Topic: wide characters - incomplete ?


Author: jjmochel@world.std.com (Jim S Jackl-Mochel)
Date: Thu, 10 Nov 1994 14:30:31 GMT
Raw View
I have just finished reading PJPs book on the draft
standard of the C++ library.

I am in the midst of a project to produce an information retrieval
toolkit and one of our greatest challenges is support for
other languages.

At the start of the project I had naively assumed that wchar_t from
ANSI C would give me a minimum character width that I could convert
characters to and from. This doesn't seem to be the case. After
struggling with this and other problems I hoped that the projected
changes for C++ would help. It may but I admit that I can't tell
from the draft standard if it can.

The two sticking points are

[1] Storage

I need to store international characters in some standard
format. Unicode seems to be one of several possible answers
but it is not clear how to get ANY external format into
a wide char.

Am I misreading the C and C++ standards ?

[2] Internal Format

If there is no standard method for going between an
external and internal format this can be remedied by
having a clearly defined internal encoding or size.
Either option will allow me to write a translation
myself.

But the guarantees for wchar_t seem to be non-utilitarian
.

"wchar_t is an integer type that can represent all values
for all wide character encdings supported by the implementation "

So if there are no wide character encodings it can be a character.

So wchar_t seems to have no minimal size and its encoding is
implementation defined.

Again, am I misreading the standard ?


My only hope seems to be something mentioned in PJP's
chapter on the standard C library. ISO94 "to support
reading, writing, and manipulation of large character sets"

Does anybody know the state of this amendment to the Standard C
library ? Does it cover any minimal sizes, encodings, or any
other things that might help me ?

Thanks to all !

Jim Jackl-Mochel
jimjm@silverplatter.com


--
Jim Jackl-Mochel
jmochel@world.std.com