Topic: wide characters - incomplete ?
Author: jjmochel@world.std.com (Jim S Jackl-Mochel)
Date: Thu, 10 Nov 1994 14:30:31 GMT Raw View
I have just finished reading PJPs book on the draft
standard of the C++ library.
I am in the midst of a project to produce an information retrieval
toolkit and one of our greatest challenges is support for
other languages.
At the start of the project I had naively assumed that wchar_t from
ANSI C would give me a minimum character width that I could convert
characters to and from. This doesn't seem to be the case. After
struggling with this and other problems I hoped that the projected
changes for C++ would help. It may but I admit that I can't tell
from the draft standard if it can.
The two sticking points are
[1] Storage
I need to store international characters in some standard
format. Unicode seems to be one of several possible answers
but it is not clear how to get ANY external format into
a wide char.
Am I misreading the C and C++ standards ?
[2] Internal Format
If there is no standard method for going between an
external and internal format this can be remedied by
having a clearly defined internal encoding or size.
Either option will allow me to write a translation
myself.
But the guarantees for wchar_t seem to be non-utilitarian
.
"wchar_t is an integer type that can represent all values
for all wide character encdings supported by the implementation "
So if there are no wide character encodings it can be a character.
So wchar_t seems to have no minimal size and its encoding is
implementation defined.
Again, am I misreading the standard ?
My only hope seems to be something mentioned in PJP's
chapter on the standard C library. ISO94 "to support
reading, writing, and manipulation of large character sets"
Does anybody know the state of this amendment to the Standard C
library ? Does it cover any minimal sizes, encodings, or any
other things that might help me ?
Thanks to all !
Jim Jackl-Mochel
jimjm@silverplatter.com
--
Jim Jackl-Mochel
jmochel@world.std.com