Topic: Unicode streams


Author: Daniel <Danne.Esset@gmail.com>
Date: Sat, 9 Jun 2007 20:49:00 CST
Raw View
"Many users of C++ need to manipulate Unicode character strings.
Unfortunately, there is no C++ standard means to do so." (New
Character Types in C++)

Where these unicode strings come from?
>From my experince most of these strings have come from I/O of some
kind (files, console or networking). Therefor I wonder why there
should not be any unicode I/O support.

According to Minimal Unicode support for the standard library
(revision 3):
"The rationale for leaving out stream specializations of the two new
types was that streams of non-char types have not attracted wide
usage, so it is not clear that there is a real need for doubling the
number of specalizations of this very complicated machinery."

One of the drawback of streams based on wchar is that different
implementations have not been required to use the same encoding and
therefore has limited it's usage.

I can agree with it beeing too complicated to add unicode stream
support since adding stream for other character types than char
involves additional overloads of all I/O functions (stream functions
and functions to stream data) for each character type unless templates
are used, but that need only to be if unicode stream support must be
based on the existing basic_stream.


What I would like unicode streams to support:
- Support for normal stream functions, (setf() , good() , eof()
etc..) .
- Different encodings (local dependent?, UTF16, UTF32) supported using
the same interface/class.
- I/O of all unicode chacter types, (so you can output a u16string and
the next time read it into a u32string)

- I/O of char/wchar as long as handling of errors (ex. trying to read
unicode symbols not representable using char/whar) is specified.

By using a slightly different approach to implement unicode streams
than for wchar streams this can be implemented without resorting to
virtual "<<" and ">>" operators by using a "streambuff" based on
char16_t and performing conversion when loading/saving on underflow/
overflow.

The conversion can be implementet with an abstract class performing
the conversion (so encoding can easily be changeg) or by creating
different "streambuffs" that implements the conversion when
constructing the stream.

Example usage:

uofstream myfile(ustream::utf16,"file.txt");
myfile << "This will be saved in UTF16 format" << endl;
myfile.close();

u32string text;
uifstream myfile2(ustream::utf16,"file.txt");
getline(myfile2,text); //text will contain "This will be saved in
UTF16 format"



One could also consider creating the following class hierarchies:
ustream : public ios?                              //unocode stream
classes
class u8stream : public ustream  //UTF8 encoding if it would be useful
class u16stream : public ustream //differs to ustream by choosing
UTF16 encoding be default
class u32stream : public ustream //differs to ustream by choosing
UTF32 encoding be default

Then one may use the following code instead:

u16ifstream myfile("file.txt");
myfile << "This will be saved in UTF16 format" << endl;
myfile.close();

u32string text;
u16ifstream myfile2("file.txt"); //UTF16 be default
getline(myfile2,text); //text will contain "This will be saved in
UTF16 format"

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]