Topic: Some questions to "Minimal Unicode support.." (N2207)
Author: "=?iso-8859-1?q?Daniel_Kr=FCgler?=" <daniel.kruegler@googlemail.com>
Date: Fri, 30 Mar 2007 16:08:23 CST Raw View
In the current public available 2nd revision of "Minimal
Unicode support for the standard library", found at
http://www2.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2207.html
I found some points I would like to discuss here
a) Naming proposal for std::basic_string specializations:
The proposed typedefs are std::ustring (charT=u16char_t)
and std::u32string (charT=u32char_t) . I really don't like
the choice for the u16char_t variant for several reasons:
- It's no regular naming scheme compared to the u32char_t
type.
- It scales bad if uchar8_t is *hopefully* added (I thing that
UTF-8 is *very* important).
The proposal does this for noble reasons: Lesser typing.
In this case I would strongly argue in favour for a regular,
longer name than a shorter one:
1) For the short-name freaks ustring is not short enough
and they will use their own shorter typedefs like "ustr",
"u16s" or similar.
2) Personally I would always ask: Which one of 8, 16, or
32 was this ustring again? The selective preference for
u16 might be a wrong guess compared to uchar8_t and
could lead to annoying discussions of "history development"
of the name.
3) ustring has more naming similarities to wstring than
to u32string, although uchar16_t and uchar32_t have
more thematic overlaps. I predict much confusions due
to these contradictions.
4) Last but not least: It's a fact, that the afterborn
generation often has to accept some inconveniences - a
somewhat longer name is one of the less frustrating
ones ;-)
b) The section proposing the char_traits specializations
of the new character types mentions the types "ustreampos"
and "u32streampos" which are not explained, most probably
these are typedefs for fpos<char_traits<char16_t>::state_type>
and fpos<char_traits<char32_t>::state_type> which should
be explicitely mentioned as additions to the header <iosfwd>
synopsis.
c) Again in the same char_traits section we have two
typos:
uint_least_16_t -> uint_least16_t
uint_least_32_t -> uint_least32_t
Greetings from Bremen,
Daniel Kr gler
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]
Author: "Mathias Gaunard" <loufoque@gmail.com>
Date: Sun, 1 Apr 2007 14:39:42 CST Raw View
On Mar 31, 12:08 am, "Daniel Kr gler" <daniel.krueg...@googlemail.com>
wrote:
> In the current public available 2nd revision of "Minimal
> Unicode support for the standard library", found at
>
> http://www2.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2207.html
>
> I found some points I would like to discuss here
>
> a) Naming proposal for std::basic_string specializations:
> The proposed typedefs are std::ustring (charT=u16char_t)
> and std::u32string (charT=u32char_t) . I really don't like
> the choice for the u16char_t variant for several reasons:
A sequence of of 16-bit code units, or ever 32-bit code points is
certaintly not an unicode string to begin with.
For example, operations done on an unicode string shouldn't invalidate
the unicode string. Just like when I use a map, I can't invalidate the
Red-black tree.
And whatever you do, Unicode is still variable-width because of
grapheme clusters. So building an unicode string on top of
basic_string is impossible unless you're willing to pay 32 + n*32
(with n fixed and big enough) bits per character.
I believe all those "minimal support for Unicode" things are not only
inappropriate but also useless. If Unicode is to be supported, do it
well rather than with minimal (almost non-existent) and broken support.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]