Thread

Topic: Some questions to "Minimal Unicode support.." (N2207)

Author: "=?iso-8859-1?q?Daniel_Kr=FCgler?=" <daniel.kruegler@googlemail.com>
Date: Fri, 30 Mar 2007 16:08:23 CST Raw View

In the current public available 2nd revision of "Minimal
Unicode support for the standard library", found at

http://www2.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2207.html

I found some points I would like to discuss here

a) Naming proposal for std::basic_string specializations:
The proposed typedefs are std::ustring (charT=u16char_t)
and std::u32string (charT=u32char_t) . I really don't like
the choice for the u16char_t variant for several reasons:

- It's no regular naming scheme compared to the u32char_t
type.
- It scales bad if uchar8_t is *hopefully* added (I thing that
UTF-8 is *very* important).

The proposal does this for noble reasons: Lesser typing.
In this case I would strongly argue in favour for a regular,
longer name than a shorter one:

1) For the short-name freaks ustring is not short enough
and they will use their own shorter typedefs like "ustr",
"u16s" or similar.
2) Personally I would always ask: Which one of 8, 16, or
32 was this ustring again? The selective preference for
u16 might be a wrong guess compared to uchar8_t and
could lead to annoying discussions of "history development"
of the name.
3) ustring has more naming similarities to wstring than
to u32string, although uchar16_t and uchar32_t have
more thematic overlaps. I predict much confusions due
to these contradictions.
4) Last but not least: It's a fact, that the afterborn
generation often has to accept some inconveniences - a
somewhat longer name is one of the less frustrating
ones ;-)

b) The section proposing the char_traits specializations
of the new character types mentions the types "ustreampos"
and "u32streampos" which are not explained, most probably
these are typedefs for fpos<char_traits<char16_t>::state_type>
and fpos<char_traits<char32_t>::state_type> which should
be explicitely mentioned as additions to the header <iosfwd>
synopsis.

c) Again in the same char_traits section we have two
typos:

uint_least_16_t -> uint_least16_t
uint_least_32_t -> uint_least32_t

Greetings from Bremen,

Daniel Kr   gler


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Mathias Gaunard" <loufoque@gmail.com>
Date: Sun, 1 Apr 2007 14:39:42 CST Raw View

On Mar 31, 12:08 am, "Daniel Kr   gler" <daniel.krueg...@googlemail.com>
wrote:
> In the current public available 2nd revision of "Minimal
> Unicode support for the standard library", found at
>
> http://www2.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2207.html
>
> I found some points I would like to discuss here
>
> a) Naming proposal for std::basic_string specializations:
> The proposed typedefs are std::ustring (charT=u16char_t)
> and std::u32string (charT=u32char_t) . I really don't like
> the choice for the u16char_t variant for several reasons:

A sequence of of 16-bit code units, or ever 32-bit code points is
certaintly not an unicode string to begin with.
For example, operations done on an unicode string shouldn't invalidate
the unicode string. Just like when I use a map, I can't invalidate the
Red-black tree.

And whatever you do, Unicode is still variable-width because of
grapheme clusters. So building an unicode string on top of
basic_string is impossible unless you're willing to pay 32 + n*32
(with n fixed and big enough) bits per character.

I believe all those "minimal support for Unicode" things are not only
inappropriate but also useless. If Unicode is to be supported, do it
well rather than with minimal (almost non-existent) and broken support.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]