Thread

Topic: Why are there no standardized string

Author: Thiago Macieira <thiago@macieira.org>
Date: Thu, 29 Jun 2017 22:31:45 -0700 Raw View

On Thursday, 29 June 2017 00:34:59 PDT george.h.personal@gmail.com wrote:
> > Rule #1 about Unicode: If you think you know Unicode, then you do not know
> > enough about Unicode to make that determination.
> >
> > Corollary: If you think something in Unciode would be "easy to implement",
> > you do not know enough about Unicode to make that determination.
> >
> > Implementing a Unicode-aware `to_lowercase` is not a trivial matter. The
> > Unicode case conversion algorithm requires not just a large data table,
> > but
> > some very oddball computations, as well as merging sequences of characters
> > (which itself suggests normalization).
>
> Ok, maybe some of the implementations wouldn't be trivial, at least not to
> someone like myself. However, literally every other programming language
> that sees heavy usage today except for C implements this functionality, so
> I would assume that a C++ dev skilled enough to work on the standard should
> be able to implement this.

Indeed. And like I said, there are suitable C libraries containing all the
relevant functionaly we need. Library authors don't need to write all the
code, just use the libraries.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/2585094.muaacRduSE%40tjmaciei-mobl1.

.

Author: Bengt Gustafsson <bengt.gustafsson@beamways.com>
Date: Mon, 3 Jul 2017 11:49:38 -0700 (PDT) Raw View

------=_Part_1983_1740459429.1499107778226
Content-Type: multipart/alternative;
 boundary="----=_Part_1984_1271624717.1499107778227"

------=_Part_1984_1271624717.1499107778227
Content-Type: text/plain; charset="UTF-8"

Isn't this code already inside the ctype towupper/twlower functions:

http://en.cppreference.com/w/cpp/string/wide/towupper

I would imagine that the stringwise tolower/toupper functions would just be
a loop around the corresponding character-wise functions. This brings out
the subject of defining the locale of the character set, which it seems
appropriate to have versions for explicitly and implicitly defining. Maybe
the somewhat messy nature of locales/facets and their scary application
wide settability is what has held proposals for this feature back?


Den fredag 30 juni 2017 kl. 07:34:29 UTC+2 skrev Thiago Macieira:
>
> On Thursday, 29 June 2017 00:34:59 PDT george.h...@gmail.com <javascript:>
> wrote:
> > > Rule #1 about Unicode: If you think you know Unicode, then you do not
> know
> > > enough about Unicode to make that determination.
> > >
> > > Corollary: If you think something in Unciode would be "easy to
> implement",
> > > you do not know enough about Unicode to make that determination.
> > >
> > > Implementing a Unicode-aware `to_lowercase` is not a trivial matter.
> The
> > > Unicode case conversion algorithm requires not just a large data
> table,
> > > but
> > > some very oddball computations, as well as merging sequences of
> characters
> > > (which itself suggests normalization).
> >
> > Ok, maybe some of the implementations wouldn't be trivial, at least not
> to
> > someone like myself. However, literally every other programming language
> > that sees heavy usage today except for C implements this functionality,
> so
> > I would assume that a C++ dev skilled enough to work on the standard
> should
> > be able to implement this.
>
> Indeed. And like I said, there are suitable C libraries containing all the
> relevant functionaly we need. Library authors don't need to write all the
> code, just use the libraries.
>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
>    Software Architect - Intel Open Source Technology Center
>
>

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/e157603c-4181-4819-b29b-36ea7f3320d2%40isocpp.org.

------=_Part_1984_1271624717.1499107778227
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Isn&#39;t this code already inside the ctype towupper/twlo=
wer functions:<div><br></div><div>http://en.cppreference.com/w/cpp/string/w=
ide/towupper</div><div><br></div><div>I would imagine that the stringwise t=
olower/toupper functions would just be a loop around the corresponding char=
acter-wise functions. This brings out the subject of defining the locale of=
 the character set, which it seems appropriate to have versions for explici=
tly and implicitly defining. Maybe the somewhat messy nature of locales/fac=
ets and their scary application wide settability is what has held proposals=
 for this feature back?</div><div><br><br>Den fredag 30 juni 2017 kl. 07:34=
:29 UTC+2 skrev Thiago Macieira:<blockquote class=3D"gmail_quote" style=3D"=
margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;=
">On Thursday, 29 June 2017 00:34:59 PDT <a href=3D"javascript:" target=3D"=
_blank" gdf-obfuscated-mailto=3D"3zpTYh-5AAAJ" rel=3D"nofollow" onmousedown=
=3D"this.href=3D&#39;javascript:&#39;;return true;" onclick=3D"this.href=3D=
&#39;javascript:&#39;;return true;">george.h...@gmail.com</a> wrote:
<br>&gt; &gt; Rule #1 about Unicode: If you think you know Unicode, then yo=
u do not know
<br>&gt; &gt; enough about Unicode to make that determination.
<br>&gt; &gt;=20
<br>&gt; &gt; Corollary: If you think something in Unciode would be &quot;e=
asy to implement&quot;,
<br>&gt; &gt; you do not know enough about Unicode to make that determinati=
on.
<br>&gt; &gt;=20
<br>&gt; &gt; Implementing a Unicode-aware `to_lowercase` is not a trivial =
matter. The
<br>&gt; &gt; Unicode case conversion algorithm requires not just a large d=
ata table,
<br>&gt; &gt; but
<br>&gt; &gt; some very oddball computations, as well as merging sequences =
of characters
<br>&gt; &gt; (which itself suggests normalization).
<br>&gt;=20
<br>&gt; Ok, maybe some of the implementations wouldn&#39;t be trivial, at =
least not to
<br>&gt; someone like myself. However, literally every other programming la=
nguage
<br>&gt; that sees heavy usage today except for C implements this functiona=
lity, so
<br>&gt; I would assume that a C++ dev skilled enough to work on the standa=
rd should
<br>&gt; be able to implement this.
<br>
<br>Indeed. And like I said, there are suitable C libraries containing all =
the=20
<br>relevant functionaly we need. Library authors don&#39;t need to write a=
ll the=20
<br>code, just use the libraries.
<br>
<br>--=20
<br>Thiago Macieira - thiago (AT) <a href=3D"http://macieira.info" target=
=3D"_blank" rel=3D"nofollow" onmousedown=3D"this.href=3D&#39;http://www.goo=
gle.com/url?q\x3dhttp%3A%2F%2Fmacieira.info\x26sa\x3dD\x26sntz\x3d1\x26usg\=
x3dAFQjCNEswDUBNCNanbu7euhqLn_62FW8ag&#39;;return true;" onclick=3D"this.hr=
ef=3D&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fmacieira.info\x26sa\x=
3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEswDUBNCNanbu7euhqLn_62FW8ag&#39;;return t=
rue;">macieira.info</a> - thiago (AT) <a href=3D"http://kde.org" target=3D"=
_blank" rel=3D"nofollow" onmousedown=3D"this.href=3D&#39;http://www.google.=
com/url?q\x3dhttp%3A%2F%2Fkde.org\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH=
GRJdo5_JYG1DowztwAHAKs80XSA&#39;;return true;" onclick=3D"this.href=3D&#39;=
http://www.google.com/url?q\x3dhttp%3A%2F%2Fkde.org\x26sa\x3dD\x26sntz\x3d1=
\x26usg\x3dAFQjCNHGRJdo5_JYG1DowztwAHAKs80XSA&#39;;return true;">kde.org</a=
>
<br>=C2=A0 =C2=A0Software Architect - Intel Open Source Technology Center
<br>
<br></blockquote></div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/e157603c-4181-4819-b29b-36ea7f3320d2%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/e157603c-4181-4819-b29b-36ea7f3320d2=
%40isocpp.org</a>.<br />

------=_Part_1984_1271624717.1499107778227--

------=_Part_1983_1740459429.1499107778226--

.

Author: Thiago Macieira <thiago@macieira.org>
Date: Mon, 03 Jul 2017 12:02:32 -0700 Raw View

On segunda-feira, 3 de julho de 2017 11:49:38 PDT Bengt Gustafsson wrote:
> Isn't this code already inside the ctype towupper/twlower functions:
>=20
> http://en.cppreference.com/w/cpp/string/wide/towupper

No, By the simple signature of the function, it's impossible:

 std::wint_t towupper( std::wint_t ch );

Compare to:

 QString("=C3=9F").toUpper()

which produces QString("SS").

> I would imagine that the stringwise tolower/toupper functions would just =
be
> a loop around the corresponding character-wise functions.

Yeah, that's the wrong implementation.

> This brings out
> the subject of defining the locale of the character set, which it seems
> appropriate to have versions for explicitly and implicitly defining. Mayb=
e
> the somewhat messy nature of locales/facets and their scary application
> wide settability is what has held proposals for this feature back?

Right... there's the locale aspect. The Unicode case changing algorithms ar=
e=20
complex enough as it is.

And then there's Turkish, where towupper('i') =3D L'=C4=B0' and towlower('I=
') =3D L'=C4=B1'.

--=20
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/2600543.LJIS3b602g%40tjmaciei-mobl1.

.

Author: Nicol Bolas <jmckesson@gmail.com>
Date: Mon, 3 Jul 2017 12:07:00 -0700 (PDT) Raw View

------=_Part_1968_1751876211.1499108820365
Content-Type: multipart/alternative;
 boundary="----=_Part_1969_1329407107.1499108820365"

------=_Part_1969_1329407107.1499108820365
Content-Type: text/plain; charset="UTF-8"

On Monday, July 3, 2017 at 2:49:38 PM UTC-4, Bengt Gustafsson wrote:
>
> Isn't this code already inside the ctype towupper/twlower functions:
>
> http://en.cppreference.com/w/cpp/string/wide/towupper
>

It depends on what you mean by "this code". If you mean "calling the
character functions", yes. If you mean "doing Unicode case conversion", no.

I would imagine that the stringwise tolower/toupper functions would just be
> a loop around the corresponding character-wise functions.
>

You too have fallen victim to the Corollary to Rule #1 of Unicode.

Unicode case folding requires being able to convert one character into 2
characters as well as two characters into one.


> This brings out the subject of defining the locale of the character set,
> which it seems appropriate to have versions for explicitly and implicitly
> defining. Maybe the somewhat messy nature of locales/facets and their scary
> application wide settability is what has held proposals for this feature
> back?
>

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/1cac430e-4014-4f31-b930-fcec58060740%40isocpp.org.

------=_Part_1969_1329407107.1499108820365
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Monday, July 3, 2017 at 2:49:38 PM UTC-4, Bengt Gustafs=
son wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left:=
 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr">Isn=
&#39;t this code already inside the ctype towupper/twlower functions:<div><=
br></div><div><a href=3D"http://en.cppreference.com/w/cpp/string/wide/towup=
per" target=3D"_blank" rel=3D"nofollow" onmousedown=3D"this.href=3D&#39;htt=
p://www.google.com/url?q\x3dhttp%3A%2F%2Fen.cppreference.com%2Fw%2Fcpp%2Fst=
ring%2Fwide%2Ftowupper\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNG9EeOZLp_Iev=
e7QCBYVNmTxCaAWg&#39;;return true;" onclick=3D"this.href=3D&#39;http://www.=
google.com/url?q\x3dhttp%3A%2F%2Fen.cppreference.com%2Fw%2Fcpp%2Fstring%2Fw=
ide%2Ftowupper\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNG9EeOZLp_Ieve7QCBYVN=
mTxCaAWg&#39;;return true;">http://en.cppreference.com/w/<wbr>cpp/string/wi=
de/towupper</a></div></div></blockquote><div><br>It depends on what you mea=
n by &quot;this code&quot;. If you mean &quot;calling the character functio=
ns&quot;, yes. If you mean &quot;doing Unicode case conversion&quot;, no.<b=
r><br></div><blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-lef=
t: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr"><=
div></div><div>I would imagine that the stringwise tolower/toupper function=
s would just be a loop around the corresponding character-wise functions.</=
div></div></blockquote><div><br>You too have fallen victim to the Corollary=
 to Rule #1 of Unicode.<br><br>Unicode case folding requires being able to =
convert one character into 2 characters as well as two characters into one.=
<br>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin: 0;margin=
-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"lt=
r"><div>This brings out the subject of defining the locale of the character=
 set, which it seems appropriate to have versions for explicitly and implic=
itly defining. Maybe the somewhat messy nature of locales/facets and their =
scary application wide settability is what has held proposals for this feat=
ure back? <br></div></div></blockquote></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/1cac430e-4014-4f31-b930-fcec58060740%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/1cac430e-4014-4f31-b930-fcec58060740=
%40isocpp.org</a>.<br />

------=_Part_1969_1329407107.1499108820365--

------=_Part_1968_1751876211.1499108820365--

.

Author: Fabio Fracassi <f.fracassi@gmx.net>
Date: Mon, 3 Jul 2017 21:19:40 +0200 Raw View


On 03.07.17 21:02, Thiago Macieira wrote:
>  QString("=C3=9F").toUpper()
>
> which produces QString("SS").
>
>
As of last week we will need a new example for that, since the capital =C3=
=9F=20
has now been officially recognized. It has already been introduced into=20
unicode a few years ago, but last week it has been adopted as the=20
correct capitalization for =C3=9F instead of SS.

Currently the only non German source I could find is Wikipedia=20
https://en.wikipedia.org/wiki/Capital_%E1%BA%9E

best

Fabio

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/d847598c-fbbb-408d-b442-7b12410797f6%40gmx.net.

.

Author: Fabio Fracassi <f.fracassi@gmx.net>
Date: Mon, 3 Jul 2017 21:22:56 +0200 Raw View


On 03.07.17 21:19, Fabio Fracassi wrote:
>
>
> On 03.07.17 21:02, Thiago Macieira wrote:
>>     QString("=C3=9F").toUpper()
>>
>> which produces QString("SS").
>>
>>
> As of last week we will need a new example for that, since the capital=20
> =C3=9F has now been officially recognized. It has already been introduced=
=20
> into unicode a few years ago, but last week it has been adopted as the=20
> correct capitalization for =C3=9F instead of SS.
>
Small (but important for maintainers) correction, as *A* correct=20
capitalization, SS will also remain a valid capitalization.

> Currently the only non German source I could find is Wikipedia=20
> https://en.wikipedia.org/wiki/Capital_%E1%BA%9E
>


> best
>
> Fabio
>

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/f655fbc9-4701-7c12-d87d-984bafb3a8b6%40gmx.net.

.

Author: Nicol Bolas <jmckesson@gmail.com>
Date: Mon, 3 Jul 2017 13:09:23 -0700 (PDT) Raw View

------=_Part_2014_1901977397.1499112563291
Content-Type: multipart/alternative;
 boundary="----=_Part_2015_1558425632.1499112563291"

------=_Part_2015_1558425632.1499112563291
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Monday, July 3, 2017 at 3:23:01 PM UTC-4, Fabio Fracassi wrote:
>
> On 03.07.17 21:19, Fabio Fracassi wrote:=20
> >=20
> >=20
> > On 03.07.17 21:02, Thiago Macieira wrote:=20
> >>     QString("=C3=9F").toUpper()=20
> >>=20
> >> which produces QString("SS").=20
> >>=20
> >>=20
> > As of last week we will need a new example for that, since the capital=
=20
> > =C3=9F has now been officially recognized. It has already been introduc=
ed=20
> > into unicode a few years ago, but last week it has been adopted as the=
=20
> > correct capitalization for =C3=9F instead of SS.=20
> >=20
> Small (but important for maintainers) correction, as *A* correct=20
> capitalization, SS will also remain a valid capitalization.
>

So, did this actually change the Unicode case conversion tables? If not,=20
the primary point stands: the non-locale-based Unicode case conversion=20
algorithm still requires the original composition/decomposition.

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/9edf3006-786f-493d-80fa-d4635b902fba%40isocpp.or=
g.

------=_Part_2015_1558425632.1499112563291
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Monday, July 3, 2017 at 3:23:01 PM UTC-4, Fabio Fracass=
i wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0=
..8ex;border-left: 1px #ccc solid;padding-left: 1ex;">
On 03.07.17 21:19, Fabio Fracassi wrote:
<br>&gt;
<br>&gt;
<br>&gt; On 03.07.17 21:02, Thiago Macieira wrote:
<br>&gt;&gt; =C2=A0 =C2=A0 QString(&quot;=C3=9F&quot;).toUpper()
<br>&gt;&gt;
<br>&gt;&gt; which produces QString(&quot;SS&quot;).
<br>&gt;&gt;
<br>&gt;&gt;
<br>&gt; As of last week we will need a new example for that, since the cap=
ital=20
<br>&gt; =C3=9F has now been officially recognized. It has already been int=
roduced=20
<br>&gt; into unicode a few years ago, but last week it has been adopted as=
 the=20
<br>&gt; correct capitalization for =C3=9F instead of SS.
<br>&gt;
<br>Small (but important for maintainers) correction, as *A* correct=20
<br>capitalization, SS will also remain a valid capitalization.<br></blockq=
uote><div><br>So, did this actually change the Unicode case conversion tabl=
es? If not, the primary point stands: the non-locale-based Unicode case con=
version algorithm still requires the original composition/decomposition.</d=
iv></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/9edf3006-786f-493d-80fa-d4635b902fba%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/9edf3006-786f-493d-80fa-d4635b902fba=
%40isocpp.org</a>.<br />

------=_Part_2015_1558425632.1499112563291--

------=_Part_2014_1901977397.1499112563291--

.

Author: Fabio Fracassi <f.fracassi@gmx.net>
Date: Mon, 3 Jul 2017 22:16:41 +0200 Raw View

This is a multi-part message in MIME format.
--------------3BAC980AAE15A454C87F7CD7
Content-Type: text/plain; charset="UTF-8"; format=flowed
Content-Transfer-Encoding: quoted-printable



On 03.07.17 22:09, Nicol Bolas wrote:
> On Monday, July 3, 2017 at 3:23:01 PM UTC-4, Fabio Fracassi wrote:
>
>     On 03.07.17 21:19, Fabio Fracassi wrote:
>     >
>     >
>     > On 03.07.17 21:02, Thiago Macieira wrote:
>     >>     QString("=C3=9F").toUpper()
>     >>
>     >> which produces QString("SS").
>     >>
>     >>
>     > As of last week we will need a new example for that, since the
>     capital
>     > =C3=9F has now been officially recognized. It has already been
>     introduced
>     > into unicode a few years ago, but last week it has been adopted
>     as the
>     > correct capitalization for =C3=9F instead of SS.
>     >
>     Small (but important for maintainers) correction, as *A* correct
>     capitalization, SS will also remain a valid capitalization.
>
>
> So, did this actually change the Unicode case conversion tables? If=20
> not, the primary point stands: the non-locale-based Unicode case=20
> conversion algorithm still requires the original=20
> composition/decomposition.
I do not know.
The decision has been made public just last week.
I do not know when or even if unicode will follow.

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/6d75b048-c3a1-d041-b401-e83879d36281%40gmx.net.

--------------3BAC980AAE15A454C87F7CD7
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<html>
  <head>
    <meta content=3D"text/html; charset=3Dutf-8" http-equiv=3D"Content-Type=
">
  </head>
  <body bgcolor=3D"#FFFFFF" text=3D"#000000">
    <p><br>
    </p>
    <br>
    <div class=3D"moz-cite-prefix">On 03.07.17 22:09, Nicol Bolas wrote:<br=
>
    </div>
    <blockquote
      cite=3D"mid:9edf3006-786f-493d-80fa-d4635b902fba@isocpp.org"
      type=3D"cite">
      <div dir=3D"ltr">On Monday, July 3, 2017 at 3:23:01 PM UTC-4, Fabio
        Fracassi wrote:
        <blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left:
          0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">
          On 03.07.17 21:19, Fabio Fracassi wrote:
          <br>
          &gt;
          <br>
          &gt;
          <br>
          &gt; On 03.07.17 21:02, Thiago Macieira wrote:
          <br>
          &gt;&gt; =C2=A0 =C2=A0 QString("=C3=9F").toUpper()
          <br>
          &gt;&gt;
          <br>
          &gt;&gt; which produces QString("SS").
          <br>
          &gt;&gt;
          <br>
          &gt;&gt;
          <br>
          &gt; As of last week we will need a new example for that,
          since the capital <br>
          &gt; =C3=9F has now been officially recognized. It has already be=
en
          introduced <br>
          &gt; into unicode a few years ago, but last week it has been
          adopted as the <br>
          &gt; correct capitalization for =C3=9F instead of SS.
          <br>
          &gt;
          <br>
          Small (but important for maintainers) correction, as *A*
          correct <br>
          capitalization, SS will also remain a valid capitalization.<br>
        </blockquote>
        <div><br>
          So, did this actually change the Unicode case conversion
          tables? If not, the primary point stands: the non-locale-based
          Unicode case conversion algorithm still requires the original
          composition/decomposition.</div>
      </div>
    </blockquote>
    I do not know. <br>
    The decision has been made public just last week. <br>
    I do not know when or even if unicode will follow. <br>
  </body>
</html>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/6d75b048-c3a1-d041-b401-e83879d36281%=
40gmx.net?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.com=
/a/isocpp.org/d/msgid/std-proposals/6d75b048-c3a1-d041-b401-e83879d36281%40=
gmx.net</a>.<br />

--------------3BAC980AAE15A454C87F7CD7--

.

Author: Nicol Bolas <jmckesson@gmail.com>
Date: Mon, 3 Jul 2017 13:45:36 -0700 (PDT) Raw View

------=_Part_1266_1698612690.1499114737011
Content-Type: multipart/alternative;
 boundary="----=_Part_1267_18912530.1499114737011"

------=_Part_1267_18912530.1499114737011
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Monday, July 3, 2017 at 4:16:48 PM UTC-4, Fabio Fracassi wrote:
>
> On 03.07.17 22:09, Nicol Bolas wrote:
>
> On Monday, July 3, 2017 at 3:23:01 PM UTC-4, Fabio Fracassi wrote:=20
>>
>> On 03.07.17 21:19, Fabio Fracassi wrote:=20
>> >=20
>> >=20
>> > On 03.07.17 21:02, Thiago Macieira wrote:=20
>> >>     QString("=C3=9F").toUpper()=20
>> >>=20
>> >> which produces QString("SS").=20
>> >>=20
>> >>=20
>> > As of last week we will need a new example for that, since the capital=
=20
>> > =C3=9F has now been officially recognized. It has already been introdu=
ced=20
>> > into unicode a few years ago, but last week it has been adopted as the=
=20
>> > correct capitalization for =C3=9F instead of SS.=20
>> >=20
>> Small (but important for maintainers) correction, as *A* correct=20
>> capitalization, SS will also remain a valid capitalization.
>>
>
> So, did this actually change the Unicode case conversion tables? If not,=
=20
> the primary point stands: the non-locale-based Unicode case conversion=20
> algorithm still requires the original composition/decomposition.
>
> I do not know.=20
> The decision has been made public just last week.=20
> I do not know when or even if unicode will follow.
>

Oh, so this was just a German language decision, not a Unicode decision. So=
=20
until the Unicode specification and data tables themselves change, the=20
example is still valid.

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/0983f2a4-2bff-48bc-ab93-eefe42c4b87d%40isocpp.or=
g.

------=_Part_1267_18912530.1499114737011
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Monday, July 3, 2017 at 4:16:48 PM UTC-4, Fabio Fracass=
i wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0=
..8ex;border-left: 1px #ccc solid;padding-left: 1ex;">
 =20
   =20
 =20
  <div bgcolor=3D"#FFFFFF" text=3D"#000000">
    <div>On 03.07.17 22:09, Nicol Bolas wrote:<br>
    </div>
    <blockquote type=3D"cite">
      <div dir=3D"ltr">On Monday, July 3, 2017 at 3:23:01 PM UTC-4, Fabio
        Fracassi wrote:
        <blockquote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8=
ex;border-left:1px #ccc solid;padding-left:1ex">
          On 03.07.17 21:19, Fabio Fracassi wrote:
          <br>
          &gt;
          <br>
          &gt;
          <br>
          &gt; On 03.07.17 21:02, Thiago Macieira wrote:
          <br>
          &gt;&gt; =C2=A0 =C2=A0 QString(&quot;=C3=9F&quot;).toUpper()
          <br>
          &gt;&gt;
          <br>
          &gt;&gt; which produces QString(&quot;SS&quot;).
          <br>
          &gt;&gt;
          <br>
          &gt;&gt;
          <br>
          &gt; As of last week we will need a new example for that,
          since the capital <br>
          &gt; =C3=9F has now been officially recognized. It has already be=
en
          introduced <br>
          &gt; into unicode a few years ago, but last week it has been
          adopted as the <br>
          &gt; correct capitalization for =C3=9F instead of SS.
          <br>
          &gt;
          <br>
          Small (but important for maintainers) correction, as *A*
          correct <br>
          capitalization, SS will also remain a valid capitalization.<br>
        </blockquote>
        <div><br>
          So, did this actually change the Unicode case conversion
          tables? If not, the primary point stands: the non-locale-based
          Unicode case conversion algorithm still requires the original
          composition/decomposition.</div>
      </div>
    </blockquote>
    I do not know. <br>
    The decision has been made public just last week. <br>
    I do not know when or even if unicode will follow.<br></div></blockquot=
e><div><br>Oh, so this was just a German language decision, not a Unicode d=
ecision. So until the Unicode specification and data tables themselves chan=
ge, the example is still valid.<br></div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/0983f2a4-2bff-48bc-ab93-eefe42c4b87d%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/0983f2a4-2bff-48bc-ab93-eefe42c4b87d=
%40isocpp.org</a>.<br />

------=_Part_1267_18912530.1499114737011--

------=_Part_1266_1698612690.1499114737011--

.

Author: Thiago Macieira <thiago@macieira.org>
Date: Mon, 03 Jul 2017 22:16:56 -0700 Raw View

On segunda-feira, 3 de julho de 2017 12:19:40 PDT Fabio Fracassi wrote:
> On 03.07.17 21:02, Thiago Macieira wrote:
> >  QString("=C3=9F").toUpper()
> >=20
> > which produces QString("SS").
>=20
> As of last week we will need a new example for that, since the capital =
=C3=9F
> has now been officially recognized. It has already been introduced into
> unicode a few years ago, but last week it has been adopted as the
> correct capitalization for =C3=9F instead of SS.
>=20
> Currently the only non German source I could find is Wikipedia
> https://en.wikipedia.org/wiki/Capital_%E1%BA%9E

Unicode 10.0 still says it's "SS":

ftp://ftp.unicode.org/Public/10.0.0/ucd/CaseFolding.txt has

00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S


--=20
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/2646430.CcDOiDQAAK%40tjmaciei-mobl1.

.

Author: Thiago Macieira <thiago@macieira.org>
Date: Mon, 03 Jul 2017 22:26:50 -0700 Raw View

On segunda-feira, 3 de julho de 2017 12:19:40 PDT Fabio Fracassi wrote:
> On 03.07.17 21:02, Thiago Macieira wrote:
> >  QString("=C3=9F").toUpper()
> >=20
> > which produces QString("SS").
>=20
> As of last week we will need a new example for that, since the capital =
=C3=9F
> has now been officially recognized. It has already been introduced into
> unicode a few years ago, but last week it has been adopted as the
> correct capitalization for =C3=9F instead of SS.
>=20
> Currently the only non German source I could find is Wikipedia
> https://en.wikipedia.org/wiki/Capital_%E1%BA%9E

As for other examples, these are the only other Latin script case conversio=
ns=20
that expand in length and do not involve a combining character:

 "=E1=BA=9A" -> "A=CA=BE"
 "=EF=AC=80" -> "FF"
 "=EF=AC=81" -> "FI"
 "=EF=AC=82" -> "FL"
 "=EF=AC=83" -> "FFI"
 "=EF=AC=84" -> "FFL"
 "=EF=AC=85" -> "ST"
 "=EF=AC=86" -> "ST"

--=20
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/8239500.mkKSLEWNJF%40tjmaciei-mobl1.

.