Topic: Unicode support by extending std::locale. Can


Author: Tom Honermann <tom@honermann.net>
Date: Wed, 28 Mar 2018 10:37:45 -0400
Raw View
This is a multi-part message in MIME format.
--------------12673CF8BAA329ECB1EBA8C6
Content-Type: text/plain; charset="UTF-8"; format=flowed
Content-Transfer-Encoding: quoted-printable

On 03/28/2018 10:18 AM, Nicol Bolas wrote:
>
>
> On Wednesday, March 28, 2018 at 8:45:40 AM UTC-4, Dimitrij Mijoski wrote:
>
>
>       Unicode support by extending std::locale. Can we make it by 2020?
>
>     The need for standard Unicode support has been requested many
>     times and I wont get into it. It is very obvious that we need it.
>
>     Goals:
>
>       * No new string class
>       * No new character type
>
> We kinda need a new character type. Unless you really like having to=20
> use `u8path` every time you want to use a UTF-8 string with=20
> `std::filesystem::path`. Having a type that says, "I'm really a UTF-8=20
> string" is important.

Obligatory links:
- [WG21 P0482]: char8_t: A type for UTF-8 characters and strings
 =C2=A0 http://wg21.link/p0482
- [WG21 N2231]: char8_t: A type for UTF-8 characters and strings
 =C2=A0 http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2231.htm

P0482 was approved by LEWG and EWG, and reviewed by CWG in=20
Jacksonville.=C2=A0 I hope to see it make C++20 in either Rapperswil or San=
=20
Diego.

>       * Reuse std::locale and facet interfaces
>
> For the love of God, /why?!/

I think a friendlier tone could be used here.=C2=A0 Nicol, *please* put mor=
e=20
effort into being more welcoming to contributors.=C2=A0 Not everyone has yo=
ur=20
experience or expertise.=C2=A0 You have been singled out for inappropriate=
=20
comments many times.=C2=A0 Do better.

That being said, I do agree that std::locale and facets have fundamental=20
deficiencies that likely make them ill-suited to continue using as=20
building blocks.

>
> I'm being serious: why would we want to compound the mistakes of=20
> `std::locale` by trying to improve it? It was a bad idea; let it die.
>
>       * Follow best practices and see how Linux and POSIX handle locales.
>       * Follow library ICU.
>       * See boost::locale which extends std::locale.
>       * Use bottom up approach while designing. First define low level
>         stuff (facets), then their use (e.g. in iostreams).
>
>     <snip>
>
>
>           Conclusion so far
>
>     Specifying the above facets are the absolute minimum to get a
>     decent Unicode support.
>
>
> How would this deal with Unicode case conversions which work based,=20
> not on letters, but /strings/? That is, a single lowercase codepoint=20
> converts into two uppercase ones, or vice-versa? How would this handle=20
> Unicode titlecase? And so forth.
>
> This is not "decent Unicode support" even if you ignore having to deal=20
> with `std::locale`'s garbage.
>
>     More advanced Unicode features like:
>
>      1. Querying character properties like general category
>      2. Language sensitive string case transformations (not character)
>      3. Normalization
>
>     Will need facets on their own.
>
> --=20
> You received this message because you are subscribed to the Google=20
> Groups "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send=20
> an email to std-proposals+unsubscribe@isocpp.org=20
> <mailto:std-proposals+unsubscribe@isocpp.org>.
> To post to this group, send email to std-proposals@isocpp.org=20
> <mailto:std-proposals@isocpp.org>.
> To view this discussion on the web visit=20
> https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/c28eab4e-b5d=
6-455f-aee7-3e212a00b135%40isocpp.org=20
> <https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/c28eab4e-b5=
d6-455f-aee7-3e212a00b135%40isocpp.org?utm_medium=3Demail&utm_source=3Dfoot=
er>.


--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/e55a7ff4-7953-832d-1535-5e309183fa44%40honermann=
..net.

--------------12673CF8BAA329ECB1EBA8C6
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<html>
  <head>
    <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dutf-8=
">
  </head>
  <body text=3D"#000000" bgcolor=3D"#FFFFFF">
    <div class=3D"moz-cite-prefix">On 03/28/2018 10:18 AM, Nicol Bolas
      wrote:<br>
    </div>
    <blockquote type=3D"cite"
      cite=3D"mid:c28eab4e-b5d6-455f-aee7-3e212a00b135@isocpp.org">
      <div dir=3D"ltr"><br>
        <br>
        On Wednesday, March 28, 2018 at 8:45:40 AM UTC-4, Dimitrij
        Mijoski wrote:
        <blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left:
          0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">
          <div dir=3D"ltr">
            <h1>Unicode support by extending std::locale. Can we make it
              by 2020?</h1>
            <p>The need for standard Unicode support has been requested
              many times and I wont get into it. It is very obvious that
              we need it.</p>
            <p>Goals:</p>
            <ul>
              <li>No new string class</li>
              <li>No new character type</li>
            </ul>
          </div>
        </blockquote>
        <div>We kinda need a new character type. Unless you really like
          having to use `u8path` every time you want to use a UTF-8
          string with `std::filesystem::path`. Having a type that says,
          "I'm really a UTF-8 string" is important. <br>
        </div>
      </div>
    </blockquote>
    <br>
    Obligatory links:<br>
    - [WG21 P0482]: char8_t: A type for UTF-8 characters and strings<br>
    =C2=A0 <a class=3D"moz-txt-link-freetext" href=3D"http://wg21.link/p048=
2">http://wg21.link/p0482</a><br>
    - [WG21 N2231]: char8_t: A type for UTF-8 characters and strings<br>
    =C2=A0 <a class=3D"moz-txt-link-freetext" href=3D"http://www.open-std.o=
rg/jtc1/sc22/wg14/www/docs/n2231.htm">http://www.open-std.org/jtc1/sc22/wg1=
4/www/docs/n2231.htm</a><br>
    <br>
    P0482 was approved by LEWG and EWG, and reviewed by CWG in
    Jacksonville.=C2=A0 I hope to see it make C++20 in either Rapperswil or
    San Diego.<br>
    <br>
    <blockquote type=3D"cite"
      cite=3D"mid:c28eab4e-b5d6-455f-aee7-3e212a00b135@isocpp.org">
      <div dir=3D"ltr">
        <blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left:
          0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">
          <div dir=3D"ltr">
            <ul>
              <li>Reuse std::locale and facet interfaces</li>
            </ul>
          </div>
        </blockquote>
        <div>For the love of God, <i>why?!</i><br>
        </div>
      </div>
    </blockquote>
    <br>
    I think a friendlier tone could be used here.=C2=A0 Nicol, *please* put
    more effort into being more welcoming to contributors.=C2=A0 Not everyo=
ne
    has your experience or expertise.=C2=A0 You have been singled out for
    inappropriate comments many times.=C2=A0 Do better.<br>
    <br>
    That being said, I do agree that std::locale and facets have
    fundamental deficiencies that likely make them ill-suited to
    continue using as building blocks.<br>
    <br>
    <blockquote type=3D"cite"
      cite=3D"mid:c28eab4e-b5d6-455f-aee7-3e212a00b135@isocpp.org">
      <div dir=3D"ltr">
        <div><br>
          I'm being serious: why would we want to compound the mistakes
          of `std::locale` by trying to improve it? It was a bad idea;
          let it die.<br>
        </div>
        <blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left:
          0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">
          <div dir=3D"ltr">
            <ul>
              <li>Follow best practices and see how Linux and POSIX
                handle locales.</li>
              <li>Follow library ICU.</li>
              <li>See boost::locale which extends std::locale.</li>
              <li>Use bottom up approach while designing. First define
                low level stuff (facets), then their use (e.g. in
                iostreams).</li>
            </ul>
            &lt;snip&gt;<br>
            <p><br>
            </p>
            <h3>Conclusion so far</h3>
            <p>Specifying the above facets are the absolute minimum to
              get a decent Unicode support.</p>
          </div>
        </blockquote>
        <div><br>
          How would this deal with Unicode case conversions which work
          based, not on letters, but <i>strings</i>? That is, a single
          lowercase codepoint converts into two uppercase ones, or
          vice-versa? How would this handle Unicode titlecase? And so
          forth.<br>
          <br>
          This is not "decent Unicode support" even if you ignore having
          to deal with `std::locale`'s garbage.<br>
          =C2=A0</div>
        <blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left:
          0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">
          <div dir=3D"ltr">
            <p>More advanced Unicode features like:</p>
            <ol>
              <li>Querying character properties like general category</li>
              <li>Language sensitive string case transformations (not
                character)</li>
              <li>Normalization</li>
            </ol>
            <p>Will need facets on their own.</p>
          </div>
        </blockquote>
      </div>
      -- <br>
      You received this message because you are subscribed to the Google
      Groups "ISO C++ Standard - Future Proposals" group.<br>
      To unsubscribe from this group and stop receiving emails from it,
      send an email to <a
        href=3D"mailto:std-proposals+unsubscribe@isocpp.org"
        moz-do-not-send=3D"true">std-proposals+unsubscribe@isocpp.org</a>.<=
br>
      To post to this group, send email to <a
        href=3D"mailto:std-proposals@isocpp.org" moz-do-not-send=3D"true">s=
td-proposals@isocpp.org</a>.<br>
      To view this discussion on the web visit <a
href=3D"https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/c28eab=
4e-b5d6-455f-aee7-3e212a00b135%40isocpp.org?utm_medium=3Demail&amp;utm_sour=
ce=3Dfooter"
        moz-do-not-send=3D"true">https://groups.google.com/a/isocpp.org/d/m=
sgid/std-proposals/c28eab4e-b5d6-455f-aee7-3e212a00b135%40isocpp.org</a>.<b=
r>
    </blockquote>
    <p><br>
    </p>
  </body>
</html>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/e55a7ff4-7953-832d-1535-5e309183fa44%=
40honermann.net?utm_medium=3Demail&utm_source=3Dfooter">https://groups.goog=
le.com/a/isocpp.org/d/msgid/std-proposals/e55a7ff4-7953-832d-1535-5e309183f=
a44%40honermann.net</a>.<br />

--------------12673CF8BAA329ECB1EBA8C6--

.


Author: =?UTF-8?B?0JTQuNC80LjRgtGA0LjRmCDQnNC40ZjQvtGB0LrQuA==?= <dim.mj.p@gmail.com>
Date: Wed, 28 Mar 2018 17:11:41 +0200
Raw View
--001a11451f08ab7ae905687a6c5f
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

AFAIK Unicode defines both simple 1-to-1 case transformations at character
level and more complex language sensitive and context sensitive case
transformations at string level. You can not just care about the second and
throw away the first. The ctype<char32_t> would handle the first.

On Wed, Mar 28, 2018 at 4:55 PM, <martinho.fernandes@native-instruments.de>
wrote:

>  I find that there are several important issues in this proposal that nee=
d
> to be addressed.
>
> On Wednesday, March 28, 2018 at 2:45:40 PM UTC+2, Dimitrij Mijoski wrote:
>>
>> Goals:
>>
>>    - [...]
>>    - Reuse std::locale and facet interfaces
>>
>> This goal is contradictory with the goal of Unicode support. Several of
> these interfaces are simply not suitable for Unicode support (and IMO
> should be deprecated). Some of the textbook counterexamples are right the=
re
> in the code sample provided. `charT toupper(charT, locale)` just cannot
> possibly correctly uppercase `=C3=9F` into `SS` (which AFAIK is still the
> correct way to uppercase this according to the CLDR locales). Even if the
> locale is changed so that it uppercases to `=E1=BA=9E` (following last ye=
ar's
> decision of the Rat f=C3=BCr deutsche Rechtschreibung, which makes it an =
option,
> but not a requirement), it's still impossible to uppercase some hundred
> other characters, e.g. U+01F0 LATIN SMALL LETTER J WITH CARON. The
> fundamental assumption existing in the locale interface is that case
> mapping is a 1:1 mapping, but that isn't true.
>
>>
>>    - See boost::locale which extends std::locale.
>>
>> Note that the Boost.Locale documentation even acknowledges the problem I
> described above: "You may notice that there are existing functions to_upp=
er
> and to_lower in the Boost.StringAlgo library. The difference is that thes=
e
> function operate over an entire string instead of performing incorrect
> character-by-character conversions."
>
>
>>    - Unicode - a standard that combines ~ 1 million characters into
>>    single set, then maps each character into unique integer and defines =
couple
>>    of encodings. Namely: UTF-32, UTF-16, and UTF-8. Then defines byte
>>    serialization of UTF-16 and UTF-32 as UTF-16-BE, UTF-16-LE, UTF-32-BE=
 and
>>    UTF-32-LE.
>>
>> Nitpick: Note that the Unicode Standard defines a lot more than
> characters and encodings.
>
>
>> 3. Future proposal
>>
>>
>> ctype<char32_t>
>>
>> We should completely avoid this gotcha and make ctype<char32_t> work out
>> of the box for the whole Unicode range. The locale name should modify on=
ly
>> the widen() and narrow() functions.
>>
>>
> As mentioned above, this is not enough because the interface itself is
> unsuitable for this purpose.
>
>
>> Defining this facet will automatically enable decent Unicode regexes.
>>
>
> This is really debatable. The only thing that `char32_t` gives is the
> ability to match on code points instead of matching on code units (which =
is
> a disaster with `char16_t` and `char`). However, this isn't enough for ev=
en
> regular expression Level 1 Conformance, because the facilities in <regex>
> are currently unsuited for this purpose.
>
>
>> ctype<char16_t>
>>
>> This should behave exactly same as the above, except that it will accept
>> only the first 65536 characters of Unicode, i.e. characters from the bas=
ic
>> multilingual plane (BMP).
>>
>
> This is just designing for deprecation. This ctype would prove entirely
> useless for UTF-16, for example. It's essentially UCS-2-only. Pretending
> UCS-2 is relevant is the same kind of mistake that <codecvt> made. This
> isn't "Unicode support"; it's "Unicode subset support". It's wishful
> thinking that people don't use, e.g. the Supplementary Ideographic Plane,
> mathematical symbols, or, heck, emoji. Let's not do that again.
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this topic, visit https://groups.google.com/a/
> isocpp.org/d/topic/std-proposals/Besva70LN3c/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> To view this discussion on the web visit https://groups.google.com/a/
> isocpp.org/d/msgid/std-proposals/8174836d-21fd-4030-
> aee9-bcb43d83d0fb%40isocpp.org
> <https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/8174836d-21=
fd-4030-aee9-bcb43d83d0fb%40isocpp.org?utm_medium=3Demail&utm_source=3Dfoot=
er>
> .
>

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/CAORbL%2BMw%3DVCaPuOvo-4NcekDSPgsvHXA6v2QD0wmFUf=
C8OgOsw%40mail.gmail.com.

--001a11451f08ab7ae905687a6c5f
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">AFAIK Unicode defines both simple 1-to-1 case transformati=
ons at character level and more complex language sensitive and context sens=
itive case transformations at string level. You can not just care about the=
 second and throw away the first. The ctype&lt;char32_t&gt; would handle th=
e first.<br></div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote"=
>On Wed, Mar 28, 2018 at 4:55 PM,  <span dir=3D"ltr">&lt;<a href=3D"mailto:=
martinho.fernandes@native-instruments.de" target=3D"_blank">martinho.fernan=
des@native-instruments.de</a>&gt;</span> wrote:<br><blockquote class=3D"gma=
il_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-lef=
t:1ex"><div dir=3D"ltr">=C2=A0I find that there are several important issue=
s in this proposal that need to be addressed.<br><br>On Wednesday, March 28=
, 2018 at 2:45:40 PM UTC+2, Dimitrij Mijoski wrote:<blockquote class=3D"gma=
il_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid;pa=
dding-left:1ex"><div dir=3D"ltr">Goals:
<ul><li>[...]<br></li><span class=3D""><li>Reuse std::locale and facet inte=
rfaces</li></span></ul></div></blockquote><div>This goal is contradictory w=
ith the goal of Unicode support. Several of these interfaces are simply not=
 suitable for Unicode support (and IMO should be deprecated). Some of the t=
extbook counterexamples are right there in the code sample provided. `charT=
 toupper(charT, locale)` just cannot possibly correctly uppercase `<span cl=
ass=3D"m_550288217103901240stringliteral">=C3=9F` into `SS` (which AFAIK is=
 still the correct way to uppercase this according to the CLDR locales). Ev=
en if the locale is changed </span><span class=3D"m_550288217103901240strin=
gliteral"><span class=3D"m_550288217103901240stringliteral">so that it uppe=
rcases to `=E1=BA=9E` </span>(following last year&#39;s decision of the Rat=
 f=C3=BCr deutsche Rechtschreibung, which makes it an option, but not a req=
uirement), it&#39;s still impossible to uppercase some hundred other charac=
ters, e.g. U+01F0 LATIN SMALL LETTER J WITH CARON. The fundamental assumpti=
on existing in the locale interface is that case mapping is a 1:1 mapping, =
but that isn&#39;t true.<br></span></div><span class=3D""><blockquote class=
=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc s=
olid;padding-left:1ex"><div dir=3D"ltr"><ul><li>See boost::locale which ext=
ends std::locale.</li></ul></div></blockquote></span><div>Note that the Boo=
st.Locale documentation even acknowledges the problem I described above: &q=
uot;You may notice that there are existing functions to_upper and to_lower =
in the Boost.StringAlgo library. The difference is that these function oper=
ate over an entire string instead of performing incorrect character-by-char=
acter conversions.&quot;<br><br></div><span class=3D""><blockquote class=3D=
"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc soli=
d;padding-left:1ex"><div dir=3D"ltr"><ul><li>Unicode - a standard that comb=
ines ~ 1 million characters into=20
single set, then maps each character into unique integer and defines=20
couple of encodings. Namely: UTF-32, UTF-16, and UTF-8. Then defines=20
byte serialization of UTF-16 and UTF-32 as UTF-16-BE, UTF-16-LE,=20
UTF-32-BE and UTF-32-LE.</li></ul></div></blockquote></span><div>Nitpick: N=
ote that the Unicode Standard defines a lot more than characters and encodi=
ngs.<br>=C2=A0<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0=
;margin-left:0.8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D=
"ltr"><span class=3D"">
<h2>3. Future proposal</h2><p><br></p>
<h3>ctype&lt;char32_t&gt;</h3><br></span><span class=3D""><p>We should comp=
letely avoid this gotcha and make <code>ctype&lt;char32_t&gt;</code> work o=
ut of the box for the whole Unicode range. The locale name should modify on=
ly the <code>widen()</code> and <code>narrow()</code> functions.</p>

<p></p></span></div></blockquote><div><br>As mentioned above, this is not e=
nough because the interface itself is unsuitable for this purpose.<br>=C2=
=A0</div><span class=3D""><blockquote class=3D"gmail_quote" style=3D"margin=
:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=
=3D"ltr"><p>Defining this facet will automatically enable decent Unicode re=
gexes.</p></div></blockquote></span><div><br>This is really debatable. The =
only thing that `char32_t` gives is the ability to match on code points ins=
tead of matching on code units (which is a disaster with `char16_t` and `ch=
ar`). However, this isn&#39;t enough for even regular expression Level 1 Co=
nformance, because the facilities in &lt;regex&gt; are currently unsuited f=
or this purpose.<br>=C2=A0</div><span class=3D""><blockquote class=3D"gmail=
_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid;padd=
ing-left:1ex"><div dir=3D"ltr"><p></p>
<h3>ctype&lt;char16_t&gt;</h3>
<p>This should behave exactly same as the above, except that it will=20
accept only the first 65536 characters of Unicode, i.e. characters from=20
the basic multilingual plane (BMP).</p></div></blockquote></span><div><br>T=
his is just designing for deprecation. This ctype would prove entirely usel=
ess for UTF-16, for example. It&#39;s essentially UCS-2-only. Pretending UC=
S-2 is relevant is the same kind of mistake that &lt;codecvt&gt; made. This=
 isn&#39;t &quot;Unicode support&quot;; it&#39;s &quot;Unicode subset suppo=
rt&quot;. It&#39;s wishful thinking that people don&#39;t use, e.g. the Sup=
plementary Ideographic Plane, mathematical symbols, or, heck, emoji. Let&#3=
9;s not do that again.<br><br></div></div><span class=3D"">

<p></p>

-- <br>
You received this message because you are subscribed to a topic in the Goog=
le Groups &quot;ISO C++ Standard - Future Proposals&quot; group.<br>
To unsubscribe from this topic, visit <a href=3D"https://groups.google.com/=
a/isocpp.org/d/topic/std-proposals/Besva70LN3c/unsubscribe" target=3D"_blan=
k">https://groups.google.com/a/<wbr>isocpp.org/d/topic/std-<wbr>proposals/B=
esva70LN3c/<wbr>unsubscribe</a>.<br>
To unsubscribe from this group and all its topics, send an email to <a href=
=3D"mailto:std-proposals+unsubscribe@isocpp.org" target=3D"_blank">std-prop=
osals+unsubscribe@<wbr>isocpp.org</a>.<br>
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org" target=3D"_blank">std-proposals@isocpp.org</a>.<br></span>
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/8174836d-21fd-4030-aee9-bcb43d83d0fb%=
40isocpp.org?utm_medium=3Demail&amp;utm_source=3Dfooter" target=3D"_blank">=
https://groups.google.com/a/<wbr>isocpp.org/d/msgid/std-<wbr>proposals/8174=
836d-21fd-4030-<wbr>aee9-bcb43d83d0fb%40isocpp.org</a><wbr>.<br>
</blockquote></div><br></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/CAORbL%2BMw%3DVCaPuOvo-4NcekDSPgsvHXA=
6v2QD0wmFUfC8OgOsw%40mail.gmail.com?utm_medium=3Demail&utm_source=3Dfooter"=
>https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CAORbL%2BMw%3=
DVCaPuOvo-4NcekDSPgsvHXA6v2QD0wmFUfC8OgOsw%40mail.gmail.com</a>.<br />

--001a11451f08ab7ae905687a6c5f--

.


Author: martinho.fernandes@native-instruments.de
Date: Wed, 28 Mar 2018 08:28:25 -0700 (PDT)
Raw View
------=_Part_1233_2140794702.1522250905350
Content-Type: multipart/alternative;
 boundary="----=_Part_1234_363284222.1522250905350"

------=_Part_1234_363284222.1522250905350
Content-Type: text/plain; charset="UTF-8"

Ooops, I forgot to reply to the list.


On Wednesday, March 28, 2018 at 5:11:44 PM UTC+2, Dimitrij Mijoski wrote:
>
> AFAIK Unicode defines both simple 1-to-1 case transformations at character
> level and more complex language sensitive and context sensitive case
> transformations at string level. You can not just care about the second and
> throw away the first. The ctype<char32_t> would handle the first.
>
>

But 1:1 mapping isn't language-sensitive, just like you said. std::locale
interfaces like `charT toupper(charT, locale)` are supposed to be
language-sensitive, and have fundamentally broken assumptions.

Arguably the simple case mappings are also not important for "Unicode
support": the Unicode standard defines toUppercase and toLowercase
operations in section 3.13, R1 and R2. They both use the full case mappings
and not the simple ones. The simple ones are just a poor fallback for when
one doesn't have full casing support (it's a poor fallback because they
have inconsistent casing of characters with diacritics: LATIN SMALL LETTER
A WITH GRAVE uppercases to LATIN CAPITAL LETTER A WITH GRAVE, but the
aforementioned LATIN SMALL LETTER J WITH CARON, doesn't uppercase to LATIN
CAPITAL LETTER J WITH CARON)

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/f5400abd-c89d-4a89-a7a9-f41a5e906c69%40isocpp.org.

------=_Part_1234_363284222.1522250905350
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Ooops, I forgot to reply to the list.<br><br><br>On Wednes=
day, March 28, 2018 at 5:11:44 PM UTC+2, Dimitrij Mijoski wrote:<blockquote=
 class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1=
px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr">AFAIK Unicode defines bo=
th simple 1-to-1 case transformations at character level and more complex l=
anguage sensitive and context sensitive case transformations at string leve=
l. You can not just care about the second and throw away the first. The cty=
pe&lt;char32_t&gt; would handle the first.<br></div>=C2=A0<br></blockquote>=
<div><br>But 1:1 mapping isn&#39;t language-sensitive, just like you said. =
std::locale interfaces like `charT toupper(charT, locale)` are supposed to =
be language-sensitive, and have fundamentally broken assumptions.<br><br>Ar=
guably the simple case mappings are also not important for &quot;Unicode su=
pport&quot;: the Unicode standard defines toUppercase and toLowercase opera=
tions in section 3.13, R1 and R2. They both use the full case mappings and =
not the simple ones. The simple ones are just a poor fallback for when one =
doesn&#39;t have full casing support (it&#39;s a poor fallback because they=
 have inconsistent casing of characters with diacritics: LATIN SMALL LETTER=
 A WITH GRAVE uppercases to LATIN CAPITAL LETTER A WITH GRAVE, but the afor=
ementioned LATIN SMALL LETTER J WITH CARON, doesn&#39;t uppercase to LATIN =
CAPITAL LETTER J WITH CARON)<br></div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/f5400abd-c89d-4a89-a7a9-f41a5e906c69%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/f5400abd-c89d-4a89-a7a9-f41a5e906c69=
%40isocpp.org</a>.<br />

------=_Part_1234_363284222.1522250905350--

------=_Part_1233_2140794702.1522250905350--

.


Author: =?UTF-8?B?0JTQuNC80LjRgtGA0LjRmCDQnNC40ZjQvtGB0LrQuA==?= <dim.mj.p@gmail.com>
Date: Wed, 28 Mar 2018 17:42:56 +0200
Raw View
--94eb2c081cde66c87505687adc2c
Content-Type: text/plain; charset="UTF-8"

1. For the sake of backward compatibility and easier porting,
ctype<char32_t> should be provided. For that same reason ICU provides
u_islower(), u_isupper(), etc. u_tolower() even if Unicode provides more
fine grained character classification.
2. For the more fine grained character classification other facets or free
functions may be added.
3. The simple case transformations are part of UnicodeData.txt and are good
enough for large number of languages and they will serve the purpose.

Sometimes you may even do this:

if (language uses simple case transformations completely) {
   algorithm that uses
   simple case transformation, should be faster

}
else {
   use string case
}

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CAORbL%2BN2wuj3XRxVBS8qA8k5M79_zkvxxhCoqOzsxOBZvhR_UQ%40mail.gmail.com.

--94eb2c081cde66c87505687adc2c
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div><div><div><div>1. For the sake of backward compa=
tibility and easier porting, ctype&lt;char32_t&gt; should be provided. For =
that same reason ICU provides u_islower(), u_isupper(), etc. u_tolower() ev=
en if Unicode provides more fine grained character classification.<br></div=
>2. For the more fine grained character classification other facets or free=
 functions may be added.<br></div>3. The simple case transformations are pa=
rt of UnicodeData.txt and are good enough for large number of languages and=
 they will serve the purpose.<br><br></div>Sometimes you may even do this:<=
br><br></div>if (language uses simple case transformations completely) {<br=
></div><div>=C2=A0=C2=A0 algorithm that uses<br></div>=C2=A0=C2=A0 simple c=
ase transformation, should be faster<br>=C2=A0=C2=A0 <br><div>}<br></div><d=
iv>else {<br></div><div>=C2=A0=C2=A0 use string case<br>}<br></div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/CAORbL%2BN2wuj3XRxVBS8qA8k5M79_zkvxxh=
CoqOzsxOBZvhR_UQ%40mail.gmail.com?utm_medium=3Demail&utm_source=3Dfooter">h=
ttps://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CAORbL%2BN2wuj3=
XRxVBS8qA8k5M79_zkvxxhCoqOzsxOBZvhR_UQ%40mail.gmail.com</a>.<br />

--94eb2c081cde66c87505687adc2c--

.


Author: Viacheslav Usov <via.usov@gmail.com>
Date: Wed, 28 Mar 2018 17:45:45 +0200
Raw View
--94eb2c061de684574d05687ae6d0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Wed, Mar 28, 2018 at 5:11 PM, =D0=94=D0=B8=D0=BC=D0=B8=D1=82=D1=80=D0=B8=
=D1=98 =D0=9C=D0=B8=D1=98=D0=BE=D1=81=D0=BA=D0=B8 <dim.mj.p@gmail.com>
wrote:

> AFAIK Unicode defines both simple 1-to-1 case transformations at
character level

Untrue in general. See, for example,
http://www.unicode.org/Public/UNIDATA/CaseFolding.txt

Only in some cases "simple" conversions are given in addition to "full".

Cheers,
V.

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/CAA7YVg0Lu3hwHhQ7WwRg82o7maF82%3D0fjWJ63C2P1DB8A=
U%3DofA%40mail.gmail.com.

--94eb2c061de684574d05687ae6d0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">On W=
ed, Mar 28, 2018 at 5:11 PM, =D0=94=D0=B8=D0=BC=D0=B8=D1=82=D1=80=D0=B8=D1=
=98 =D0=9C=D0=B8=D1=98=D0=BE=D1=81=D0=BA=D0=B8 <span dir=3D"ltr">&lt;<a hre=
f=3D"mailto:dim.mj.p@gmail.com" target=3D"_blank">dim.mj.p@gmail.com</a>&gt=
;</span> wrote:<br><div><br></div><div>&gt; AFAIK Unicode defines both simp=
le 1-to-1 case transformations at character level</div><div><br></div><div>=
Untrue in general. See, for example,=C2=A0<a href=3D"http://www.unicode.org=
/Public/UNIDATA/CaseFolding.txt">http://www.unicode.org/Public/UNIDATA/Case=
Folding.txt</a></div><div><br></div><div>Only in some cases &quot;simple&qu=
ot; conversions are given in addition to &quot;full&quot;.</div><div><br></=
div><div>Cheers,</div><div>V.</div></div></div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/CAA7YVg0Lu3hwHhQ7WwRg82o7maF82%3D0fjW=
J63C2P1DB8AU%3DofA%40mail.gmail.com?utm_medium=3Demail&utm_source=3Dfooter"=
>https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CAA7YVg0Lu3hw=
HhQ7WwRg82o7maF82%3D0fjWJ63C2P1DB8AU%3DofA%40mail.gmail.com</a>.<br />

--94eb2c061de684574d05687ae6d0--

.


Author: Tony V E <tvaneerd@gmail.com>
Date: Wed, 28 Mar 2018 12:09:28 -0400
Raw View
<html><head></head><body lang=3D"en-US" style=3D"background-color: rgb(255,=
 255, 255); line-height: initial;">                                        =
                                              <div style=3D"width: 100%; fo=
nt-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif=
; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, =
255, 255);">Yes, we could add this partial level of support.</div><div styl=
e=3D"width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', sa=
ns-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; backgro=
und-color: rgb(255, 255, 255);">But there are people currently working on f=
ull unicode support. (or 'fuller'. Hard to say if complete support is possi=
ble)</div><div style=3D"width: 100%; font-size: initial; font-family: Calib=
ri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-alig=
n: initial; background-color: rgb(255, 255, 255);"><br></div><div style=3D"=
width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', sans-se=
rif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-c=
olor: rgb(255, 255, 255);">So would we want this partial support, that _mig=
ht_ make C++20, or full support that will probably make C++23?<span style=
=3D"font-size: initial; text-align: initial; line-height: initial;"></span>=
</div>                                                                     =
                                                                <div style=
=3D"width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', san=
s-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; backgrou=
nd-color: rgb(255, 255, 255);"><br style=3D"display:initial"></div>        =
                                                                           =
                                                                           =
                                     <div style=3D"font-size: initial; font=
-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 1=
25); text-align: initial; background-color: rgb(255, 255, 255);">Sent&nbsp;=
from&nbsp;my&nbsp;BlackBerry&nbsp;portable&nbsp;Babbage&nbsp;Device</div>  =
                                                                           =
                                                                           =
                          <table width=3D"100%" style=3D"background-color:w=
hite;border-spacing:0px;"> <tbody><tr><td colspan=3D"2" style=3D"font-size:=
 initial; text-align: initial; background-color: rgb(255, 255, 255);">     =
                      <div style=3D"border-style: solid none none; border-t=
op-color: rgb(181, 196, 223); border-top-width: 1pt; padding: 3pt 0in 0in; =
font-family: Tahoma, 'BB Alpha Sans', 'Slate Pro'; font-size: 10pt;">  <div=
><b>From: </b>=D0=94=D0=B8=D0=BC=D0=B8=D1=82=D1=80=D0=B8=D1=98 =D0=9C=D0=B8=
=D1=98=D0=BE=D1=81=D0=BA=D0=B8</div><div><b>Sent: </b>Wednesday, March 28, =
2018 11:42 AM</div><div><b>To: </b>std-proposals@isocpp.org</div><div><b>Re=
ply To: </b>std-proposals@isocpp.org</div><div><b>Subject: </b>Re: [std-pro=
posals] Re: Unicode support by extending std::locale. Can we make it by 202=
0?</div></div></td></tr></tbody></table><div style=3D"border-style: solid n=
one none; border-top-color: rgb(186, 188, 209); border-top-width: 1pt; font=
-size: initial; text-align: initial; background-color: rgb(255, 255, 255);"=
></div><br><div id=3D"_originalContent" style=3D""><div dir=3D"ltr"><div><d=
iv><div><div><div>1. For the sake of backward compatibility and easier port=
ing, ctype&lt;char32_t&gt; should be provided. For that same reason ICU pro=
vides u_islower(), u_isupper(), etc. u_tolower() even if Unicode provides m=
ore fine grained character classification.<br></div>2. For the more fine gr=
ained character classification other facets or free functions may be added.=
<br></div>3. The simple case transformations are part of UnicodeData.txt an=
d are good enough for large number of languages and they will serve the pur=
pose.<br><br></div>Sometimes you may even do this:<br><br></div>if (languag=
e uses simple case transformations completely) {<br></div><div>&nbsp;&nbsp;=
 algorithm that uses<br></div>&nbsp;&nbsp; simple case transformation, shou=
ld be faster<br>&nbsp;&nbsp; <br><div>}<br></div><div>else {<br></div><div>=
&nbsp;&nbsp; use string case<br>}<br></div></div>

<p></p>

-- <br>
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.<br>
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br>
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br>
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/CAORbL%2BN2wuj3XRxVBS8qA8k5M79_zkvxxh=
CoqOzsxOBZvhR_UQ%40mail.gmail.com?utm_medium=3Demail&amp;utm_source=3Dfoote=
r">https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CAORbL%2BN2=
wuj3XRxVBS8qA8k5M79_zkvxxhCoqOzsxOBZvhR_UQ%40mail.gmail.com</a>.<br>
<br><!--end of _originalContent --></div></body></html>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/20180328160928.5083219.57460.48482%40=
gmail.com?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.com=
/a/isocpp.org/d/msgid/std-proposals/20180328160928.5083219.57460.48482%40gm=
ail.com</a>.<br />

.


Author: Thiago Macieira <thiago@macieira.org>
Date: Thu, 29 Mar 2018 22:30:07 -0700
Raw View
On quarta-feira, 28 de mar=C3=A7o de 2018 08:42:56 PDT =D0=94=D0=B8=D0=BC=
=D0=B8=D1=82=D1=80=D0=B8=D1=98 =D0=9C=D0=B8=D1=98=D0=BE=D1=81=D0=BA=D0=B8 w=
rote:
> if (language uses simple case transformations completely) {

That condition evaluates to a constant false. There are characters with no=
=20
simple transformation in the language-independent section. Therefore, all=
=20
languages have at least one complex transformation.

--=20
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center



--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/1870772.V5Mry1WRzf%40tjmaciei-mobl1.

.


Author: =?UTF-8?B?0JTQuNC80LjRgtGA0LjRmCDQnNC40ZjQvtGB0LrQuA==?= <dim.mj.p@gmail.com>
Date: Fri, 30 Mar 2018 13:16:00 +0000
Raw View
--001a114a857a422fc80568a10bdd
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

I dont understand this. Can you show examples?

=D0=9D=D0=B0 =D0=BF=D0=B5=D1=82., 2018 =D0=9C=D0=B0=D1=80 30 07:30, Thiago =
Macieira <thiago@macieira.org> =D0=BD=D0=B0=D0=BF=D0=B8=D1=88=D0=B0:

> On quarta-feira, 28 de mar=C3=A7o de 2018 08:42:56 PDT =D0=94=D0=B8=D0=BC=
=D0=B8=D1=82=D1=80=D0=B8=D1=98 =D0=9C=D0=B8=D1=98=D0=BE=D1=81=D0=BA=D0=B8 w=
rote:
> > if (language uses simple case transformations completely) {
>
> That condition evaluates to a constant false. There are characters with n=
o
> simple transformation in the language-independent section. Therefore, all
> languages have at least one complex transformation.
>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
>    Software Architect - Intel Open Source Technology Center
>
>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/a/isocpp.org/d/topic/std-proposals/Besva70LN3c/=
unsubscribe
> .
> To unsubscribe from this group and all its topics, send an email to
> std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> To view this discussion on the web visit
> https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/1870772.V5Mr=
y1WRzf%40tjmaciei-mobl1
> .
>

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/CAORbL%2BOr6WJYHAad9i%3DDaWwhUASvo5-9tC-HXcOkWRb=
W9wG0pg%40mail.gmail.com.

--001a114a857a422fc80568a10bdd
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto"><div>I dont understand this. Can you show examples?<br><b=
r><div class=3D"gmail_quote"><div dir=3D"ltr">=D0=9D=D0=B0 =D0=BF=D0=B5=D1=
=82., 2018 =D0=9C=D0=B0=D1=80 30 07:30, Thiago Macieira &lt;<a href=3D"mail=
to:thiago@macieira.org">thiago@macieira.org</a>&gt; =D0=BD=D0=B0=D0=BF=D0=
=B8=D1=88=D0=B0:<br></div><blockquote class=3D"gmail_quote" style=3D"margin=
:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On quarta-feira, 2=
8 de mar=C3=A7o de 2018 08:42:56 PDT =D0=94=D0=B8=D0=BC=D0=B8=D1=82=D1=80=
=D0=B8=D1=98 =D0=9C=D0=B8=D1=98=D0=BE=D1=81=D0=BA=D0=B8 wrote:<br>
&gt; if (language uses simple case transformations completely) {<br>
<br>
That condition evaluates to a constant false. There are characters with no<=
br>
simple transformation in the language-independent section. Therefore, all<b=
r>
languages have at least one complex transformation.<br>
<br>
--<br>
Thiago Macieira - thiago (AT) <a href=3D"http://macieira.info" rel=3D"noref=
errer noreferrer" target=3D"_blank">macieira.info</a> - thiago (AT) <a href=
=3D"http://kde.org" rel=3D"noreferrer noreferrer" target=3D"_blank">kde.org=
</a><br>
=C2=A0 =C2=A0Software Architect - Intel Open Source Technology Center<br>
<br>
<br>
<br>
--<br>
You received this message because you are subscribed to a topic in the Goog=
le Groups &quot;ISO C++ Standard - Future Proposals&quot; group.<br>
To unsubscribe from this topic, visit <a href=3D"https://groups.google.com/=
a/isocpp.org/d/topic/std-proposals/Besva70LN3c/unsubscribe" rel=3D"noreferr=
er noreferrer" target=3D"_blank">https://groups.google.com/a/isocpp.org/d/t=
opic/std-proposals/Besva70LN3c/unsubscribe</a>.<br>
To unsubscribe from this group and all its topics, send an email to <a href=
=3D"mailto:std-proposals%2Bunsubscribe@isocpp.org" target=3D"_blank" rel=3D=
"noreferrer">std-proposals+unsubscribe@isocpp.org</a>.<br>
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org" target=3D"_blank" rel=3D"noreferrer">std-proposals@isocpp.org</a>.<br=
>
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/1870772.V5Mry1WRzf%40tjmaciei-mobl1" =
rel=3D"noreferrer noreferrer" target=3D"_blank">https://groups.google.com/a=
/isocpp.org/d/msgid/std-proposals/1870772.V5Mry1WRzf%40tjmaciei-mobl1</a>.<=
br>
</blockquote></div></div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/CAORbL%2BOr6WJYHAad9i%3DDaWwhUASvo5-9=
tC-HXcOkWRbW9wG0pg%40mail.gmail.com?utm_medium=3Demail&utm_source=3Dfooter"=
>https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CAORbL%2BOr6W=
JYHAad9i%3DDaWwhUASvo5-9tC-HXcOkWRbW9wG0pg%40mail.gmail.com</a>.<br />

--001a114a857a422fc80568a10bdd--

.


Author: rmf@rmf.io
Date: Fri, 30 Mar 2018 06:27:54 -0700 (PDT)
Raw View
------=_Part_6065_166432053.1522416474907
Content-Type: text/plain; charset="UTF-8"

U+FB00 LATIN SMALL LIGATURE FF would be a prime example, I think.

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/d808230f-eb72-4914-8a14-57d52f1f6f03%40isocpp.org.

------=_Part_6065_166432053.1522416474907--

.


Author: Thiago Macieira <thiago@macieira.org>
Date: Fri, 30 Mar 2018 08:42:52 -0700
Raw View
On sexta-feira, 30 de mar=C3=A7o de 2018 06:16:00 PDT =D0=94=D0=B8=D0=BC=D0=
=B8=D1=82=D1=80=D0=B8=D1=98 =D0=9C=D0=B8=D1=98=D0=BE=D1=81=D0=BA=D0=B8 wrot=
e:
> I dont understand this. Can you show examples?
>=20

Uppercasing of =C3=9F to SS

--=20
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center



--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/2183936.8uoq0Qk8Ef%40tjmaciei-mobl1.

.


Author: =?UTF-8?B?0JTQuNC80LjRgtGA0LjRmCDQnNC40ZjQvtGB0LrQuA==?= <dim.mj.p@gmail.com>
Date: Fri, 30 Mar 2018 16:05:36 +0000
Raw View
--94eb2c12500cc3e14f0568a3692c
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

I don't see how the sharp s is language independent. It is used only in
German.

=D0=9D=D0=B0 =D0=BF=D0=B5=D1=82., 2018 =D0=9C=D0=B0=D1=80 30 17:42, Thiago =
Macieira <thiago@macieira.org> =D0=BD=D0=B0=D0=BF=D0=B8=D1=88=D0=B0:

> On sexta-feira, 30 de mar=C3=A7o de 2018 06:16:00 PDT =D0=94=D0=B8=D0=BC=
=D0=B8=D1=82=D1=80=D0=B8=D1=98 =D0=9C=D0=B8=D1=98=D0=BE=D1=81=D0=BA=D0=B8 w=
rote:
> > I dont understand this. Can you show examples?
> >
>
> Uppercasing of =C3=9F to SS
>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
>    Software Architect - Intel Open Source Technology Center
>
>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/a/isocpp.org/d/topic/std-proposals/Besva70LN3c/=
unsubscribe
> .
> To unsubscribe from this group and all its topics, send an email to
> std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> To view this discussion on the web visit
> https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/2183936.8uoq=
0Qk8Ef%40tjmaciei-mobl1
> .
>

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/CAORbL%2BOdETM5YVTZ%3Di1DDrE7-_P%2BBMgXnsuwp1afz=
R9mrCHeYg%40mail.gmail.com.

--94eb2c12500cc3e14f0568a3692c
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto">I don&#39;t see how the sharp s is language independent. =
It is used only in German.</div><br><div class=3D"gmail_quote"><div dir=3D"=
ltr">=D0=9D=D0=B0 =D0=BF=D0=B5=D1=82., 2018 =D0=9C=D0=B0=D1=80 30 17:42, Th=
iago Macieira &lt;<a href=3D"mailto:thiago@macieira.org">thiago@macieira.or=
g</a>&gt; =D0=BD=D0=B0=D0=BF=D0=B8=D1=88=D0=B0:<br></div><blockquote class=
=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex">On sexta-feira, 30 de mar=C3=A7o de 2018 06:16:00 PDT =D0=94=
=D0=B8=D0=BC=D0=B8=D1=82=D1=80=D0=B8=D1=98 =D0=9C=D0=B8=D1=98=D0=BE=D1=81=
=D0=BA=D0=B8 wrote:<br>
&gt; I dont understand this. Can you show examples?<br>
&gt;<br>
<br>
Uppercasing of =C3=9F to SS<br>
<br>
--<br>
Thiago Macieira - thiago (AT) <a href=3D"http://macieira.info" rel=3D"noref=
errer noreferrer" target=3D"_blank">macieira.info</a> - thiago (AT) <a href=
=3D"http://kde.org" rel=3D"noreferrer noreferrer" target=3D"_blank">kde.org=
</a><br>
=C2=A0 =C2=A0Software Architect - Intel Open Source Technology Center<br>
<br>
<br>
<br>
--<br>
You received this message because you are subscribed to a topic in the Goog=
le Groups &quot;ISO C++ Standard - Future Proposals&quot; group.<br>
To unsubscribe from this topic, visit <a href=3D"https://groups.google.com/=
a/isocpp.org/d/topic/std-proposals/Besva70LN3c/unsubscribe" rel=3D"noreferr=
er noreferrer" target=3D"_blank">https://groups.google.com/a/isocpp.org/d/t=
opic/std-proposals/Besva70LN3c/unsubscribe</a>.<br>
To unsubscribe from this group and all its topics, send an email to <a href=
=3D"mailto:std-proposals%2Bunsubscribe@isocpp.org" target=3D"_blank" rel=3D=
"noreferrer">std-proposals+unsubscribe@isocpp.org</a>.<br>
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org" target=3D"_blank" rel=3D"noreferrer">std-proposals@isocpp.org</a>.<br=
>
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/2183936.8uoq0Qk8Ef%40tjmaciei-mobl1" =
rel=3D"noreferrer noreferrer" target=3D"_blank">https://groups.google.com/a=
/isocpp.org/d/msgid/std-proposals/2183936.8uoq0Qk8Ef%40tjmaciei-mobl1</a>.<=
br>
</blockquote></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/CAORbL%2BOdETM5YVTZ%3Di1DDrE7-_P%2BBM=
gXnsuwp1afzR9mrCHeYg%40mail.gmail.com?utm_medium=3Demail&utm_source=3Dfoote=
r">https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CAORbL%2BOd=
ETM5YVTZ%3Di1DDrE7-_P%2BBMgXnsuwp1afzR9mrCHeYg%40mail.gmail.com</a>.<br />

--94eb2c12500cc3e14f0568a3692c--

.


Author: Thiago Macieira <thiago@macieira.org>
Date: Fri, 30 Mar 2018 10:35:47 -0700
Raw View
On sexta-feira, 30 de mar=C3=A7o de 2018 09:05:36 PDT =D0=94=D0=B8=D0=BC=D0=
=B8=D1=82=D1=80=D0=B8=D1=98 =D0=9C=D0=B8=D1=98=D0=BE=D1=81=D0=BA=D0=B8 wrot=
e:
> I don't see how the sharp s is language independent. It is used only in
> German.

It's irrelevant which language uses the character. The rule is language-
independent, therefore it applies to all languages. Any application failing=
 to=20
apply the rule is by definition broken. So please don't suggest people writ=
e=20
broken code.

CaseFolding.txt:
00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S
0149; F; 02BC 006E; # LATIN SMALL LETTER N PRECEDED BY APOSTROPHE
FB00; F; 0066 0066; # LATIN SMALL LIGATURE FF
FB01; F; 0066 0069; # LATIN SMALL LIGATURE FI
FB02; F; 0066 006C; # LATIN SMALL LIGATURE FL
FB03; F; 0066 0066 0069; # LATIN SMALL LIGATURE FFI
FB04; F; 0066 0066 006C; # LATIN SMALL LIGATURE FFL
FB05; F; 0073 0074; # LATIN SMALL LIGATURE LONG S T
FB06; F; 0073 0074; # LATIN SMALL LIGATURE ST

etc. (and those are just Latin script and not including combining character=
s)

--=20
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center



--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/2902681.DHahH4ZQGr%40tjmaciei-mobl1.

.


Author: =?UTF-8?B?0JTQuNC80LjRgtGA0LjRmCDQnNC40ZjQvtGB0LrQuA==?= <dim.mj.p@gmail.com>
Date: Fri, 30 Mar 2018 17:53:44 +0000
Raw View
--94eb2c12500c7b59060568a4ecd9
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Can you please stop trying to prove that I am wrong with some superficial
examples.

Can you show me real world example where you need to uppercase sharp s to
SS in Russian text?

There are many languages where case transformations are simple and I really
don't care if sharp s gets uppercased to SS. In fact having sharp s in
Russian text is wrong in the first place.

=D0=9D=D0=B0 =D0=BF=D0=B5=D1=82., 2018 =D0=9C=D0=B0=D1=80 30 19:35, Thiago =
Macieira <thiago@macieira.org> =D0=BD=D0=B0=D0=BF=D0=B8=D1=88=D0=B0:

> On sexta-feira, 30 de mar=C3=A7o de 2018 09:05:36 PDT =D0=94=D0=B8=D0=BC=
=D0=B8=D1=82=D1=80=D0=B8=D1=98 =D0=9C=D0=B8=D1=98=D0=BE=D1=81=D0=BA=D0=B8 w=
rote:
> > I don't see how the sharp s is language independent. It is used only in
> > German.
>
> It's irrelevant which language uses the character. The rule is language-
> independent, therefore it applies to all languages. Any application
> failing to
> apply the rule is by definition broken. So please don't suggest people
> write
> broken code.
>
> CaseFolding.txt:
> 00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S
> 0149; F; 02BC 006E; # LATIN SMALL LETTER N PRECEDED BY APOSTROPHE
> FB00; F; 0066 0066; # LATIN SMALL LIGATURE FF
> FB01; F; 0066 0069; # LATIN SMALL LIGATURE FI
> FB02; F; 0066 006C; # LATIN SMALL LIGATURE FL
> FB03; F; 0066 0066 0069; # LATIN SMALL LIGATURE FFI
> FB04; F; 0066 0066 006C; # LATIN SMALL LIGATURE FFL
> FB05; F; 0073 0074; # LATIN SMALL LIGATURE LONG S T
> FB06; F; 0073 0074; # LATIN SMALL LIGATURE ST
>
> etc. (and those are just Latin script and not including combining
> characters)
>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
>    Software Architect - Intel Open Source Technology Center
>
>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/a/isocpp.org/d/topic/std-proposals/Besva70LN3c/=
unsubscribe
> .
> To unsubscribe from this group and all its topics, send an email to
> std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> To view this discussion on the web visit
> https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/2902681.DHah=
H4ZQGr%40tjmaciei-mobl1
> .
>

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/CAORbL%2BPT2ShzE9qFtyZsEXEfwOEOxXkARiaK2PYkoZ%3D=
d3H%2BKYA%40mail.gmail.com.

--94eb2c12500c7b59060568a4ecd9
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto">Can you please stop trying to prove that I am wrong with =
some superficial examples.<div dir=3D"auto"><br></div><div dir=3D"auto">Can=
 you show me real world example where you need to uppercase sharp s to SS i=
n Russian text?</div><div dir=3D"auto"><br></div><div dir=3D"auto">There ar=
e many languages where case transformations are simple and I really don&#39=
;t care if sharp s gets uppercased to SS. In fact having sharp s in Russian=
 text is wrong in the first place.</div></div><br><div class=3D"gmail_quote=
"><div dir=3D"ltr">=D0=9D=D0=B0 =D0=BF=D0=B5=D1=82., 2018 =D0=9C=D0=B0=D1=
=80 30 19:35, Thiago Macieira &lt;<a href=3D"mailto:thiago@macieira.org">th=
iago@macieira.org</a>&gt; =D0=BD=D0=B0=D0=BF=D0=B8=D1=88=D0=B0:<br></div><b=
lockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px =
#ccc solid;padding-left:1ex">On sexta-feira, 30 de mar=C3=A7o de 2018 09:05=
:36 PDT =D0=94=D0=B8=D0=BC=D0=B8=D1=82=D1=80=D0=B8=D1=98 =D0=9C=D0=B8=D1=98=
=D0=BE=D1=81=D0=BA=D0=B8 wrote:<br>
&gt; I don&#39;t see how the sharp s is language independent. It is used on=
ly in<br>
&gt; German.<br>
<br>
It&#39;s irrelevant which language uses the character. The rule is language=
-<br>
independent, therefore it applies to all languages. Any application failing=
 to<br>
apply the rule is by definition broken. So please don&#39;t suggest people =
write<br>
broken code.<br>
<br>
CaseFolding.txt:<br>
00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S<br>
0149; F; 02BC 006E; # LATIN SMALL LETTER N PRECEDED BY APOSTROPHE<br>
FB00; F; 0066 0066; # LATIN SMALL LIGATURE FF<br>
FB01; F; 0066 0069; # LATIN SMALL LIGATURE FI<br>
FB02; F; 0066 006C; # LATIN SMALL LIGATURE FL<br>
FB03; F; 0066 0066 0069; # LATIN SMALL LIGATURE FFI<br>
FB04; F; 0066 0066 006C; # LATIN SMALL LIGATURE FFL<br>
FB05; F; 0073 0074; # LATIN SMALL LIGATURE LONG S T<br>
FB06; F; 0073 0074; # LATIN SMALL LIGATURE ST<br>
<br>
etc. (and those are just Latin script and not including combining character=
s)<br>
<br>
--<br>
Thiago Macieira - thiago (AT) <a href=3D"http://macieira.info" rel=3D"noref=
errer noreferrer" target=3D"_blank">macieira.info</a> - thiago (AT) <a href=
=3D"http://kde.org" rel=3D"noreferrer noreferrer" target=3D"_blank">kde.org=
</a><br>
=C2=A0 =C2=A0Software Architect - Intel Open Source Technology Center<br>
<br>
<br>
<br>
--<br>
You received this message because you are subscribed to a topic in the Goog=
le Groups &quot;ISO C++ Standard - Future Proposals&quot; group.<br>
To unsubscribe from this topic, visit <a href=3D"https://groups.google.com/=
a/isocpp.org/d/topic/std-proposals/Besva70LN3c/unsubscribe" rel=3D"noreferr=
er noreferrer" target=3D"_blank">https://groups.google.com/a/isocpp.org/d/t=
opic/std-proposals/Besva70LN3c/unsubscribe</a>.<br>
To unsubscribe from this group and all its topics, send an email to <a href=
=3D"mailto:std-proposals%2Bunsubscribe@isocpp.org" target=3D"_blank" rel=3D=
"noreferrer">std-proposals+unsubscribe@isocpp.org</a>.<br>
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org" target=3D"_blank" rel=3D"noreferrer">std-proposals@isocpp.org</a>.<br=
>
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/2902681.DHahH4ZQGr%40tjmaciei-mobl1" =
rel=3D"noreferrer noreferrer" target=3D"_blank">https://groups.google.com/a=
/isocpp.org/d/msgid/std-proposals/2902681.DHahH4ZQGr%40tjmaciei-mobl1</a>.<=
br>
</blockquote></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/CAORbL%2BPT2ShzE9qFtyZsEXEfwOEOxXkARi=
aK2PYkoZ%3Dd3H%2BKYA%40mail.gmail.com?utm_medium=3Demail&utm_source=3Dfoote=
r">https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CAORbL%2BPT=
2ShzE9qFtyZsEXEfwOEOxXkARiaK2PYkoZ%3Dd3H%2BKYA%40mail.gmail.com</a>.<br />

--94eb2c12500c7b59060568a4ecd9--

.


Author: rmf@rmf.io
Date: Fri, 30 Mar 2018 11:41:02 -0700 (PDT)
Raw View
------=_Part_6675_1813136535.1522435262740
Content-Type: text/plain; charset="UTF-8"

Example on the Russian wikipedia, first paragraph: https://ru.wikipedia.org/wiki/%D0%93%D1%80%D0%BE%D1%81%D0%B1%D0%B5%D1%80%D0%B5%D0%BD

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/89093a68-fa4c-4793-8b5b-38bd4a2c9b9b%40isocpp.org.

------=_Part_6675_1813136535.1522435262740--

.


Author: Thiago Macieira <thiago@macieira.org>
Date: Fri, 30 Mar 2018 11:45:14 -0700
Raw View
On sexta-feira, 30 de mar=C3=A7o de 2018 10:53:44 PDT =D0=94=D0=B8=D0=BC=D0=
=B8=D1=82=D1=80=D0=B8=D1=98 =D0=9C=D0=B8=D1=98=D0=BE=D1=81=D0=BA=D0=B8 wrot=
e:
> Can you please stop trying to prove that I am wrong with some superficial
> examples.
>=20
> Can you show me real world example where you need to uppercase sharp s to
> SS in Russian text?

=D1=84=D1=83=D1=82=D0=B1=D0=BE=D0=BB =D0=BD=D0=B0 =D0=BD=D0=B5=D0=BC=D0=B5=
=D1=86=D0=BA=D0=BE=D0=BC =D1=8F=D0=B7=D1=8B=D0=BA=D0=B5 =D0=BF=D1=80=D0=BE=
=D0=BF=D0=B8=D1=81=D0=B0=D0=BD "fu=C3=9Fball"

(using Google translate, apologies if it mangled grammar)

> There are many languages where case transformations are simple and I real=
ly
> don't care if sharp s gets uppercased to SS. In fact having sharp s in
> Russian text is wrong in the first place.

The point is that the application needs to be prepared for mixed-language=
=20
text. It cannot use its own locale settings since the text the user may hav=
e=20
input can be in other languages. I am writing this email in English but my =
UI=20
is in Portuguese (see the date my mail client added).

--=20
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center



--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/3502663.VuirZCD4Wq%40tjmaciei-mobl1.

.


Author: Viacheslav Usov <via.usov@gmail.com>
Date: Fri, 30 Mar 2018 20:54:42 +0200
Raw View
--000000000000f3fdde0568a5c5a0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Fri, Mar 30, 2018 at 7:53 PM, =D0=94=D0=B8=D0=BC=D0=B8=D1=82=D1=80=D0=B8=
=D1=98 =D0=9C=D0=B8=D1=98=D0=BE=D1=81=D0=BA=D0=B8 <dim.mj.p@gmail.com>
wrote:

> Can you show me real world example where you need to uppercase sharp s to
SS in Russian text?

Not sure about uppercasing, but case folding would in general be necessary
when searching in a text. Any Russian dictionary that has an entry for
=D1=80=D0=B5=D0=B9=D1=81=D0=BC=D0=B0=D1=81 (originally Rei=C3=9Fma=C3=9F) c=
ould use that advantageously.

That is not limited to dictionaries. A lot of Russian fiction has
substantial content in Latin scripts. Tolstoy's War and Peace is a prime
example of that, even though I am not so intimately familiar with it so as
to say you can (or not) find =C3=9F in it. Then there is scientific literat=
ure,
which at a minimum would be expected to contain references to other works,
whose titles and authors might be in Latin scripts.

In fact, I cannot think about a single closed domain where pure Cyrillic
content could be assumed. Primary school textbooks, perhaps?

Cheers,
V.

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/CAA7YVg2Bi8-U1TnFB-BMC4qW9PTA%3DbdjLq8D5MbRePwMU=
c194g%40mail.gmail.com.

--000000000000f3fdde0568a5c5a0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">On F=
ri, Mar 30, 2018 at 7:53 PM, =D0=94=D0=B8=D0=BC=D0=B8=D1=82=D1=80=D0=B8=D1=
=98 =D0=9C=D0=B8=D1=98=D0=BE=D1=81=D0=BA=D0=B8 <span dir=3D"ltr">&lt;<a hre=
f=3D"mailto:dim.mj.p@gmail.com" target=3D"_blank">dim.mj.p@gmail.com</a>&gt=
;</span> wrote:</div><div class=3D"gmail_quote"><br></div><div class=3D"gma=
il_quote">&gt; Can you show me real world example where you need to upperca=
se sharp s to SS in Russian text?</div><div class=3D"gmail_quote"><br></div=
><div class=3D"gmail_quote">Not sure about uppercasing, but case folding wo=
uld in general be necessary when searching in a text. Any Russian dictionar=
y that has an entry for =D1=80=D0=B5=D0=B9=D1=81=D0=BC=D0=B0=D1=81 (origina=
lly Rei=C3=9Fma=C3=9F) could use that advantageously.</div><div class=3D"gm=
ail_quote"><br></div><div class=3D"gmail_quote">That is not limited to dict=
ionaries. A lot of Russian fiction has substantial content in Latin scripts=
.. Tolstoy&#39;s War and Peace is a prime example of that, even though I am =
not so intimately familiar with it so as to say you can (or not) find =C3=
=9F in it. Then there is scientific literature, which at a minimum would be=
 expected to contain references to other works, whose titles and authors mi=
ght be in Latin scripts.</div><div class=3D"gmail_quote"><br></div><div cla=
ss=3D"gmail_quote">In fact, I cannot think about a single closed domain whe=
re pure Cyrillic content could be assumed. Primary school textbooks, perhap=
s?</div><div class=3D"gmail_quote"><br></div><div class=3D"gmail_quote">Che=
ers,</div><div class=3D"gmail_quote">V.</div></div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/CAA7YVg2Bi8-U1TnFB-BMC4qW9PTA%3DbdjLq=
8D5MbRePwMUc194g%40mail.gmail.com?utm_medium=3Demail&utm_source=3Dfooter">h=
ttps://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CAA7YVg2Bi8-U1T=
nFB-BMC4qW9PTA%3DbdjLq8D5MbRePwMUc194g%40mail.gmail.com</a>.<br />

--000000000000f3fdde0568a5c5a0--

.