Topic: Comments on P0372R0, A type for utf-8 data
Author: Tom Honermann <tom@honermann.net>
Date: Tue, 14 Jun 2016 22:55:36 -0400
Raw View
First, thank you for writing this paper! It has been on my todo list to
write such a proposal, but alas...
I spoke with Richard Smith about such a proposal in Jacksonville and he
mentioned a further justification for supporting a char8_t type -
optimization. Today, compilers are limited in optimizing code involving
char and unsigned char glvalues because these types are allowed to alias
objects of other types (C++14 3.10 [basic.lval] p10). If a char8_t type
were to be added that adhered to strict aliasing, then compilers could
more aggressively optimize code involving it. I think this may be a
benefit worth adding to the paper.
Tom.
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/5760C3A8.5060804%40honermann.net.
.
Author: Nicol Bolas <jmckesson@gmail.com>
Date: Tue, 14 Jun 2016 20:32:22 -0700 (PDT)
Raw View
------=_Part_1206_273730841.1465961543012
Content-Type: multipart/alternative;
boundary="----=_Part_1207_1618401840.1465961543013"
------=_Part_1207_1618401840.1465961543013
Content-Type: text/plain; charset=UTF-8
On Tuesday, June 14, 2016 at 10:55:38 PM UTC-4, Tom Honermann wrote:
>
> First, thank you for writing this paper! It has been on my todo list to
> write such a proposal, but alas...
>
> I spoke with Richard Smith about such a proposal in Jacksonville and he
> mentioned a further justification for supporting a char8_t type -
> optimization. Today, compilers are limited in optimizing code involving
> char and unsigned char glvalues because these types are allowed to alias
> objects of other types (C++14 3.10 [basic.lval] p10). If a char8_t type
> were to be added that adhered to strict aliasing, then compilers could
> more aggressively optimize code involving it. I think this may be a
> benefit worth adding to the paper.
>
I'm quite certain that the proposal makes this illegal:
const char8_t *str = "Some String";
`char8_t` is meant for UTF-8 strings *only*. And most people's strings are
narrow character strings; on specific platforms, this may work out to being
UTF-8, but there is no guarantee of that. We need to differentiate between
narrow character strings and UTF-8 encoded strings at the type level.
The last thing we want is to encourage people to do this:
auto str = (const char8_t *)"Some String";
If people start trying doing casts like that to take advantage of more
aggressive optimizations, then we'll be right back where we were before: we
won't know if a string *really is* UTF-8 or not.
Solving the "char as byte array and string" problem is important. But we
shouldn't suggest that `char8_t` constitutes such a solution.
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/5b62dc8c-b02e-46ec-91cd-2598965a73ff%40isocpp.org.
------=_Part_1207_1618401840.1465961543013
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">On Tuesday, June 14, 2016 at 10:55:38 PM UTC-4, Tom Honerm=
ann wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left:=
0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">First, thank you for=
writing this paper! =C2=A0It has been on my todo list to=20
<br>write such a proposal, but alas...
<br>
<br>I spoke with Richard Smith about such a proposal in Jacksonville and he=
=20
<br>mentioned a further justification for supporting a char8_t type -=20
<br>optimization. =C2=A0Today, compilers are limited in optimizing code inv=
olving=20
<br>char and unsigned char glvalues because these types are allowed to alia=
s=20
<br>objects of other types (C++14 3.10 [basic.lval] p10). =C2=A0If a char8_=
t type=20
<br>were to be added that adhered to strict aliasing, then compilers could=
=20
<br>more aggressively optimize code involving it. =C2=A0I think this may be=
a=20
<br>benefit worth adding to the paper.
<br></blockquote><div><br>I'm quite certain that the proposal makes thi=
s illegal:<br><br>const char8_t *str =3D "Some String";<br><br>`c=
har8_t` is meant for UTF-8 strings <i>only</i>. And most people's strin=
gs are narrow character strings; on specific platforms, this may work out t=
o being UTF-8, but there is no guarantee of that. We need to differentiate =
between narrow character strings and UTF-8 encoded strings at the type leve=
l.<br><br>The last thing we want is to encourage people to do this:<br><br>=
auto str =3D (const char8_t *)"Some String";<br><br>If people sta=
rt trying doing casts like that to take advantage of more aggressive optimi=
zations, then we'll be right back where we were before: we won't kn=
ow if a string <i>really is</i> UTF-8 or not.<br><br>Solving the "char=
as byte array and string" problem is important. But we shouldn't =
suggest that `char8_t` constitutes such a solution.<br> </div></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/5b62dc8c-b02e-46ec-91cd-2598965a73ff%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/5b62dc8c-b02e-46ec-91cd-2598965a73ff=
%40isocpp.org</a>.<br />
------=_Part_1207_1618401840.1465961543013--
------=_Part_1206_273730841.1465961543012--
.
Author: Tom Honermann <tom@honermann.net>
Date: Tue, 14 Jun 2016 23:44:17 -0400
Raw View
This is a multi-part message in MIME format.
--------------000006080000030302000900
Content-Type: text/plain; charset=UTF-8; format=flowed
On 06/14/2016 11:32 PM, Nicol Bolas wrote:
> On Tuesday, June 14, 2016 at 10:55:38 PM UTC-4, Tom Honermann wrote:
>
> First, thank you for writing this paper! It has been on my todo
> list to
> write such a proposal, but alas...
>
> I spoke with Richard Smith about such a proposal in Jacksonville
> and he
> mentioned a further justification for supporting a char8_t type -
> optimization. Today, compilers are limited in optimizing code
> involving
> char and unsigned char glvalues because these types are allowed to
> alias
> objects of other types (C++14 3.10 [basic.lval] p10). If a
> char8_t type
> were to be added that adhered to strict aliasing, then compilers
> could
> more aggressively optimize code involving it. I think this may be a
> benefit worth adding to the paper.
>
>
> I'm quite certain that the proposal makes this illegal:
>
> const char8_t *str = "Some String";
I would hope so.
> `char8_t` is meant for UTF-8 strings /only/. And most people's strings
> are narrow character strings; on specific platforms, this may work out
> to being UTF-8, but there is no guarantee of that. We need to
> differentiate between narrow character strings and UTF-8 encoded
> strings at the type level.
>
> The last thing we want is to encourage people to do this:
>
> auto str = (const char8_t *)"Some String";
I agree.
> If people start trying doing casts like that to take advantage of more
> aggressive optimizations, then we'll be right back where we were
> before: we won't know if a string /really is/ UTF-8 or not.
>
> Solving the "char as byte array and string" problem is important. But
> we shouldn't suggest that `char8_t` constitutes such a solution.
I don't think the ability to abuse a feature should be sufficient
justification to not add it. I did not intend to suggest that char8_t
be used to circumvent existing aliasing rules. Rather, that giving it
strict aliasing behavior would enable optimizations for UTF-8 data.
That could potentially provide some motivation towards using UTF-8
strings in preference to narrow strings.
Tom.
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/5760CF11.20807%40honermann.net.
--------------000006080000030302000900
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<html>
<head>
<meta content=3D"text/html; charset=3Dutf-8" http-equiv=3D"Content-Type=
">
</head>
<body bgcolor=3D"#FFFFFF" text=3D"#000000">
<div class=3D"moz-cite-prefix">On 06/14/2016 11:32 PM, Nicol Bolas
wrote:<br>
</div>
<blockquote
cite=3D"mid:5b62dc8c-b02e-46ec-91cd-2598965a73ff@isocpp.org"
type=3D"cite">
<div dir=3D"ltr">On Tuesday, June 14, 2016 at 10:55:38 PM UTC-4, Tom
Honermann wrote:
<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left:
0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">First,
thank you for writing this paper! =C2=A0It has been on my todo li=
st
to <br>
write such a proposal, but alas...
<br>
<br>
I spoke with Richard Smith about such a proposal in
Jacksonville and he <br>
mentioned a further justification for supporting a char8_t
type - <br>
optimization. =C2=A0Today, compilers are limited in optimizing co=
de
involving <br>
char and unsigned char glvalues because these types are
allowed to alias <br>
objects of other types (C++14 3.10 [basic.lval] p10). =C2=A0If a
char8_t type <br>
were to be added that adhered to strict aliasing, then
compilers could <br>
more aggressively optimize code involving it. =C2=A0I think this
may be a <br>
benefit worth adding to the paper.
<br>
</blockquote>
<div><br>
I'm quite certain that the proposal makes this illegal:<br>
<br>
const char8_t *str =3D "Some String";<br>
</div>
</div>
</blockquote>
<br>
I would hope so.<br>
<br>
<blockquote
cite=3D"mid:5b62dc8c-b02e-46ec-91cd-2598965a73ff@isocpp.org"
type=3D"cite">
<div dir=3D"ltr">
<div>`char8_t` is meant for UTF-8 strings <i>only</i>. And most
people's strings are narrow character strings; on specific
platforms, this may work out to being UTF-8, but there is no
guarantee of that. We need to differentiate between narrow
character strings and UTF-8 encoded strings at the type level.<br=
>
<br>
The last thing we want is to encourage people to do this:<br>
<br>
auto str =3D (const char8_t *)"Some String";<br>
</div>
</div>
</blockquote>
<br>
I agree.<br>
<br>
<blockquote
cite=3D"mid:5b62dc8c-b02e-46ec-91cd-2598965a73ff@isocpp.org"
type=3D"cite">
<div dir=3D"ltr">
<div>If people start trying doing casts like that to take
advantage of more aggressive optimizations, then we'll be
right back where we were before: we won't know if a string <i>rea=
lly
is</i> UTF-8 or not.<br>
<br>
Solving the "char as byte array and string" problem is
important. But we shouldn't suggest that `char8_t` constitutes
such a solution.<br>
</div>
</div>
</blockquote>
<br>
I don't think the ability to abuse a feature should be sufficient
justification to not add it.=C2=A0 I did not intend to suggest that
char8_t be used to circumvent existing aliasing rules.=C2=A0 Rather, th=
at
giving it strict aliasing behavior would enable optimizations for
UTF-8 data.=C2=A0 That could potentially provide some motivation toward=
s
using UTF-8 strings in preference to narrow strings.<br>
<br>
Tom.<br>
</body>
</html>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/5760CF11.20807%40honermann.net?utm_me=
dium=3Demail&utm_source=3Dfooter">https://groups.google.com/a/isocpp.org/d/=
msgid/std-proposals/5760CF11.20807%40honermann.net</a>.<br />
--------------000006080000030302000900--
.
Author: Tom Honermann <tom@honermann.net>
Date: Wed, 15 Jun 2016 11:07:19 -0400
Raw View
On 6/14/2016 10:55 PM, Tom Honermann wrote:
> First, thank you for writing this paper! It has been on my todo list
> to write such a proposal, but alas...
>
> I spoke with Richard Smith about such a proposal in Jacksonville and
> he mentioned a further justification for supporting a char8_t type -
> optimization. Today, compilers are limited in optimizing code
> involving char and unsigned char glvalues because these types are
> allowed to alias objects of other types (C++14 3.10 [basic.lval]
> p10). If a char8_t type were to be added that adhered to strict
> aliasing, then compilers could more aggressively optimize code
> involving it. I think this may be a benefit worth adding to the paper.
I'd also like to propose that the implicit conversion from u8"" to const
char[] and u8'x' to char be introduced as deprecated features that can
be removed in a future standard.
Is there any implementation experience? Any chance that patches to gcc
or Clang exist? If so, I would be interested in experimenting with them.
Tom.
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/b784f120-033e-20c4-da3e-af25056b598c%40honermann.net.
.
Author: Nicol Bolas <jmckesson@gmail.com>
Date: Wed, 15 Jun 2016 08:56:32 -0700 (PDT)
Raw View
------=_Part_694_1215804870.1466006192419
Content-Type: multipart/alternative;
boundary="----=_Part_695_869998225.1466006192419"
------=_Part_695_869998225.1466006192419
Content-Type: text/plain; charset=UTF-8
On Tuesday, June 14, 2016 at 11:44:19 PM UTC-4, Tom Honermann wrote:
>
> On 06/14/2016 11:32 PM, Nicol Bolas wrote:
>
> On Tuesday, June 14, 2016 at 10:55:38 PM UTC-4, Tom Honermann wrote:
>>
>> First, thank you for writing this paper! It has been on my todo list to
>> write such a proposal, but alas...
>>
>> I spoke with Richard Smith about such a proposal in Jacksonville and he
>> mentioned a further justification for supporting a char8_t type -
>> optimization. Today, compilers are limited in optimizing code involving
>> char and unsigned char glvalues because these types are allowed to alias
>> objects of other types (C++14 3.10 [basic.lval] p10). If a char8_t type
>> were to be added that adhered to strict aliasing, then compilers could
>> more aggressively optimize code involving it. I think this may be a
>> benefit worth adding to the paper.
>>
>
> I'm quite certain that the proposal makes this illegal:
>
> const char8_t *str = "Some String";
>
>
> I would hope so.
>
> `char8_t` is meant for UTF-8 strings *only*. And most people's strings
> are narrow character strings; on specific platforms, this may work out to
> being UTF-8, but there is no guarantee of that. We need to differentiate
> between narrow character strings and UTF-8 encoded strings at the type
> level.
>
> The last thing we want is to encourage people to do this:
>
> auto str = (const char8_t *)"Some String";
>
>
> I agree.
>
> If people start trying doing casts like that to take advantage of more
> aggressive optimizations, then we'll be right back where we were before: we
> won't know if a string *really is* UTF-8 or not.
>
> Solving the "char as byte array and string" problem is important. But we
> shouldn't suggest that `char8_t` constitutes such a solution.
>
>
> I don't think the ability to abuse a feature should be sufficient
> justification to not add it. I did not intend to suggest that char8_t be
> used to circumvent existing aliasing rules. Rather, that giving it strict
> aliasing behavior would enable optimizations for UTF-8 data. That could
> potentially provide some motivation towards using UTF-8 strings in
> preference to narrow strings.
>
Right, but it already has that. `char8_t`, based on the "unique, unsigned
type" statement in the proposal, is a different type from `char` and
`unsigned char`. It has the same value representation as those two, but the
way strict aliasing is defined already does not allow `char8_t*` to alias
with other types. Just as it doesn't allow `char16_t*` or `char32_t*` to do
so. The same goes for enums who use `char` as their underlying types;
arrays of them are not `char*`s to the strict aliasing rules.
The strict aliasing rules do not care what the underlying type of something
is.
What I'm saying is that we shouldn't *advertise* this as a selling point of
the feature. It shouldn't be listed in the motivation section, for example.
Otherwise you will encourage people to abuse it.
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/66a5b78e-5445-40c8-80e7-c550416bf0cf%40isocpp.org.
------=_Part_695_869998225.1466006192419
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">On Tuesday, June 14, 2016 at 11:44:19 PM UTC-4, Tom Honerm=
ann wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left:=
0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">
=20
=20
=20
<div bgcolor=3D"#FFFFFF" text=3D"#000000">
<div>On 06/14/2016 11:32 PM, Nicol Bolas
wrote:<br>
</div>
<blockquote type=3D"cite">
<div dir=3D"ltr">On Tuesday, June 14, 2016 at 10:55:38 PM UTC-4, Tom
Honermann wrote:
<blockquote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8=
ex;border-left:1px #ccc solid;padding-left:1ex">First,
thank you for writing this paper! =C2=A0It has been on my todo li=
st
to <br>
write such a proposal, but alas...
<br>
<br>
I spoke with Richard Smith about such a proposal in
Jacksonville and he <br>
mentioned a further justification for supporting a char8_t
type - <br>
optimization. =C2=A0Today, compilers are limited in optimizing co=
de
involving <br>
char and unsigned char glvalues because these types are
allowed to alias <br>
objects of other types (C++14 3.10 [basic.lval] p10). =C2=A0If a
char8_t type <br>
were to be added that adhered to strict aliasing, then
compilers could <br>
more aggressively optimize code involving it. =C2=A0I think this
may be a <br>
benefit worth adding to the paper.
<br>
</blockquote>
<div><br>
I'm quite certain that the proposal makes this illegal:<br>
<br>
const char8_t *str =3D "Some String";<br>
</div>
</div>
</blockquote>
<br>
I would hope so.<br>
<br>
<blockquote type=3D"cite">
<div dir=3D"ltr">
<div>`char8_t` is meant for UTF-8 strings <i>only</i>. And most
people's strings are narrow character strings; on specific
platforms, this may work out to being UTF-8, but there is no
guarantee of that. We need to differentiate between narrow
character strings and UTF-8 encoded strings at the type level.<br=
>
<br>
The last thing we want is to encourage people to do this:<br>
<br>
auto str =3D (const char8_t *)"Some String";<br>
</div>
</div>
</blockquote>
<br>
I agree.<br>
<br>
<blockquote type=3D"cite">
<div dir=3D"ltr">
<div>If people start trying doing casts like that to take
advantage of more aggressive optimizations, then we'll be
right back where we were before: we won't know if a string <i=
>really
is</i> UTF-8 or not.<br>
<br>
Solving the "char as byte array and string" problem is
important. But we shouldn't suggest that `char8_t` constitute=
s
such a solution.<br>
</div>
</div>
</blockquote>
<br>
I don't think the ability to abuse a feature should be sufficient
justification to not add it.=C2=A0 I did not intend to suggest that
char8_t be used to circumvent existing aliasing rules.=C2=A0 Rather, th=
at
giving it strict aliasing behavior would enable optimizations for
UTF-8 data.=C2=A0 That could potentially provide some motivation toward=
s
using UTF-8 strings in preference to narrow strings.<br></div></blockqu=
ote><div><br>Right, but it already has that. `char8_t`, based on the "=
unique, unsigned type" statement in the proposal, is a different type =
from `char` and `unsigned char`. It has the same value representation as th=
ose two, but the way strict aliasing is defined already does not allow `cha=
r8_t*` to alias with other types. Just as it doesn't allow `char16_t*` =
or `char32_t*` to do so. The same goes for enums who use `char` as their un=
derlying types; arrays of them are not `char*`s to the strict aliasing rule=
s.<br><br>The strict aliasing rules do not care what the underlying type of=
something is.<br><br>What I'm saying is that we shouldn't <i>adver=
tise</i> this as a selling point of the feature. It shouldn't be listed=
in the motivation section, for example. Otherwise you will encourage peopl=
e to abuse it.<br></div></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/66a5b78e-5445-40c8-80e7-c550416bf0cf%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/66a5b78e-5445-40c8-80e7-c550416bf0cf=
%40isocpp.org</a>.<br />
------=_Part_695_869998225.1466006192419--
------=_Part_694_1215804870.1466006192419--
.
Author: Tom Honermann <tom@honermann.net>
Date: Wed, 15 Jun 2016 17:21:15 -0400
Raw View
This is a multi-part message in MIME format.
--------------1BD2FF3062EF6430B873773D
Content-Type: text/plain; charset=UTF-8; format=flowed
On 6/15/2016 11:56 AM, Nicol Bolas wrote:
>
>> If people start trying doing casts like that to take advantage of
>> more aggressive optimizations, then we'll be right back where we
>> were before: we won't know if a string /really is/ UTF-8 or not.
>>
>> Solving the "char as byte array and string" problem is important.
>> But we shouldn't suggest that `char8_t` constitutes such a solution.
>
> I don't think the ability to abuse a feature should be sufficient
> justification to not add it. I did not intend to suggest that
> char8_t be used to circumvent existing aliasing rules. Rather,
> that giving it strict aliasing behavior would enable optimizations
> for UTF-8 data. That could potentially provide some motivation
> towards using UTF-8 strings in preference to narrow strings.
>
>
> Right, but it already has that. `char8_t`, based on the "unique,
> unsigned type" statement in the proposal, is a different type from
> `char` and `unsigned char`. It has the same value representation as
> those two, but the way strict aliasing is defined already does not
> allow `char8_t*` to alias with other types. Just as it doesn't allow
> `char16_t*` or `char32_t*` to do so. The same goes for enums who use
> `char` as their underlying types; arrays of them are not `char*`s to
> the strict aliasing rules.
Until we have wording or the proposal states otherwise, we don't know
what we have. I agree that changes particular to aliasing would have to
be made to the standard if the type was intended not to follow strict
aliasing rules.
> The strict aliasing rules do not care what the underlying type of
> something is.
>
> What I'm saying is that we shouldn't /advertise/ this as a selling
> point of the feature. It shouldn't be listed in the motivation
> section, for example. Otherwise you will encourage people to abuse it.
Uh oh, I think the cat is out of the bag...
Name a feature that people haven't figured out how to abuse. This isn't
any different. Listing the potential benefit and facilitating
discussion on the potential for abuse strikes me as a better approach
than "shhh".
Tom.
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/3156186d-a7ac-224d-471f-943879f7bae3%40honermann.net.
--------------1BD2FF3062EF6430B873773D
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<html>
<head>
<meta content=3D"text/html; charset=3Dutf-8" http-equiv=3D"Content-Type=
">
</head>
<body bgcolor=3D"#FFFFFF" text=3D"#000000">
<div class=3D"moz-cite-prefix">On 6/15/2016 11:56 AM, Nicol Bolas
wrote:<br>
</div>
<blockquote
cite=3D"mid:66a5b78e-5445-40c8-80e7-c550416bf0cf@isocpp.org"
type=3D"cite">
<div dir=3D"ltr"><br>
<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left:
0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">
<div bgcolor=3D"#FFFFFF" text=3D"#000000">
<blockquote type=3D"cite">
<div dir=3D"ltr">
<div>If people start trying doing casts like that to
take advantage of more aggressive optimizations, then
we'll be right back where we were before: we won't
know if a string <i>really is</i> UTF-8 or not.<br>
<br>
Solving the "char as byte array and string" problem is
important. But we shouldn't suggest that `char8_t`
constitutes such a solution.<br>
</div>
</div>
</blockquote>
<br>
I don't think the ability to abuse a feature should be
sufficient justification to not add it.=C2=A0 I did not intend =
to
suggest that char8_t be used to circumvent existing aliasing
rules.=C2=A0 Rather, that giving it strict aliasing behavior
would enable optimizations for UTF-8 data.=C2=A0 That could
potentially provide some motivation towards using UTF-8
strings in preference to narrow strings.<br>
</div>
</blockquote>
<div><br>
Right, but it already has that. `char8_t`, based on the
"unique, unsigned type" statement in the proposal, is a
different type from `char` and `unsigned char`. It has the
same value representation as those two, but the way strict
aliasing is defined already does not allow `char8_t*` to alias
with other types. Just as it doesn't allow `char16_t*` or
`char32_t*` to do so. The same goes for enums who use `char`
as their underlying types; arrays of them are not `char*`s to
the strict aliasing rules.<br>
</div>
</div>
</blockquote>
<br>
Until we have wording or the proposal states otherwise, we don't
know what we have.=C2=A0 I agree that changes particular to aliasing
would have to be made to the standard if the type was intended not
to follow strict aliasing rules.<br>
<br>
<blockquote
cite=3D"mid:66a5b78e-5445-40c8-80e7-c550416bf0cf@isocpp.org"
type=3D"cite">
<div dir=3D"ltr">
<div>The strict aliasing rules do not care what the underlying
type of something is.<br>
<br>
What I'm saying is that we shouldn't <i>advertise</i> this as
a selling point of the feature. It shouldn't be listed in the
motivation section, for example. Otherwise you will encourage
people to abuse it.</div>
</div>
</blockquote>
<br>
Uh oh, I think the cat is out of the bag...<br>
<br>
Name a feature that people haven't figured out how to abuse.=C2=A0 This
isn't any different.=C2=A0 Listing the potential benefit and facilitati=
ng
discussion on the potential for abuse strikes me as a better
approach than "shhh".<br>
<br>
Tom.<br>
</body>
</html>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/3156186d-a7ac-224d-471f-943879f7bae3%=
40honermann.net?utm_medium=3Demail&utm_source=3Dfooter">https://groups.goog=
le.com/a/isocpp.org/d/msgid/std-proposals/3156186d-a7ac-224d-471f-943879f7b=
ae3%40honermann.net</a>.<br />
--------------1BD2FF3062EF6430B873773D--
.
Author: Michael Spencer <bigcheesegs@gmail.com>
Date: Fri, 17 Jun 2016 13:18:41 -0700
Raw View
On Tue, Jun 14, 2016 at 7:55 PM, Tom Honermann <tom@honermann.net> wrote:
> First, thank you for writing this paper! It has been on my todo list to
> write such a proposal, but alas...
>
> I spoke with Richard Smith about such a proposal in Jacksonville and he
> mentioned a further justification for supporting a char8_t type -
> optimization. Today, compilers are limited in optimizing code involving
> char and unsigned char glvalues because these types are allowed to alias
> objects of other types (C++14 3.10 [basic.lval] p10). If a char8_t type
> were to be added that adhered to strict aliasing, then compilers could more
> aggressively optimize code involving it. I think this may be a benefit
> worth adding to the paper.
>
> Tom.
I also had a quite similar conversation with Richard :).
We did consider covering aliasing in the paper, but in the end we felt
that it detracted from the core message of C++ needing a type for
utf-8. The aliasing properties are indeed useful for optimization, but
just adding new distinct types is a bad solution to the general
aliasing problem.
- Michael Spencer
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CACXTSimGN_8Gkoz%3DX0944auizKWNB4d%3DQD_1jivK_pfjd6rBqA%40mail.gmail.com.
.