Topic: std::byte thoughts
Author: pecholt@gmail.com
Date: Mon, 31 Oct 2016 05:28:20 -0700 (PDT)
Raw View
------=_Part_1716_1413286481.1477916900581
Content-Type: multipart/alternative;
boundary="----=_Part_1717_1049496307.1477916900581"
------=_Part_1717_1049496307.1477916900581
Content-Type: text/plain; charset=UTF-8
I noticed "a byte type definition" proposal (P0298R1) in last wg21 mailings
and I want to share some concerns. While I agree fundamental byte type is
indeed missing in C++ and should be added I don't think the presented way
is the right one. The proposal suggests to add it as a library type in
namespace std. It argues C++17 is expressive enough for a simple library
definition, as opposed to a keyword. I am not saying it's not possible but
the question is if the new type is added this way will it fit with other
fundamental types defined as keywords? Will it bring confusion to the
users? And this is a weak point of this proposal imho.
The fundamental type set can be already seen as confusing. First we have
shortcuts like *short int* becomes a *short*, *signed int* becomes *int *etc.
This is obviously shared with C and cannot be changed. We are all used to
live with that. But then there is a *char* as a distinct type not a
shortcut for *signed char*. OK confusion number one but it happened long
time ago so nothing to do here too. Then the user has to learn some newly
added types contain _t suffix like *wchar_t*, *char16_t* and *char32_t*.
Most inexperienced programmers tend to think _t means a typedefed type
because that is a convention used by some libraries. *wchar_t* was really
implemented as a typedef in earlier MS compilers. But that's no longer
true. All these types are distinct opaque types so typedef cannot be used
for that. I assume _t was used as an uglifier which helps preserving source
code compatibility. So confusion number two. And now we want to add
*std::byte*. I guess using namespace std everywhere is not a good practice
and using it in headers is widely considered wrong. So users will have to
learn another rule - new generation types have to be prefixed with
namespace qualifier. I cannot imagine any other way how to bring the
confusion to a higher level than this.
We are still talking about fundamental type set not anything complicated.
Simplicity and clarity matters. So why so many rules, exceptions and
experiments? Why don't we use byte_t keyword so it will at least fit with
2nd generation types? If that is not possible I consider the new proposal
so confusing to the users that it shouldn't be accepted. We live without it
since the beginning anyway. What do the others here think?
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/7da80620-e4bd-40eb-a4ec-4f3c2dcb2c98%40isocpp.org.
------=_Part_1717_1049496307.1477916900581
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">I noticed "a byte type definition" proposal (P02=
98R1) in last wg21 mailings and I want to share some concerns. While I agre=
e fundamental byte type is indeed missing in C++ and should be added I don&=
#39;t think the presented way is the right one. The proposal suggests to ad=
d it as a library type in namespace std. It argues C++17 is expressive enou=
gh for a simple library definition, as opposed to a keyword. I am not sayin=
g it's not possible but the question is if the new type is added this w=
ay will it fit with other fundamental types defined as keywords? Will it br=
ing confusion to the users? And this is a weak point of this proposal imho.=
<br><br>The fundamental type set can be already seen as confusing. First we=
have shortcuts like <b>short int</b> becomes a <b>short</b>, <b>signed int=
</b> becomes <b>int </b>etc. This is obviously shared with C and cannot be =
changed. We are all used to live with that. But then there is a <b>char</b>=
as a distinct type not a shortcut for <b>signed char</b>. OK confusion num=
ber one but it happened long time ago so nothing to do here too. Then the u=
ser has to learn some newly added types contain _t suffix like <b>wchar_t</=
b>, <b>char16_t</b> and <b>char32_t</b>. Most inexperienced programmers ten=
d to think _t means a typedefed type because that is a convention used by s=
ome libraries. <b>wchar_t</b> was really implemented as a typedef in earlie=
r MS compilers. But that's no longer true. All these types are distinct=
opaque types so typedef cannot be used for that. I assume _t was used as a=
n uglifier which helps preserving source code compatibility. So confusion n=
umber two. And now we want to add <b>std::byte</b>. I guess using namespace=
std everywhere is not a good practice and using it in headers is widely co=
nsidered wrong. So users will have to learn another rule - new generation t=
ypes have to be prefixed with namespace qualifier. I cannot imagine any oth=
er way how to bring the confusion to a higher level than this. <br><br>We a=
re still talking about fundamental type set not anything complicated. Simpl=
icity and clarity matters. So why so many rules, exceptions and experiments=
? Why don't we use byte_t keyword so it will at least fit with 2nd gene=
ration types? If that is not possible I consider the new proposal so confus=
ing to the users that it shouldn't be accepted. We live without it sinc=
e the beginning anyway. What do the others here think?<br></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/7da80620-e4bd-40eb-a4ec-4f3c2dcb2c98%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/7da80620-e4bd-40eb-a4ec-4f3c2dcb2c98=
%40isocpp.org</a>.<br />
------=_Part_1717_1049496307.1477916900581--
------=_Part_1716_1413286481.1477916900581--
.
Author: mihailnajdenov@gmail.com
Date: Mon, 31 Oct 2016 06:58:14 -0700 (PDT)
Raw View
------=_Part_618_1922471703.1477922294281
Content-Type: multipart/alternative;
boundary="----=_Part_619_1166179731.1477922294281"
------=_Part_619_1166179731.1477922294281
Content-Type: text/plain; charset=UTF-8
Off the top of my head.
- Native byte is out of question, because it will break existing code
which defined something else behind it.
Also it will be very confusing to have two basic types with *exactly* the
same properties and not be "shortcuts" for one another.
- byte_t will break code as well.
Also, you can always do a 'using std:byte' in your code and you are all
set.
That said, it is probably more correct to be called std::byte_t like
std::size_t, std::nullptr_t, std::nullopt_t, std::align_val_t
The last one is *particularly* *relevant* because it is already just enum
class align_val_t : size_t {}; so the overloading can pick it up
byte is much more akin to these basic types, then to string, vector or map,
which don't have a _t
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/004be1ec-6455-4844-b704-7961ea745773%40isocpp.org.
------=_Part_619_1166179731.1477922294281
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><div>Off the top of my head. </div><div>=C2=A0- Native byt=
e is out of question,=C2=A0because it will break existing code which define=
d something else behind it. </div><div>Also it will be very confusing to ha=
ve two basic types with <i>exactly</i>=C2=A0the same properties and not be =
"shortcuts" for one another.</div><div>=C2=A0- byte_t will break =
code as well.</div><div><br></div><div>Also, you can always do a 'using=
std:byte' in your code=C2=A0and you are all set. </div><div><br></div>=
<div>That said, it is probably more correct to be called std::byte_t like s=
td::size_t, std::nullptr_t, std::nullopt_t, std::align_val_t</div><div><br>=
</div><div><div>The last one is <i>particularly</i> <i>relevant</i> because=
it is already just enum class align_val_t : size_t {}; so the overloading =
can pick it up</div></div><div><br></div><div><div>byte is much=C2=A0more a=
kin to these basic types, then to string, vector or map, which don't ha=
ve a _t</div><div><br></div></div><div><br></div><div><br></div><div><br></=
div></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/004be1ec-6455-4844-b704-7961ea745773%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/004be1ec-6455-4844-b704-7961ea745773=
%40isocpp.org</a>.<br />
------=_Part_619_1166179731.1477922294281--
------=_Part_618_1922471703.1477922294281--
.
Author: Dejan Milosavljevic <dmilos@gmail.com>
Date: Tue, 1 Nov 2016 21:28:21 +0100
Raw View
--94eb2c0554b06c02af0540432aa4
Content-Type: text/plain; charset=UTF-8
In <cstdint> there is uint8_t.
So far it is optional.
In GCC and MSVS this type exists.
On Mon, Oct 31, 2016 at 2:58 PM, <mihailnajdenov@gmail.com> wrote:
> Off the top of my head.
> - Native byte is out of question, because it will break existing code
> which defined something else behind it.
> Also it will be very confusing to have two basic types with *exactly* the
> same properties and not be "shortcuts" for one another.
> - byte_t will break code as well.
>
> Also, you can always do a 'using std:byte' in your code and you are all
> set.
>
> That said, it is probably more correct to be called std::byte_t like
> std::size_t, std::nullptr_t, std::nullopt_t, std::align_val_t
>
> The last one is *particularly* *relevant* because it is already just enum
> class align_val_t : size_t {}; so the overloading can pick it up
>
> byte is much more akin to these basic types, then to string, vector or
> map, which don't have a _t
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> To view this discussion on the web visit https://groups.google.com/a/
> isocpp.org/d/msgid/std-proposals/004be1ec-6455-4844-
> b704-7961ea745773%40isocpp.org
> <https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/004be1ec-6455-4844-b704-7961ea745773%40isocpp.org?utm_medium=email&utm_source=footer>
> .
>
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CAEfefmx0QMVa504JmgiOrPbq-xKjTpa5ahD9MyqrM1izU-Mkkg%40mail.gmail.com.
--94eb2c0554b06c02af0540432aa4
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><div>In <cstdint> there=C2=A0is uint8_t.</div><div>S=
o far it is optional.</div><div>In GCC and MSVS this type exists.<br><br></=
div></div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Mon,=
Oct 31, 2016 at 2:58 PM, <span dir=3D"ltr"><<a href=3D"mailto:mihailna=
jdenov@gmail.com" target=3D"_blank">mihailnajdenov@gmail.com</a>></span>=
wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bor=
der-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>Off the top=
of my head. </div><div>=C2=A0- Native byte is out of question,=C2=A0becaus=
e it will break existing code which defined something else behind it. </div=
><div>Also it will be very confusing to have two basic types with <i>exactl=
y</i>=C2=A0the same properties and not be "shortcuts" for one ano=
ther.</div><div>=C2=A0- byte_t will break code as well.</div><div><br></div=
><div>Also, you can always do a 'using std:byte' in your code=C2=A0=
and you are all set. </div><div><br></div><div>That said, it is probably mo=
re correct to be called std::byte_t like std::size_t, std::nullptr_t, std::=
nullopt_t, std::align_val_t</div><div><br></div><div><div>The last one is <=
i>particularly</i> <i>relevant</i> because it is already just enum class al=
ign_val_t : size_t {}; so the overloading can pick it up</div></div><div><b=
r></div><div><div>byte is much=C2=A0more akin to these basic types, then to=
string, vector or map, which don't have a _t</div><div><br></div></div=
><div><br></div><div><br></div><div><br></div></div><span>
<p></p>
-- <br>
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br>
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org" target=3D"_=
blank">std-proposals+unsubscribe@<wbr>isocpp.org</a>.<br>
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org" target=3D"_blank">std-proposals@isocpp.org</a>.<br></span>
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/004be1ec-6455-4844-b704-7961ea745773%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter" target=3D"_blank">=
https://groups.google.com/a/<wbr>isocpp.org/d/msgid/std-<wbr>proposals/004b=
e1ec-6455-4844-<wbr>b704-7961ea745773%40isocpp.org</a><wbr>.<br>
</blockquote></div><br></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/CAEfefmx0QMVa504JmgiOrPbq-xKjTpa5ahD9=
MyqrM1izU-Mkkg%40mail.gmail.com?utm_medium=3Demail&utm_source=3Dfooter">htt=
ps://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CAEfefmx0QMVa504J=
mgiOrPbq-xKjTpa5ahD9MyqrM1izU-Mkkg%40mail.gmail.com</a>.<br />
--94eb2c0554b06c02af0540432aa4--
.
Author: "D. B." <db0451@gmail.com>
Date: Tue, 1 Nov 2016 21:23:47 +0000
Raw View
--001a11477a5eab2490054043f01d
Content-Type: text/plain; charset=UTF-8
It must always be optional, because C/C++ do not like to set (even 99%
coverage) rules for how an implementation must represent numbers.
But anyway, the proposal linked explicitly states that "std::byte is not an
integer and not a character". std::uint8_t is always the former, and on I
bet all of our machines here, it's also a typedef to the latter. So how is
it relevant?
Mind you, (A) std::byte will ultimately *contain* an integer *unsigned char*,
just with some opaque wrapping around it to save the programmer from
themselves in some ill-defined way. I don't really see much need for it,
but as another tool in the stdlib, sure, the more the merrier.
I will say that I think that proposal oversteps its do main when it
expects to get allowances for a library wrapper type to participate in the
special cases of object lifetime, aliasing, etc. Come on! They need to
decide whether they're proposing a piece of convenience or a new
fundamental type. I was iffy enough about std::initializer_list straddling
the boundaries, and I'm really not sure this justifies another case of
that, and on a far flimsier pretext AFAICT.
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CACGiwhFrBEfxNehp8nkGMJ8embK-zkU9KWtKNGTr1xwb-3xW5w%40mail.gmail.com.
--001a11477a5eab2490054043f01d
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><div><div><div>It must always be optional, because C/C++ d=
o not like to set (even 99% coverage) rules for how an implementation must =
represent numbers.<br><br></div>But anyway, the proposal linked explicitly =
states that "std::byte is not an integer and not a character". st=
d::uint8_t is always the former, and on I bet all of our machines here, it&=
#39;s also a typedef to the latter. So how is it relevant?<br><br></div>Min=
d you, (A) std::byte will<i> </i>ultimately <i>contain</i> an integer <i>un=
signed char</i>, just with some opaque wrapping around it to save the progr=
ammer from themselves in some ill-defined way. I don't really see much =
need for it, but as another tool in the stdlib, sure, the more the merrier.=
<br><br></div>I will say that I =C2=A0think that proposal oversteps its do =
main when it expects to get allowances for a library wrapper type to partic=
ipate in the special cases of object lifetime, aliasing, etc. Come on! They=
need to decide whether they're proposing a piece of convenience or a n=
ew fundamental type. I was iffy enough about std::initializer_list straddli=
ng the boundaries, and I'm really not sure this justifies another case =
of that, and on a far flimsier pretext AFAICT.<br></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/CACGiwhFrBEfxNehp8nkGMJ8embK-zkU9KWtK=
NGTr1xwb-3xW5w%40mail.gmail.com?utm_medium=3Demail&utm_source=3Dfooter">htt=
ps://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CACGiwhFrBEfxNehp=
8nkGMJ8embK-zkU9KWtKNGTr1xwb-3xW5w%40mail.gmail.com</a>.<br />
--001a11477a5eab2490054043f01d--
.
Author: Thiago Macieira <thiago@macieira.org>
Date: Tue, 01 Nov 2016 18:21:19 -0700
Raw View
Em ter=C3=A7a-feira, 1 de novembro de 2016, =C3=A0s 21:28:21 PDT, Dejan Mil=
osavljevic=20
escreveu:
> In <cstdint> there is uint8_t.
> So far it is optional.
> In GCC and MSVS this type exists.
Bytes don't have to be 8 bits.
Note that we don't strictly need the typedef. It's just a convenience so th=
at=20
you don't see "char" and get confused.
--=20
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/1549171.TC16PSHVVd%40tjmaciei-mobl1.
.
Author: Nicol Bolas <jmckesson@gmail.com>
Date: Tue, 1 Nov 2016 20:10:28 -0700 (PDT)
Raw View
------=_Part_3054_1783250140.1478056228810
Content-Type: multipart/alternative;
boundary="----=_Part_3055_1705258446.1478056228810"
------=_Part_3055_1705258446.1478056228810
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
On Tuesday, November 1, 2016 at 9:21:27 PM UTC-4, Thiago Macieira wrote:
>
> Em ter=C3=A7a-feira, 1 de novembro de 2016, =C3=A0s 21:28:21 PDT, Dejan=
=20
> Milosavljevic=20
> escreveu:=20
> > In <cstdint> there is uint8_t.=20
> > So far it is optional.=20
> > In GCC and MSVS this type exists.=20
>
> Bytes don't have to be 8 bits.=20
>
> Note that we don't strictly need the typedef. It's just a convenience so=
=20
> that=20
> you don't see "char" and get confused.
>
Actually, no. `std::byte` as proposed by P0298 is explicitly *not* a=20
typedef. It is a scoped enumeration whose underlying type is `unsigned=20
char`, but it is not a typedef.
Granted, I can't agree with that design decision. I would much rather it be=
=20
a genuine type, rather than using C++ enum chicanery to get strong aliases.
--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/d3d99972-8135-44d0-9674-69d94141d0fb%40isocpp.or=
g.
------=_Part_3055_1705258446.1478056228810
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><br><br>On Tuesday, November 1, 2016 at 9:21:27 PM UTC-4, =
Thiago Macieira wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;=
margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">Em ter=
=C3=A7a-feira, 1 de novembro de 2016, =C3=A0s 21:28:21 PDT, Dejan Milosavlj=
evic=20
<br>escreveu:
<br>> In <cstdint> there is uint8_t.
<br>> So far it is optional.
<br>> In GCC and MSVS this type exists.
<br>
<br>Bytes don't have to be 8 bits.
<br>
<br>Note that we don't strictly need the typedef. It's just a conve=
nience so that=20
<br>you don't see "char" and get confused.<br></blockquote><d=
iv><br>Actually, no. `std::byte` as proposed by P0298 is explicitly <i>not<=
/i> a typedef. It is a scoped enumeration whose underlying type is `unsigne=
d char`, but it is not a typedef.<br><br>Granted, I can't agree with th=
at design decision. I would much rather it be a genuine type, rather than u=
sing C++ enum chicanery to get strong aliases.<br></div></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/d3d99972-8135-44d0-9674-69d94141d0fb%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/d3d99972-8135-44d0-9674-69d94141d0fb=
%40isocpp.org</a>.<br />
------=_Part_3055_1705258446.1478056228810--
------=_Part_3054_1783250140.1478056228810--
.
Author: Thiago Macieira <thiago@macieira.org>
Date: Tue, 01 Nov 2016 21:55:41 -0700
Raw View
Em ter=C3=A7a-feira, 1 de novembro de 2016, =C3=A0s 20:10:28 PDT, Nicol Bol=
as escreveu:
> Actually, no. `std::byte` as proposed by P0298 is explicitly *not* a
> typedef. It is a scoped enumeration whose underlying type is `unsigned
> char`, but it is not a typedef.
>=20
> Granted, I can't agree with that design decision. I would much rather it =
be
> a genuine type, rather than using C++ enum chicanery to get strong aliase=
s.
That would also not be a good idea because, unsigned char is, by defintion,=
a=20
byte. Why should we have (more) types that mean exactly the same thing, and=
=20
this time in all platforms, by definition?
--=20
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/2350841.Z7lUCKSumT%40tjmaciei-mobl1.
.
Author: Nicol Bolas <jmckesson@gmail.com>
Date: Tue, 1 Nov 2016 22:16:32 -0700 (PDT)
Raw View
------=_Part_3205_1180560473.1478063792641
Content-Type: multipart/alternative;
boundary="----=_Part_3206_760360212.1478063792641"
------=_Part_3206_760360212.1478063792641
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
On Wednesday, November 2, 2016 at 12:55:46 AM UTC-4, Thiago Macieira wrote:
>
> Em ter=C3=A7a-feira, 1 de novembro de 2016, =C3=A0s 20:10:28 PDT, Nicol B=
olas=20
> escreveu:=20
> > Actually, no. `std::byte` as proposed by P0298 is explicitly *not* a=20
> > typedef. It is a scoped enumeration whose underlying type is `unsigned=
=20
> > char`, but it is not a typedef.=20
> >=20
> > Granted, I can't agree with that design decision. I would much rather i=
t=20
> be=20
> > a genuine type, rather than using C++ enum chicanery to get strong=20
> aliases.=20
>
> That would also not be a good idea because, unsigned char is, by=20
> defintion, a=20
> byte. Why should we have (more) types that mean exactly the same thing,=
=20
> and=20
> this time in all platforms, by definition?=20
>
Because "unsigned char" *also* means "unsigned character". With just=20
`unsigned char`, there is no way to distinguish between manipulating bytes=
=20
and manipulating unsigned characters.
That's what `byte` is for, as a type: a way to semantically differentiate=
=20
between operations on bytes and operations on characters. The types can be=
=20
inter-convertible, numerically speaking, but they don't mean the same thing=
..
My problem with using C++ enum chicanery is that, if you use the type=20
traits mechanisms to ask what `std::byte` is, it will say that it's an=20
enum, not that it's an integral type. There's no reason why `byte` should=
=20
not be an integral type.
Think about it. The C++ standard allows a break in strict aliasing rules=20
for `unsigned char` and `char`. Why? Not because the standard thinks it's=
=20
reasonable for people to alias UTF-8 strings. But because that's the only=
=20
way to pass/manipulate a byte array. And byte arrays need to be able to=20
alias.
It's the same reason why we have `char16_t` as a distinct type from=20
`uint_least16_t`. Because there is a fundamental semantic difference=20
between an array of unsigned integers that are at least 16-bits in size and=
=20
an array of UTF-16 code units. One of this is a string; the other is not.
It's time we had such a distinction for bytes. And UTF-8 code units, for=20
that matter.
--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/3e64b220-79ae-4108-ba1c-9785494fc434%40isocpp.or=
g.
------=_Part_3206_760360212.1478063792641
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">On Wednesday, November 2, 2016 at 12:55:46 AM UTC-4, Thiag=
o Macieira wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margi=
n-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">Em ter=C3=A7a=
-feira, 1 de novembro de 2016, =C3=A0s 20:10:28 PDT, Nicol Bolas escreveu:
<br>> Actually, no. `std::byte` as proposed by P0298 is explicitly *not*=
a
<br>> typedef. It is a scoped enumeration whose underlying type is `unsi=
gned
<br>> char`, but it is not a typedef.
<br>>=20
<br>> Granted, I can't agree with that design decision. I would much=
rather it be
<br>> a genuine type, rather than using C++ enum chicanery to get strong=
aliases.
<br>
<br>That would also not be a good idea because, unsigned char is, by defint=
ion, a=20
<br>byte. Why should we have (more) types that mean exactly the same thing,=
and=20
<br>this time in all platforms, by definition?
<br></blockquote><div><br>Because "unsigned char" <i>also</i> mea=
ns "unsigned character". With just `unsigned char`, there is no w=
ay to distinguish between manipulating bytes and manipulating unsigned char=
acters.<br><br>That's what `byte` is for, as a type: a way to semantica=
lly differentiate between operations on bytes and operations on characters.=
The types can be inter-convertible, numerically speaking, but they don'=
;t mean the same thing.<br><br>My problem with using C++ enum chicanery is =
that, if you use the type traits mechanisms to ask what `std::byte` is, it =
will say that it's an enum, not that it's an integral type. There&#=
39;s no reason why `byte` should not be an integral type.<br><br>Think abou=
t it. The C++ standard allows a break in strict aliasing rules for `unsigne=
d char` and `char`. Why? Not because the standard thinks it's reasonabl=
e for people to alias UTF-8 strings. But because that's the only way to=
pass/manipulate a byte array. And byte arrays need to be able to alias.<br=
><br>It's the same reason why we have `char16_t` as a distinct type fro=
m=20
`uint_least16_t`. Because there is a fundamental semantic difference=20
between an array of unsigned integers that are at least 16-bits in size=20
and an array of UTF-16 code units. One of this is a string; the other is
not.<br><br>It's time we had such a distinction for bytes. And UTF-8 c=
ode units, for that matter.<br></div></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/3e64b220-79ae-4108-ba1c-9785494fc434%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/3e64b220-79ae-4108-ba1c-9785494fc434=
%40isocpp.org</a>.<br />
------=_Part_3206_760360212.1478063792641--
------=_Part_3205_1180560473.1478063792641--
.
Author: Jared Grubb <jared.grubb@gmail.com>
Date: Tue, 1 Nov 2016 23:40:42 -0700 (PDT)
Raw View
------=_Part_895_1598619982.1478068842320
Content-Type: multipart/alternative;
boundary="----=_Part_896_2019922534.1478068842320"
------=_Part_896_2019922534.1478068842320
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
On Tuesday, November 1, 2016 at 10:16:32 PM UTC-7, Nicol Bolas wrote:
>
> On Wednesday, November 2, 2016 at 12:55:46 AM UTC-4, Thiago Macieira wrot=
e:
>>
>> Em ter=C3=A7a-feira, 1 de novembro de 2016, =C3=A0s 20:10:28 PDT, Nicol =
Bolas=20
>> escreveu:=20
>> > Actually, no. `std::byte` as proposed by P0298 is explicitly *not* a=
=20
>> > typedef. It is a scoped enumeration whose underlying type is `unsigned=
=20
>> > char`, but it is not a typedef.=20
>> >=20
>> > Granted, I can't agree with that design decision. I would much rather=
=20
>> it be=20
>> > a genuine type, rather than using C++ enum chicanery to get strong=20
>> aliases.=20
>>
>> That would also not be a good idea because, unsigned char is, by=20
>> defintion, a=20
>> byte. Why should we have (more) types that mean exactly the same thing,=
=20
>> and=20
>> this time in all platforms, by definition?=20
>>
>
> Because "unsigned char" *also* means "unsigned character". With just=20
> `unsigned char`, there is no way to distinguish between manipulating byte=
s=20
> and manipulating unsigned characters.
>
> That's what `byte` is for, as a type: a way to semantically differentiate=
=20
> between operations on bytes and operations on characters. The types can b=
e=20
> inter-convertible, numerically speaking, but they don't mean the same thi=
ng.
>
> My problem with using C++ enum chicanery is that, if you use the type=20
> traits mechanisms to ask what `std::byte` is, it will say that it's an=20
> enum, not that it's an integral type. There's no reason why `byte` should=
=20
> not be an integral type.
>
> Think about it. The C++ standard allows a break in strict aliasing rules=
=20
> for `unsigned char` and `char`. Why? Not because the standard thinks it's=
=20
> reasonable for people to alias UTF-8 strings. But because that's the only=
=20
> way to pass/manipulate a byte array. And byte arrays need to be able to=
=20
> alias.
>
> It's the same reason why we have `char16_t` as a distinct type from=20
> `uint_least16_t`. Because there is a fundamental semantic difference=20
> between an array of unsigned integers that are at least 16-bits in size a=
nd=20
> an array of UTF-16 code units. One of this is a string; the other is not.
>
> It's time we had such a distinction for bytes. And UTF-8 code units, for=
=20
> that matter.
>
100% agree. I know I've had a few cases where I was annoyed by the lack of=
=20
an 8-bit number type. Although this proposal does *not* fix the=20
inconsistency in the following example, it at least provides an 8-bit=20
option.
int main()
{
std::cout << (uint32_t)65 << '\n';
std::cout << (uint16_t)65 << '\n';
std::cout << (uint8_t)65 << '\n'; // Surprise!
}
=20
--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/b2769718-6557-4107-9038-d1e7ade05ad5%40isocpp.or=
g.
------=_Part_896_2019922534.1478068842320
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><br><br>On Tuesday, November 1, 2016 at 10:16:32 PM UTC-7,=
Nicol Bolas wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;mar=
gin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D=
"ltr">On Wednesday, November 2, 2016 at 12:55:46 AM UTC-4, Thiago Macieira =
wrote:<blockquote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex=
;border-left:1px #ccc solid;padding-left:1ex">Em ter=C3=A7a-feira, 1 de nov=
embro de 2016, =C3=A0s 20:10:28 PDT, Nicol Bolas escreveu:
<br>> Actually, no. `std::byte` as proposed by P0298 is explicitly *not*=
a
<br>> typedef. It is a scoped enumeration whose underlying type is `unsi=
gned
<br>> char`, but it is not a typedef.
<br>>=20
<br>> Granted, I can't agree with that design decision. I would much=
rather it be
<br>> a genuine type, rather than using C++ enum chicanery to get strong=
aliases.
<br>
<br>That would also not be a good idea because, unsigned char is, by defint=
ion, a=20
<br>byte. Why should we have (more) types that mean exactly the same thing,=
and=20
<br>this time in all platforms, by definition?
<br></blockquote><div><br>Because "unsigned char" <i>also</i> mea=
ns "unsigned character". With just `unsigned char`, there is no w=
ay to distinguish between manipulating bytes and manipulating unsigned char=
acters.<br><br>That's what `byte` is for, as a type: a way to semantica=
lly differentiate between operations on bytes and operations on characters.=
The types can be inter-convertible, numerically speaking, but they don'=
;t mean the same thing.<br><br>My problem with using C++ enum chicanery is =
that, if you use the type traits mechanisms to ask what `std::byte` is, it =
will say that it's an enum, not that it's an integral type. There&#=
39;s no reason why `byte` should not be an integral type.<br><br>Think abou=
t it. The C++ standard allows a break in strict aliasing rules for `unsigne=
d char` and `char`. Why? Not because the standard thinks it's reasonabl=
e for people to alias UTF-8 strings. But because that's the only way to=
pass/manipulate a byte array. And byte arrays need to be able to alias.<br=
><br>It's the same reason why we have `char16_t` as a distinct type fro=
m=20
`uint_least16_t`. Because there is a fundamental semantic difference=20
between an array of unsigned integers that are at least 16-bits in size=20
and an array of UTF-16 code units. One of this is a string; the other is
not.<br><br>It's time we had such a distinction for bytes. And UTF-8 c=
ode units, for that matter.<br></div></div></blockquote><div><br>100% agree=
.. I know I've had a few cases where I was annoyed by the lack of an 8-b=
it number type. Although this proposal does <i>not</i> fix the inconsistenc=
y in the following example, it at least provides an 8-bit option.<br><br>in=
t main()<br>{<br>=C2=A0=C2=A0=C2=A0 std::cout << (uint32_t)65 <<=
; '\n';<br>=C2=A0=C2=A0=C2=A0 std::cout << (uint16_t)65 <&=
lt; '\n';<br>=C2=A0=C2=A0=C2=A0 std::cout << (uint8_t)65 <=
< '\n'; // Surprise!<br>}<br>=C2=A0</div></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/b2769718-6557-4107-9038-d1e7ade05ad5%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/b2769718-6557-4107-9038-d1e7ade05ad5=
%40isocpp.org</a>.<br />
------=_Part_896_2019922534.1478068842320--
------=_Part_895_1598619982.1478068842320--
.
Author: Thiago Macieira <thiago@macieira.org>
Date: Tue, 01 Nov 2016 23:43:31 -0700
Raw View
Em ter=C3=A7a-feira, 1 de novembro de 2016, =C3=A0s 22:16:32 PDT, Nicol Bol=
as escreveu:
> On Wednesday, November 2, 2016 at 12:55:46 AM UTC-4, Thiago Macieira wrot=
e:
> > That would also not be a good idea because, unsigned char is, by
> > defintion, a
> > byte. Why should we have (more) types that mean exactly the same thing,
> > and
> > this time in all platforms, by definition?
>=20
> Because "unsigned char" *also* means "unsigned character". With just
> `unsigned char`, there is no way to distinguish between manipulating byte=
s
> and manipulating unsigned characters.
>=20
> That's what `byte` is for, as a type: a way to semantically differentiate
> between operations on bytes and operations on characters. The types can b=
e
> inter-convertible, numerically speaking, but they don't mean the same thi=
ng.
I'm sorry, I don't agree that there's a distinction in the first place. Byt=
es=20
are used more often than just copying around. If you add, subtract, shift l=
eft=20
or right, perform bitwise operations, etc, you need the value. If I need th=
e=20
value, then a zero is a zero is a zero, a 0x40 is still a 0x40.
Also, I can assign 'a' to any integer type. Maybe this was the main issue:=
=20
that single-quote character literals automatically convert to integral,=20
instead of staying a character. We're 40 years too late to change this, tho=
ugh=20
(since B).
> My problem with using C++ enum chicanery is that, if you use the type
> traits mechanisms to ask what `std::byte` is, it will say that it's an
> enum, not that it's an integral type. There's no reason why `byte` should
> not be an integral type.
Agreed, but I'm going to go further and say that it's pretty useless for a =
lot=20
of use-cases. I need a value of a lot of operations and an enum won't give =
it=20
to me unless I cast it to a suitable integral in the first place -- unsigne=
d=20
char (that is, a *real* byte).
This week I've been spending time working on hashing algorithms, notably=20
SipHash (btw, implementations should reconsider their std::hash algorithms)=
..=20
In order to implement it, I needed to access the byte array and do byte-lev=
el=20
operations like rotating left, XOR, and additions. Not only would std::byte=
=20
not work for me, I fail to see how the operations I'm doing are any differe=
nt=20
than the operations on an unsigned char.
> It's the same reason why we have `char16_t` as a distinct type from
> `uint_least16_t`. Because there is a fundamental semantic difference
> between an array of unsigned integers that are at least 16-bits in size a=
nd
> an array of UTF-16 code units. One of this is a string; the other is not.
The only benefit I see there is allowing overloading.
But while that may be true, what's the point of an *unsigned* char? If you=
=20
want to do character operations, you use char. If you're using unsigned cha=
r,=20
that's because you want a byte, plain and simple. By this argument, we alre=
ady=20
have the distinction between character operations and byte operations.
--=20
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/2128372.NTAZafObDQ%40tjmaciei-mobl1.
.
Author: Andrey Semashev <andrey.semashev@gmail.com>
Date: Wed, 2 Nov 2016 12:02:04 +0300
Raw View
On 11/02/16 08:16, Nicol Bolas wrote:
> On Wednesday, November 2, 2016 at 12:55:46 AM UTC-4, Thiago Macieira wrot=
e:
>
> Em ter=C3=A7a-feira, 1 de novembro de 2016, =C3=A0s 20:10:28 PDT, Nic=
ol Bolas
> escreveu:
> > Actually, no. `std::byte` as proposed by P0298 is explicitly *not* =
a
> > typedef. It is a scoped enumeration whose underlying type is
> `unsigned
> > char`, but it is not a typedef.
> >
> > Granted, I can't agree with that design decision. I would much
> rather it be
> > a genuine type, rather than using C++ enum chicanery to get strong
> aliases.
>
> That would also not be a good idea because, unsigned char is, by
> defintion, a
> byte. Why should we have (more) types that mean exactly the same
> thing, and
> this time in all platforms, by definition?
>
>
> Because "unsigned char" /also/ means "unsigned character". With just
> `unsigned char`, there is no way to distinguish between manipulating
> bytes and manipulating unsigned characters.
And what is "unsigned character", exactly? The standard defines a=20
character set, but leaves character encoding implementation-defined,=20
except that it says that code units are representable by char. Assuming=20
that code units are always positive (which I don't think is mandated=20
anywhere, but let's keep things sane), you could also store code units=20
as unsigned chars. But neither char nor unsigned char represents a=20
character, unless a code point is equivalent to a code unit.
I think when you say "unsigned character" you should actually be saying=20
"code units", and at this point it's not that much different from=20
"bytes". I think, unsigned char should be considered as byte in every=20
respect; if such a type is added to C++, it should either be a regular=20
typedef (std::byte_t) or an intrinsic integral type that is equivalent=20
to unsigned char.
> My problem with using C++ enum chicanery is that, if you use the type
> traits mechanisms to ask what `std::byte` is, it will say that it's an
> enum, not that it's an integral type. There's no reason why `byte`
> should not be an integral type.
Agreed.
> Think about it. The C++ standard allows a break in strict aliasing rules
> for `unsigned char` and `char`. Why? Not because the standard thinks
> it's reasonable for people to alias UTF-8 strings. But because that's
> the only way to pass/manipulate a byte array. And byte arrays need to be
> able to alias.
>
> It's the same reason why we have `char16_t` as a distinct type from
> `uint_least16_t`. Because there is a fundamental semantic difference
> between an array of unsigned integers that are at least 16-bits in size
> and an array of UTF-16 code units. One of this is a string; the other is
> not.
>
> It's time we had such a distinction for bytes. And UTF-8 code units, for
> that matter.
Thing is, unsigned char is already allowed to alias, and if we add the=20
intrinsic byte type that is also allowed to alias, we don't make that=20
distinction you talk about. And if we prohibit unsigned char to alias=20
other types, we will render lots of existing code invalid. We could add=20
an intrinsic char8_t instead and say it will only represent narrow=20
character code units and not alias other types. This way unsigned char=20
is left as the "byte" type and we have the other type for string processing=
..
--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/97ebfc84-c5a3-eaf5-efcb-c1bdf9b507a5%40gmail.com=
..
.
Author: mihailnajdenov@gmail.com
Date: Wed, 2 Nov 2016 02:27:17 -0700 (PDT)
Raw View
------=_Part_1661_2064418561.1478078837367
Content-Type: multipart/alternative;
boundary="----=_Part_1662_2067262189.1478078837367"
------=_Part_1662_2067262189.1478078837367
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
On Wednesday, November 2, 2016 at 8:43:35 AM UTC+2, Thiago Macieira wrote:
>
> Em ter=C3=A7a-feira, 1 de novembro de 2016, =C3=A0s 22:16:32 PDT, Nicol B=
olas=20
> escreveu:=20
> > On Wednesday, November 2, 2016 at 12:55:46 AM UTC-4, Thiago Macieira=20
> wrote:=20
> > > That would also not be a good idea because, unsigned char is, by=20
> > > defintion, a=20
> > > byte. Why should we have (more) types that mean exactly the same=20
> thing,=20
> > > and=20
> > > this time in all platforms, by definition?=20
> >=20
> > Because "unsigned char" *also* means "unsigned character". With just=20
> > `unsigned char`, there is no way to distinguish between manipulating=20
> bytes=20
> > and manipulating unsigned characters.=20
> >=20
> > That's what `byte` is for, as a type: a way to semantically=20
> differentiate=20
> > between operations on bytes and operations on characters. The types can=
=20
> be=20
> > inter-convertible, numerically speaking, but they don't mean the same=
=20
> thing.=20
>
> I'm sorry, I don't agree that there's a distinction in the first place.=
=20
> Bytes=20
> are used more often than just copying around. If you add, subtract, shift=
=20
> left=20
> or right, perform bitwise operations, etc, you need the value. If I need=
=20
> the=20
> value, then a zero is a zero is a zero, a 0x40 is still a 0x40.=20
>
> ...
> But while that may be true, what's the point of an *unsigned* char? If yo=
u=20
> want to do character operations, you use char. If you're using unsigned=
=20
> char,=20
> that's because you want a byte, plain and simple. By this argument, we=20
> already=20
> have the distinction between character operations and byte operations.=20
>
> --=20
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org=20
> Software Architect - Intel Open Source Technology Center=20
>
All byte is trying to do is to make "unsigned char" officially different=20
then char. Better name, better name, no conversions and limited operations.
Also it *does* define bitwise ops.=20
As for arithmetics, well It seems a bit odd indeed, but lets not=20
forget that char is too small on the one hand, and, on the other, unsigned=
=20
is not considered a good type for math (as per '16 cppcon talk).
The idea is to be pure storage format not representing values (strings or=
=20
math) without a cast.=20
--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/68625755-8e0d-4030-afd5-d081371f7295%40isocpp.or=
g.
------=_Part_1662_2067262189.1478078837367
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><br><br>On Wednesday, November 2, 2016 at 8:43:35 AM UTC+2=
, Thiago Macieira wrote:<blockquote class=3D"gmail_quote" style=3D"margin: =
0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">Em ter=
=C3=A7a-feira, 1 de novembro de 2016, =C3=A0s 22:16:32 PDT, Nicol Bolas esc=
reveu:
<br>> On Wednesday, November 2, 2016 at 12:55:46 AM UTC-4, Thiago Maciei=
ra wrote:
<br>> > That would also not be a good idea because, unsigned char is,=
by
<br>> > defintion, a
<br>> > byte. Why should we have (more) types that mean exactly the s=
ame thing,
<br>> > and
<br>> > this time in all platforms, by definition?
<br>>=20
<br>> Because "unsigned char" *also* means "unsigned char=
acter". With just
<br>> `unsigned char`, there is no way to distinguish between manipulati=
ng bytes
<br>> and manipulating unsigned characters.
<br>>=20
<br>> That's what `byte` is for, as a type: a way to semantically di=
fferentiate
<br>> between operations on bytes and operations on characters. The type=
s can be
<br>> inter-convertible, numerically speaking, but they don't mean t=
he same thing.
<br>
<br>I'm sorry, I don't agree that there's a distinction in the =
first place. Bytes=20
<br>are used more often than just copying around. If you add, subtract, shi=
ft left=20
<br>or right, perform bitwise operations, etc, you need the value. If I nee=
d the=20
<br>value, then a zero is a zero is a zero, a 0x40 is still a 0x40.
<br>
<br>...<br>But while that may be true, what's the point of an *unsigned=
* char? If you=20
<br>want to do character operations, you use char. If you're using unsi=
gned char,=20
<br>that's because you want a byte, plain and simple. By this argument,=
we already=20
<br>have the distinction between character operations and byte operations.
<br>
<br>--=20
<br>Thiago Macieira - thiago (AT) <a onmousedown=3D"this.href=3D'http:/=
/www.google.com/url?q\x3dhttp%3A%2F%2Fmacieira.info\x26sa\x3dD\x26sntz\x3d1=
\x26usg\x3dAFQjCNEswDUBNCNanbu7euhqLn_62FW8ag';return true;" onclick=3D=
"this.href=3D'http://www.google.com/url?q\x3dhttp%3A%2F%2Fmacieira.info=
\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEswDUBNCNanbu7euhqLn_62FW8ag';=
return true;" href=3D"http://macieira.info" target=3D"_blank" rel=3D"nofoll=
ow">macieira.info</a> - thiago (AT) <a onmousedown=3D"this.href=3D'http=
://www.google.com/url?q\x3dhttp%3A%2F%2Fkde.org\x26sa\x3dD\x26sntz\x3d1\x26=
usg\x3dAFQjCNHGRJdo5_JYG1DowztwAHAKs80XSA';return true;" onclick=3D"thi=
s.href=3D'http://www.google.com/url?q\x3dhttp%3A%2F%2Fkde.org\x26sa\x3d=
D\x26sntz\x3d1\x26usg\x3dAFQjCNHGRJdo5_JYG1DowztwAHAKs80XSA';return tru=
e;" href=3D"http://kde.org" target=3D"_blank" rel=3D"nofollow">kde.org</a>
<br>=C2=A0 =C2=A0Software Architect - Intel Open Source Technology Center
<br></blockquote><div><br></div><div>All byte is trying to do is to make &q=
uot;unsigned char" officially different then char. Better name, better=
name, no conversions and limited operations.</div><div>Also it <i>does</i>=
define bitwise ops. </div><div>As for arithmetics, well It seems a bit odd=
indeed, but lets not forget=C2=A0that char is too small on the one hand, a=
nd, on the other,=C2=A0unsigned is not considered=C2=A0a good type for math=
(as per '16 cppcon talk).</div><div><br></div><div>The idea is to be p=
ure storage=C2=A0format not representing values=C2=A0(strings or math) with=
out a cast. </div><div><br></div></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/68625755-8e0d-4030-afd5-d081371f7295%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/68625755-8e0d-4030-afd5-d081371f7295=
%40isocpp.org</a>.<br />
------=_Part_1662_2067262189.1478078837367--
------=_Part_1661_2064418561.1478078837367--
.
Author: Miro Knejp <miro.knejp@gmail.com>
Date: Wed, 2 Nov 2016 13:14:50 +0100
Raw View
--Apple-Mail=_76CE7E6B-C53E-47AF-A878-7CA2C4CFBBA6
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=UTF-8
> On 02 Nov 2016, at 10:02 , Andrey Semashev <andrey.semashev@gmail.com> wr=
ote:
>=20
> On 11/02/16 08:16, Nicol Bolas wrote:
>> On Wednesday, November 2, 2016 at 12:55:46 AM UTC-4, Thiago Macieira wro=
te:
>>=20
>> Em ter=C3=A7a-feira, 1 de novembro de 2016, =C3=A0s 20:10:28 PDT, Nic=
ol Bolas
>> escreveu:
>> > Actually, no. `std::byte` as proposed by P0298 is explicitly *not* =
a
>> > typedef. It is a scoped enumeration whose underlying type is
>> `unsigned
>> > char`, but it is not a typedef.
>> >
>> > Granted, I can't agree with that design decision. I would much
>> rather it be
>> > a genuine type, rather than using C++ enum chicanery to get strong
>> aliases.
>>=20
>> That would also not be a good idea because, unsigned char is, by
>> defintion, a
>> byte. Why should we have (more) types that mean exactly the same
>> thing, and
>> this time in all platforms, by definition?
>>=20
>>=20
>> Because "unsigned char" /also/ means "unsigned character". With just
>> `unsigned char`, there is no way to distinguish between manipulating
>> bytes and manipulating unsigned characters.
>=20
> And what is "unsigned character", exactly? The standard defines a charact=
er set, but leaves character encoding implementation-defined, except that i=
t says that code units are representable by char. Assuming that code units =
are always positive (which I don't think is mandated anywhere, but let's ke=
ep things sane), you could also store code units as unsigned chars. But nei=
ther char nor unsigned char represents a character, unless a code point is =
equivalent to a code unit.
>=20
> I think when you say "unsigned character" you should actually be saying "=
code units", and at this point it's not that much different from "bytes". I=
think, unsigned char should be considered as byte in every respect; if suc=
h a type is added to C++, it should either be a regular typedef (std::byte_=
t) or an intrinsic integral type that is equivalent to unsigned char.
>=20
>> My problem with using C++ enum chicanery is that, if you use the type
>> traits mechanisms to ask what `std::byte` is, it will say that it's an
>> enum, not that it's an integral type. There's no reason why `byte`
>> should not be an integral type.
>=20
> Agreed.
>=20
>> Think about it. The C++ standard allows a break in strict aliasing rules
>> for `unsigned char` and `char`. Why? Not because the standard thinks
>> it's reasonable for people to alias UTF-8 strings. But because that's
>> the only way to pass/manipulate a byte array. And byte arrays need to be
>> able to alias.
>>=20
>> It's the same reason why we have `char16_t` as a distinct type from
>> `uint_least16_t`. Because there is a fundamental semantic difference
>> between an array of unsigned integers that are at least 16-bits in size
>> and an array of UTF-16 code units. One of this is a string; the other is
>> not.
>>=20
>> It's time we had such a distinction for bytes. And UTF-8 code units, for
>> that matter.
>=20
> Thing is, unsigned char is already allowed to alias, and if we add the in=
trinsic byte type that is also allowed to alias, we don't make that distinc=
tion you talk about. And if we prohibit unsigned char to alias other types,=
we will render lots of existing code invalid. We could add an intrinsic ch=
ar8_t instead and say it will only represent narrow character code units an=
d not alias other types. This way unsigned char is left as the "byte" type =
and we have the other type for string processing.
I think this is the real issue here: a type that is dedicated to represent =
only characters and not allowed to alias anything else. Having a char* can =
severely limit the compiler=E2=80=99s ability to optimize your function bec=
ause of all the aliasing implications, even if you as the author know it ne=
ver aliases anything because it actually *is* a string. Having a type that =
clearly conveys this semantic to the compiler would be useful.
--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/684B5982-22CB-4357-9A1A-09F4588ADA8A%40gmail.com=
..
--Apple-Mail=_76CE7E6B-C53E-47AF-A878-7CA2C4CFBBA6
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset=UTF-8
<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html charset=
=3Dutf-8"></head><body style=3D"word-wrap: break-word; -webkit-nbsp-mode: s=
pace; -webkit-line-break: after-white-space;" class=3D""><br class=3D""><di=
v><blockquote type=3D"cite" class=3D""><div class=3D"">On 02 Nov 2016, at 1=
0:02 , Andrey Semashev <<a href=3D"mailto:andrey.semashev@gmail.com" cla=
ss=3D"">andrey.semashev@gmail.com</a>> wrote:</div><br class=3D"Apple-in=
terchange-newline"><div class=3D""><span style=3D"font-family: Helvetica; f=
ont-size: 12px; font-style: normal; font-variant-caps: normal; font-weight:=
normal; letter-spacing: normal; orphans: auto; text-align: start; text-ind=
ent: 0px; text-transform: none; white-space: normal; widows: auto; word-spa=
cing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !im=
portant;" class=3D"">On 11/02/16 08:16, Nicol Bolas wrote:</span><br style=
=3D"font-family: Helvetica; font-size: 12px; font-style: normal; font-varia=
nt-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto=
; text-align: start; text-indent: 0px; text-transform: none; white-space: n=
ormal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" cl=
ass=3D""><blockquote type=3D"cite" style=3D"font-family: Helvetica; font-si=
ze: 12px; font-style: normal; font-variant-caps: normal; font-weight: norma=
l; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0=
px; text-transform: none; white-space: normal; widows: auto; word-spacing: =
0px; -webkit-text-stroke-width: 0px;" class=3D"">On Wednesday, November 2, =
2016 at 12:55:46 AM UTC-4, Thiago Macieira wrote:<br class=3D""><br class=
=3D""> Em ter=C3=A7a-feira, 1 de novembro de 2016, =C3=A0s=
20:10:28 PDT, Nicol Bolas<br class=3D""> escreveu:<br cla=
ss=3D""> > Actually, no. `std::byte` as proposed by P02=
98 is explicitly *not* a<br class=3D""> > typedef. It i=
s a scoped enumeration whose underlying type is<br class=3D""> &=
nbsp;`unsigned<br class=3D""> > char`, but it is not a =
typedef.<br class=3D""> ><br class=3D""> &nb=
sp;> Granted, I can't agree with that design decision. I would much<br c=
lass=3D""> rather it be<br class=3D""> &g=
t; a genuine type, rather than using C++ enum chicanery to get strong<br cl=
ass=3D""> aliases.<br class=3D""><br class=3D""> &nbs=
p; That would also not be a good idea because, unsigned char is, by<br=
class=3D""> defintion, a<br class=3D""> =
byte. Why should we have (more) types that mean exactly the same<br class=
=3D""> thing, and<br class=3D""> this tim=
e in all platforms, by definition?<br class=3D""><br class=3D""><br class=
=3D"">Because "unsigned char" /also/ means "unsigned character". With just<=
br class=3D"">`unsigned char`, there is no way to distinguish between manip=
ulating<br class=3D"">bytes and manipulating unsigned characters.<br class=
=3D""></blockquote><br style=3D"font-family: Helvetica; font-size: 12px; fo=
nt-style: normal; font-variant-caps: normal; font-weight: normal; letter-sp=
acing: normal; orphans: auto; text-align: start; text-indent: 0px; text-tra=
nsform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit=
-text-stroke-width: 0px;" class=3D""><span style=3D"font-family: Helvetica;=
font-size: 12px; font-style: normal; font-variant-caps: normal; font-weigh=
t: normal; letter-spacing: normal; orphans: auto; text-align: start; text-i=
ndent: 0px; text-transform: none; white-space: normal; widows: auto; word-s=
pacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !=
important;" class=3D"">And what is "unsigned character", exactly? The stand=
ard defines a character set, but leaves character encoding implementation-d=
efined, except that it says that code units are representable by char. Assu=
ming that code units are always positive (which I don't think is mandated a=
nywhere, but let's keep things sane), you could also store code units as un=
signed chars. But neither char nor unsigned char represents a character, un=
less a code point is equivalent to a code unit.</span><br style=3D"font-fam=
ily: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: nor=
mal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align=
: start; text-indent: 0px; text-transform: none; white-space: normal; widow=
s: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=3D""><br=
style=3D"font-family: Helvetica; font-size: 12px; font-style: normal; font=
-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans=
: auto; text-align: start; text-indent: 0px; text-transform: none; white-sp=
ace: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0p=
x;" class=3D""><span style=3D"font-family: Helvetica; font-size: 12px; font=
-style: normal; font-variant-caps: normal; font-weight: normal; letter-spac=
ing: normal; orphans: auto; text-align: start; text-indent: 0px; text-trans=
form: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-t=
ext-stroke-width: 0px; float: none; display: inline !important;" class=3D""=
>I think when you say "unsigned character" you should actually be saying "c=
ode units", and at this point it's not that much different from "bytes". I =
think, unsigned char should be considered as byte in every respect; if such=
a type is added to C++, it should either be a regular typedef (std::byte_t=
) or an intrinsic integral type that is equivalent to unsigned char.</span>=
<br style=3D"font-family: Helvetica; font-size: 12px; font-style: normal; f=
ont-variant-caps: normal; font-weight: normal; letter-spacing: normal; orph=
ans: auto; text-align: start; text-indent: 0px; text-transform: none; white=
-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width:=
0px;" class=3D""><br style=3D"font-family: Helvetica; font-size: 12px; fon=
t-style: normal; font-variant-caps: normal; font-weight: normal; letter-spa=
cing: normal; orphans: auto; text-align: start; text-indent: 0px; text-tran=
sform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-=
text-stroke-width: 0px;" class=3D""><blockquote type=3D"cite" style=3D"font=
-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps:=
normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-a=
lign: start; text-indent: 0px; text-transform: none; white-space: normal; w=
idows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=3D""=
>My problem with using C++ enum chicanery is that, if you use the type<br c=
lass=3D"">traits mechanisms to ask what `std::byte` is, it will say that it=
's an<br class=3D"">enum, not that it's an integral type. There's no reason=
why `byte`<br class=3D"">should not be an integral type.<br class=3D""></b=
lockquote><br style=3D"font-family: Helvetica; font-size: 12px; font-style:=
normal; font-variant-caps: normal; font-weight: normal; letter-spacing: no=
rmal; orphans: auto; text-align: start; text-indent: 0px; text-transform: n=
one; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-str=
oke-width: 0px;" class=3D""><span style=3D"font-family: Helvetica; font-siz=
e: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal=
; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0p=
x; text-transform: none; white-space: normal; widows: auto; word-spacing: 0=
px; -webkit-text-stroke-width: 0px; float: none; display: inline !important=
;" class=3D"">Agreed.</span><br style=3D"font-family: Helvetica; font-size:=
12px; font-style: normal; font-variant-caps: normal; font-weight: normal; =
letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px;=
text-transform: none; white-space: normal; widows: auto; word-spacing: 0px=
; -webkit-text-stroke-width: 0px;" class=3D""><br style=3D"font-family: Hel=
vetica; font-size: 12px; font-style: normal; font-variant-caps: normal; fon=
t-weight: normal; letter-spacing: normal; orphans: auto; text-align: start;=
text-indent: 0px; text-transform: none; white-space: normal; widows: auto;=
word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=3D""><blockquote=
type=3D"cite" style=3D"font-family: Helvetica; font-size: 12px; font-style=
: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: n=
ormal; orphans: auto; text-align: start; text-indent: 0px; text-transform: =
none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-st=
roke-width: 0px;" class=3D"">Think about it. The C++ standard allows a brea=
k in strict aliasing rules<br class=3D"">for `unsigned char` and `char`. Wh=
y? Not because the standard thinks<br class=3D"">it's reasonable for people=
to alias UTF-8 strings. But because that's<br class=3D"">the only way to p=
ass/manipulate a byte array. And byte arrays need to be<br class=3D"">able =
to alias.<br class=3D""><br class=3D"">It's the same reason why we have `ch=
ar16_t` as a distinct type from<br class=3D"">`uint_least16_t`. Because the=
re is a fundamental semantic difference<br class=3D"">between an array of u=
nsigned integers that are at least 16-bits in size<br class=3D"">and an arr=
ay of UTF-16 code units. One of this is a string; the other is<br class=3D"=
">not.<br class=3D""><br class=3D"">It's time we had such a distinction for=
bytes. And UTF-8 code units, for<br class=3D"">that matter.<br class=3D"">=
</blockquote><br style=3D"font-family: Helvetica; font-size: 12px; font-sty=
le: normal; font-variant-caps: normal; font-weight: normal; letter-spacing:=
normal; orphans: auto; text-align: start; text-indent: 0px; text-transform=
: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-=
stroke-width: 0px;" class=3D""><span style=3D"font-family: Helvetica; font-=
size: 12px; font-style: normal; font-variant-caps: normal; font-weight: nor=
mal; letter-spacing: normal; orphans: auto; text-align: start; text-indent:=
0px; text-transform: none; white-space: normal; widows: auto; word-spacing=
: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !import=
ant;" class=3D"">Thing is, unsigned char is already allowed to alias, and i=
f we add the intrinsic byte type that is also allowed to alias, we don't ma=
ke that distinction you talk about. And if we prohibit unsigned char to ali=
as other types, we will render lots of existing code invalid. We could add =
an intrinsic char8_t instead and say it will only represent narrow characte=
r code units and not alias other types. This way unsigned char is left as t=
he "byte" type and we have the other type for string processing.</span></di=
v></blockquote>I think this is the real issue here: a type that is dedicate=
d to represent only characters and not allowed to alias anything else. Havi=
ng a char* can severely limit the compiler=E2=80=99s ability to optimize yo=
ur function because of all the aliasing implications, even if you as the au=
thor know it never aliases anything because it actually *is* a string. Havi=
ng a type that clearly conveys this semantic to the compiler would be usefu=
l.</div><div><br class=3D""></div></body></html>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/684B5982-22CB-4357-9A1A-09F4588ADA8A%=
40gmail.com?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/684B5982-22CB-4357-9A1A-09F4588ADA8A%=
40gmail.com</a>.<br />
--Apple-Mail=_76CE7E6B-C53E-47AF-A878-7CA2C4CFBBA6--
.
Author: Tom Honermann <tom@honermann.net>
Date: Wed, 2 Nov 2016 09:46:15 -0400
Raw View
On 11/02/2016 01:16 AM, Nicol Bolas wrote:
> It's time we had such a distinction for bytes. And UTF-8 code units,
> for that matter.
>
In case you haven't seen the latest proposal yet:
- P0482R0: char8_t: A type for UTF-8 characters and strings
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0482r0.html
Any feedback you might have would be appreciated.
Tom.
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/988fd18c-c2ed-7ae6-fac4-5150af335cf1%40honermann.net.
.
Author: Nicol Bolas <jmckesson@gmail.com>
Date: Wed, 2 Nov 2016 06:55:29 -0700 (PDT)
Raw View
------=_Part_2423_1043699034.1478094929894
Content-Type: multipart/alternative;
boundary="----=_Part_2424_1028791155.1478094929894"
------=_Part_2424_1028791155.1478094929894
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
On Wednesday, November 2, 2016 at 2:43:35 AM UTC-4, Thiago Macieira wrote:
>
> Em ter=C3=A7a-feira, 1 de novembro de 2016, =C3=A0s 22:16:32 PDT, Nicol B=
olas=20
> escreveu:=20
> > On Wednesday, November 2, 2016 at 12:55:46 AM UTC-4, Thiago Macieira=20
> wrote:=20
> > > That would also not be a good idea because, unsigned char is, by=20
> > > defintion, a=20
> > > byte. Why should we have (more) types that mean exactly the same=20
> thing,=20
> > > and=20
> > > this time in all platforms, by definition?=20
> >=20
> > Because "unsigned char" *also* means "unsigned character". With just=20
> > `unsigned char`, there is no way to distinguish between manipulating=20
> bytes=20
> > and manipulating unsigned characters.=20
> >=20
> > That's what `byte` is for, as a type: a way to semantically=20
> differentiate=20
> > between operations on bytes and operations on characters. The types can=
=20
> be=20
> > inter-convertible, numerically speaking, but they don't mean the same=
=20
> thing.=20
>
> I'm sorry, I don't agree that there's a distinction in the first place.=
=20
> Bytes=20
> are used more often than just copying around. If you add, subtract, shift=
=20
> left=20
> or right, perform bitwise operations, etc, you need the value. If I need=
=20
> the=20
> value, then a zero is a zero is a zero, a 0x40 is still a 0x40.=20
>
> Also, I can assign 'a' to any integer type. Maybe this was the main issue=
:=20
> that single-quote character literals automatically convert to integral,=
=20
> instead of staying a character. We're 40 years too late to change this,=
=20
> though=20
> (since B).=20
>
> > My problem with using C++ enum chicanery is that, if you use the type=
=20
> > traits mechanisms to ask what `std::byte` is, it will say that it's an=
=20
> > enum, not that it's an integral type. There's no reason why `byte`=20
> should=20
> > not be an integral type.=20
>
> Agreed, but I'm going to go further and say that it's pretty useless for =
a=20
> lot=20
> of use-cases. I need a value of a lot of operations and an enum won't giv=
e=20
> it=20
> to me unless I cast it to a suitable integral in the first place --=20
> unsigned=20
> char (that is, a *real* byte).=20
>
> This week I've been spending time working on hashing algorithms, notably=
=20
> SipHash (btw, implementations should reconsider their std::hash=20
> algorithms).=20
> In order to implement it, I needed to access the byte array and do=20
> byte-level=20
> operations like rotating left, XOR, and additions. Not only would=20
> std::byte=20
> not work for me, I fail to see how the operations I'm doing are any=20
> different=20
> than the operations on an unsigned char.
>
P0298's std::byte permits all of those operations via operator overloading.
> It's the same reason why we have `char16_t` as a distinct type from=20
> > `uint_least16_t`. Because there is a fundamental semantic difference=20
> > between an array of unsigned integers that are at least 16-bits in size=
=20
> and=20
> > an array of UTF-16 code units. One of this is a string; the other is=20
> not.=20
>
> The only benefit I see there is allowing overloading.=20
>
> But while that may be true, what's the point of an *unsigned* char? If yo=
u=20
> want to do character operations, you use char. If you're using unsigned=
=20
> char,=20
> that's because you want a byte, plain and simple. By this argument, we=20
> already=20
> have the distinction between character operations and byte operations.
>
You must not do much UTF-8 work. Because `char` can be signed or unsigned,=
=20
the only reasonable way to manipulate UTF-8 code units is to use `unsigned=
=20
char`. While a signed `char` is required to be able to store UTF-8 code=20
units, you don't want to invoke implementation-defined behavior when=20
bitshifting signed types. So you have to use `unsigned char`.
So it's hardly unreasonable to pass UTF-8 strings around via `unsigned=20
char*`, rather than `char*`.
--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/2145466f-4680-4489-8728-258fd4374aff%40isocpp.or=
g.
------=_Part_2424_1028791155.1478094929894
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">On Wednesday, November 2, 2016 at 2:43:35 AM UTC-4, Thiago=
Macieira wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin=
-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">Em ter=C3=A7a-=
feira, 1 de novembro de 2016, =C3=A0s 22:16:32 PDT, Nicol Bolas escreveu:
<br>> On Wednesday, November 2, 2016 at 12:55:46 AM UTC-4, Thiago Maciei=
ra wrote:
<br>> > That would also not be a good idea because, unsigned char is,=
by
<br>> > defintion, a
<br>> > byte. Why should we have (more) types that mean exactly the s=
ame thing,
<br>> > and
<br>> > this time in all platforms, by definition?
<br>>=20
<br>> Because "unsigned char" *also* means "unsigned char=
acter". With just
<br>> `unsigned char`, there is no way to distinguish between manipulati=
ng bytes
<br>> and manipulating unsigned characters.
<br>>=20
<br>> That's what `byte` is for, as a type: a way to semantically di=
fferentiate
<br>> between operations on bytes and operations on characters. The type=
s can be
<br>> inter-convertible, numerically speaking, but they don't mean t=
he same thing.
<br>
<br>I'm sorry, I don't agree that there's a distinction in the =
first place. Bytes=20
<br>are used more often than just copying around. If you add, subtract, shi=
ft left=20
<br>or right, perform bitwise operations, etc, you need the value. If I nee=
d the=20
<br>value, then a zero is a zero is a zero, a 0x40 is still a 0x40.
<br>
<br>Also, I can assign 'a' to any integer type. Maybe this was the =
main issue:=20
<br>that single-quote character literals automatically convert to integral,=
=20
<br>instead of staying a character. We're 40 years too late to change t=
his, though=20
<br>(since B).
<br>
<br>> My problem with using C++ enum chicanery is that, if you use the t=
ype
<br>> traits mechanisms to ask what `std::byte` is, it will say that it&=
#39;s an
<br>> enum, not that it's an integral type. There's no reason wh=
y `byte` should
<br>> not be an integral type.
<br>
<br>Agreed, but I'm going to go further and say that it's pretty us=
eless for a lot=20
<br>of use-cases. I need a value of a lot of operations and an enum won'=
;t give it=20
<br>to me unless I cast it to a suitable integral in the first place -- uns=
igned=20
<br>char (that is, a *real* byte).
<br>
<br>This week I've been spending time working on hashing algorithms, no=
tably=20
<br>SipHash (btw, implementations should reconsider their std::hash algorit=
hms).=20
<br>In order to implement it, I needed to access the byte array and do byte=
-level=20
<br>operations like rotating left, XOR, and additions. Not only would std::=
byte=20
<br>not work for me, I fail to see how the operations I'm doing are any=
different=20
<br>than the operations on an unsigned char.<br></blockquote><div><br>P0298=
's std::byte permits all of those operations via operator overloading.<=
br><br></div><blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-le=
ft: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">
> It's the same reason why we have `char16_t` as a distinct type fro=
m
<br>> `uint_least16_t`. Because there is a fundamental semantic differen=
ce
<br>> between an array of unsigned integers that are at least 16-bits in=
size and
<br>> an array of UTF-16 code units. One of this is a string; the other =
is not.
<br>
<br>The only benefit I see there is allowing overloading.
<br>
<br>But while that may be true, what's the point of an *unsigned* char?=
If you=20
<br>want to do character operations, you use char. If you're using unsi=
gned char,=20
<br>that's because you want a byte, plain and simple. By this argument,=
we already=20
<br>have the distinction between character operations and byte operations.<=
br></blockquote><div><br>You must not do much UTF-8 work. Because `char` ca=
n be signed or unsigned, the only reasonable way to manipulate UTF-8 code u=
nits is to use `unsigned char`. While a signed `char` is required to be abl=
e to store UTF-8 code units, you don't want to invoke implementation-de=
fined behavior when bitshifting signed types. So you have to use `unsigned =
char`.<br><br>So it's hardly unreasonable to pass UTF-8 strings around =
via `unsigned char*`, rather than `char*`.<br></div></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/2145466f-4680-4489-8728-258fd4374aff%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/2145466f-4680-4489-8728-258fd4374aff=
%40isocpp.org</a>.<br />
------=_Part_2424_1028791155.1478094929894--
------=_Part_2423_1043699034.1478094929894--
.
Author: Tom Honermann <tom@honermann.net>
Date: Wed, 2 Nov 2016 10:01:49 -0400
Raw View
This is a multi-part message in MIME format.
--------------9BBFD717F93BBE58E2A94ABE
Content-Type: text/plain; charset=UTF-8; format=flowed
On 11/02/2016 05:02 AM, Andrey Semashev wrote:
> Assuming that code units are always positive (which I don't think is
> mandated anywhere, but let's keep things sane),
I don't know of any encodings that specify negative values for code
units or code points. However, the mapping of UTF-8 code units to char
results in negative code unit values in UTF-8 strings for
implementations that use a signed 8-bit char. So, in practice, sanity
does not always prevail. This is one of the motivations for P0482R0
<http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0482r0.html>.
Tom.
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/5933359a-3f59-33df-10db-e643bfa39c55%40honermann.net.
--------------9BBFD717F93BBE58E2A94ABE
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<html>
<head>
<meta content=3D"text/html; charset=3Dutf-8" http-equiv=3D"Content-Type=
">
</head>
<body bgcolor=3D"#FFFFFF" text=3D"#000000">
<div class=3D"moz-cite-prefix">On 11/02/2016 05:02 AM, Andrey Semashev
wrote:<br>
</div>
<blockquote
cite=3D"mid:97ebfc84-c5a3-eaf5-efcb-c1bdf9b507a5@gmail.com"
type=3D"cite">Assuming that code units are always positive (which I
don't think is mandated anywhere, but let's keep things sane), <br>
</blockquote>
<br>
I don't know of any encodings that specify negative values for code
units or code points.=C2=A0 However, the mapping of UTF-8 code units to
char results in negative code unit values in UTF-8 strings for
implementations that use a signed 8-bit char.=C2=A0 So, in practice,
sanity does not always prevail.=C2=A0 This is one of the motivations fo=
r
<a
href=3D"http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0482r0.htm=
l">P0482R0</a>.<br>
<br>
Tom.<br>
</body>
</html>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/5933359a-3f59-33df-10db-e643bfa39c55%=
40honermann.net?utm_medium=3Demail&utm_source=3Dfooter">https://groups.goog=
le.com/a/isocpp.org/d/msgid/std-proposals/5933359a-3f59-33df-10db-e643bfa39=
c55%40honermann.net</a>.<br />
--------------9BBFD717F93BBE58E2A94ABE--
.
Author: Nicol Bolas <jmckesson@gmail.com>
Date: Wed, 2 Nov 2016 07:10:10 -0700 (PDT)
Raw View
------=_Part_598_382808584.1478095811041
Content-Type: multipart/alternative;
boundary="----=_Part_599_1527175469.1478095811041"
------=_Part_599_1527175469.1478095811041
Content-Type: text/plain; charset=UTF-8
On Wednesday, November 2, 2016 at 9:46:18 AM UTC-4, Tom Honermann wrote:
>
> On 11/02/2016 01:16 AM, Nicol Bolas wrote:
> > It's time we had such a distinction for bytes. And UTF-8 code units,
> > for that matter.
> >
> In case you haven't seen the latest proposal yet:
> - P0482R0: char8_t: A type for UTF-8 characters and strings
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0482r0.html
>
> Any feedback you might have would be appreciated.
>
A proposal which was apparently soundly rejected by EWG
<https://botondballo.wordpress.com/2016/07/06/trip-report-c-standards-meeting-in-oulu-june-2016/>at
the previous meeting.
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/b20c0ab7-b224-4f4f-925b-07b573d70229%40isocpp.org.
------=_Part_599_1527175469.1478095811041
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><br><br>On Wednesday, November 2, 2016 at 9:46:18 AM UTC-4=
, Tom Honermann wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;=
margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">On 11/02=
/2016 01:16 AM, Nicol Bolas wrote:
<br>> It's time we had such a distinction for bytes. And UTF-8 code =
units,=20
<br>> for that matter.
<br>>
<br>In case you haven't seen the latest proposal yet:
<br>- P0482R0: char8_t: A type for UTF-8 characters and strings
<br><a href=3D"http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p048=
2r0.html" target=3D"_blank" rel=3D"nofollow" onmousedown=3D"this.href=3D=
9;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.open-std.org%2Fjtc1%2Fsc2=
2%2Fwg21%2Fdocs%2Fpapers%2F2016%2Fp0482r0.html\x26sa\x3dD\x26sntz\x3d1\x26u=
sg\x3dAFQjCNFTejH-6at2nIYCUaCfsD1aCks1tA';return true;" onclick=3D"this=
..href=3D'http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.open-std.org%2=
Fjtc1%2Fsc22%2Fwg21%2Fdocs%2Fpapers%2F2016%2Fp0482r0.html\x26sa\x3dD\x26snt=
z\x3d1\x26usg\x3dAFQjCNFTejH-6at2nIYCUaCfsD1aCks1tA';return true;">http=
://www.open-std.org/jtc1/<wbr>sc22/wg21/docs/papers/2016/<wbr>p0482r0.html<=
/a>
<br>
<br>Any feedback you might have would be appreciated.<br></blockquote><div>=
<br>A proposal which was <a href=3D"https://botondballo.wordpress.com/2016/=
07/06/trip-report-c-standards-meeting-in-oulu-june-2016/">apparently soundl=
y rejected by EWG </a>at the previous meeting.<br></div></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/b20c0ab7-b224-4f4f-925b-07b573d70229%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/b20c0ab7-b224-4f4f-925b-07b573d70229=
%40isocpp.org</a>.<br />
------=_Part_599_1527175469.1478095811041--
------=_Part_598_382808584.1478095811041--
.
Author: Thiago Macieira <thiago@macieira.org>
Date: Wed, 02 Nov 2016 07:10:51 -0700
Raw View
Em quarta-feira, 2 de novembro de 2016, =C3=A0s 06:55:29 PDT, Nicol Bolas e=
screveu:
> You must not do much UTF-8 work. Because `char` can be signed or unsigned=
,
> the only reasonable way to manipulate UTF-8 code units is to use `unsigne=
d
> char`. While a signed `char` is required to be able to store UTF-8 code
> units, you don't want to invoke implementation-defined behavior when
> bitshifting signed types. So you have to use `unsigned char`.
>=20
> So it's hardly unreasonable to pass UTF-8 strings around via `unsigned
> char*`, rather than `char*`.
I didn't respond to that part of the email, about UTF-8, because I thought =
we=20
were getting char8_t (see Tom Honermann's email).
Why do we need:
- char
- unsigned char
- char8_t
- byte
It seems to me we have one too many. Why do we need four? What are the four=
=20
distinction types?
--=20
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/3804442.N9iLZUPZCD%40tjmaciei-mobl1.
.
Author: Tom Honermann <tom@honermann.net>
Date: Wed, 2 Nov 2016 10:14:32 -0400
Raw View
This is a multi-part message in MIME format.
--------------636D9CBDB09B259F174D748E
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
On 11/02/2016 08:14 AM, Miro Knejp wrote:
> I think this is the real issue here: a type that is dedicated to=20
> represent only characters and not allowed to alias anything else.=20
> Having a char* can severely limit the compiler=E2=80=99s ability to optim=
ize=20
> your function because of all the aliasing implications, even if you as=20
> the author know it never aliases anything because it actually *is* a=20
> string. Having a type that clearly conveys this semantic to the=20
> compiler would be useful.
I recall this being discussed during the presentation of P0257R0=20
<http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0257r0.pdf> in=20
Jacksonville; thoughts were expressed that, if an alternative type to=20
accommodate aliasing needs can be popularized, then perhaps the aliasing=20
behaviors of char can be removed in some future standard. I think that=20
is a more realistic perspective than popularizing a new type to replace=20
char. I'll note that the char8_t type proposed in P0482R0=20
<http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0482r0.html>=20
for UTF-8 characters and strings would not include char's aliasing=20
behavior, thus at least allowing for more optimization possibilities for=20
UTF-8 characters/strings.
Tom.
--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/6a3dd40c-c25c-2440-b8b5-922700a4d989%40honermann=
..net.
--------------636D9CBDB09B259F174D748E
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<html>
<head>
<meta content=3D"text/html; charset=3Dutf-8" http-equiv=3D"Content-Type=
">
</head>
<body bgcolor=3D"#FFFFFF" text=3D"#000000">
<div class=3D"moz-cite-prefix">On 11/02/2016 08:14 AM, Miro Knejp
wrote:<br>
</div>
<blockquote
cite=3D"mid:684B5982-22CB-4357-9A1A-09F4588ADA8A@gmail.com"
type=3D"cite">I think this is the real issue here: a type that is
dedicated to represent only characters and not allowed to alias
anything else. Having a char* can severely limit the compiler=E2=80=
=99s
ability to optimize your function because of all the aliasing
implications, even if you as the author know it never aliases
anything because it actually *is* a string. Having a type that
clearly conveys this semantic to the compiler would be useful.</block=
quote>
<br>
I recall this being discussed during the presentation of <a
href=3D"http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0257r0.pdf=
">P0257R0</a>
in Jacksonville; thoughts were expressed that, if an alternative
type to accommodate aliasing needs can be popularized, then perhaps
the aliasing behaviors of char can be removed in some future
standard.=C2=A0 I think that is a more realistic perspective than
popularizing a new type to replace char.=C2=A0 I'll note that the char8=
_t
type proposed in <a
href=3D"http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0482r0.htm=
l">P0482R0</a>
for UTF-8 characters and strings would not include char's aliasing
behavior, thus at least allowing for more optimization possibilities
for UTF-8 characters/strings.<br>
<br>
Tom.<br>
</body>
</html>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/6a3dd40c-c25c-2440-b8b5-922700a4d989%=
40honermann.net?utm_medium=3Demail&utm_source=3Dfooter">https://groups.goog=
le.com/a/isocpp.org/d/msgid/std-proposals/6a3dd40c-c25c-2440-b8b5-922700a4d=
989%40honermann.net</a>.<br />
--------------636D9CBDB09B259F174D748E--
.
Author: Tom Honermann <tom@honermann.net>
Date: Wed, 2 Nov 2016 10:18:47 -0400
Raw View
This is a multi-part message in MIME format.
--------------8146AE986B0E1291A31F2AB4
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
On 11/02/2016 10:10 AM, Thiago Macieira wrote:
> Em quarta-feira, 2 de novembro de 2016, =C3=A0s 06:55:29 PDT, Nicol Bolas=
escreveu:
>> You must not do much UTF-8 work. Because `char` can be signed or unsigne=
d,
>> the only reasonable way to manipulate UTF-8 code units is to use `unsign=
ed
>> char`. While a signed `char` is required to be able to store UTF-8 code
>> units, you don't want to invoke implementation-defined behavior when
>> bitshifting signed types. So you have to use `unsigned char`.
>>
>> So it's hardly unreasonable to pass UTF-8 strings around via `unsigned
>> char*`, rather than `char*`.
> I didn't respond to that part of the email, about UTF-8, because I though=
t we
> were getting char8_t (see Tom Honermann's email).
I'd like for that to happen, but P0482R0=20
<http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0482r0.html>=20
has not yet been presented nor received any discussion. I hope to=20
present/discuss in Issaquah, time permitting.
Tom.
--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/e48c78ed-3044-ddda-1efe-f7ee0122dad5%40honermann=
..net.
--------------8146AE986B0E1291A31F2AB4
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<html>
<head>
<meta content=3D"text/html; charset=3Dutf-8" http-equiv=3D"Content-Type=
">
</head>
<body bgcolor=3D"#FFFFFF" text=3D"#000000">
<div class=3D"moz-cite-prefix">On 11/02/2016 10:10 AM, Thiago Macieira
wrote:<br>
</div>
<blockquote cite=3D"mid:3804442.N9iLZUPZCD@tjmaciei-mobl1" type=3D"cite=
">
<pre wrap=3D"">Em quarta-feira, 2 de novembro de 2016, =C3=A0s 06:55:=
29 PDT, Nicol Bolas escreveu:
</pre>
<blockquote type=3D"cite">
<pre wrap=3D"">You must not do much UTF-8 work. Because `char` can =
be signed or unsigned,
the only reasonable way to manipulate UTF-8 code units is to use `unsigned
char`. While a signed `char` is required to be able to store UTF-8 code
units, you don't want to invoke implementation-defined behavior when
bitshifting signed types. So you have to use `unsigned char`.
So it's hardly unreasonable to pass UTF-8 strings around via `unsigned
char*`, rather than `char*`.
</pre>
</blockquote>
<pre wrap=3D"">
I didn't respond to that part of the email, about UTF-8, because I thought =
we=20
were getting char8_t (see Tom Honermann's email).</pre>
</blockquote>
I'd like for that to happen, but <a
href=3D"http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0482r0.htm=
l">P0482R0</a>
has not yet been presented nor received any discussion.=C2=A0 I hope to
present/discuss in Issaquah, time permitting.<br>
<br>
Tom.<br>
</body>
</html>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/e48c78ed-3044-ddda-1efe-f7ee0122dad5%=
40honermann.net?utm_medium=3Demail&utm_source=3Dfooter">https://groups.goog=
le.com/a/isocpp.org/d/msgid/std-proposals/e48c78ed-3044-ddda-1efe-f7ee0122d=
ad5%40honermann.net</a>.<br />
--------------8146AE986B0E1291A31F2AB4--
.
Author: Tom Honermann <tom@honermann.net>
Date: Wed, 2 Nov 2016 10:25:57 -0400
Raw View
This is a multi-part message in MIME format.
--------------A5B6ECF13C59CA3F444E7B8E
Content-Type: text/plain; charset=UTF-8; format=flowed
On 11/02/2016 10:10 AM, Nicol Bolas wrote:
>
>
> On Wednesday, November 2, 2016 at 9:46:18 AM UTC-4, Tom Honermann wrote:
>
> On 11/02/2016 01:16 AM, Nicol Bolas wrote:
> > It's time we had such a distinction for bytes. And UTF-8 code
> units,
> > for that matter.
> >
> In case you haven't seen the latest proposal yet:
> - P0482R0: char8_t: A type for UTF-8 characters and strings
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0482r0.html
> <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0482r0.html>
>
>
> Any feedback you might have would be appreciated.
>
>
> A proposal which was apparently soundly rejected by EWG
> <https://botondballo.wordpress.com/2016/07/06/trip-report-c-standards-meeting-in-oulu-june-2016/>at
> the previous meeting.
That isn't my understanding. P0372R0
<http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0372r0.html>
was presented to EWG in Oulu. The wiki notes state that further
discussion has been delegated to LEWG. The link that you provided
states similarly and does not state that it was rejected. I've had some
correspondence with the authors and with a LEWG member that raised
concerns. Some of those concerns are addressed in P0482R0
<http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0482r0.html>,
but I ran out of time before the Issaquah mailing deadline to address
them all. The paper notes some incomplete items at the end of the
introduction.
Tom.
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/e8aee427-0b35-8952-41b1-e2f11875c908%40honermann.net.
--------------A5B6ECF13C59CA3F444E7B8E
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<html>
<head>
<meta content=3D"text/html; charset=3Dutf-8" http-equiv=3D"Content-Type=
">
</head>
<body bgcolor=3D"#FFFFFF" text=3D"#000000">
<div class=3D"moz-cite-prefix">On 11/02/2016 10:10 AM, Nicol Bolas
wrote:<br>
</div>
<blockquote
cite=3D"mid:b20c0ab7-b224-4f4f-925b-07b573d70229@isocpp.org"
type=3D"cite">
<div dir=3D"ltr"><br>
<br>
On Wednesday, November 2, 2016 at 9:46:18 AM UTC-4, Tom
Honermann wrote:
<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left:
0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">On
11/02/2016 01:16 AM, Nicol Bolas wrote:
<br>
> It's time we had such a distinction for bytes. And UTF-8
code units, <br>
> for that matter.
<br>
>
<br>
In case you haven't seen the latest proposal yet:
<br>
- P0482R0: char8_t: A type for UTF-8 characters and strings
<br>
<a moz-do-not-send=3D"true"
href=3D"http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0482r0.htm=
l"
target=3D"_blank" rel=3D"nofollow"
onmousedown=3D"this.href=3D'http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww=
..open-std.org%2Fjtc1%2Fsc22%2Fwg21%2Fdocs%2Fpapers%2F2016%2Fp0482r0.html\x2=
6sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFTejH-6at2nIYCUaCfsD1aCks1tA';return
true;"
onclick=3D"this.href=3D'http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.ope=
n-std.org%2Fjtc1%2Fsc22%2Fwg21%2Fdocs%2Fpapers%2F2016%2Fp0482r0.html\x26sa\=
x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFTejH-6at2nIYCUaCfsD1aCks1tA';return
true;">http://www.open-std.org/jtc1/<wbr>sc22/wg21/docs/papers/=
2016/<wbr>p0482r0.html</a>
<br>
<br>
Any feedback you might have would be appreciated.<br>
</blockquote>
<div><br>
A proposal which was <a moz-do-not-send=3D"true"
href=3D"https://botondballo.wordpress.com/2016/07/06/trip-report-c-standard=
s-meeting-in-oulu-june-2016/">apparently
soundly rejected by EWG </a>at the previous meeting.</div>
</div>
</blockquote>
<br>
That isn't my understanding.=C2=A0 <a
href=3D"http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0372r0.htm=
l">P0372R0</a>
was presented to EWG in Oulu.=C2=A0 The wiki notes state that further
discussion has been delegated to LEWG.=C2=A0 The link that you provided
states similarly and does not state that it was rejected.=C2=A0 I've ha=
d
some correspondence with the authors and with a LEWG member that
raised concerns.=C2=A0 Some of those concerns are addressed in <a
href=3D"http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0482r0.htm=
l">P0482R0</a>,
but I ran out of time before the Issaquah mailing deadline to
address them all.=C2=A0 The paper notes some incomplete items at the en=
d
of the introduction.<br>
<br>
Tom.<br>
</body>
</html>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/e8aee427-0b35-8952-41b1-e2f11875c908%=
40honermann.net?utm_medium=3Demail&utm_source=3Dfooter">https://groups.goog=
le.com/a/isocpp.org/d/msgid/std-proposals/e8aee427-0b35-8952-41b1-e2f11875c=
908%40honermann.net</a>.<br />
--------------A5B6ECF13C59CA3F444E7B8E--
.
Author: Andrey Semashev <andrey.semashev@gmail.com>
Date: Wed, 2 Nov 2016 17:50:56 +0300
Raw View
On 11/02/16 17:01, Tom Honermann wrote:
> On 11/02/2016 05:02 AM, Andrey Semashev wrote:
>> Assuming that code units are always positive (which I don't think is
>> mandated anywhere, but let's keep things sane),
>
> I don't know of any encodings that specify negative values for code
> units or code points. However, the mapping of UTF-8 code units to char
> results in negative code unit values in UTF-8 strings for
> implementations that use a signed 8-bit char. So, in practice, sanity
> does not always prevail. This is one of the motivations for P0482R0
> <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0482r0.html>.
AFAIR, Unicode does not define negative code units. The fact that code
units greater than 0x7f cannot be represented in 8-bit signed `char`
simply means overflow and implementation-defined behavior. In other
words, UTF-8 code units cannot be represented by `char` on such platforms.
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/a6b76054-436c-9f0f-db6d-f93b95c63938%40gmail.com.
.
Author: Tom Honermann <tom@honermann.net>
Date: Wed, 2 Nov 2016 11:14:26 -0400
Raw View
On 11/2/2016 10:50 AM, Andrey Semashev wrote:
> On 11/02/16 17:01, Tom Honermann wrote:
>> On 11/02/2016 05:02 AM, Andrey Semashev wrote:
>>> Assuming that code units are always positive (which I don't think is
>>> mandated anywhere, but let's keep things sane),
>>
>> I don't know of any encodings that specify negative values for code
>> units or code points. However, the mapping of UTF-8 code units to char
>> results in negative code unit values in UTF-8 strings for
>> implementations that use a signed 8-bit char. So, in practice, sanity
>> does not always prevail. This is one of the motivations for P0482R0
>> <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0482r0.html>.
>
> AFAIR, Unicode does not define negative code units. The fact that code
> units greater than 0x7f cannot be represented in 8-bit signed `char`
> simply means overflow and implementation-defined behavior. In other
> words, UTF-8 code units cannot be represented by `char` on such
> platforms.
The standard specifies behavior a little stronger than that:
C++14 [basic.fundamental] 3.9.1p1:
"For each value i of type unsigned char in the range 0 to 255 inclusive,
there exists a value j of type char such that the result
of an integral conversion (4.7) from i to char is j, and the result of
an integral conversion from j to unsigned
char is i."
So, arguably, UTF-8 code units with value greater than 0x7f must be
representable in char, though their values are mapped in an
implementation defined way. Obtaining the UTF-8 code unit value
requires conversion to unsigned char. This means that it is possible to
fully work with UTF-8 data, including UTF-8 character and string
literals, though it is quite frustrating and error prone.
Tom.
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/0092ba40-3c2a-23ef-5a60-2512037b1489%40honermann.net.
.