Topic: Why are there no standardized string manipulation
Author: george.h.personal@gmail.com
Date: Wed, 28 Jun 2017 15:48:55 -0700 (PDT)
Raw View
------=_Part_1739_304261109.1498690135990
Content-Type: multipart/alternative;
boundary="----=_Part_1740_1679743719.1498690135990"
------=_Part_1740_1679743719.1498690135990
Content-Type: text/plain; charset="UTF-8"
I assume the biggest argument against more diverse string manipulation
functions would be that strings are a container and, as such, most common
string related algorithms can be implemented with the algorithms library.
I feel like the biggest hurdle by far when presenting C++ to a new person
is the fact that I have to come up with an excuse for why the language
doesn't have basic string manipulation that they'd expect it to have and
tell them the alternative is to either use boost (and that opens up another
can of worms) or write their own. I also bet that there are a metric ton of
bugs out there that come as a result of poorly implemented string
processing functions (e.g. implementations of to lower case that aren't
compatible with unicode). I feel like having something like:
void std::to_lowercase(std::string& str);
void std::to_uppercase(std::string& str);
void std::string_replace(std::string& str, std::string pattern, std::string
replacement);
void std::trim_right(std::string& str, std::string pattern = " ");
void std::trim(std::string& str, std::string pattern = " ");
void std::trim_left(std::string& str, std::string pattern = " ");
bool std::string_contains(std::string& str, std::string str); //(arguably
this isn't need when we have find, but I still have to read the
documentation to understand how to use find, in this case the func
signature is enough to understand how to use it)
.... There's probably a dozen more necessary algorithm I haven't thought of.
I think they would be all easy to implement, to the point where anyone in
one of the wgs could spew a paper on the matter in a couple of minutes. The
boost library already has some of these algorithms (though some are
surprisingly slow), so implementation could even be more or less copied
from there.
Functions like std::min and std::max exist, and they are quite basic, so I
see no point in not having string manipulation functions just because they
are simple in implementation. As for them being less generic than using
std::algorithm... that is true, but so is using std::string instead of
using std::vector<rune_sized_type>, in the end I think dropping some
esoteric purity in design principle to make the language way more beginner
friendly and avoid an insane amount of code duplication would be for the
greater good.
That being my view on the matter, my questions is, why is this view wrong ?
I assume there were 1001 proposals to implement some string manipulation
functionality in the std, why hasn't it been done ?
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/02cf02f4-df2e-4eb0-adcf-a7d167cd7e56%40isocpp.org.
------=_Part_1740_1679743719.1498690135990
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">I assume the biggest argument against more diverse string =
manipulation functions would be that strings are a container and, as such, =
most common string related algorithms can be implemented with the algorithm=
s library.<div><br></div><div>I feel like the biggest hurdle by far when pr=
esenting C++ to a new person is the fact that I have to come up with an exc=
use for why the language doesn't have basic string manipulation that th=
ey'd expect it to have and tell them the alternative is to either use b=
oost (and that opens up another can of worms) or write their own. I also be=
t that there are a metric ton of bugs out there that come as a result of po=
orly implemented string processing functions (e.g. implementations of to lo=
wer case that aren't compatible with unicode). I feel like having somet=
hing like:</div><div><br></div><div>void std::to_lowercase(std::string&=
str);</div><div><br></div><div>void std::to_uppercase(std::string& str=
);</div><div><br></div><div>void std::string_replace(std::string& str, =
std::string pattern, std::string replacement);</div><div><br></div><div>voi=
d std::trim_right(std::string& str, std::string pattern =3D " &quo=
t;);<br></div><div><br></div><div>void std::trim(std::string& str, std:=
:string pattern =3D " ");<br></div><div><br></div><div>void std::=
trim_left(std::string& str, std::string pattern =3D " ");<br>=
</div><div><br></div><div>bool std::string_contains(std::string& str, s=
td::string str); //(arguably this isn't need when we have find, but I s=
till have to read the documentation to understand how to use find, in this =
case the func signature is enough to understand how to use it)</div><div><b=
r></div><div>... There's probably a dozen more necessary algorithm I ha=
ven't thought of. I think they would be all easy to implement, to the p=
oint where anyone in one of the wgs could spew a paper on the matter in a c=
ouple of minutes. The boost library already has some of these algorithms (t=
hough some are surprisingly slow), so implementation could even be more or =
less copied from there.</div><div><br></div><div>Functions like std::min an=
d std::max exist, and they are quite basic, so I see no point in not having=
string manipulation functions just because they are simple in implementati=
on. As for them being less generic than using std::algorithm... that is tru=
e, but so is using std::string instead of using std::vector<rune_sized_t=
ype>, in the end I think dropping some esoteric purity in design princip=
le to make the language way more beginner friendly and avoid an insane amou=
nt of code duplication would be for the greater good.</div><div><br></div><=
div>That being my view on the matter, my questions is, why is this view wro=
ng ? I assume there were 1001 proposals to implement some string manipulati=
on functionality in the std, why hasn't it been done ?</div></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/02cf02f4-df2e-4eb0-adcf-a7d167cd7e56%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/02cf02f4-df2e-4eb0-adcf-a7d167cd7e56=
%40isocpp.org</a>.<br />
------=_Part_1740_1679743719.1498690135990--
------=_Part_1739_304261109.1498690135990--
.
Author: Thiago Macieira <thiago@macieira.org>
Date: Wed, 28 Jun 2017 16:09:24 -0700
Raw View
On Wednesday, 28 June 2017 15:48:55 PDT george.h.personal@gmail.com wrote:
> I assume there were 1001 proposals to implement some string manipulation
> functionality in the std, why hasn't it been done ?
One of the reasons is that doing so requires standard library vendors to bring
in a huge database of Unicode character properties.
Another is that the Standard Library does not have to provide every
functionality under the sun. There are other libraries that can be used to
supplement functionality, like ICU.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/8087084.sQRXrLiOkF%40tjmaciei-mobl1.
.
Author: Nicol Bolas <jmckesson@gmail.com>
Date: Wed, 28 Jun 2017 17:31:58 -0700 (PDT)
Raw View
------=_Part_3159_527084883.1498696318162
Content-Type: multipart/alternative;
boundary="----=_Part_3160_345655200.1498696318162"
------=_Part_3160_345655200.1498696318162
Content-Type: text/plain; charset="UTF-8"
On Wednesday, June 28, 2017 at 7:09:29 PM UTC-4, Thiago Macieira wrote:
>
> On Wednesday, 28 June 2017 15:48:55 PDT george.h...@gmail.com
> <javascript:> wrote:
> > I assume there were 1001 proposals to implement some string manipulation
> > functionality in the std, why hasn't it been done ?
>
> One of the reasons is that doing so requires standard library vendors to
> bring
> in a huge database of Unicode character properties.
>
> Another is that the Standard Library does not have to provide every
> functionality under the sun. There are other libraries that can be used to
> supplement functionality, like ICU.
>
Unicode is too big for the standard, but a 2D graphics library, *that's
fine*?! Where's the logic in that? You could probably do Unicode case
folding, normalization, and grapheme cluster iterators in the same compiled
storage it takes to make that graphics library work. Cairo can do what the
2D graphics library does, so why do we need one?
I don't mind the argument that Unicode is too much for the standard, or the
argument that there are other libraries that can supplement functionality.
But the fact is, the standard committee doesn't seem to agree with that
rationale.
2D graphics is happening because someone on the committee decided that it
needed to happen and got domain experts working on it. Unicode is not
happening because nobody with both clout and ability has really stepped
forward to *make* it happen.
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/a69b663e-330f-4aa2-a9fc-9c14ddae504a%40isocpp.org.
------=_Part_3160_345655200.1498696318162
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><br><br>On Wednesday, June 28, 2017 at 7:09:29 PM UTC-4, T=
hiago Macieira wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;m=
argin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">On Wednes=
day, 28 June 2017 15:48:55 PDT <a href=3D"javascript:" target=3D"_blank" gd=
f-obfuscated-mailto=3D"UhHGYIhVAAAJ" rel=3D"nofollow" onmousedown=3D"this.h=
ref=3D'javascript:';return true;" onclick=3D"this.href=3D'javas=
cript:';return true;">george.h...@gmail.com</a> wrote:
<br>> I assume there were 1001 proposals to implement some string manipu=
lation
<br>> functionality in the std, why hasn't it been done ?
<br>
<br>One of the reasons is that doing so requires standard library vendors t=
o bring=20
<br>in a huge database of Unicode character properties.
<br>
<br>Another is that the Standard Library does not have to provide every=20
<br>functionality under the sun. There are other libraries that can be used=
to=20
<br>supplement functionality, like ICU.<br></blockquote><div><br>Unicode is=
too big for the standard, but a 2D graphics library, <i>that's fine</i=
>?! Where's the logic in that? You could probably do Unicode case foldi=
ng, normalization, and grapheme cluster iterators in the same compiled stor=
age it takes to make that graphics library work. Cairo can do what the 2D g=
raphics library does, so why do we need one?<br><br>I don't mind the ar=
gument that Unicode is too much for the standard, or the argument that ther=
e are other libraries that can supplement functionality. But the fact is, t=
he standard committee doesn't seem to agree with that rationale.<br><br=
>2D graphics is happening because someone on the committee decided that it =
needed to happen and got domain experts working on it. Unicode is not happe=
ning because nobody with both clout and ability has really stepped forward =
to <i>make</i> it happen.<br></div></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/a69b663e-330f-4aa2-a9fc-9c14ddae504a%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/a69b663e-330f-4aa2-a9fc-9c14ddae504a=
%40isocpp.org</a>.<br />
------=_Part_3160_345655200.1498696318162--
------=_Part_3159_527084883.1498696318162--
.
Author: Nicol Bolas <jmckesson@gmail.com>
Date: Wed, 28 Jun 2017 17:47:42 -0700 (PDT)
Raw View
------=_Part_2804_553728615.1498697262415
Content-Type: multipart/alternative;
boundary="----=_Part_2805_890549700.1498697262415"
------=_Part_2805_890549700.1498697262415
Content-Type: text/plain; charset="UTF-8"
On Wednesday, June 28, 2017 at 6:48:56 PM UTC-4, george.h...@gmail.com
wrote:
>
> I assume the biggest argument against more diverse string manipulation
> functions would be that strings are a container and, as such, most common
> string related algorithms can be implemented with the algorithms library.
>
> I feel like the biggest hurdle by far when presenting C++ to a new person
> is the fact that I have to come up with an excuse for why the language
> doesn't have basic string manipulation that they'd expect it to have and
> tell them the alternative is to either use boost (and that opens up another
> can of worms) or write their own. I also bet that there are a metric ton of
> bugs out there that come as a result of poorly implemented string
> processing functions (e.g. implementations of to lower case that aren't
> compatible with unicode). I feel like having something like:
>
> void std::to_lowercase(std::string& str);
>
> void std::to_uppercase(std::string& str);
>
> void std::string_replace(std::string& str, std::string pattern,
> std::string replacement);
>
> void std::trim_right(std::string& str, std::string pattern = " ");
>
> void std::trim(std::string& str, std::string pattern = " ");
>
> void std::trim_left(std::string& str, std::string pattern = " ");
>
> bool std::string_contains(std::string& str, std::string str); //(arguably
> this isn't need when we have find, but I still have to read the
> documentation to understand how to use find, in this case the func
> signature is enough to understand how to use it)
>
Ignoring the question of how to implement them, these are not algorithms.
As you have written them, there is no reason not to make them member
functions of `std::string`. They don't even work with `basic_string`s of
non-`char` types, let along something like `string_view` or any other
string container/range.
The whole point of STL algorithms is that they are *uncoupled* from the
container on which they operate. Your "algorithms" are specific to the
container. Those aren't algorithms; they're just namespace-scoped functions.
.... There's probably a dozen more necessary algorithm I haven't thought of.
> I think they would be all easy to implement, to the point where anyone in
> one of the wgs could spew a paper on the matter in a couple of minutes.
>
Rule #1 about Unicode: If you think you know Unicode, then you do not know
enough about Unicode to make that determination.
Corollary: If you think something in Unciode would be "easy to implement",
you do not know enough about Unicode to make that determination.
Implementing a Unicode-aware `to_lowercase` is not a trivial matter. The
Unicode case conversion algorithm requires not just a large data table, but
some very oddball computations, as well as merging sequences of characters
(which itself suggests normalization).
>
I assume there were 1001 proposals to implement some string manipulation
> functionality in the std, why hasn't it been done ?
>
If you really want people to take your position seriously, then you should
be willing to do some research. All of the proposals are public information
<http://www.open-std.org/JTC1/SC22/WG21/docs/papers/>, so you don't have to
"assume" anything.
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/25d8a3a1-1712-4bc1-9a3a-c7fd2b67b119%40isocpp.org.
------=_Part_2805_890549700.1498697262415
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">On Wednesday, June 28, 2017 at 6:48:56 PM UTC-4, george.h.=
...@gmail.com wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;mar=
gin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D=
"ltr">I assume the biggest argument against more diverse string manipulatio=
n functions would be that strings are a container and, as such, most common=
string related algorithms can be implemented with the algorithms library.<=
div><br></div><div>I feel like the biggest hurdle by far when presenting C+=
+ to a new person is the fact that I have to come up with an excuse for why=
the language doesn't have basic string manipulation that they'd ex=
pect it to have and tell them the alternative is to either use boost (and t=
hat opens up another can of worms) or write their own. I also bet that ther=
e are a metric ton of bugs out there that come as a result of poorly implem=
ented string processing functions (e.g. implementations of to lower case th=
at aren't compatible with unicode). I feel like having something like:<=
/div><div><br></div><div>void std::to_lowercase(std::string& str);</div=
><div><br></div><div>void std::to_uppercase(std::string& str);</div><di=
v><br></div><div>void std::string_replace(std::<wbr>string& str, std::s=
tring pattern, std::string replacement);</div><div><br></div><div>void std:=
:trim_right(std::string& str, std::string pattern =3D " ");<b=
r></div><div><br></div><div>void std::trim(std::string& str, std::strin=
g pattern =3D " ");<br></div><div><br></div><div>void std::trim_l=
eft(std::string& str, std::string pattern =3D " ");<br></div>=
<div><br></div><div>bool std::string_contains(std::<wbr>string& str, st=
d::string str); //(arguably this isn't need when we have find, but I st=
ill have to read the documentation to understand how to use find, in this c=
ase the func signature is enough to understand how to use it)</div></div></=
blockquote><div><br>Ignoring the question of how to implement them, these a=
re not algorithms. As you have written them, there is no reason not to make=
them member functions of `std::string`. They don't even work with `bas=
ic_string`s of non-`char` types, let along something like `string_view` or =
any other string container/range.<br><br>The whole point of STL algorithms =
is that they are <i>uncoupled</i> from the container on which they operate.=
Your "algorithms" are specific to the container. Those aren'=
t algorithms; they're just namespace-scoped functions.<br><br></div><bl=
ockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border=
-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr"><div></div><div>=
.... There's probably a dozen more necessary algorithm I haven't tho=
ught of. I think they would be all easy to implement, to the point where a=
nyone in one of the wgs could spew a paper on the matter in a couple of min=
utes.</div></div></blockquote><div><br>Rule #1 about Unicode: If you think =
you know Unicode, then you do not know enough about Unicode to make that de=
termination.<br><br>Corollary: If you think something in Unciode would be &=
quot;easy to implement", you do not know enough about Unicode to make =
that determination.<br><br>Implementing a Unicode-aware `to_lowercase` is n=
ot a trivial matter. The Unicode case conversion algorithm requires not jus=
t a large data table, but some very oddball computations, as well as mergin=
g sequences of characters (which itself suggests normalization).<br></div><=
blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;bord=
er-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr"></div></blockq=
uote><div><br></div><blockquote class=3D"gmail_quote" style=3D"margin: 0;ma=
rgin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=
=3D"ltr"><div>I assume there were 1001 proposals to implement some string m=
anipulation functionality in the std, why hasn't it been done ?</div></=
div></blockquote><div><br>If you really want people to take your position s=
eriously, then you should be willing to do some research. <a href=3D"http:/=
/www.open-std.org/JTC1/SC22/WG21/docs/papers/">All of the proposals are pub=
lic information</a>, so you don't have to "assume" anything.<=
br></div></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/25d8a3a1-1712-4bc1-9a3a-c7fd2b67b119%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/25d8a3a1-1712-4bc1-9a3a-c7fd2b67b119=
%40isocpp.org</a>.<br />
------=_Part_2805_890549700.1498697262415--
------=_Part_2804_553728615.1498697262415--
.
Author: Thiago Macieira <thiago@macieira.org>
Date: Thu, 29 Jun 2017 00:17:44 -0700
Raw View
On Wednesday, 28 June 2017 17:31:58 PDT Nicol Bolas wrote:
> On Wednesday, June 28, 2017 at 7:09:29 PM UTC-4, Thiago Macieira wrote:
> > On Wednesday, 28 June 2017 15:48:55 PDT george.h...@gmail.com
> >
> > <javascript:> wrote:
> > > I assume there were 1001 proposals to implement some string manipulation
> > > functionality in the std, why hasn't it been done ?
> >
> > One of the reasons is that doing so requires standard library vendors to
> > bring
> > in a huge database of Unicode character properties.
> >
> > Another is that the Standard Library does not have to provide every
> > functionality under the sun. There are other libraries that can be used to
> > supplement functionality, like ICU.
>
> Unicode is too big for the standard, but a 2D graphics library, *that's
> fine*?!
In my opinion? No.
I don't think we need a 2D graphics library in 2017 and I do think we need
Unicode.
> Where's the logic in that? You could probably do Unicode case
> folding, normalization, and grapheme cluster iterators in the same compiled
> storage it takes to make that graphics library work. Cairo can do what the
> 2D graphics library does, so why do we need one?
Probably. Cairo is bigger than qstring.o, which contains the code for the
transformations and the Unicode data for it.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/1832069.lyu62HVRZL%40tjmaciei-mobl1.
.
Author: george.h.personal@gmail.com
Date: Thu, 29 Jun 2017 00:34:59 -0700 (PDT)
Raw View
------=_Part_2047_778588781.1498721699587
Content-Type: multipart/alternative;
boundary="----=_Part_2048_1717250437.1498721699587"
------=_Part_2048_1717250437.1498721699587
Content-Type: text/plain; charset="UTF-8"
>
> Rule #1 about Unicode: If you think you know Unicode, then you do not know
> enough about Unicode to make that determination.
>
> Corollary: If you think something in Unciode would be "easy to implement",
> you do not know enough about Unicode to make that determination.
>
> Implementing a Unicode-aware `to_lowercase` is not a trivial matter. The
> Unicode case conversion algorithm requires not just a large data table, but
> some very oddball computations, as well as merging sequences of characters
> (which itself suggests normalization).
>
Ok, maybe some of the implementations wouldn't be trivial, at least not to
someone like myself. However, literally every other programming language
that sees heavy usage today except for C implements this functionality, so
I would assume that a C++ dev skilled enough to work on the standard should
be able to implement this.
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/be3a3a36-1d34-4e2b-8db5-c6b061e99063%40isocpp.org.
------=_Part_2048_1717250437.1498721699587
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><blockquote class=3D"gmail_quote" style=3D"margin: 0;margi=
n-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"l=
tr"><div>Rule #1 about Unicode: If you think you know Unicode, then you do =
not know enough about Unicode to make that determination.<br><br>Corollary:=
If you think something in Unciode would be "easy to implement", =
you do not know enough about Unicode to make that determination.<br><br>Imp=
lementing a Unicode-aware `to_lowercase` is not a trivial matter. The Unico=
de case conversion algorithm requires not just a large data table, but some=
very oddball computations, as well as merging sequences of characters (whi=
ch itself suggests normalization).</div></div></blockquote><div><br></div><=
div>=C2=A0</div><div>Ok, maybe some of the implementations wouldn't be =
trivial, at least not to someone like myself. However, literally every othe=
r programming language that sees heavy usage today except for C implements =
this functionality, so I would assume that a C++ dev skilled enough to work=
on the standard should be able to implement this.</div></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/be3a3a36-1d34-4e2b-8db5-c6b061e99063%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/be3a3a36-1d34-4e2b-8db5-c6b061e99063=
%40isocpp.org</a>.<br />
------=_Part_2048_1717250437.1498721699587--
------=_Part_2047_778588781.1498721699587--
.
Author: george.h.personal@gmail.com
Date: Thu, 29 Jun 2017 00:42:47 -0700 (PDT)
Raw View
------=_Part_3247_1720857533.1498722167914
Content-Type: multipart/alternative;
boundary="----=_Part_3248_982158706.1498722167914"
------=_Part_3248_982158706.1498722167914
Content-Type: text/plain; charset="UTF-8"
>
> If you really want people to take your position seriously, then you should
> be willing to do some research. All of the proposals are public
> information <http://www.open-std.org/JTC1/SC22/WG21/docs/papers/>, so you
> don't have to "assume" anything.
>
Also my fault for not searching through this. Look through the mailing list
from 2011 onward I see literally 3 proposals that could related to this.
Two of them are for split operations, one seems to be recently made by some
including the two of you and I assume it didn't make the cut for c++17, the
other like 4 years old and, again, since its not there I assume it didn't
make the cut. Another one is related to making different encoding of
strings more type safe when it comes to operators that currently work
between them, and, again, that didn't make the standard. The rest seems to
be more focused on string_view behavior or optimization.
So is the problem here the fact that nobody cares enough to implement this
? Or are std::string and friends so messily implemented that implementing
said algorithms on them would be nightmarish ?
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/5e8a6305-ef10-461b-97b8-bb46b33cfd52%40isocpp.org.
------=_Part_3248_982158706.1498722167914
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><blockquote class=3D"gmail_quote" style=3D"margin: 0;margi=
n-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"l=
tr"><div>If you really want people to take your position seriously, then yo=
u should be willing to do some research. <a href=3D"http://www.open-std.org=
/JTC1/SC22/WG21/docs/papers/" target=3D"_blank" rel=3D"nofollow" onmousedow=
n=3D"this.href=3D'http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.open-=
std.org%2FJTC1%2FSC22%2FWG21%2Fdocs%2Fpapers%2F\x26sa\x3dD\x26sntz\x3d1\x26=
usg\x3dAFQjCNFDdHOr5da7S44AF7V-1tm4k347OA';return true;" onclick=3D"thi=
s.href=3D'http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.open-std.org%=
2FJTC1%2FSC22%2FWG21%2Fdocs%2Fpapers%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dA=
FQjCNFDdHOr5da7S44AF7V-1tm4k347OA';return true;">All of the proposals a=
re public information</a>, so you don't have to "assume" anyt=
hing.<br></div></div></blockquote><div><br></div><div>Also my fault for not=
searching through this. Look through the mailing list from 2011 onward I s=
ee literally 3 proposals that could related to this. Two of them are for sp=
lit operations, one seems to be recently made by some including the two of =
you and I assume it didn't make the cut for c++17, the other like 4 yea=
rs old and, again, since its not there I assume it didn't make the cut.=
Another one is related to making different encoding of strings more type s=
afe when it comes to operators that currently work between them, and, again=
, that didn't make the standard. The rest seems to be more focused on s=
tring_view behavior or optimization.</div><div><br></div><div>So is the pro=
blem here the fact that nobody cares enough to implement this ? Or are std:=
:string and friends so messily implemented that implementing said algorithm=
s on them would be nightmarish =C2=A0?</div></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/5e8a6305-ef10-461b-97b8-bb46b33cfd52%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/5e8a6305-ef10-461b-97b8-bb46b33cfd52=
%40isocpp.org</a>.<br />
------=_Part_3248_982158706.1498722167914--
------=_Part_3247_1720857533.1498722167914--
.
Author: zamazan4ik@gmail.com
Date: Mon, 3 Jul 2017 21:09:47 -0700 (PDT)
Raw View
------=_Part_2185_1579200694.1499141387377
Content-Type: multipart/alternative;
boundary="----=_Part_2186_1047685316.1499141387377"
------=_Part_2186_1047685316.1499141387377
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Yeah, it's well-known problem in C++. There are different problems:
1) Unicode problem (You read about it above)
2) STL is too conservative for new algorithms (i don't like it)
3) I think these functions shouldn't be methods inside std::string. It=20
should be free function + Unified call syntax. Because if i want to=20
implement my own ownNames::string, i want to use it simply.
4) Try to use Boost.StringAlgo - in this library you will find a lot of=20
algorithms for working with strings.
=D1=87=D0=B5=D1=82=D0=B2=D0=B5=D1=80=D0=B3, 29 =D0=B8=D1=8E=D0=BD=D1=8F 201=
7 =D0=B3., 1:48:56 UTC+3 =D0=BF=D0=BE=D0=BB=D1=8C=D0=B7=D0=BE=D0=B2=D0=B0=
=D1=82=D0=B5=D0=BB=D1=8C george.h...@gmail.com=20
=D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB:
>
> I assume the biggest argument against more diverse string manipulation=20
> functions would be that strings are a container and, as such, most common=
=20
> string related algorithms can be implemented with the algorithms library.
>
> I feel like the biggest hurdle by far when presenting C++ to a new person=
=20
> is the fact that I have to come up with an excuse for why the language=20
> doesn't have basic string manipulation that they'd expect it to have and=
=20
> tell them the alternative is to either use boost (and that opens up anoth=
er=20
> can of worms) or write their own. I also bet that there are a metric ton =
of=20
> bugs out there that come as a result of poorly implemented string=20
> processing functions (e.g. implementations of to lower case that aren't=
=20
> compatible with unicode). I feel like having something like:
>
> void std::to_lowercase(std::string& str);
>
> void std::to_uppercase(std::string& str);
>
> void std::string_replace(std::string& str, std::string pattern,=20
> std::string replacement);
>
> void std::trim_right(std::string& str, std::string pattern =3D " ");
>
> void std::trim(std::string& str, std::string pattern =3D " ");
>
> void std::trim_left(std::string& str, std::string pattern =3D " ");
>
> bool std::string_contains(std::string& str, std::string str); //(arguably=
=20
> this isn't need when we have find, but I still have to read the=20
> documentation to understand how to use find, in this case the func=20
> signature is enough to understand how to use it)
>
> ... There's probably a dozen more necessary algorithm I haven't thought=
=20
> of. I think they would be all easy to implement, to the point where anyon=
e=20
> in one of the wgs could spew a paper on the matter in a couple of minutes=
..=20
> The boost library already has some of these algorithms (though some are=
=20
> surprisingly slow), so implementation could even be more or less copied=
=20
> from there.
>
> Functions like std::min and std::max exist, and they are quite basic, so =
I=20
> see no point in not having string manipulation functions just because the=
y=20
> are simple in implementation. As for them being less generic than using=
=20
> std::algorithm... that is true, but so is using std::string instead of=20
> using std::vector<rune_sized_type>, in the end I think dropping some=20
> esoteric purity in design principle to make the language way more beginne=
r=20
> friendly and avoid an insane amount of code duplication would be for the=
=20
> greater good.
>
> That being my view on the matter, my questions is, why is this view wrong=
=20
> ? I assume there were 1001 proposals to implement some string manipulatio=
n=20
> functionality in the std, why hasn't it been done ?
>
--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/7ae413bd-2f04-4cd7-842b-bf47b0069b35%40isocpp.or=
g.
------=_Part_2186_1047685316.1499141387377
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">Yeah, it's well-known problem in C++. There are differ=
ent problems:<br>1) Unicode problem (You read about it above)<div>2) STL is=
too conservative for new algorithms (i don't like it)</div><div>3) I t=
hink these functions shouldn't be methods inside std::string. It should=
be free function + Unified call syntax. Because if i want to implement my =
own ownNames::string, i want to use it simply.</div><div>4) Try to use Boos=
t.StringAlgo - in this library you will find a lot of algorithms for workin=
g with strings.<br><br>=D1=87=D0=B5=D1=82=D0=B2=D0=B5=D1=80=D0=B3, 29 =D0=
=B8=D1=8E=D0=BD=D1=8F 2017 =D0=B3., 1:48:56 UTC+3 =D0=BF=D0=BE=D0=BB=D1=8C=
=D0=B7=D0=BE=D0=B2=D0=B0=D1=82=D0=B5=D0=BB=D1=8C george.h...@gmail.com =D0=
=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB:<blockquote class=3D"gmail_quote" s=
tyle=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-le=
ft: 1ex;"><div dir=3D"ltr">I assume the biggest argument against more diver=
se string manipulation functions would be that strings are a container and,=
as such, most common string related algorithms can be implemented with the=
algorithms library.<div><br></div><div>I feel like the biggest hurdle by f=
ar when presenting C++ to a new person is the fact that I have to come up w=
ith an excuse for why the language doesn't have basic string manipulati=
on that they'd expect it to have and tell them the alternative is to ei=
ther use boost (and that opens up another can of worms) or write their own.=
I also bet that there are a metric ton of bugs out there that come as a re=
sult of poorly implemented string processing functions (e.g. implementation=
s of to lower case that aren't compatible with unicode). I feel like ha=
ving something like:</div><div><br></div><div>void std::to_lowercase(std::s=
tring& str);</div><div><br></div><div>void std::to_uppercase(std::strin=
g& str);</div><div><br></div><div>void std::string_replace(std::<wbr>st=
ring& str, std::string pattern, std::string replacement);</div><div><br=
></div><div>void std::trim_right(std::string& str, std::string pattern =
=3D " ");<br></div><div><br></div><div>void std::trim(std::string=
& str, std::string pattern =3D " ");<br></div><div><br></div>=
<div>void std::trim_left(std::string& str, std::string pattern =3D &quo=
t; ");<br></div><div><br></div><div>bool std::string_contains(std::<wb=
r>string& str, std::string str); //(arguably this isn't need when w=
e have find, but I still have to read the documentation to understand how t=
o use find, in this case the func signature is enough to understand how to =
use it)</div><div><br></div><div>... There's probably a dozen more nece=
ssary algorithm I haven't thought of. I think they would be all easy to=
implement, to the point where anyone in one of the wgs could spew a paper =
on the matter in a couple of minutes. The boost library already has some of=
these algorithms (though some are surprisingly slow), so implementation co=
uld even be more or less copied from there.</div><div><br></div><div>Functi=
ons like std::min and std::max exist, and they are quite basic, so I see no=
point in not having string manipulation functions just because they are si=
mple in implementation. As for them being less generic than using std::algo=
rithm... that is true, but so is using std::string instead of using std::ve=
ctor<rune_sized_type>, in the end I think dropping some esoteric puri=
ty in design principle to make the language way more beginner friendly and =
avoid an insane amount of code duplication would be for the greater good.</=
div><div><br></div><div>That being my view on the matter, my questions is, =
why is this view wrong ? I assume there were 1001 proposals to implement so=
me string manipulation functionality in the std, why hasn't it been don=
e ?</div></div></blockquote></div></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/7ae413bd-2f04-4cd7-842b-bf47b0069b35%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/7ae413bd-2f04-4cd7-842b-bf47b0069b35=
%40isocpp.org</a>.<br />
------=_Part_2186_1047685316.1499141387377--
------=_Part_2185_1579200694.1499141387377--
.