Topic: std::match_results / smatch container version
Author: Daniel Gutson <danielgutson@gmail.com>
Date: Fri, 20 Jul 2018 15:57:17 -0300
Raw View
--0000000000002a6245057172de74
Content-Type: text/plain; charset="UTF-8"
I'm tired of seeing people wrongly assuming that smatch and friends are
actual containers and getting heap corruption becaise the original source
was destroyed.
I would like to survey what people thinks of a new standalone match_results
equivalent (or at least smatch) containing copies of the matched substrings.
E.g. std::match_results_container
I can tell that the current semantic results counterintuitive for beginners
of the regex library.
Comments?
Daniel.
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CAFdMc-1Y2SAO0br6gKQn0EbLho64LGw189TnT152wZ1Pt3N27w%40mail.gmail.com.
--0000000000002a6245057172de74
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"auto">I'm tired of seeing people wrongly assuming that smat=
ch and friends are actual containers and getting heap corruption becaise th=
e original source was destroyed.<div dir=3D"auto"><br></div><div dir=3D"aut=
o">I would like to survey what people thinks of a new standalone match_resu=
lts equivalent (or at least smatch) containing copies of the matched substr=
ings.</div><div dir=3D"auto">E.g. std::match_results_container</div><div di=
r=3D"auto">I can tell that the current semantic results counterintuitive fo=
r beginners of the regex library.</div><div dir=3D"auto"><br></div><div dir=
=3D"auto">Comments?</div><div dir=3D"auto"><br></div><div dir=3D"auto">=C2=
=A0 =C2=A0 Daniel.</div></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/CAFdMc-1Y2SAO0br6gKQn0EbLho64LGw189Tn=
T152wZ1Pt3N27w%40mail.gmail.com?utm_medium=3Demail&utm_source=3Dfooter">htt=
ps://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CAFdMc-1Y2SAO0br6=
gKQn0EbLho64LGw189TnT152wZ1Pt3N27w%40mail.gmail.com</a>.<br />
--0000000000002a6245057172de74--
.
Author: Nicol Bolas <jmckesson@gmail.com>
Date: Fri, 20 Jul 2018 14:26:22 -0700 (PDT)
Raw View
------=_Part_6424_1835990827.1532121982583
Content-Type: multipart/alternative;
boundary="----=_Part_6425_1104036459.1532121982583"
------=_Part_6425_1104036459.1532121982583
Content-Type: text/plain; charset="UTF-8"
First, `match_results` *is* a container. A container of `sub_match`es.
They're not a container of `string`s or whatever.
Second, `sub_match` is really just a `string_view` before those existed, so
if we were to take a second look at regex, that's what the match results
should be based on.
Personally, I find the regex library to be problematic as it is currently
design. It is simultaneously too low level for new users and too high level
for experienced users. A regex library appropriate for new users would
return strings as matches, not references to the matches in the sequence.
And a regex library appropriate for experienced users wouldn't force you to
use their own container to hold matches (or any container).
On the one hand, I would say that the regex library's interface needs to be
rethought, with a clear delineation between the lowest-level regular
expression system (something from which `regex_iterator` could be built. At
present, you can't actually do that, due to not being able to communicate
how far back the next match can look) and the higher-level systems like
`regex_replace` (which is bound to `std::string` and the like).
But on the other hand, I'm not sure it matters from a practical
perspective. From what I understand, standard library implementations of
regex are just... terrible in terms of performance. Boost.Regex is so much
faster despite having a very similar interface. So even if we could fix the
interface problem, would low level developers ever actually trust it to be
performant?
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/3728c32d-ca23-4cf2-bd32-cbc821723a3b%40isocpp.org.
------=_Part_6425_1104036459.1532121982583
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><div>First, `match_results` <i>is</i> a container. A conta=
iner of `sub_match`es. They're not a container of `string`s or whatever=
..<br></div><div><br></div><div>Second, `sub_match` is really just a `string=
_view` before those existed, so if we were to take a second look at regex, =
that's what the match results should be based on.</div><div><br></div><=
div>Personally, I find the regex library to be problematic as it is current=
ly design. It is simultaneously too low level for new users and too high le=
vel for experienced users. A regex library appropriate for new users would =
return strings as matches, not references to the matches in the sequence. A=
nd a regex library appropriate for experienced users wouldn't force you=
to use their own container to hold matches (or any container).</div><div><=
br></div><div>On the one hand, I would say that the regex library's int=
erface needs to be rethought, with a clear delineation between the lowest-l=
evel regular expression system (something from which `regex_iterator` could=
be built. At present, you can't actually do that, due to not being abl=
e to communicate how far back the next match can look) and the higher-level=
systems like `regex_replace` (which is bound to `std::string` and the like=
).</div><div><br></div><div>But on the other hand, I'm not sure it matt=
ers from a practical perspective. From what I understand, standard library =
implementations of regex are just... terrible in terms of performance. Boos=
t.Regex is so much faster despite having a very similar interface. So even =
if we could fix the interface problem, would low level developers ever actu=
ally trust it to be performant?<br></div></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/3728c32d-ca23-4cf2-bd32-cbc821723a3b%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/3728c32d-ca23-4cf2-bd32-cbc821723a3b=
%40isocpp.org</a>.<br />
------=_Part_6425_1104036459.1532121982583--
------=_Part_6424_1835990827.1532121982583--
.
Author: Daniel Gutson <danielgutson@gmail.com>
Date: Fri, 20 Jul 2018 20:00:46 -0300
Raw View
--000000000000ca1ece0571764490
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
On Fri, Jul 20, 2018 at 6:26 PM Nicol Bolas <jmckesson@gmail.com> wrote:
> First, `match_results` *is* a container. A container of `sub_match`es.
> They're not a container of `string`s or whatever.
>
:) If you were a close friend, I would call this a pedestalling obvious
clarification based on abusive exactness of my initial post. But, I just
say, "sure, agree!".
Sorry, I just had to say something :)
>
> Second, `sub_match` is really just a `string_view` before those existed,
> so if we were to take a second look at regex, that's what the match resul=
ts
> should be based on.
>
I will rephrase my question more accurately just in case: is there a
consensus that it is worth to add a class that contains copies of the
matched substrings besides the container of `sub_match`es, so people can
choose to use either one?
BTW, I like the idea the sub_match is a string view. Or, be something
related to the CoW being concurrently discussed in another thread of this
mailing list.
In any case, thanks for providing your opinion.
Daniel.
>
> Personally, I find the regex library to be problematic as it is currently
> design. It is simultaneously too low level for new users and too high lev=
el
> for experienced users. A regex library appropriate for new users would
> return strings as matches, not references to the matches in the sequence.
> And a regex library appropriate for experienced users wouldn't force you =
to
> use their own container to hold matches (or any container).
>
> On the one hand, I would say that the regex library's interface needs to
> be rethought, with a clear delineation between the lowest-level regular
> expression system (something from which `regex_iterator` could be built. =
At
> present, you can't actually do that, due to not being able to communicate
> how far back the next match can look) and the higher-level systems like
> `regex_replace` (which is bound to `std::string` and the like).
>
> But on the other hand, I'm not sure it matters from a practical
> perspective. From what I understand, standard library implementations of
> regex are just... terrible in terms of performance. Boost.Regex is so muc=
h
> faster despite having a very similar interface. So even if we could fix t=
he
> interface problem, would low level developers ever actually trust it to b=
e
> performant?
>
> --
> You received this message because you are subscribed to the Google Groups
> "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> To view this discussion on the web visit
> https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/3728c32d-ca2=
3-4cf2-bd32-cbc821723a3b%40isocpp.org
> <https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/3728c32d-ca=
23-4cf2-bd32-cbc821723a3b%40isocpp.org?utm_medium=3Demail&utm_source=3Dfoot=
er>
> .
>
--=20
Who=E2=80=99s got the sweetest disposition?
One guess, that=E2=80=99s who?
Who=E2=80=99d never, ever start an argument?
Who never shows a bit of temperament?
Who's never wrong but always right?
Who'd never dream of starting a fight?
Who get stuck with all the bad luck?
--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/CAFdMc-2xP69tYLtiWBLQnBopYknUHW0QSfT-dOka7x7nepB=
bmg%40mail.gmail.com.
--000000000000ca1ece0571764490
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><br><br><div class=3D"gmail_quote"><div dir=3D"ltr">On Fri=
, Jul 20, 2018 at 6:26 PM Nicol Bolas <<a href=3D"mailto:jmckesson@gmail=
..com">jmckesson@gmail.com</a>> wrote:<br></div><blockquote class=3D"gmai=
l_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left=
:1ex"><div dir=3D"ltr"><div>First, `match_results` <i>is</i> a container. A=
container of `sub_match`es. They're not a container of `string`s or wh=
atever.<br></div></div></blockquote><div><br></div><div>:) If you were a cl=
ose friend, I would call this a pedestalling obvious clarification based on=
abusive exactness of my initial post.=C2=A0 But, I just say, "sure, a=
gree!".</div><div>Sorry, I just had to say something :)</div><div>=C2=
=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;borde=
r-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div></div><div><b=
r></div><div>Second, `sub_match` is really just a `string_view` before thos=
e existed, so if we were to take a second look at regex, that's what th=
e match results should be based on.</div></div></blockquote><div><br></div>=
<div>I will rephrase my question more accurately just in case: is there a c=
onsensus that it is worth to add a class that contains copies of the matche=
d substrings besides the container of=C2=A0<span style=3D"font-size:small;b=
ackground-color:rgb(255,255,255);text-decoration-style:initial;text-decorat=
ion-color:initial;float:none;display:inline">`</span>sub_match`es, so peopl=
e can choose to use either one?</div><div>BTW, I like the idea the sub_matc=
h is a string view. Or, be something related to the CoW being concurrently =
discussed in another thread of this mailing list.</div><div><br></div><div>=
In any case, thanks for providing your opinion.</div><div><br></div><div>=
=C2=A0 =C2=A0 Daniel.</div><div><br></div><div>=C2=A0</div><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pad=
ding-left:1ex"><div dir=3D"ltr"><div><br></div><div>Personally, I find the =
regex library to be problematic as it is currently design. It is simultaneo=
usly too low level for new users and too high level for experienced users. =
A regex library appropriate for new users would return strings as matches, =
not references to the matches in the sequence. And a regex library appropri=
ate for experienced users wouldn't force you to use their own container=
to hold matches (or any container).</div><div><br></div><div>On the one ha=
nd, I would say that the regex library's interface needs to be rethough=
t, with a clear delineation between the lowest-level regular expression sys=
tem (something from which `regex_iterator` could be built. At present, you =
can't actually do that, due to not being able to communicate how far ba=
ck the next match can look) and the higher-level systems like `regex_replac=
e` (which is bound to `std::string` and the like).</div><div><br></div><div=
>But on the other hand, I'm not sure it matters from a practical perspe=
ctive. From what I understand, standard library implementations of regex ar=
e just... terrible in terms of performance. Boost.Regex is so much faster d=
espite having a very similar interface. So even if we could fix the interfa=
ce problem, would low level developers ever actually trust it to be perform=
ant?<br></div></div>
<p></p>
-- <br>
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br>
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org" target=3D"_=
blank">std-proposals+unsubscribe@isocpp.org</a>.<br>
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org" target=3D"_blank">std-proposals@isocpp.org</a>.<br>
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/3728c32d-ca23-4cf2-bd32-cbc821723a3b%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter" target=3D"_blank">=
https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/3728c32d-ca23-=
4cf2-bd32-cbc821723a3b%40isocpp.org</a>.<br>
</blockquote></div><br clear=3D"all"><div><br></div>-- <br><div dir=3D"ltr"=
class=3D"gmail_signature" data-smartmail=3D"gmail_signature">Who=E2=80=99s=
got the sweetest disposition?<br>One guess, that=E2=80=99s who?<br>Who=E2=
=80=99d never, ever start an argument?<br>Who never shows a bit of temperam=
ent?<br>Who's never wrong but always right?<br>Who'd never dream of=
starting a fight?<br>Who get stuck with all the bad luck? </div></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/CAFdMc-2xP69tYLtiWBLQnBopYknUHW0QSfT-=
dOka7x7nepBbmg%40mail.gmail.com?utm_medium=3Demail&utm_source=3Dfooter">htt=
ps://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CAFdMc-2xP69tYLti=
WBLQnBopYknUHW0QSfT-dOka7x7nepBbmg%40mail.gmail.com</a>.<br />
--000000000000ca1ece0571764490--
.
Author: Nicol Bolas <jmckesson@gmail.com>
Date: Fri, 20 Jul 2018 18:18:26 -0700 (PDT)
Raw View
------=_Part_6713_1305405681.1532135906231
Content-Type: multipart/alternative;
boundary="----=_Part_6714_818362467.1532135906231"
------=_Part_6714_818362467.1532135906231
Content-Type: text/plain; charset="UTF-8"
On Friday, July 20, 2018 at 7:01:00 PM UTC-4, Daniel Gutson wrote:
>
> On Fri, Jul 20, 2018 at 6:26 PM Nicol Bolas <jmck...@gmail.com
> <javascript:>> wrote:
>
>> First, `match_results` *is* a container. A container of `sub_match`es.
>> They're not a container of `string`s or whatever.
>>
>
> :) If you were a close friend, I would call this a pedestalling obvious
> clarification based on abusive exactness of my initial post. But, I just
> say, "sure, agree!".
> Sorry, I just had to say something :)
>
>
>>
>> Second, `sub_match` is really just a `string_view` before those existed,
>> so if we were to take a second look at regex, that's what the match results
>> should be based on.
>>
>
> I will rephrase my question more accurately just in case: is there a
> consensus that it is worth to add a class that contains copies of the
> matched substrings besides the container of `sub_match`es, so people can
> choose to use either one?
> BTW, I like the idea the sub_match is a string view. Or, be something
> related to the CoW being concurrently discussed in another thread of this
> mailing list.
>
> In any case, thanks for providing your opinion.
>
I think this is really "yet another part" of the general lifetime problem
C++ has. We don't want to have a bunch of different APIs, one for copying,
one for not copying, etc. We *want* to be able to have a container of
references into a temporary like that, I think. The only reason it doesn't
work is because the lifetime of the match container isn't properly
connected to the lifetime of the temporary string being searched.
Solve that problem, and you basically solve the issue.
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/bc4c3f4f-7cb1-4f88-8fe4-97b2cdeab09f%40isocpp.org.
------=_Part_6714_818362467.1532135906231
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">On Friday, July 20, 2018 at 7:01:00 PM UTC-4, Daniel Gutso=
n wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0=
..8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr"><div =
class=3D"gmail_quote"><div dir=3D"ltr">On Fri, Jul 20, 2018 at 6:26 PM Nico=
l Bolas <<a href=3D"javascript:" target=3D"_blank" gdf-obfuscated-mailto=
=3D"UlfBQo2ZCwAJ" rel=3D"nofollow" onmousedown=3D"this.href=3D'javascri=
pt:';return true;" onclick=3D"this.href=3D'javascript:';return =
true;">jmck...@gmail.com</a>> wrote:<br></div><blockquote class=3D"gmail=
_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:=
1ex"><div dir=3D"ltr"><div>First, `match_results` <i>is</i> a container. A =
container of `sub_match`es. They're not a container of `string`s or wha=
tever.<br></div></div></blockquote><div><br></div><div>:) If you were a clo=
se friend, I would call this a pedestalling obvious clarification based on =
abusive exactness of my initial post.=C2=A0 But, I just say, "sure, ag=
ree!".</div><div>Sorry, I just had to say something :)</div><div>=C2=
=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;borde=
r-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div></div><div><b=
r></div><div>Second, `sub_match` is really just a `string_view` before thos=
e existed, so if we were to take a second look at regex, that's what th=
e match results should be based on.</div></div></blockquote><div><br></div>=
<div>I will rephrase my question more accurately just in case: is there a c=
onsensus that it is worth to add a class that contains copies of the matche=
d substrings besides the container of=C2=A0<span style=3D"font-size:small;b=
ackground-color:rgb(255,255,255);float:none;display:inline">`</span>sub_mat=
ch`es, so people can choose to use either one?</div><div>BTW, I like the id=
ea the sub_match is a string view. Or, be something related to the CoW bein=
g concurrently discussed in another thread of this mailing list.</div><div>=
<br></div><div>In any case, thanks for providing your opinion.</div></div><=
/div></blockquote><div><br></div><div>I think this is really "yet anot=
her part" of the general lifetime problem C++ has. We don't want t=
o have a bunch of different APIs, one for copying, one for not copying, etc=
.. We <i>want</i> to be able to have a container of references into a tempor=
ary like that, I think. The only reason it doesn't work is because the =
lifetime of the match container isn't properly connected to the lifetim=
e of the temporary string being searched.</div><div><br></div><div>Solve th=
at problem, and you basically solve the issue.</div><br></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/bc4c3f4f-7cb1-4f88-8fe4-97b2cdeab09f%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/bc4c3f4f-7cb1-4f88-8fe4-97b2cdeab09f=
%40isocpp.org</a>.<br />
------=_Part_6714_818362467.1532135906231--
------=_Part_6713_1305405681.1532135906231--
.
Author: pecholt@gmail.com
Date: Sat, 21 Jul 2018 01:14:16 -0700 (PDT)
Raw View
------=_Part_7415_1558922628.1532160856344
Content-Type: text/plain; charset="UTF-8"
I think Richard Smith is currently working on lifetime extension issues. His [[lifetimebound]] proposal should solve this right?
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/34707b93-6aee-48f9-9d36-19286760ff41%40isocpp.org.
------=_Part_7415_1558922628.1532160856344--
.