Thread

Topic: Limitations in string_ref

Author: rick@longbowgames.com
Date: Tue, 22 Jan 2013 00:27:33 -0800 (PST) Raw View

------=_Part_2385_9034347.1358843253859
Content-Type: text/plain; charset=ISO-8859-1

So I was just perusing the string_ref paper, and please tell me if I'm
wrong, but it seems to have some relatively severe limitations:

   - The string must represented as a pointer/size pair.
   - The memory must be contiguous.
   - Each memory location must a discreet character.
   - string_refs of different character types are incompatible.

This really only makes it useful std::string, or clones of std::string,
which strikes me as odd, since the intent seems to be to create something
that can be used universally. Now, non-contiguous memory may be a somewhat
uncommon case, but what's not uncommon is UTF-8, which has been the
dominant character encoding (at least on the web, where we can measure it)
for at least three years, and this proposal seems to be completely
incompatible with it.

I think we need to ask two questions:

   - If this is only intended to make it easier to work with std::string
   clones, we should take a look at why std::string clones are being used at
   all. Would the need for a class like this be reduced by adding more
   functionality to std::string?
   - What would it take to make a truly universal string ref class?

I can think of a couple solutions to a universal string ref class, but they
definitely come with a cost. Both solutions require that the string ref be
designed around iterators/ranges rather than C strings, and therefore
eliminating any index-based operations (notably, find*).

One solution would be to define a virtual base class, then have a generic
derived class that handles the actual iteration. This could all be hidden
inside a string_ref class, but requires virtual lookups for each operation,
and a bit of code bloat.

A potentially faster solution would be to use type erasure, but this would
require a lot of function pointers for each operation.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



------=_Part_2385_9034347.1358843253859
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

So I was just perusing the string_ref paper, and please tell me if I'm wron=
g, but it seems to have some relatively severe limitations:<br><ul><li>The =
string must represented as a pointer/size pair.</li><li>The memory must be =
contiguous.</li><li>Each memory location must a discreet character.</li><li=
>string_refs of different character types are incompatible.</li></ul>This r=
eally only makes it useful std::string, or clones of std::string, which str=
ikes me as odd, since the intent seems to be to create something that can b=
e used universally. Now, non-contiguous memory may be a somewhat uncommon c=
ase, but what's not uncommon is UTF-8, which has been the dominant characte=
r encoding (at least on the web, where we can measure it) for at least thre=
e years, and this proposal seems to be completely incompatible with it.<br>=
<br>I think we need to ask two questions:<br><ul><li>If this is only intend=
ed to make it easier to work with std::string clones, we should take a look=
 at why std::string clones are being used at all. Would the need for a clas=
s like this be reduced by adding more functionality to std::string?</li><li=
>What would it take to make a truly universal string ref class?</li></ul>I =
can think of a couple solutions to a universal string ref class, but they d=
efinitely come with a cost. Both solutions require that the string ref be d=
esigned around iterators/ranges rather than C strings, and therefore elimin=
ating any index-based operations (notably, find*).<br><br>One solution woul=
d be to define a virtual base class, then have a generic derived class that=
 handles the actual iteration. This could all be hidden inside a string_ref=
 class, but requires virtual lookups for each operation, and a bit of code =
bloat.<br><br>A potentially faster solution would be to use type erasure, b=
ut this would require a lot of function pointers for each operation.<br>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_2385_9034347.1358843253859--

.

Author: Martinho Fernandes <martinho.fernandes@gmail.com>
Date: Tue, 22 Jan 2013 10:17:13 +0100 Raw View

On Tue, Jan 22, 2013 at 9:27 AM,  <rick@longbowgames.com> wrote:
> So I was just perusing the string_ref paper, and please tell me if I'm
> wrong, but it seems to have some relatively severe limitations:
>
> The string must represented as a pointer/size pair.
> The memory must be contiguous.

Without these limitations, it becomes just simpler to use a range (or
a pair of iterators, if ranges don't get on). It is my impression that
these are not really limitations, but are actually *the* use case for
string_ref. It is intended to provide a zero-cost common interface to
various common ways of passing strings in the wild: things like
std::string&, char*+size, std::vector<char>::iterator or
std::string::iterator pairs, other string implementations, etc.
string_ref is intended merely as a special case of array_ref for
characters. Note that the word "character" has the C++ meaning of
"character".

> Each memory location must a discreet character.

Since that is what happens in the use cases I don't see a problem with
it. I doubt many people put more than one character in the same memory
location or less than one character per memory location.

> string_refs of different character types are incompatible.

This might be a real limitation. I am not sure.

> This really only makes it useful std::string, or clones of std::string,
> which strikes me as odd, since the intent seems to be to create something
> that can be used universally. Now, non-contiguous memory may be a somewha=
t
> uncommon case, but what's not uncommon is UTF-8, which has been the domin=
ant
> character encoding (at least on the web, where we can measure it) for at
> least three years, and this proposal seems to be completely incompatible
> with it.

std::string accepts UTF-8 just fine. In fact, C++11 has redefined a
byte (and therefore char) to be "large enough to contain [...] the
eight-bit code units of the Unicode UTF-8 encoding form", so I see no
problem there.

> I think we need to ask two questions:
>
> If this is only intended to make it easier to work with std::string clone=
s,
> we should take a look at why std::string clones are being used at all. Wo=
uld
> the need for a class like this be reduced by adding more functionality to
> std::string?
> What would it take to make a truly universal string ref class?

I don't know, but I am not sure if the goal is worthy. If you want to
provide a truly universal interface, C++ already has templates.

> One solution would be to define a virtual base class, then have a generic
> derived class that handles the actual iteration. This could all be hidden
> inside a string_ref class, but requires virtual lookups for each operatio=
n,
> and a bit of code bloat.

Using subtype polymorphism *is* a form of type erasure.

> A potentially faster solution would be to use type erasure, but this woul=
d
> require a lot of function pointers for each operation.

Mit freundlichen Gr=FC=DFen,

Martinho

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/?hl=3Den.

.

Author: Nicol Bolas <jmckesson@gmail.com>
Date: Tue, 22 Jan 2013 01:46:16 -0800 (PST) Raw View

------=_Part_987_3468334.1358847976422
Content-Type: text/plain; charset=ISO-8859-1



On Tuesday, January 22, 2013 12:27:33 AM UTC-8, ri...@longbowgames.com
wrote:
>
> So I was just perusing the string_ref paper, and please tell me if I'm
> wrong, but it seems to have some relatively severe limitations:
>
>    - The string must represented as a pointer/size pair.
>    - The memory must be contiguous.
>    - Each memory location must a discreet character.
>    - string_refs of different character types are incompatible.
>
> This really only makes it useful std::string, or clones of std::string,
> which strikes me as odd, since the intent seems to be to create something
> that can be used universally. Now, non-contiguous memory may be a somewhat
> uncommon case, but what's not uncommon is UTF-8, which has been the
> dominant character encoding (at least on the web, where we can measure it)
> for at least three years, and this proposal seems to be completely
> incompatible with it.
>

In what way is it not compatible with UTF-8? The UTF-8 specification
doesn't state that the string needs to be contiguous, but the vast majority
of uses of UTF-8 strings *are *contiguously stored in memory. So there is
nothing incompatible with UTF-8.

Now, it doesn't cooperate with UTF-8, in that it doesn't know what a
code-point is, nor does it do transformation to codepoints or from
codepoints to code units. It can't find the next codepoint or skip to the
previous one. It can't search based on codepoints and the like.

But it *isn't supposed to*. Proper Unicode support of that caliber is for
another proposal and for another *class*. Basic_string_ref is no more and
no *less* compatible with UTF-8 than basic_string. If you could use
basic_string with UTF-8 (and people do. *Carefully*), then you can use
basic_string_ref with UTF-8. If your UTF-8 usage is relying on special
codepoint transformation iterators or something, then you're going to need
to look at some other type.

There's a lot of C code out there that takes/gives UTF-8 strings by `const
char*`+size. Your suggestion would effectively mean not being able to talk
to them, thus ironically making the code *less* compatible with UTF-8.

I think we need to ask two questions:
>
>    - If this is only intended to make it easier to work with std::string
>    clones, we should take a look at why std::string clones are being used at
>    all. Would the need for a class like this be reduced by adding more
>    functionality to std::string?
>
> "std::string clones" exist because people have written them. It's not
about how much functionality or lack thereof is in std::string. They exist.
Today. And even if std::string became some kind of magical any-string class
that could exactly mirror the behavior of any other string class in
existence (somehow), it would not suddenly cause all of those people to
stop using their home-grown versions.

Also, there are times when you don't want a general-purpose string class.
For example, I sometimes use a fixed-length 28-byte+hash string, for
resource identifiers. I find this string very useful, and comparisons are
exceptionally fast (since they're identifiers, they compare by taking 32 or
64-bit blocks of integer data and comparing them. I don't need
lexocographical ordering, just *some* ordering).

I *certainly* would not want to add cruft into std::basic_string to support
that *very specific* use case. But I would also want to be able to shove
these at some function via a basic_string_ref.

There will always be some special case need for strings that someone will
have. Trying to make basic_string service all of their needs is ridiculous
and doomed to failure. Uberstrings with template policies or whatever that
try to help everyone end up being so complex that they help nobody.

basic_string is very simple: a contiguously allocated array of characters.
basic_string_ref is equally simple: a constant reference to a contiguously
allocated array of characters.

>
>    - What would it take to make a truly universal string ref class?
>
> Define "universal". If "universal" means "can support non-contiguous
strings", then it just becomes a range, which is not what we want. We want
basic_string_ref to *discriminate*; it's not just a range of char-like
objects. It is specifically an *array* of char-like objects.

The most basic use case for `basic_string_ref` is to take one as an
argument and pass it to some C interface that takes a `const char *`
(possibly with a size). A generic range can't handle that, because a
generic range is not required to be a *contiguous* range.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



------=_Part_987_3468334.1358847976422
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<br><br>On Tuesday, January 22, 2013 12:27:33 AM UTC-8, ri...@longbowgames.=
com wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left:=
 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">So I was just perusi=
ng the string_ref paper, and please tell me if I'm wrong, but it seems to h=
ave some relatively severe limitations:<br><ul><li>The string must represen=
ted as a pointer/size pair.</li><li>The memory must be contiguous.</li><li>=
Each memory location must a discreet character.</li><li>string_refs of diff=
erent character types are incompatible.</li></ul></blockquote><blockquote c=
lass=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px=
 #ccc solid;padding-left: 1ex;">This really only makes it useful std::strin=
g, or clones of std::string, which strikes me as odd, since the intent seem=
s to be to create something that can be used universally. Now, non-contiguo=
us memory may be a somewhat uncommon case, but what's not uncommon is UTF-8=
, which has been the dominant character encoding (at least on the web, wher=
e we can measure it) for at least three years, and this proposal seems to b=
e completely incompatible with it.<br></blockquote><div><br>In what way is =
it not compatible with UTF-8? The UTF-8 specification doesn't state that th=
e string needs to be contiguous, but the vast majority of uses of UTF-8 str=
ings <i>are </i>contiguously stored in memory. So there is nothing incompat=
ible with UTF-8.<br><br>Now, it doesn't cooperate with UTF-8, in that it do=
esn't know what a code-point is, nor does it do transformation to codepoint=
s or from codepoints to code units. It can't find the next codepoint or ski=
p to the previous one. It can't search based on codepoints and the like.<br=
><br>But it <i>isn't supposed to</i>. Proper Unicode support of that calibe=
r is for another proposal and for another <i>class</i>. Basic_string_ref is=
 no more and no <i>less</i> compatible with UTF-8 than basic_string. If you=
 could use basic_string with UTF-8 (and people do. <i>Carefully</i>), then =
you can use basic_string_ref with UTF-8. If your UTF-8 usage is relying on =
special codepoint transformation iterators or something, then you're going =
to need to look at some other type.<br><br>There's a lot of C code out ther=
e that takes/gives UTF-8 strings by `const char*`+size. Your suggestion wou=
ld effectively mean not being able to talk to them, thus ironically making =
the code <i>less</i> compatible with UTF-8.<br><br></div><blockquote class=
=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #cc=
c solid;padding-left: 1ex;">I think we need to ask two questions:<br><ul><l=
i>If this is only intended to make it easier to work with std::string clone=
s, we should take a look at why std::string clones are being used at all. W=
ould the need for a class like this be reduced by adding more functionality=
 to std::string?</li></ul></blockquote><div>"std::string clones" exist beca=
use people have written them. It's not about how much functionality or lack=
 thereof is in std::string. They exist. Today. And even if std::string beca=
me some kind of magical any-string class that could exactly mirror the beha=
vior of any other string class in existence (somehow), it would not suddenl=
y cause all of those people to stop using their home-grown versions.<br><br=
>Also, there are times when you don't want a general-purpose string class. =
For example, I sometimes use a fixed-length 28-byte+hash string, for resour=
ce identifiers. I find this string very useful, and comparisons are excepti=
onally fast (since they're identifiers, they compare by taking 32 or 64-bit=
 blocks of integer data and comparing them. I don't need lexocographical or=
dering, just <i>some</i> ordering).<br><br>I <i>certainly</i> would not wan=
t to add cruft into std::basic_string to support that <i>very specific</i> =
use case. But I would also want to be able to shove these at some function =
via a basic_string_ref.<br><br>There will always be some special case need =
for strings that someone will have. Trying to make basic_string service all=
 of their needs is ridiculous and doomed to failure. Uberstrings with templ=
ate policies or whatever that try to help everyone end up being so complex =
that they help nobody.<br><br>basic_string is very simple: a contiguously a=
llocated array of characters. basic_string_ref is equally simple: a constan=
t reference to a contiguously allocated array of characters.<br></div><bloc=
kquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-l=
eft: 1px #ccc solid;padding-left: 1ex;"><ul><li>What would it take to make =
a truly universal string ref class?</li></ul></blockquote><div>Define "univ=
ersal". If "universal" means "can support non-contiguous strings", then it =
just becomes a range, which is not what we want. We want basic_string_ref t=
o <i>discriminate</i>; it's not just a range of char-like objects. It is sp=
ecifically an <i>array</i> of char-like objects.<br><br>The most basic use =
case for `basic_string_ref` is to take one as an argument and pass it to so=
me C interface that takes a `const char *` (possibly with a size). A generi=
c range can't handle that, because a generic range is not required to be a =
<i>contiguous</i> range.<br></div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_987_3468334.1358847976422--

.

Author: rick@longbowgames.com
Date: Tue, 22 Jan 2013 01:59:55 -0800 (PST) Raw View

------=_Part_2355_28674853.1358848795309
Content-Type: text/plain; charset=ISO-8859-1

On Tuesday, January 22, 2013 4:17:13 AM UTC-5, R. Martinho Fernandes wrote:
>
> Without these limitations, it becomes just simpler to use a range (or
> a pair of iterators, if ranges don't get on). It is my impression that
> these are not really limitations, but are actually *the* use case for
> string_ref. It is intended to provide a zero-cost common interface to
> various common ways of passing strings in the wild: things like
> std::string&, char*+size, std::vector<char>::iterator or
> std::string::iterator pairs, other string implementations, etc.
> string_ref is intended merely as a special case of array_ref for
> characters. Note that the word "character" has the C++ meaning of
> "character".
>

Perhaps the biggest problem here is that the proposal is claiming to be
something it's not. I'm getting the impression that the intended use of
this class isn't for users who don't care about the exact type of the
string, as implied in the overview, so much as it's for users looking to
ease C-interop.

> Each memory location must a discreet character.
>
> Since that is what happens in the use cases I don't see a problem with
> it. I doubt many people put more than one character in the same memory
> location or less than one character per memory location.
>

I was referring here to the way UTF-8 works, where one character may span
multiple memory locations.


> std::string accepts UTF-8 just fine. In fact, C++11 has redefined a
> byte (and therefore char) to be "large enough to contain [...] the
> eight-bit code units of the Unicode UTF-8 encoding form", so I see no
> problem there.
>

It accepts a UTF-8 string, but it will not operate upon one correctly.
Dereferencing, iterating, subscripting, and size() are all ill-defined if
the string contained is UTF-8. Ideally, if I truly don't care what the
string type is, I should be able to compare a wstring and some UTF-8 string
class without worrying about the string type.

I don't know, but I am not sure if the goal is worthy. If you want to
> provide a truly universal interface, C++ already has templates.
>

The proposal specifically addresses that:

> The callee is rewritten as a template and its implementation is moved to a
> header file. This can increase flexibility if the author takes the time to
> code to a weaker iterator concept, but it can also increase compile time
> and code size, and can even introduce bugs if the author misses an
> assumption that the argument's contents are contiguous.


Those seem to be valid concerns. Notice that it even mentions problems with
assuming that the contents are contiguous, while the proposal contains the
very same issue.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



------=_Part_2355_28674853.1358848795309
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Tuesday, January 22, 2013 4:17:13 AM UTC-5, R. Martinho Fernandes wrote:=
<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;bor=
der-left: 1px #ccc solid;padding-left: 1ex;">Without these limitations, it =
becomes just simpler to use a range (or
<br>a pair of iterators, if ranges don't get on). It is my impression that
<br>these are not really limitations, but are actually *the* use case for
<br>string_ref. It is intended to provide a zero-cost common interface to
<br>various common ways of passing strings in the wild: things like
<br>std::string&amp;, char*+size, std::vector&lt;char&gt;::iterator or
<br>std::string::iterator pairs, other string implementations, etc.
<br>string_ref is intended merely as a special case of array_ref for
<br>characters. Note that the word "character" has the C++ meaning of
<br>"character".<br></blockquote><div>&nbsp;</div><div>Perhaps the biggest =
problem here is that the proposal is claiming to be something it's not. I'm=
 getting the impression that the intended use of this class isn't for users=
 who don't care about the exact type of the string, as implied in the overv=
iew, so much as it's for users looking to ease C-interop.<br><br></div><blo=
ckquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-=
left: 1px #ccc solid;padding-left: 1ex;">&gt; Each memory location must a d=
iscreet character.
<br>
<br>Since that is what happens in the use cases I don't see a problem with
<br>it. I doubt many people put more than one character in the same memory
<br>location or less than one character per memory location.
<br></blockquote><div><br>I was referring here to the way UTF-8 works, wher=
e one character may span multiple memory locations. <br></div><div>&nbsp;</=
div><div></div><blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-=
left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">std::string acc=
epts UTF-8 just fine. In fact, C++11 has redefined a
<br>byte (and therefore char) to be "large enough to contain [...] the
<br>eight-bit code units of the Unicode UTF-8 encoding form", so I see no
<br>problem there.
<br></blockquote><div><br>It accepts a UTF-8 string, but it will not operat=
e upon one correctly. Dereferencing, iterating, subscripting, and size() ar=
e all ill-defined if the string contained is UTF-8. Ideally, if I truly don=
't care what the string type is, I should be able to compare a wstring and =
some UTF-8 string class without worrying about the string type.<br><br></di=
v><blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;b=
order-left: 1px #ccc solid;padding-left: 1ex;">I don't know, but I am not s=
ure if the goal is worthy. If you want to
<br>provide a truly universal interface, C++ already has templates.
<br></blockquote><div><br>The proposal specifically addresses that:<br><blo=
ckquote style=3D"margin: 0px 0px 0px 0.8ex; border-left: 1px solid rgb(204,=
 204, 204); padding-left: 1ex;" class=3D"gmail_quote">The callee is rewritt=
en as a template and its implementation
      is moved to a header file. This can increase flexibility if the
      author takes the time to code to a weaker iterator concept, but it
      can also increase compile time and code size, and can even
      introduce bugs if the author misses an assumption that the
      argument's contents are contiguous.</blockquote><div><br>Those seem t=
o be valid concerns. Notice that it even mentions problems with assuming th=
at the contents are contiguous, while the proposal contains the very same i=
ssue.<br></div></div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_2355_28674853.1358848795309--

.

Author: Martinho Fernandes <martinho.fernandes@gmail.com>
Date: Tue, 22 Jan 2013 11:42:01 +0100 Raw View

On Tue, Jan 22, 2013 at 10:59 AM,  <rick@longbowgames.com> wrote:
> Perhaps the biggest problem here is that the proposal is claiming to be
> something it's not. I'm getting the impression that the intended use of t=
his
> class isn't for users who don't care about the exact type of the string, =
as
> implied in the overview, so much as it's for users looking to ease
> C-interop.

I believe that to be one of the driving motivations, yes. I think most
people take "string" to mean "array of characters", but maybe that
should be clarified anyway.

>> > Each memory location must a discreet character.
>>
>> Since that is what happens in the use cases I don't see a problem with
>> it. I doubt many people put more than one character in the same memory
>> location or less than one character per memory location.
>
>
> I was referring here to the way UTF-8 works, where one character may span
> multiple memory locations.

"character" is a much overloaded term. It can be used to mean many
things, and that is why I will refrain from using it at all from now
on.

>> std::string accepts UTF-8 just fine. In fact, C++11 has redefined a
>> byte (and therefore char) to be "large enough to contain [...] the
>> eight-bit code units of the Unicode UTF-8 encoding form", so I see no
>> problem there.
>
>
> It accepts a UTF-8 string, but it will not operate upon one correctly.
> Dereferencing, iterating, subscripting, and size() are all ill-defined if
> the string contained is UTF-8.

I don't see how they are ill-defined. People just need to grow away
from the idea that std::string is more than an array of code units.
If you want Unicode support, ask for proper Unicode support. That is
far beyond the motivations of string_ref *and not incompatible with
it*, since a decent interface for Unicode algorithms must work fine
with any of std::string, string_ref, std::u16string, and so on.

I won't even go into detail about why I think size() based on
something other than code units is less useful than the code unit
counterpart, and why random-access isn't that great a thing. That was
already debated in the Unicode proposal thread.

> Ideally, if I truly don't care what the
> string type is, I should be able to compare a wstring and some UTF-8 stri=
ng
> class without worrying about the string type.

And what kind of comparison would that be? The Unicode standard says
that canonically equivalent strings should be treated exactly the same
way. Are you saying operator=3D=3D for string refs should normalize both
sides in order to make that comparison? Or that we should only have
*some* rushed Unicode support in *some* places?

There was already a thread about proper Unicode support in this
mailing list, and AFAIK as it interoperates nicely with string_ref. I
recommend redirecting any related efforts there.

Mit freundlichen Gr=FC=DFen,

Martinho

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/?hl=3Den.

.

Author: Nicol Bolas <jmckesson@gmail.com>
Date: Tue, 22 Jan 2013 02:47:35 -0800 (PST) Raw View

------=_Part_334_24712142.1358851655725
Content-Type: text/plain; charset=ISO-8859-1



On Tuesday, January 22, 2013 1:59:55 AM UTC-8, ri...@longbowgames.com wrote:
>
> On Tuesday, January 22, 2013 4:17:13 AM UTC-5, R. Martinho Fernandes wrote:
>>
>> Without these limitations, it becomes just simpler to use a range (or
>> a pair of iterators, if ranges don't get on). It is my impression that
>> these are not really limitations, but are actually *the* use case for
>> string_ref. It is intended to provide a zero-cost common interface to
>> various common ways of passing strings in the wild: things like
>> std::string&, char*+size, std::vector<char>::iterator or
>> std::string::iterator pairs, other string implementations, etc.
>> string_ref is intended merely as a special case of array_ref for
>> characters. Note that the word "character" has the C++ meaning of
>> "character".
>>
>
> Perhaps the biggest problem here is that the proposal is claiming to be
> something it's not. I'm getting the impression that the intended use of
> this class isn't for users who don't care about the exact type of the
> string, as implied in the overview, so much as it's for users looking to
> ease C-interop.
>

The vast majority of strings are contiguous and made up of characters, so I
don't see the problem here. It serves both the needs of C interop and C++
interop. Besides the SGI STL rope class, how many string classes do not use
contiguous arrays of characters of some form?

Even UTF-8 strings conform to this, if we assume "character" means
"code-unit". Indeed, Unicode does not actually define a "character", and
with good reason. It defines "code unit", "codepoint", "grapheme cluster"
and similar things, but not "character". Most APIs that take UTF-8 strings
take contiguous sequences of UTF-8 code units. Because that's what a UTF-8
string is.

Again, if you want to treat UTF-8 as valid UTF-8 encoded Unicode
codepoints, you need to use an API designed for Unicode strings. Which
means that both the callee and the caller need to be using tools
appropriate for that task. That task being outside the issue of
basic_string_ref.

std::string accepts UTF-8 just fine. In fact, C++11 has redefined a
>> byte (and therefore char) to be "large enough to contain [...] the
>> eight-bit code units of the Unicode UTF-8 encoding form", so I see no
>> problem there.
>>
>
> It accepts a UTF-8 string, but it will not operate upon one correctly.
> Dereferencing, iterating, subscripting, and size() are all ill-defined if
> the string contained is UTF-8. Ideally, if I truly don't care what the
> string type is, I should be able to compare a wstring and some UTF-8 string
> class without worrying about the string type.
>

No, you should not. What you're asking for makes absolutely no sense.

basic_string_ref is not about being *any*_string_ref. basic_string_ref is
about divorcing the *consumer* of a string from the implementation details
of the storage of that string. You still need to care about what it
contains, otherwise you could *never* interface with it. There's nothing
that can wipe away the differences between a wchar_t and a char, and
expecting that is folly.

The use cases for this class are really quite simple.

1: I have a function that needs to do something with a contiguous sequence
of characters of known type. I neither know nor care how the caller
provides storage for those characters; I just want a contiguous sequence of
characters known type.

2: I have a function that needs to return to you a contiguous sequence of
characters of known type. You neither know nor care how I am storing these
characters long-term; all you want is the contiguous sequence of characters
of known type.

This covers about 50-80% of the uses of strings in APIs. Basically, anytime
you would use a `const StringType &` in a function's interface, you should
be using a `basic_string_ref` instead.

Most string classes store their strings as contiguous sequences of
characters of known type. Most interfaces that take or return strings
return specific contiguous sequences of characters of known type. And thus,
most string classes can relatively easily be used with
basic_string_ref-based interfaces.

I don't know, but I am not sure if the goal is worthy. If you want to
>> provide a truly universal interface, C++ already has templates.
>>
>
> The proposal specifically addresses that:
>
>> The callee is rewritten as a template and its implementation is moved to
>> a header file. This can increase flexibility if the author takes the time
>> to code to a weaker iterator concept, but it can also increase compile time
>> and code size, and can even introduce bugs if the author misses an
>> assumption that the argument's contents are contiguous.
>
>
> Those seem to be valid concerns. Notice that it even mentions problems
> with assuming that the contents are contiguous, while the proposal contains
> the very same issue.
>

The point in bringing that up in the proposal is to show that a truly
"universal" interface would only be viable via templates, which would bring
up other problems. In short, basic_string_ref is a reasonable *compromise*between "universal" and "good code".

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



------=_Part_334_24712142.1358851655725
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<br><br>On Tuesday, January 22, 2013 1:59:55 AM UTC-8, ri...@longbowgames.c=
om wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: =
0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">On Tuesday, January 2=
2, 2013 4:17:13 AM UTC-5, R. Martinho Fernandes wrote:<blockquote class=3D"=
gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid=
;padding-left:1ex">Without these limitations, it becomes just simpler to us=
e a range (or
<br>a pair of iterators, if ranges don't get on). It is my impression that
<br>these are not really limitations, but are actually *the* use case for
<br>string_ref. It is intended to provide a zero-cost common interface to
<br>various common ways of passing strings in the wild: things like
<br>std::string&amp;, char*+size, std::vector&lt;char&gt;::iterator or
<br>std::string::iterator pairs, other string implementations, etc.
<br>string_ref is intended merely as a special case of array_ref for
<br>characters. Note that the word "character" has the C++ meaning of
<br>"character".<br></blockquote><div>&nbsp;</div><div>Perhaps the biggest =
problem here is that the proposal is claiming to be something it's not. I'm=
 getting the impression that the intended use of this class isn't for users=
 who don't care about the exact type of the string, as implied in the overv=
iew, so much as it's for users looking to ease C-interop.<br></div></blockq=
uote><div><br>The vast majority of strings are contiguous and made up of ch=
aracters, so I don't see the problem here. It serves both the needs of C in=
terop and C++ interop. Besides the SGI STL rope class, how many string clas=
ses do not use contiguous arrays of characters of some form?<br><br>Even UT=
F-8 strings conform to this, if we assume "character" means "code-unit". In=
deed, Unicode does not actually define a "character", and with good reason.=
 It defines "code unit", "codepoint", "grapheme cluster" and similar things=
, but not "character". Most APIs that take UTF-8 strings take contiguous se=
quences of UTF-8 code units. Because that's what a UTF-8 string is.<br><br>=
Again, if you want to treat UTF-8 as valid UTF-8 encoded Unicode codepoints=
, you need to use an API designed for Unicode strings. Which means that bot=
h the callee and the caller need to be using tools appropriate for that tas=
k. That task being outside the issue of basic_string_ref.<br><br></div><blo=
ckquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-=
left: 1px #ccc solid;padding-left: 1ex;"><div></div><blockquote class=3D"gm=
ail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid;p=
adding-left:1ex">std::string accepts UTF-8 just fine. In fact, C++11 has re=
defined a
<br>byte (and therefore char) to be "large enough to contain [...] the
<br>eight-bit code units of the Unicode UTF-8 encoding form", so I see no
<br>problem there.
<br></blockquote><div><br>It accepts a UTF-8 string, but it will not operat=
e upon one correctly. Dereferencing, iterating, subscripting, and size() ar=
e all ill-defined if the string contained is UTF-8. Ideally, if I truly don=
't care what the string type is, I should be able to compare a wstring and =
some UTF-8 string class without worrying about the string type.<br></div></=
blockquote><div><br>No, you should not. What you're asking for makes absolu=
tely no sense.<br><br>basic_string_ref is not about being <i>any</i>_string=
_ref. basic_string_ref is about divorcing the <i>consumer</i> of a string f=
rom the implementation details of the storage of that string. You still nee=
d to care about what it contains, otherwise you could <i>never</i> interfac=
e with it. There's nothing that can wipe away the differences between a wch=
ar_t and a char, and expecting that is folly.<br><br>The use cases for this=
 class are really quite simple.<br><br>1: I have a function that needs to d=
o something with a contiguous sequence of characters of known type. I neith=
er know nor care how the caller provides storage for those characters; I ju=
st want a contiguous sequence of characters known type.<br><br>2: I have a =
function that needs to return to you a contiguous sequence of characters of=
 known type. You neither know nor care how I am storing these characters lo=
ng-term; all you want is the contiguous sequence of characters of known typ=
e.<br><br>This covers about 50-80% of the uses of strings in APIs. Basicall=
y, anytime you would use a `const StringType &amp;` in a function's interfa=
ce, you should be using a `basic_string_ref` instead.<br><br>Most string cl=
asses store their strings as contiguous sequences of characters of known ty=
pe. Most interfaces that take or return strings return specific contiguous =
sequences of characters of known type. And thus, most string classes can re=
latively easily be used with basic_string_ref-based interfaces.<br><br></di=
v><blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;b=
order-left: 1px #ccc solid;padding-left: 1ex;"><blockquote class=3D"gmail_q=
uote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid;paddin=
g-left:1ex">I don't know, but I am not sure if the goal is worthy. If you w=
ant to
<br>provide a truly universal interface, C++ already has templates.
<br></blockquote><div><br>The proposal specifically addresses that:<br><blo=
ckquote style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204=
,204);padding-left:1ex" class=3D"gmail_quote">The callee is rewritten as a =
template and its implementation
      is moved to a header file. This can increase flexibility if the
      author takes the time to code to a weaker iterator concept, but it
      can also increase compile time and code size, and can even
      introduce bugs if the author misses an assumption that the
      argument's contents are contiguous.</blockquote><div><br>Those seem t=
o be valid concerns. Notice that it even mentions problems with assuming th=
at the contents are contiguous, while the proposal contains the very same i=
ssue.<br></div></div></blockquote><div><br>The point in bringing that up in=
 the proposal is to show that a truly "universal" interface would only be v=
iable via templates, which would bring up other problems. In short, basic_s=
tring_ref is a reasonable <i>compromise</i> between "universal" and "good c=
ode". <br></div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_334_24712142.1358851655725--

.

Author: Nicol Bolas <jmckesson@gmail.com>
Date: Tue, 22 Jan 2013 02:52:45 -0800 (PST) Raw View

------=_Part_695_32599562.1358851965561
Content-Type: text/plain; charset=ISO-8859-1

Or, to put it another way, think of it like this.

If I have a function that takes `const std::string &`, the user of my
function *must* be using std::string to manage the storage for their data.
If they are not internally using std::string, then they *must copy* their
string into std::string. All just so that they can call my function.

Why does my function need to tell you how to store your string data? My
function isn't going to modify that storage. It can't add or remove
elements from it, since you passed it as `const&` (and it's rude to remove
the `const`). There is absolutely no reason for my function to force the
user into using std::string if they don't want to.

If I have a function that takes `std::string_ref`, the user of my function
only needs to provide a conversion from whatever they're using to store
their string data into the `std::string_ref` format. This conversion will
in virtually all cases require *no* copying of the string.

One of these is fast, the other is slow. This is what basic_string_ref is
for.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



------=_Part_695_32599562.1358851965561
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Or, to put it another way, think of it like this.<br><br>If I have a functi=
on that takes `const std::string &amp;`, the user of my function <i>must</i=
> be using std::string to manage the storage for their data. If they are no=
t internally using std::string, then they <i>must copy</i> their string int=
o std::string. All just so that they can call my function.<br><br>Why does =
my function need to tell you how to store your string data? My function isn=
't going to modify that storage. It can't add or remove elements from it, s=
ince you passed it as `const&amp;` (and it's rude to remove the `const`). T=
here is absolutely no reason for my function to force the user into using s=
td::string if they don't want to.<br><br>If I have a function that takes `s=
td::string_ref`, the user of my function only needs to provide a conversion=
 from whatever they're using to store their string data into the `std::stri=
ng_ref` format. This conversion will in virtually all cases require <i>no</=
i> copying of the string.<br><br>One of these is fast, the other is slow. T=
his is what basic_string_ref is for.<br>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_695_32599562.1358851965561--

.

Author: rick@longbowgames.com
Date: Tue, 22 Jan 2013 03:01:19 -0800 (PST) Raw View

------=_Part_670_6395153.1358852479842
Content-Type: text/plain; charset=ISO-8859-1

Thinking about it some more, I'm curious if some existing basic_string
implementations could achieve what this proposal is looking for with one
function:

// If the implementation permits it, uses str to create a basic_string
without
// allocating or freeing any memory. str must remain valid for as long as
the
// returned string is in use.
template <typename charT>
const basic_string<charT>& make_string_ref(const charT* str, size_type len);

Skimming over the Visual Studio implementation, for example, it seems to
have three variables in its basic_string implementation: the string size,
the string capacity, and a pointer. If the string capacity is more than the
size of a pointer, it treats that pointer as you would expect. Otherwise,
it treats that pointer as a character array instead of allocating memory
for it. In that case, obviously it doesn't need to deallocate anything.

If you had some private friend function that set the pointer to str, set
the size to len, and set the capacity to zero, this could achieve the
performance you're looking for without the need for a whole new class that
mirrors std::basic_string. It also works with existing code that already
accepts const string&, and you don't have to teach people to change their
parameter type.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



------=_Part_670_6395153.1358852479842
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Thinking about it some more, I'm curious if some existing basic_string impl=
ementations could achieve what this proposal is looking for with one functi=
on:<br><br><div class=3D"prettyprint" style=3D"background-color: rgb(250, 2=
50, 250); border-color: rgb(187, 187, 187); border-style: solid; border-wid=
th: 1px; word-wrap: break-word;"><code class=3D"prettyprint"><div class=3D"=
subprettyprint"><span style=3D"color: #800;" class=3D"styled-by-prettify">/=
/ If the implementation permits it, uses str to create a basic_string witho=
ut</span><span style=3D"color: #000;" class=3D"styled-by-prettify"><br></sp=
an><span style=3D"color: #800;" class=3D"styled-by-prettify">// allocating =
or freeing any memory. str must remain valid for as long as the</span><span=
 style=3D"color: #000;" class=3D"styled-by-prettify"><br></span><span style=
=3D"color: #800;" class=3D"styled-by-prettify">// returned string is in use=
..</span><span style=3D"color: #000;" class=3D"styled-by-prettify"><br></spa=
n><span style=3D"color: #008;" class=3D"styled-by-prettify">template</span>=
<span style=3D"color: #000;" class=3D"styled-by-prettify"> </span><span sty=
le=3D"color: #660;" class=3D"styled-by-prettify">&lt;</span><span style=3D"=
color: #008;" class=3D"styled-by-prettify">typename</span><span style=3D"co=
lor: #000;" class=3D"styled-by-prettify"> charT</span><span style=3D"color:=
 #660;" class=3D"styled-by-prettify">&gt;</span><span style=3D"color: #000;=
" class=3D"styled-by-prettify"><br></span><span style=3D"color: #008;" clas=
s=3D"styled-by-prettify">const</span><span style=3D"color: #000;" class=3D"=
styled-by-prettify"> basic_string</span><span style=3D"color: #080;" class=
=3D"styled-by-prettify">&lt;charT&gt;</span><span style=3D"color: #660;" cl=
ass=3D"styled-by-prettify">&amp;</span><span style=3D"color: #000;" class=
=3D"styled-by-prettify"> make_string_ref</span><span style=3D"color: #660;"=
 class=3D"styled-by-prettify">(</span><span style=3D"color: #008;" class=3D=
"styled-by-prettify">const</span><span style=3D"color: #000;" class=3D"styl=
ed-by-prettify"> charT</span><span style=3D"color: #660;" class=3D"styled-b=
y-prettify">*</span><span style=3D"color: #000;" class=3D"styled-by-prettif=
y"> str</span><span style=3D"color: #660;" class=3D"styled-by-prettify">,</=
span><span style=3D"color: #000;" class=3D"styled-by-prettify"> size_type l=
en</span><span style=3D"color: #660;" class=3D"styled-by-prettify">);</span=
><span style=3D"color: #000;" class=3D"styled-by-prettify"><br></span></div=
></code></div><br>Skimming over the Visual Studio implementation, for examp=
le, it seems to have three variables in its basic_string implementation: th=
e string size, the string capacity, and a pointer. If the string capacity i=
s more than the size of a pointer, it treats that pointer as you would expe=
ct. Otherwise, it treats that pointer as a character array instead of alloc=
ating memory for it. In that case, obviously it doesn't need to deallocate =
anything.<br><br>If you had some private friend function that set the point=
er to str, set the size to len, and set the capacity to zero, this could ac=
hieve the performance you're looking for without the need for a whole new c=
lass that mirrors std::basic_string. It also works with existing code that =
already accepts const string&amp;, and you don't have to teach people to ch=
ange their parameter type.<br>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_670_6395153.1358852479842--

.

Author: rick@longbowgames.com
Date: Tue, 22 Jan 2013 03:19:17 -0800 (PST) Raw View

------=_Part_756_18426191.1358853557579
Content-Type: text/plain; charset=ISO-8859-1

On Tuesday, January 22, 2013 6:01:19 AM UTC-5, ri...@longbowgames.com wrote:

> If you had some private friend function that set the pointer to str, set
> the size to len, and set the capacity to zero, this could achieve the
> performance you're looking for without the need for a whole new class that
> mirrors std::basic_string. It also works with existing code that already
> accepts const string&, and you don't have to teach people to change their
> parameter type.
>

Doh, of course it wouldn't be that simple since it wouldn't deref the
pointer properly. Instead, you would need some third state, perhaps set by
setting the capacity to numeric_limits<size_type>::max() with a smattering
of ifs to make sure you're not deleting a constant string. Still, I'd be
curious to see if such an implementation would have negligible performance
penalties to people not using the feature.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



------=_Part_756_18426191.1358853557579
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Tuesday, January 22, 2013 6:01:19 AM UTC-5, ri...@longbowgames.com wrote=
:<br><blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8e=
x;border-left: 1px #ccc solid;padding-left: 1ex;">If you had some private f=
riend function that set the pointer to str, set the size to len, and set th=
e capacity to zero, this could achieve the performance you're looking for w=
ithout the need for a whole new class that mirrors std::basic_string. It al=
so works with existing code that already accepts const string&amp;, and you=
 don't have to teach people to change their parameter type.<br></blockquote=
><div><br>Doh, of course it wouldn't be that simple since it wouldn't deref=
 the pointer properly. Instead, you would need some third state, perhaps se=
t by setting the capacity to numeric_limits&lt;size_type&gt;::max() with a =
smattering of ifs to make sure you're not deleting a constant string. Still=
, I'd be curious to see if such an implementation would have negligible per=
formance penalties to people not using the feature.<br></div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_756_18426191.1358853557579--

.

Author: Nicol Bolas <jmckesson@gmail.com>
Date: Tue, 22 Jan 2013 03:34:26 -0800 (PST) Raw View

------=_Part_1051_14010848.1358854466695
Content-Type: text/plain; charset=ISO-8859-1

On Tuesday, January 22, 2013 3:01:19 AM UTC-8, ri...@longbowgames.com wrote:
>
> Thinking about it some more, I'm curious if some existing basic_string
> implementations could achieve what this proposal is looking for with one
> function:
>

That's over-complicating a class to no constructive purpose. You're
effectively overloading the class to *sometimes* contain data and sometimes
reference it.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.

------=_Part_1051_14010848.1358854466695
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<br><br>On Tuesday, January 22, 2013 3:01:19 AM UTC-8, ri...@longbowgames.c=
om wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: =
0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">Thinking about it som=
e more, I'm curious if some existing basic_string implementations could ach=
ieve what this proposal is looking for with one function:<br></blockquote><=
div><br>That's over-complicating a class to no constructive purpose. You're=
 effectively overloading the class to <i>sometimes</i> contain data and som=
etimes reference it.<br></div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_1051_14010848.1358854466695--

.

Author: Olaf van der Spek <olafvdspek@gmail.com>
Date: Tue, 22 Jan 2013 03:47:59 -0800 (PST) Raw View

------=_Part_9_4943927.1358855279558
Content-Type: text/plain; charset=ISO-8859-1

On Tuesday, January 22, 2013 12:01:19 PM UTC+1, ri...@longbowgames.com
wrote:

> Skimming over the Visual Studio implementation, for example,
>

Isn't sizeof(std::string) == 32 or so in that implementation?

string_ref is designed to be as simple as possible. Something like this
should just work:

void f(string_ref);

std::string s = "Olaf";
f(s);

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.

------=_Part_9_4943927.1358855279558
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Tuesday, January 22, 2013 12:01:19 PM UTC+1, ri...@longbowgames.com wrot=
e:<br><blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8=
ex;border-left: 1px #ccc solid;padding-left: 1ex;">Skimming over the Visual=
 Studio implementation, for example,&nbsp;<br></blockquote><div><br></div><=
div>Isn't sizeof(std::string) =3D=3D 32 or so in that implementation?</div>=
<div><br></div><div>string_ref is designed to be as simple as possible. Som=
ething like this should just work:</div><div><br></div><div>void f(string_r=
ef);</div><div><br></div><div>std::string s =3D "Olaf";</div><div>f(s);<br>=
</div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_9_4943927.1358855279558--

.

Author: rick@longbowgames.com
Date: Tue, 22 Jan 2013 04:13:03 -0800 (PST) Raw View

------=_Part_248_767934.1358856783190
Content-Type: text/plain; charset=ISO-8859-1

I hope this doesn't get posted three times. Google Groups is being fussy.

On Tuesday, January 22, 2013 6:47:59 AM UTC-5, Olaf van der Spek wrote:

> Isn't sizeof(std::string) == 32 or so in that implementation?
>

24 in the latest version. Only one byte bigger than the ideal
representation of string_ref. But that should be compared against the
benefits of make_string_ref:

* The user can use this optimization with *every* function that accepts a
const string&, even if the author didn't realize it would be used that way,
or if the code was originally written in 1998.
* Simplifies the interface to a single function, rather than a mirrored
class which is, in every other way, identical to a const string&.
* No need to teach students why they should be taking strings by
string_ref, when every other class in taken by const&.


> string_ref is designed to be as simple as possible. Something like this
> should just work:
>
> void f(string_ref);
>
> std::string s = "Olaf";
> f(s);
>

Of course that code would be identical in a make_string_ref world. A better
example would be the case where you want to use C strings.

Here's what it would look like with string_ref:

void f(string_ref);

const char[] s = "Hello World";
f(string_ref(s, sizeof(s)));

And here's what it would look like with make_string_ref:

void f(const string& s);

const char[] s = "Hello World";
f(make_string_ref(s, sizeof(s)));

I would argue that the second one is simpler to use, because you take your
string by const&, just like you always have, and just like you do for any
other class.

On Tuesday, January 22, 2013 6:34:26 AM UTC-5, Nicol Bolas wrote:

> That's over-complicating a class to no constructive purpose. You're
> effectively overloading the class to *sometimes* contain data and
> sometimes reference it.
>

If you taking a string by const string&, why would you case whether or not
it contains the data it's referencing? Do you care that current
implementations hop back and forth between the stack and the heap?

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



------=_Part_248_767934.1358856783190
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I hope this doesn't get posted three times. Google Groups is being fussy.<b=
r><br>On Tuesday, January 22, 2013 6:47:59 AM UTC-5, Olaf van der Spek wrot=
e:<br><blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8=
ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div>Isn't sizeof(std::s=
tring) =3D=3D 32 or so in that implementation?</div></blockquote><div><br>2=
4 in the latest version. Only one byte bigger than the ideal representation=
 of string_ref. But that should be compared against the benefits of make_st=
ring_ref:<br><br>* The user can use this optimization with *every* function=
 that accepts a const string&amp;, even if the author didn't realize it wou=
ld be used that way, or if the code was originally written in 1998.<br>* Si=
mplifies the interface to a single function, rather than a mirrored class w=
hich is, in every other way, identical to a const string&amp;.<br>* No need=
 to teach students why they should be taking strings by string_ref, when ev=
ery other class in taken by const&amp;.<br>&nbsp;</div><blockquote class=3D=
"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #ccc s=
olid;padding-left: 1ex;"><div>string_ref is designed to be as simple as pos=
sible. Something like this should just work:</div><div><br></div><div>void =
f(string_ref);</div><div><br></div><div>std::string s =3D "Olaf";</div><div=
>f(s);<br></div></blockquote><div><br>Of course that code would be identica=
l in a make_string_ref world. A better example would be the case where you =
want to use C strings.<br><br>Here's what it would look like with string_re=
f:<br><br>void f(string_ref);<br><br>const char[] s =3D "Hello World";<br>f=
(string_ref(s, sizeof(s)));<br><br>And here's what it would look like with =
make_string_ref:<br><br>void f(const string&amp; s);<br><br>const char[] s =
=3D "Hello World";<br>f(make_string_ref(s, sizeof(s)));<br><br>I would argu=
e that the second one is simpler to use, because you take your string by co=
nst&amp;, just like you always have, and just like you do for any other cla=
ss.<br><br>On Tuesday, January 22, 2013 6:34:26 AM UTC-5, Nicol Bolas wrote=
:<br><blockquote style=3D"margin: 0px 0px 0px 0.8ex; border-left: 1px solid=
 rgb(204, 204, 204); padding-left: 1ex;" class=3D"gmail_quote">That's over-=
complicating a class to no constructive purpose. You're effectively overloa=
ding the class to <i>sometimes</i> contain data and sometimes reference it.=
<br></blockquote><div><br>If you taking a string by const string&amp;, why =
would you case whether or not it contains the data it's referencing? Do you=
 care that current implementations hop back and forth between the stack and=
 the heap? <br></div></div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_248_767934.1358856783190--

.

Author: Olaf van der Spek <olafvdspek@gmail.com>
Date: Tue, 22 Jan 2013 13:18:27 +0100 Raw View

On Tue, Jan 22, 2013 at 1:13 PM,  <rick@longbowgames.com> wrote:
> On Tuesday, January 22, 2013 6:47:59 AM UTC-5, Olaf van der Spek wrote:
>>
>> Isn't sizeof(std::string) == 32 or so in that implementation?
>
>
> 24 in the latest version. Only one byte bigger than the ideal representation
> of string_ref. But that should be compared against the benefits of
> make_string_ref:

Isn't the simplest implementation just 3 pointers (12 bytes on x86)?
Still, 24 bytes is a lot more than 8.

>> string_ref is designed to be as simple as possible. Something like this
>> should just work:
>>
>> void f(string_ref);
>>
>> std::string s = "Olaf";
>> f(s);
>
>
> Of course that code would be identical in a make_string_ref world. A better
> example would be the case where you want to use C strings.

Right, I should've used vector<char> or so.

> Here's what it would look like with string_ref:
>
> void f(string_ref);
>
> const char[] s = "Hello World";
> f(string_ref(s, sizeof(s)));

Actually, it should be just f(s);

> And here's what it would look like with make_string_ref:
>
> void f(const string& s);
>
> const char[] s = "Hello World";
> f(make_string_ref(s, sizeof(s)));

That's rather ugly and thus doesn't meet the requirements for string_ref.


--
Olaf

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



.

Author: rick@longbowgames.com
Date: Tue, 22 Jan 2013 05:19:23 -0800 (PST) Raw View

------=_Part_2246_9934083.1358860763278
Content-Type: text/plain; charset=ISO-8859-1

On Tuesday, January 22, 2013 7:18:27 AM UTC-5, Olaf van der Spek wrote:

> Isn't the simplest implementation just 3 pointers (12 bytes on x86)?
> Still, 24 bytes is a lot more than 8.
>

 Sorry, my brain was in 64-bit mode. It is indeed just three pointers.
(Actually, they pad it out to 32 bytes on x86 to take advantage of the
short string optimization, but there's no reason it couldn't be 12. Not
that x86 Windows will be very relevant in 2014/2017.)

> Of course that code would be identical in a make_string_ref world. A
> better
> > example would be the case where you want to use C strings.
>
> Right, I should've used vector<char> or so.
>

Wouldn't that be fairly identical as well? In both cases you would need
&v[0] and v.size(), or else you're paying the price of a call to strlen.
Actually, since make_string_ref is a free function, you could implement
your own overload which accepts a const vector<char>& instead of requiring
a pair.

The "ugly" complaint is certainly subjective. The only thing you're losing
is implicit conversion from a char*, and in practice C-interop usually
involves a address/size pair. The implicit conversion is mostly intended
for string literals.

Of course, you can continue to pass string literals into a const string&,
and there's a healthy chance that the optimizer is *already* optimizing it
away.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.

------=_Part_2246_9934083.1358860763278
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Tuesday, January 22, 2013 7:18:27 AM UTC-5, Olaf van der Spek wrote:<br>=
<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;bor=
der-left: 1px #ccc solid;padding-left: 1ex;">Isn't the simplest implementat=
ion just 3 pointers (12 bytes on x86)?
<br>Still, 24 bytes is a lot more than 8.
<br></blockquote><div><br>&nbsp;Sorry, my brain was in 64-bit mode. It is i=
ndeed just three pointers. (Actually, they pad it out to 32 bytes on x86 to=
 take advantage of the short string optimization, but there's no reason it =
couldn't be 12. Not that x86 Windows will be very relevant in 2014/2017.)<b=
r><br></div><blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-lef=
t: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">&gt; Of course tha=
t code would be identical in a make_string_ref world. A better
<br>&gt; example would be the case where you want to use C strings.
<br>
<br>Right, I should've used vector&lt;char&gt; or so.
<br></blockquote><div><br>Wouldn't that be fairly identical as well? In bot=
h cases you would need &amp;v[0] and v.size(), or else you're paying the pr=
ice of a call to strlen. Actually, since make_string_ref is a free function=
, you could implement your own overload which accepts a const vector&lt;cha=
r&gt;&amp; instead of requiring a pair.<br></div><br>The "ugly" complaint i=
s certainly subjective. The only thing you're losing is implicit conversion=
 from a char*, and in practice C-interop usually involves a address/size pa=
ir. The implicit conversion is mostly intended for string literals.<br><br>=
Of course, you can continue to pass string literals into a const string&amp=
;, and there's a healthy chance that the optimizer is *already* optimizing =
it away.<br>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_2246_9934083.1358860763278--

.

Author: Olaf van der Spek <olafvdspek@gmail.com>
Date: Tue, 22 Jan 2013 14:55:15 +0100 Raw View

On Tue, Jan 22, 2013 at 2:19 PM,  <rick@longbowgames.com> wrote:
> On Tuesday, January 22, 2013 7:18:27 AM UTC-5, Olaf van der Spek wrote:
>>
>> Isn't the simplest implementation just 3 pointers (12 bytes on x86)?
>> Still, 24 bytes is a lot more than 8.
>
>
>  Sorry, my brain was in 64-bit mode. It is indeed just three pointers.
> (Actually, they pad it out to 32 bytes on x86 to take advantage of the short
> string optimization, but there's no reason it couldn't be 12.

Hmm, no short string optimization on x64?

> Not that x86
> Windows will be very relevant in 2014/2017.)

True, though 32-bit pointers / address space is theoretically still
useful on x64.

>> > Of course that code would be identical in a make_string_ref world. A
>> > better
>> > example would be the case where you want to use C strings.
>>
>> Right, I should've used vector<char> or so.
>
>
> Wouldn't that be fairly identical as well? In both cases you would need
> &v[0] and v.size(), or else you're paying the price of a call to strlen.
> Actually, since make_string_ref is a free function, you could implement your
> own overload which accepts a const vector<char>& instead of requiring a
> pair.
>
> The "ugly" complaint is certainly subjective. The only thing you're losing

Is it?

f(s)
vs
f(make_string_ref(s))

That's not a subjective difference. It requires changes to calling code to.

> is implicit conversion from a char*, and in practice C-interop usually
> involves a address/size pair. The implicit conversion is mostly intended for
> string literals.
>
> Of course, you can continue to pass string literals into a const string&,
> and there's a healthy chance that the optimizer is *already* optimizing it
> away.

You'd replace the const string& by string_ref.

Olaf

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



.

Author: rick@longbowgames.com
Date: Tue, 22 Jan 2013 09:48:33 -0800 (PST) Raw View

------=_Part_279_2256460.1358876913237
Content-Type: text/plain; charset=ISO-8859-1

On Tuesday, January 22, 2013 8:55:15 AM UTC-5, Olaf van der Spek wrote:
>
> Hmm, no short string optimization on x64?
>

The short string optimization is still there, it just doesn't need the
padding in order to be worthwhile, since 64-bit pointers are bigger, but
chars remain the same size.

f(s)
> vs
> f(make_string_ref(s))
>
> That's not a subjective difference. It requires changes to calling code
> to.
>

Sure, that's not subjective when you only give one example, but with
make_string_ref, the user is free to make an overload for, say,
vector<char>. Then you have to compare these:

f(string_ref(&v[0], v.size()))
vs
f(make_string_ref(v))

Some examples are prettier with string_ref, some are prettier with
make_string_ref, but most of them are pretty similar.

As far as having to change the calling code, well, you either have to
change the calling code or the called code. There's certainly arguments for
both cases.

Here's a pros/cons breakdown for make_string_ref:

Pros:
* When passing a std::string, accepting a const string& only involves
copying one word for the reference, rather than copying two words for
string_ref's member variables.
* The optimization is useful even with existing code that currently uses
const string&, or if the original author did not foresee the need for the
optimization.
* It's trivial to allow the implementation to choose between owning and
non-owning semantics, making a null-terminated c_str() possible, even when
the input is not null-terminated.
* Since it's a free function, it's extendable to more than just C strings
and std::strings.
* Does not make C++ harder to teach. With string_ref, you have to teach
students that, unlike other large objects, strings should *not* be passed
by const&.

Cons:
* When passing a C string, accepting a const string& involves copying three
words (pointer, size, parameter reference) and assigning one default
(capacity), rather than copying two words for string_ref's member variables.
* Makes no guarantees that the optimization is actually being used.
* Since it does not implement c_str(), the implementation is free to
optimize for the case when the string is *not* null terminated.
* Although it can be implicitly created from a C string (it's a
std::string, after all) it will likely not be using the optimization in
this case.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.

------=_Part_279_2256460.1358876913237
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Tuesday, January 22, 2013 8:55:15 AM UTC-5, Olaf van der Spek wrote:<blo=
ckquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-=
left: 1px #ccc solid;padding-left: 1ex;">Hmm, no short string optimization =
on x64?
<br></blockquote><div><br>The short string optimization is still there, it =
just doesn't need the padding in order to be worthwhile, since 64-bit point=
ers are bigger, but chars remain the same size.<br><br></div><blockquote cl=
ass=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px =
#ccc solid;padding-left: 1ex;">f(s)
<br>vs
<br>f(make_string_ref(s))
<br>
<br>That's not a subjective difference. It requires changes to calling code=
 to.
<br></blockquote><div><br>Sure, that's not subjective when you only give on=
e example, but with make_string_ref, the user is free to make an overload f=
or, say, vector&lt;char&gt;. Then you have to compare these:<br><br>f(strin=
g_ref(&amp;v[0], v.size()))<br>vs<br>f(make_string_ref(v))<br><br>Some exam=
ples are prettier with string_ref, some are prettier with make_string_ref, =
but most of them are pretty similar.<br><br>As far as having to change the =
calling code, well, you either have to change the calling code or the calle=
d code. There's certainly arguments for both cases.<br><br>Here's a pros/co=
ns breakdown for make_string_ref:<br><br>Pros:<br>* When passing a std::str=
ing, accepting a const string&amp; only involves copying one word for the r=
eference, rather than copying two words for string_ref's member variables.<=
br>* The optimization is useful even with existing code that currently uses=
 const string&amp;, or if the original author did not foresee the need for =
the optimization.<br>* It's trivial to allow the implementation to choose b=
etween owning and non-owning semantics, making a null-terminated c_str() po=
ssible, even when the input is not null-terminated.<br>* Since it's a free =
function, it's extendable to more than just C strings and std::strings.<br>=
* Does not make C++ harder to teach. With string_ref, you have to teach=20
students that, unlike other large objects, strings should *not* be=20
passed by const&amp;.<br><br>Cons:<br>* When passing a C string, accepting =
a const string&amp; involves copying three words (pointer, size, parameter =
reference) and assigning one default (capacity), rather than copying two wo=
rds for string_ref's member variables.<br>* Makes no guarantees that the op=
timization is actually being used.<br>* Since it does not implement c_str()=
, the implementation is free to optimize for the case when the string is *n=
ot* null terminated.<br>* Although it can be implicitly created from a C st=
ring (it's a std::string, after all) it will likely not be using the optimi=
zation in this case.<br></div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_279_2256460.1358876913237--

.

Author: rick@longbowgames.com
Date: Tue, 22 Jan 2013 09:51:30 -0800 (PST) Raw View

------=_Part_431_5673994.1358877090976
Content-Type: text/plain; charset=ISO-8859-1

On Tuesday, January 22, 2013 12:48:33 PM UTC-5, ri...@longbowgames.com
wrote:

> * Since it does not implement c_str(), the implementation is free to
> optimize for the case when the string is *not* null terminated.
>

This con was supposed to be a pro for string_ref. It's poorly worded.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



------=_Part_431_5673994.1358877090976
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Tuesday, January 22, 2013 12:48:33 PM UTC-5, ri...@longbowgames.com wrot=
e:<br><blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8=
ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div>* Since it does not=
 implement c_str(), the implementation is free to optimize for the case whe=
n the string is *not* null terminated.<br></div></blockquote><div><br>This =
con was supposed to be a pro for string_ref. It's poorly worded. <br></div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_431_5673994.1358877090976--

.

Author: Olaf van der Spek <olafvdspek@gmail.com>
Date: Tue, 22 Jan 2013 18:57:53 +0100 Raw View

On Tue, Jan 22, 2013 at 6:48 PM,  <rick@longbowgames.com> wrote:
> On Tuesday, January 22, 2013 8:55:15 AM UTC-5, Olaf van der Spek wrote:
>>
>> Hmm, no short string optimization on x64?
>
>
> The short string optimization is still there, it just doesn't need the
> padding in order to be worthwhile, since 64-bit pointers are bigger, but
> chars remain the same size.

True, but why isn't it padded to 32 bytes too?

>> f(s)
>> vs
>> f(make_string_ref(s))
>>
>> That's not a subjective difference. It requires changes to calling code
>> to.
>
>
> Sure, that's not subjective when you only give one example, but with
> make_string_ref, the user is free to make an overload for, say,
> vector<char>. Then you have to compare these:
>
> f(string_ref(&v[0], v.size()))
> vs
> f(make_string_ref(v))

Nah, that should be just f(v) too (long-term). BTW, you probably
wanted to use v.data(), &v[0] is undefined when v is empty.
std::string is the common case though.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



.

Author: Nicol Bolas <jmckesson@gmail.com>
Date: Tue, 22 Jan 2013 11:22:38 -0800 (PST) Raw View

------=_Part_1007_29546604.1358882558056
Content-Type: text/plain; charset=ISO-8859-1



On Tuesday, January 22, 2013 4:13:03 AM UTC-8, ri...@longbowgames.com wrote:
>
> I hope this doesn't get posted three times. Google Groups is being fussy.
>
> On Tuesday, January 22, 2013 6:47:59 AM UTC-5, Olaf van der Spek wrote:
>
>> Isn't sizeof(std::string) == 32 or so in that implementation?
>>
>
> 24 in the latest version. Only one byte bigger than the ideal
> representation of string_ref. But that should be compared against the
> benefits of make_string_ref:
>
> * The user can use this optimization with *every* function that accepts a
> const string&, even if the author didn't realize it would be used that way,
> or if the code was originally written in 1998.
> * Simplifies the interface to a single function, rather than a mirrored
> class which is, in every other way, identical to a const string&.
> * No need to teach students why they should be taking strings by
> string_ref, when every other class in taken by const&.
>

You can tell them to take string_ref by `const&`. Also, you *shouldn't* be
teaching them to take everything by `const&`.

On Tuesday, January 22, 2013 6:34:26 AM UTC-5, Nicol Bolas wrote:

> That's over-complicating a class to no constructive purpose. You're
> effectively overloading the class to *sometimes* contain data and
> sometimes reference it.
>

If you taking a string by const string&, why would you case whether or not
it contains the data it's referencing? Do you care that current
implementations hop back and forth between the stack and the heap?

>
I care because std::string is a *container*; that's what the type does.
Your method is merging two completely different concepts (containment and
referencing) in one object. It's like proposing that std::reference_wrapper
should be able to have value semantics by calling std::ref_value or
something.

Also, it's coming dangerously close to CoW semantics, which are not a good
thing.

Lastly, as others have pointed out, it's exceedingly ugly to use at the
call site. With std::string_ref, if I have some string class, I can give it
an `operator std::string_ref` member function and it will automatically be
used with string_ref functions without any special work done at the call
site. No ugly `make_string_ref` nonsense; just passing a parameter like
normal.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



------=_Part_1007_29546604.1358882558056
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<br><br>On Tuesday, January 22, 2013 4:13:03 AM UTC-8, ri...@longbowgames.c=
om wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: =
0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">I hope this doesn't g=
et posted three times. Google Groups is being fussy.<br><br>On Tuesday, Jan=
uary 22, 2013 6:47:59 AM UTC-5, Olaf van der Spek wrote:<br><blockquote cla=
ss=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc=
 solid;padding-left:1ex"><div>Isn't sizeof(std::string) =3D=3D 32 or so in =
that implementation?</div></blockquote><div><br>24 in the latest version. O=
nly one byte bigger than the ideal representation of string_ref. But that s=
hould be compared against the benefits of make_string_ref:<br><br>* The use=
r can use this optimization with *every* function that accepts a const stri=
ng&amp;, even if the author didn't realize it would be used that way, or if=
 the code was originally written in 1998.<br>* Simplifies the interface to =
a single function, rather than a mirrored class which is, in every other wa=
y, identical to a const string&amp;.<br>* No need to teach students why the=
y should be taking strings by string_ref, when every other class in taken b=
y const&amp;.<br></div></blockquote><div><br>You can tell them to take stri=
ng_ref by `const&amp;`. Also, you <i>shouldn't</i> be teaching them to take=
 everything by `const&amp;`.<br><br>On Tuesday, January 22, 2013 6:34:26 AM=
 UTC-5, Nicol Bolas wrote:<br><blockquote style=3D"margin:0px 0px 0px 0.8ex=
;border-left:1px solid rgb(204,204,204);padding-left:1ex" class=3D"gmail_qu=
ote">That's over-complicating a class to no constructive purpose. You're ef=
fectively overloading the class to <i>sometimes</i> contain data and someti=
mes reference it.<br></blockquote><div><br>If you taking a string by const =
string&amp;, why would you case whether or not it contains the data it's re=
ferencing? Do you care that current implementations hop back and forth betw=
een the stack and the heap?<br></div></div><blockquote class=3D"gmail_quote=
" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding=
-left: 1ex;"></blockquote><div><br>I care because std::string is a <i>conta=
iner</i>; that's what the type does. Your method is merging two completely =
different concepts (containment and referencing) in one object. It's like p=
roposing that std::reference_wrapper should be able to have value semantics=
 by calling std::ref_value or something.<br><br>Also, it's coming dangerous=
ly close to CoW semantics, which are not a good thing.<br><br>Lastly, as ot=
hers have pointed out, it's exceedingly ugly to use at the call site. With =
std::string_ref, if I have some string class, I can give it an `operator st=
d::string_ref` member function and it will automatically be used with strin=
g_ref functions without any special work done at the call site. No ugly `ma=
ke_string_ref` nonsense; just passing a parameter like normal.<br></div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_1007_29546604.1358882558056--

.

Author: Jeffrey Yasskin <jyasskin@googlers.com>
Date: Wed, 23 Jan 2013 00:47:15 -0800 Raw View

On Tue, Jan 22, 2013 at 12:27 AM,  <rick@longbowgames.com> wrote:
> So I was just perusing the string_ref paper, and please tell me if I'm
> wrong, but it seems to have some relatively severe limitations:
>
> The string must represented as a pointer/size pair.

Must be representable, yes. The proposal doesn't specify a precise
implementation.

> The memory must be contiguous.

Yes. Part of the goal here is to erase the precise type from a
reasonable group of existing types, without needing expensive virtual
calls. I've played with the notion of a "chunked_string_ref", which
could type-erase strings stored in ropes or deques by limiting the
virtual calls to when it runs past the end of a "chunk", but it's not
fleshed out into a proposal yet. It can't be done as an
implementation-independent library (unlike string_ref) unless
http://lafstern.org/matt/segmented.pdf is adopted.

> Each memory location must a discreet character.

There's no native support for multibyte character encodings, right.
Getting unicode right is a bigger task, and string_ref has been useful
in several codebases without that.

> string_refs of different character types are incompatible.
>
> This really only makes it useful std::string, or clones of std::string,
> which strikes me as odd, since the intent seems to be to create something
> that can be used universally.

Clones of std::string are quite common, as are substrings of
std::string. They're worth supporting.

> Now, non-contiguous memory may be a somewhat
> uncommon case, but what's not uncommon is UTF-8, which has been the dominant
> character encoding (at least on the web, where we can measure it) for at
> least three years, and this proposal seems to be completely incompatible
> with it.

We use string_ref to hold UTF-8 data quite well. You have to use UTF-8
algorithms in some cases, but a lot of string_ref's (and
std::string's) methods continue to work.

We have been thinking of proposing a unicode-specific class, but we're
not sure that yet another representation for strings is a good idea.
However, there's a general need for a rope/cord class, so it might
make sense to piggyback unicode support onto that class. Random access
to Unicode code points (not units) seems like a good idea to prevent a
few kinds of bugs, and we'd want to borrow Python 3's technique of
storing the code points in an array whose unit size depends on the
maximum code point. This interacts really well with the rope/cord data
structure, since each chunk can have a different array unit. I've been
calling the combination a "unicord". ;)

Again, this is way more complex than string_ref, and I don't think
there's any reason to delay string_ref's benefit until a full solution
for unicode presents itself.

> I think we need to ask two questions:
>
> If this is only intended to make it easier to work with std::string clones,
> we should take a look at why std::string clones are being used at all. Would
> the need for a class like this be reduced by adding more functionality to
> std::string?

Adding more functionality to std::string is likely a bad idea and
wouldn't help anyway. Often the "clones" exist because they predate
std::string being widely implemented and are used in external
interfaces.

> What would it take to make a truly universal string ref class?
>
> I can think of a couple solutions to a universal string ref class, but they
> definitely come with a cost. Both solutions require that the string ref be
> designed around iterators/ranges rather than C strings, and therefore
> eliminating any index-based operations (notably, find*).

Using a range of iterators is certainly one way to capture a universal
notion of a sequence of characters, but it requires all functions to
be templates, which isn't always what we want.

> One solution would be to define a virtual base class, then have a generic
> derived class that handles the actual iteration. This could all be hidden
> inside a string_ref class, but requires virtual lookups for each operation,
> and a bit of code bloat.

A virtual call per character is too expensive.

HTH,
Jeffrey

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.

.

Author: rick@longbowgames.com
Date: Wed, 23 Jan 2013 05:13:19 -0800 (PST) Raw View

------=_Part_726_4132236.1358946799911
Content-Type: text/plain; charset=ISO-8859-1

Thank you for the reply, Jeffrey.

In my original post, I was initially thrown off by the overview in the
paper, which didn't seem to line up with what the paper was actually
delivering. As I understand it now, the motivation of the paper isn't
really for cases where you "don't care" what kind of string you're
receiving; the motivation is simply "I wish I could use a C string as
though it was a const string& without paying for a copy."

That's definitely a real problem, so even though I originally misunderstood
what you were trying to solve, I appreciate all the work you've put into
the proposal. I only have a few concerns:

* Does this make C++ harder to teach? As a general rule of thumb, we teach
that you should pass large objects by const&. Now we will have to teach
that strings are an exception to that rule.
* This would clearly be more efficient when working with C strings, but
does it make std::strings less efficient in the general case? When passing
a const string&, you're copying one word, but when passing a string_ref,
you're copying two.
* This will make it more efficient to work with C strings, but won't it
cause the opposite problem? What if I have a string_ref, and I need to call
a function that only accepts const string&?
* Instead of making a std::string-*like* interface for working with C
strings, would it be simpler to just add that optimization to std::string?
It sounds crazy, but over the past 24 hours I've looked at a few
implementations of std::string, and in each one it would be trivial to
permit this, so long as you stick to const functions. If you had a factory
that took a C string and returned a const string, that would allow you to
use the std::string interface with C strings without paying for a copy, and
wouldn't complicate strings for people who don't deal with C strings. It
would look something like this:

const string make_string_ref(const char*, size_type);

I don't want to pretend that implementing this "make_string_ref" factory is
panacea; it has its own downsides:
* The consumer of the library has to ask for the optimization, rather than
the producer. On the flip side, the consumer *can* ask for the
optimization, even when the producer didn't foresee the need.
* Constructing and passing a const string& in this way has about twice the
cost of constructing and passing a string_ref. Of course, it's still much
better than the status quo, where we're copying entire strings.
* The implementation would probably be forced to make a copy if the C
string it received was not null-terminated, otherwise c_str() would not
work as expected. On the plus side, c_str() would be available.
* An ideal solution would probably require branches in
basic_string::~basic_string and basic_string::capacity.
* Implementations would be permitted to ignore the optimization, so there's
no guarantee that your code is optimal.

On Wednesday, January 23, 2013 3:47:15 AM UTC-5, Jeffrey Yasskin wrote:

> Yes. Part of the goal here is to erase the precise type from a
> reasonable group of existing types, without needing expensive virtual
> calls. I've played with the notion of a "chunked_string_ref", which
> could type-erase strings stored in ropes or deques by limiting the
> virtual calls to when it runs past the end of a "chunk", but it's not
> fleshed out into a proposal yet. It can't be done as an
> implementation-independent library (unlike string_ref) unless
> http://lafstern.org/matt/segmented.pdf is adopted.
>

That was an interesting proposal. I wonder what it would look like it if
was proposed with today's C++.

I think the most generic solution to this problem -- ignoring performance
for now -- would look a lot more like boost::any_range, and admittedly when
I started reading your proposal I got the impression you were going to
propose something like that, but with some niceties sprinkled in for string
usage. Obviously that's not what you were going for, but it's where my
original confusion came from.

I guess for the cost of a handful of extra branches and a variable to hold
all the type erasure stuff, the concept of string_ref and
chunked_string_ref could be merged, while only paying the full price for
virtual function calls if you need them. It seems like a chunked_string_ref
would be extraordinarily niche if it wasn't compatible with string_ref.


> There's no native support for multibyte character encodings, right.
> Getting unicode right is a bigger task, and string_ref has been useful
> in several codebases without that.
>

I don't doubt that, and I'm glad it's already been added to boost.

We have been thinking of proposing a unicode-specific class, but we're
> not sure that yet another representation for strings is a good idea.
>

I see we share a common concern :)

My worst nightmare would be a language that includes:

* C strings
* basic_string + typedefs
* basic_string_ref + typedefs
* basic_utf_string + typedefs
* basic_utf_string_ref + typedefs
* basic_chunked_string + typedefs
* basic_chunked_string_ref + typedefs
* unicord_string
* unicord_string_ref

Unfortunately, I believe another string representation will be unavoidable
in order to safely work with Unicode, and that's why, from my perspective,
*your* proposal looks like the "yet another".


> However, there's a general need for a rope/cord class, so it might
> make sense to piggyback unicode support onto that class. Random access
> to Unicode code points (not units) seems like a good idea to prevent a
> few kinds of bugs, and we'd want to borrow Python 3's technique of
> storing the code points in an array whose unit size depends on the
> maximum code point. This interacts really well with the rope/cord data
> structure, since each chunk can have a different array unit. I've been
> calling the combination a "unicord". ;)
>

That's an interesting idea. It's unfortunate that it would require
conversion from UTF, but that might not be the end of the world, since most
UTF data is going to come from streams, and streams are often read one char
at a time anyway. I would probably call it something like "text", but I'm
clearly less creative :)


> Adding more functionality to std::string is likely a bad idea and
> wouldn't help anyway. Often the "clones" exist because they predate
> std::string being widely implemented and are used in external
> interfaces.
>

CString would be an example where people only use it because of legacy, but
I don't think that's the only reason these classes exist. I think a lot of
people legitimately appreciate the rich interface of QString. And while it
might not make sense to add a bunch more member functions, we could
definitely use some more free functions that are designed for use with
std::string. The std::split proposal in this mailing is one example, and
generic versions of std::string's find* functions would be another.

I've also experienced my own reason to make a std::string clone: when we
switched to use UTF-8, every call to string::iterator::operator++ and
string::iterator::operator* became unsafe, as well as anything that
required random access. In our code base, random access into strings is
very rare, so it was simpler and safer to switch to a string class that
supports UTF-8.

You can probably see where some of my bias comes from :)

> What would it take to make a truly universal string ref class?
> >
> > I can think of a couple solutions to a universal string ref class, but
> they
> > definitely come with a cost. Both solutions require that the string ref
> be
> > designed around iterators/ranges rather than C strings, and therefore
> > eliminating any index-based operations (notably, find*).
>
> Using a range of iterators is certainly one way to capture a universal
> notion of a sequence of characters, but it requires all functions to
> be templates, which isn't always what we want.
>

What I meant to say was that a type-erased string would require the
elimination of random-access functions if you wanted to safely support
multibyte encodings.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



------=_Part_726_4132236.1358946799911
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Thank you for the reply, Jeffrey.<br><br>In my original post, I was initial=
ly thrown off by the overview in the paper, which didn't seem to line up wi=
th what the paper was actually delivering. As I understand it now, the moti=
vation of the paper isn't really for cases where you "don't care" what kind=
 of string you're receiving; the motivation is simply "I wish I could use a=
 C string as though it was a const string&amp; without paying for a copy."<=
br><br>That's definitely a real problem, so even though I originally misund=
erstood what you were trying to solve, I appreciate all the work you've put=
 into the proposal. I only have a few concerns:<br><br>* Does this make C++=
 harder to teach? As a general rule of thumb, we teach that you should pass=
 large objects by const&amp;. Now we will have to teach that strings are an=
 exception to that rule.<br>* This would clearly be more efficient when wor=
king with C strings, but does it make std::strings less efficient in the ge=
neral case? When passing a const string&amp;, you're copying one word, but =
when passing a string_ref, you're copying two.<br>* This will make it more =
efficient to work with C strings, but won't it cause the opposite problem? =
What if I have a string_ref, and I need to call a function that only accept=
s const string&amp;?<br>* Instead of making a std::string-*like* interface =
for working with C strings, would it be simpler to just add that optimizati=
on to std::string? It sounds crazy, but over the past 24 hours I've looked =
at a few implementations of std::string, and in each one it would be trivia=
l to permit this, so long as you stick to const functions. If you had a fac=
tory that took a C string and returned a const string, that would allow you=
 to use the std::string interface with C strings without paying for a copy,=
 and wouldn't complicate strings for people who don't deal with C strings. =
It would look something like this:<br><br>const string make_string_ref(cons=
t char*, size_type);<br><br>I don't want to pretend that implementing this =
"make_string_ref" factory is panacea; it has its own downsides:<br>* The co=
nsumer of the library has to ask for the optimization, rather than the prod=
ucer. On the flip side, the consumer *can* ask for the optimization, even w=
hen the producer didn't foresee the need.<br>* Constructing and passing a c=
onst string&amp; in this way has about twice the cost of constructing and p=
assing a string_ref. Of course, it's still much better than the status quo,=
 where we're copying entire strings.<br>* The implementation would probably=
 be forced to make a copy if the C string it received was not null-terminat=
ed, otherwise c_str() would not work as expected. On the plus side, c_str()=
 would be available.<br>* An ideal solution would probably require branches=
 in basic_string::~basic_string and basic_string::capacity.<br>* Implementa=
tions would be permitted to ignore the optimization, so there's no guarante=
e that your code is optimal.<br><br>On Wednesday, January 23, 2013 3:47:15 =
AM UTC-5, Jeffrey Yasskin wrote:<br><blockquote class=3D"gmail_quote" style=
=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: =
1ex;">Yes. Part of the goal here is to erase the precise type from a
<br>reasonable group of existing types, without needing expensive virtual
<br>calls. I've played with the notion of a "chunked_string_ref", which
<br>could type-erase strings stored in ropes or deques by limiting the
<br>virtual calls to when it runs past the end of a "chunk", but it's not
<br>fleshed out into a proposal yet. It can't be done as an
<br>implementation-independent library (unlike string_ref) unless
<br><a href=3D"http://lafstern.org/matt/segmented.pdf" target=3D"_blank">ht=
tp://lafstern.org/matt/<wbr>segmented.pdf</a> is adopted.
<br></blockquote><div><br>That was an interesting proposal. I wonder what i=
t would look like it if was proposed with today's C++.<br><br>I think the m=
ost generic solution to this problem -- ignoring performance for now -- wou=
ld look a lot more like boost::any_range, and admittedly when I started rea=
ding your proposal I got the impression you were going to propose something=
 like that, but with some niceties sprinkled in for string usage. Obviously=
 that's not what you were going for, but it's where my original confusion c=
ame from.<br><br>I guess for the cost of a handful of extra branches and a =
variable to hold all the type erasure stuff, the concept of string_ref and =
chunked_string_ref could be merged, while only paying the full price for vi=
rtual function calls if you need them. It seems like a chunked_string_ref w=
ould be extraordinarily niche if it wasn't compatible with string_ref.<br><=
/div><div>&nbsp;<br></div><blockquote class=3D"gmail_quote" style=3D"margin=
: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">Ther=
e's no native support for multibyte character encodings, right.
<br>Getting unicode right is a bigger task, and string_ref has been useful
<br>in several codebases without that.<br></blockquote><div><br>I don't dou=
bt that, and I'm glad it's already been added to boost. <br><br></div><bloc=
kquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-l=
eft: 1px #ccc solid;padding-left: 1ex;">We have been thinking of proposing =
a unicode-specific class, but we're
<br>not sure that yet another representation for strings is a good idea.
<br></blockquote><div><br>I see we share a common concern :)<br><br>My wors=
t nightmare would be a language that includes:<br><br>* C strings<br>* basi=
c_string + typedefs<br>* basic_string_ref + typedefs<br>* basic_utf_string =
+ typedefs<br>* basic_utf_string_ref + typedefs<br>* basic_chunked_string +=
 typedefs<br>* basic_chunked_string_ref + typedefs<br>* unicord_string<br>*=
 unicord_string_ref<br><br>Unfortunately, I believe another string represen=
tation will be unavoidable in order to safely work with Unicode, and that's=
 why, from my perspective, *your* proposal looks like the "yet another".<br=
>&nbsp;</div><blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-le=
ft: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">However, there's =
a general need for a rope/cord class, so it might
<br>make sense to piggyback unicode support onto that class. Random access
<br>to Unicode code points (not units) seems like a good idea to prevent a
<br>few kinds of bugs, and we'd want to borrow Python 3's technique of
<br>storing the code points in an array whose unit size depends on the
<br>maximum code point. This interacts really well with the rope/cord data
<br>structure, since each chunk can have a different array unit. I've been
<br>calling the combination a "unicord". ;)<br></blockquote><div><br>That's=
 an interesting idea. It's unfortunate that it would require conversion fro=
m UTF, but that might not be the end of the world, since most UTF data is g=
oing to come from streams, and streams are often read one char at a time an=
yway. I would probably call it something like "text", but I'm clearly less =
creative :)<br></div><div>&nbsp;<br></div><blockquote class=3D"gmail_quote"=
 style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-=
left: 1ex;">Adding more functionality to std::string is likely a bad idea a=
nd
<br>wouldn't help anyway. Often the "clones" exist because they predate
<br>std::string being widely implemented and are used in external
<br>interfaces.
<br></blockquote><div><br>CString would be an example where people only use=
 it because of legacy, but I don't think that's the only reason these class=
es exist. I think a lot of people legitimately appreciate the rich interfac=
e of QString. And while it might not make sense to add a bunch more member =
functions, we could definitely use some more free functions that are design=
ed for use with std::string. The std::split proposal in this mailing is one=
 example, and generic versions of std::string's find* functions would be an=
other.<br><br>I've also experienced my own reason to make a std::string clo=
ne: when we switched to use UTF-8, every call to string::iterator::operator=
++ and string::iterator::operator* became unsafe, as well as anything that =
required random access. In our code base, random access into strings is ver=
y rare, so it was simpler and safer to switch to a string class that suppor=
ts UTF-8.<br><br>You can probably see where some of my bias comes from :)<b=
r><br></div><blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-lef=
t: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">&gt; What would it=
 take to make a truly universal string ref class?
<br>&gt;
<br>&gt; I can think of a couple solutions to a universal string ref class,=
 but they
<br>&gt; definitely come with a cost. Both solutions require that the strin=
g ref be
<br>&gt; designed around iterators/ranges rather than C strings, and theref=
ore
<br>&gt; eliminating any index-based operations (notably, find*).
<br>
<br>Using a range of iterators is certainly one way to capture a universal
<br>notion of a sequence of characters, but it requires all functions to
<br>be templates, which isn't always what we want.
<br></blockquote><div>&nbsp;<br>What I meant to say was that a type-erased =
string would require the elimination of random-access functions if you want=
ed to safely support multibyte encodings.<br></div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_726_4132236.1358946799911--

.

Author: Nevin Liber <nevin@eviloverlord.com>
Date: Wed, 23 Jan 2013 09:28:42 -0600 Raw View

--f46d043893bddd03fe04d3f65b7a
Content-Type: text/plain; charset=ISO-8859-1

On 23 January 2013 07:13, <rick@longbowgames.com> wrote:

> * Does this make C++ harder to teach? As a general rule of thumb, we teach
> that you should pass large objects by const&. Now we will have to teach
> that strings are an exception to that rule.
>

string_ref != string, just as iterator != container.  If it manages the
resource, it is probably expensive to copy.  If it is a view on the
resource, it is probably cheap.

> * This would clearly be more efficient when working with C strings, but
> does it make std::strings less efficient in the general case? When passing
> a const string&, you're copying one word, but when passing a string_ref,
> you're copying two.
>

You could pass a string_ref by const reference, if you are really that
concerned.

> * This will make it more efficient to work with C strings, but won't it
> cause the opposite problem? What if I have a string_ref, and I need to call
> a function that only accepts const string&?
>

If you have to build another string, then it costs.  I don't see any way
around that.

> * Instead of making a std::string-*like* interface for working with C
> strings, would it be simpler to just add that optimization to std::string?
> It sounds crazy, but over the past 24 hours I've looked at a few
> implementations of std::string, and in each one it would be trivial to
> permit this, so long as you stick to const functions.

But that doesn't work in general.  Take the following:

void foo(std::string& s, const char* p)
{
    ++s[0];
}

std::string s = "Hello";
foo(s, s.c_str());

Copy on write strings are effectively not allowed in C++11 (although many
implementations have yet to catch up).  operator[] is not allowed to
invalidate references, pointers or iterators to elements of a string (n3485
21.4.1p6).

> --

 Nevin ":-)" Liber  <mailto:nevin@eviloverlord.com>  (847) 691-1404

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.

--f46d043893bddd03fe04d3f65b7a
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On 23 January 2013 07:13,  <span dir=3D"ltr">&lt;<a href=3D"mailto:rick@lon=
gbowgames.com" target=3D"_blank">rick@longbowgames.com</a>&gt;</span> wrote=
:<br><div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" style=3D"=
margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

* Does this make C++ harder to teach? As a general rule of thumb, we teach =
that you should pass large objects by const&amp;. Now we will have to teach=
 that strings are an exception to that rule.<br></blockquote><div><br>
</div>
<div>string_ref !=3D string, just as iterator !=3D container. =A0If it mana=
ges the resource, it is probably expensive to copy. =A0If it is a view on t=
he resource, it is probably cheap.</div><div>=A0</div><blockquote class=3D"=
gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-=
left:1ex">

* This would clearly be more efficient when working with C strings, but doe=
s it make std::strings less efficient in the general case? When passing a c=
onst string&amp;, you&#39;re copying one word, but when passing a string_re=
f, you&#39;re copying two.<br>

</blockquote><div><br></div><div>You could pass a string_ref by const refer=
ence, if you are really that concerned.</div><div>=A0</div><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pad=
ding-left:1ex">

* This will make it more efficient to work with C strings, but won&#39;t it=
 cause the opposite problem? What if I have a string_ref, and I need to cal=
l a function that only accepts const string&amp;?<br></blockquote><div>

<br></div><div>If you have to build another string, then it costs. =A0I don=
&#39;t see any way around that.</div><div>=A0</div><blockquote class=3D"gma=
il_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-lef=
t:1ex">

* Instead of making a std::string-*like* interface for working with C strin=
gs, would it be simpler to just add that optimization to std::string? It so=
unds crazy, but over the past 24 hours I&#39;ve looked at a few implementat=
ions of std::string, and in each one it would be trivial to permit this, so=
 long as you stick to const functions.</blockquote>

<div><br></div><div>But that doesn&#39;t work in general. =A0Take the follo=
wing:</div><div><br></div><div>void foo(std::string&amp; s, const char* p)<=
/div><div>{</div><div>=A0 =A0 ++s[0];</div><div>}</div><div><br></div><div>=
std::string s =3D &quot;Hello&quot;;</div>

<div>foo(s, s.c_str());</div><div><br></div><div>Copy on write strings are =
effectively not allowed in C++11 (although many implementations have yet to=
 catch up). =A0operator[] is not allowed to invalidate references, pointers=
 or iterators to elements of a string (n3485 21.4.1p6).</div>

<div>=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;=
border-left:1px #ccc solid;padding-left:1ex">--=A0</blockquote></div>=A0Nev=
in &quot;:-)&quot; Liber=A0 &lt;mailto:<a href=3D"mailto:nevin@eviloverlord=
..com" target=3D"_blank">nevin@eviloverlord.com</a>&gt;=A0 (847) 691-1404

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

--f46d043893bddd03fe04d3f65b7a--

.

Author: rick@longbowgames.com
Date: Wed, 23 Jan 2013 14:48:55 -0800 (PST) Raw View

------=_Part_390_11241593.1358981335045
Content-Type: text/plain; charset=ISO-8859-1

On Wednesday, January 23, 2013 10:28:42 AM UTC-5, Nevin ":-)" Liber wrote:
>
> string_ref != string, just as iterator != container.  If it manages the
> resource, it is probably expensive to copy.  If it is a view on the
> resource, it is probably cheap.
>

That doesn't answer questions like "Why can't I just pass a string by const
string&?" or "Why isn't there a vector_ref?" *I* know the answers to those
questions, but the answers would make students roll their eyes.

>
> You could pass a string_ref by const reference, if you are really that
> concerned.
>

Like this?

foo(const string_ref&)

bar()
{
  string s;
  foo(s);
}

Now you're copying three words, because you're implicitly constructing a
two-word string_ref *and* passing it by reference. It doesn't really make
sense to take string_ref by const&.

>
> If you have to build another string, then it costs.  I don't see any way
> around that.
>

The problem comes with code like this:

// Written by user A
foo(const string&);

// Written by user B
bar(string_ref r) { foo(r); }

// Written by user C
int main()
{
  string s = "Hello World";
  bar(s);
}

User B *could* have used const string& instead of string_ref, but he's
being diligent about using string_ref, because he doesn't want to limit his
users. Unfortunately, he has to interface with user A's library, which only
accepts const string&, forcing him to make a copy. User C is now paying for
a feature she doesn't need.

This is the exact same problem string_ref is trying to solve, but
string_ref's presence presents yet another vector for such a problem, and
now you don't even have to leave the bounds of the standard library to get
snagged on it.


>
>
>> * Instead of making a std::string-*like* interface for working with C
>> strings, would it be simpler to just add that optimization to std::string?
>> It sounds crazy, but over the past 24 hours I've looked at a few
>> implementations of std::string, and in each one it would be trivial to
>> permit this, so long as you stick to const functions.
>
>
> But that doesn't work in general.  Take the following:
>
> void foo(std::string& s, const char* p)
> {
>     ++s[0];
> }
>
> std::string s = "Hello";
> foo(s, s.c_str());
>
> Copy on write strings are effectively not allowed in C++11 (although many
> implementations have yet to catch up).  operator[] is not allowed to
> invalidate references, pointers or iterators to elements of a string (n3485
> 21.4.1p6).
>

I'm not talking about CoW. Here's some pseudo code to illustrate what I'm
talking about:

const string make_string_ref(const char* str)
{
  string s;
  s.__m_pointer = str;
  s.__m_size = strlen(str);
  s.__m_capacity = ~0;
  return s;
}

basic_string<>::~basic_string()
{
  if(__m_capacity != ~0)
    delete [] __m_pointer;
}

size_type basic_string<>::capacity() const
{
  return __m_capacity == ~0 ? __m_size : __m_capacity;
}

The rest of basic_string remains virtually unchanged. The string returned
by the function is immutable, and that's enforced by making it const in the
return type. It has nothing to do with CoW. In fact, there is no W!

Your example would look more like this:

void foo(const std::string& s) // This must be a *const* string& in order
to accept the result of make_string_ref.
{
    ++s[0]; // Compile-time error! Cannot modify const char&!
}

foo(make_string_ref("Hello"));

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



------=_Part_390_11241593.1358981335045
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Wednesday, January 23, 2013 10:28:42 AM UTC-5, Nevin ":-)" Liber wrote:<=
blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;bord=
er-left: 1px #ccc solid;padding-left: 1ex;"><div class=3D"gmail_quote"><div=
>string_ref !=3D string, just as iterator !=3D container. &nbsp;If it manag=
es the resource, it is probably expensive to copy. &nbsp;If it is a view on=
 the resource, it is probably cheap.</div></div></blockquote><div>&nbsp;</d=
iv><div>That doesn't answer questions like "Why can't I just pass a string =
by const string&amp;?" or "Why isn't there a vector_ref?" *I* know the answ=
ers to those questions, but the answers would make students roll their eyes=
..<br></div><blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left=
: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div class=3D"gmail=
_quote"><br><div>You could pass a string_ref by const reference, if you are=
 really that concerned.</div></div></blockquote><div><br>Like this?<br><br>=
foo(const string_ref&amp;)<br><br>bar()<br>{<br>&nbsp; string s;<br>&nbsp; =
foo(s);<br>} <br></div><div><br>Now you're copying three words, because you=
're implicitly constructing a two-word string_ref *and* passing it by refer=
ence. It doesn't really make sense to take string_ref by const&amp;.<br></d=
iv><blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;=
border-left: 1px #ccc solid;padding-left: 1ex;"><div class=3D"gmail_quote">=
<br><div>If you have to build another string, then it costs. &nbsp;I don't =
see any way around that.</div></div></blockquote><div><br>The problem comes=
 with code like this:<br><br>// Written by user A<br>foo(const string&amp;)=
;<br><br>// Written by user B<br>bar(string_ref r) { foo(r); }<br><br>// Wr=
itten by user C<br>int main()<br>{<br>&nbsp; string s =3D "Hello World";<br=
>&nbsp; bar(s);<br>}<br><br>User B *could* have used const string&amp; inst=
ead of string_ref, but he's being diligent about using string_ref, because =
he doesn't want to limit his users. Unfortunately, he has to interface with=
 user A's library, which only accepts const string&amp;, forcing him to mak=
e a copy. User C is now paying for a feature she doesn't need.<br><br>This =
is the exact same problem string_ref is trying to solve, but string_ref's p=
resence presents yet another vector for such a problem, and now you don't e=
ven have to leave the bounds of the standard library to get snagged on it.<=
br>&nbsp;</div><blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-=
left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div class=3D"g=
mail_quote"><div>&nbsp;</div><blockquote class=3D"gmail_quote" style=3D"mar=
gin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

* Instead of making a std::string-*like* interface for working with C strin=
gs, would it be simpler to just add that optimization to std::string? It so=
unds crazy, but over the past 24 hours I've looked at a few implementations=
 of std::string, and in each one it would be trivial to permit this, so lon=
g as you stick to const functions.</blockquote>

<div><br></div><div>But that doesn't work in general. &nbsp;Take the follow=
ing:</div><div><br></div><div>void foo(std::string&amp; s, const char* p)</=
div><div>{</div><div>&nbsp; &nbsp; ++s[0];</div><div>}</div><div><br></div>=
<div>std::string s =3D "Hello";</div>

<div>foo(s, s.c_str());</div><div><br></div><div>Copy on write strings are =
effectively not allowed in C++11 (although many implementations have yet to=
 catch up). &nbsp;operator[] is not allowed to invalidate references, point=
ers or iterators to elements of a string (n3485 21.4.1p6).</div>

</div></blockquote><div>&nbsp;<br>I'm not talking about CoW. Here's some ps=
eudo code to illustrate what I'm talking about:<br><br>const string make_st=
ring_ref(const char* str)<br>{<br>&nbsp; string s;<br>&nbsp; s.__m_pointer =
=3D str;<br>&nbsp; s.__m_size =3D strlen(str);<br>&nbsp; s.__m_capacity =3D=
 ~0;<br>&nbsp; return s;<br>}<br><br>basic_string&lt;&gt;::~basic_string()<=
br>{<br>&nbsp; if(__m_capacity !=3D ~0)<br>&nbsp;&nbsp;&nbsp; delete [] __m=
_pointer;<br>}<br><br>size_type basic_string&lt;&gt;::capacity() const<br>{=
<br>&nbsp; return __m_capacity =3D=3D ~0 ? __m_size : __m_capacity;<br>}<br=
><br>The rest of basic_string remains virtually unchanged. The string retur=
ned by the function is immutable, and that's enforced by making it const in=
 the return type. It has nothing to do with CoW. In fact, there is no W!<br=
><br>Your example would look more like this:<br><br><div>void foo(const std=
::string&amp; s) // This must be a *const* string&amp; in order to accept t=
he result of make_string_ref.<br></div><div>{</div><div>&nbsp; &nbsp; ++s[0=
]; // Compile-time error! Cannot modify const char&amp;!<br></div><div>}</d=
iv><div><br></div><div></div>

<div>foo(make_string_ref("Hello"));</div><br></div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_390_11241593.1358981335045--

.

Author: Nevin Liber <nevin@eviloverlord.com>
Date: Wed, 23 Jan 2013 17:16:54 -0600 Raw View

--f46d043893bd4e12ff04d3fce65f
Content-Type: text/plain; charset=ISO-8859-1

On 23 January 2013 16:48, <rick@longbowgames.com> wrote:

const string make_string_ref(const char* str)
> {
>   string s;
>   s.__m_pointer = str;
>   s.__m_size = strlen(str);
>   s.__m_capacity = ~0;
>   return s;
> }
>

The following trivially breaks the convention:

auto s = make_string_ref("Oops");

It also break assignability if it is a member variable.

-1 for useability (and teachability).

I don't want string_ref to be a string; way too costly unless I am really,
really careful (euphemism for really, really perfect).

If you are a sink for a std::string, either provide both interfaces or pay
an efficiency cost; I don't see that as unreasonable.
--
 Nevin ":-)" Liber  <mailto:nevin@eviloverlord.com>  (847) 691-1404

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



--f46d043893bd4e12ff04d3fce65f
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On 23 January 2013 16:48,  <span dir=3D"ltr">&lt;<a href=3D"mailto:rick@lon=
gbowgames.com" target=3D"_blank">rick@longbowgames.com</a>&gt;</span> wrote=
:<br><div class=3D"gmail_quote"><br><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div>const string make_string_ref(const char* str)<br>{<br>=A0 string s;<br=
>=A0 s.__m_pointer =3D str;<br>=A0 s.__m_size =3D strlen(str);<br>=A0 s.__m=
_capacity =3D ~0;<br>=A0 return s;<br>}<br></div></blockquote><div><br>The =
following trivially breaks the convention:<br>

<br>auto s =3D make_string_ref(&quot;Oops&quot;);<br><br>It also break assi=
gnability if it is a member variable.<br><br>-1 for useability (and teachab=
ility).<br><br>I don&#39;t want string_ref to be a string; way too costly u=
nless I am really, really careful (euphemism for really, really perfect).<b=
r>

<br>If you are a sink for a std::string, either provide both interfaces or =
pay an efficiency cost; I don&#39;t see that as unreasonable.<br clear=3D"a=
ll"></div></div>-- <br>=A0Nevin &quot;:-)&quot; Liber=A0 &lt;mailto:<a href=
=3D"mailto:nevin@eviloverlord.com" target=3D"_blank">nevin@eviloverlord.com=
</a>&gt;=A0 (847) 691-1404

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

--f46d043893bd4e12ff04d3fce65f--

.

Author: Nicol Bolas <jmckesson@gmail.com>
Date: Wed, 23 Jan 2013 16:30:53 -0800 (PST) Raw View

------=_Part_255_20510302.1358987453601
Content-Type: text/plain; charset=ISO-8859-1

On Wednesday, January 23, 2013 2:48:55 PM UTC-8, ri...@longbowgames.com
wrote:
>
> On Wednesday, January 23, 2013 10:28:42 AM UTC-5, Nevin ":-)" Liber wrote:
>>
>> string_ref != string, just as iterator != container.  If it manages the
>> resource, it is probably expensive to copy.  If it is a view on the
>> resource, it is probably cheap.
>>
>
> That doesn't answer questions like "Why can't I just pass a string by
> const string&?" or "Why isn't there a vector_ref?" *I* know the answers to
> those questions, but the answers would make students roll their eyes.
>

As far as I'm concerned, the answer to the second question is stupid; we
ought to have array_ref. And the first question is essentially irrelevant;
students at that stage need to learn things idiomatically. The reasons for
the idioms can be explained later.

For example, by making them write a fixed-length string class and giving it
an `operator string_ref` member, so that they can see that it can be passed
around just like std::string.

Idiomatically, you should be telling users to take a std::string *by value*if they want a copy (and then move it to where they want it), or to take a
std::string_ref *by value* if they want to reference a string owned by
someone else. The problem is your belief in taking these things by `const&`
to begin with.


>> You could pass a string_ref by const reference, if you are really that
>> concerned.
>>
>
> Like this?
>
> foo(const string_ref&)
>
> bar()
> {
>   string s;
>   foo(s);
> }
>
> Now you're copying three words, because you're implicitly constructing a
> two-word string_ref *and* passing it by reference. It doesn't really make
> sense to take string_ref by const&.
>

How much "sense" it makes in terms of data size is irrelevant; it works
with no real performance issues. So if you want to tell your students to
pass everything by `const&`, you can and their code will function.

If you have to build another string, then it costs.  I don't see any way
>> around that.
>>
>
> The problem comes with code like this:
>
> // Written by user A
> foo(const string&);
>
> // Written by user B
> bar(string_ref r) { foo(r); }
>
> // Written by user C
> int main()
> {
>   string s = "Hello World";
>   bar(s);
> }
>
> User B *could* have used const string& instead of string_ref, but he's
> being diligent about using string_ref, because he doesn't want to limit his
> users. Unfortunately, he has to interface with user A's library, which only
> accepts const string&, forcing him to make a copy. User C is now paying for
> a feature she doesn't need.
>
> This is the exact same problem string_ref is trying to solve, but
> string_ref's presence presents yet another vector for such a problem, and
> now you don't even have to leave the bounds of the standard library to get
> snagged on it.
>

That's no different from one person using FILE file handles and another
using iostreams, and you having to interoperate between them. These
problems already exist. You do what you can, where you can.

I appreciate that you want to try to make a `const std::string &` work like
`std::string_ref`. But that's simply not going to happen, and the methods
you'd have to use in order to do it (the constant use of `make_string_ref`
everywhere rather than a nice implicit conversion, not to mention
implementation issues) are unpleasant.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



------=_Part_255_20510302.1358987453601
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Wednesday, January 23, 2013 2:48:55 PM UTC-8, ri...@longbowgames.com wro=
te:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;=
border-left: 1px #ccc solid;padding-left: 1ex;">On Wednesday, January 23, 2=
013 10:28:42 AM UTC-5, Nevin ":-)" Liber wrote:<blockquote class=3D"gmail_q=
uote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid;paddin=
g-left:1ex"><div class=3D"gmail_quote"><div>string_ref !=3D string, just as=
 iterator !=3D container. &nbsp;If it manages the resource, it is probably =
expensive to copy. &nbsp;If it is a view on the resource, it is probably ch=
eap.</div></div></blockquote><div>&nbsp;</div><div>That doesn't answer ques=
tions like "Why can't I just pass a string by const string&amp;?" or "Why i=
sn't there a vector_ref?" *I* know the answers to those questions, but the =
answers would make students roll their eyes.<br></div></blockquote><div><br=
>As far as I'm concerned, the answer to the second question is stupid; we o=
ught to have array_ref. And the first question is essentially irrelevant; s=
tudents at that stage need to learn things idiomatically. The reasons for t=
he idioms can be explained later.<br><br>For example, by making them write =
a fixed-length string class and giving it an `operator string_ref` member, =
so that they can see that it can be passed around just like std::string.<br=
><br>Idiomatically, you should be telling users to take a std::string <i>by=
 value</i> if they want a copy (and then move it to where they want it), or=
 to take a std::string_ref <i>by value</i> if they want to reference a stri=
ng owned by someone else. The problem is your belief in taking these things=
 by `const&amp;` to begin with.<br><br></div><blockquote class=3D"gmail_quo=
te" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;paddi=
ng-left: 1ex;"><div></div><blockquote class=3D"gmail_quote" style=3D"margin=
:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-left:1ex"><div clas=
s=3D"gmail_quote"><br><div>You could pass a string_ref by const reference, =
if you are really that concerned.</div></div></blockquote><div><br>Like thi=
s?<br><br>foo(const string_ref&amp;)<br><br>bar()<br>{<br>&nbsp; string s;<=
br>&nbsp; foo(s);<br>} <br></div><div><br>Now you're copying three words, b=
ecause you're implicitly constructing a two-word string_ref *and* passing i=
t by reference. It doesn't really make sense to take string_ref by const&am=
p;.<br></div></blockquote><div><br>How much "sense" it makes in terms of da=
ta size is irrelevant; it works with no real performance issues. So if you =
want to tell your students to pass everything by `const&amp;`, you can and =
their code will function.<br><br></div><blockquote class=3D"gmail_quote" st=
yle=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-lef=
t: 1ex;"><div></div><blockquote class=3D"gmail_quote" style=3D"margin:0;mar=
gin-left:0.8ex;border-left:1px #ccc solid;padding-left:1ex"><div class=3D"g=
mail_quote"><div>If you have to build another string, then it costs. &nbsp;=
I don't see any way around that.</div></div></blockquote><div><br>The probl=
em comes with code like this:<br><br>// Written by user A<br>foo(const stri=
ng&amp;);<br><br>// Written by user B<br>bar(string_ref r) { foo(r); }<br><=
br>// Written by user C<br>int main()<br>{<br>&nbsp; string s =3D "Hello Wo=
rld";<br>&nbsp; bar(s);<br>}<br><br>User B *could* have used const string&a=
mp; instead of string_ref, but he's being diligent about using string_ref, =
because he doesn't want to limit his users. Unfortunately, he has to interf=
ace with user A's library, which only accepts const string&amp;, forcing hi=
m to make a copy. User C is now paying for a feature she doesn't need.<br><=
br>This is the exact same problem string_ref is trying to solve, but string=
_ref's presence presents yet another vector for such a problem, and now you=
 don't even have to leave the bounds of the standard library to get snagged=
 on it.<br></div></blockquote><div><br>That's no different from one person =
using FILE file handles and another using iostreams, and you having to inte=
roperate between them. These problems already exist. You do what you can, w=
here you can.<br><br>I appreciate that you want to try to make a `const std=
::string &amp;` work like `std::string_ref`. But that's simply not going to=
 happen, and the methods you'd have to use in order to do it (the constant =
use of `make_string_ref` everywhere rather than a nice implicit conversion,=
 not to mention implementation issues) are unpleasant.<br></div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_255_20510302.1358987453601--

.

Author: rick@longbowgames.com
Date: Wed, 23 Jan 2013 17:24:16 -0800 (PST) Raw View

------=_Part_680_989454.1358990656352
Content-Type: text/plain; charset=ISO-8859-1

On Wednesday, January 23, 2013 6:16:54 PM UTC-5, Nevin ":-)" Liber wrote:

> The following trivially breaks the convention:
>
> auto s = make_string_ref("Oops");
>

Doh, I forgot about disappearing consts.

Okay, another alternative would be to leave string_ref as it is, except add:

operator const basic_string<charT>() const

Implementations would be permitted to make this operation cheap, but making
a copy would also be a legal implementation.

That would have some important benefits:

* string_ref remains as elegant as ever.
* Users of string_ref are not negatively impacted when interfacing with
pre-C++14 code. In fact, passing a string_ref into a function which only
accepts const string& would become just as pretty as going in the other
direction.
* Users of string_ref have a cheap-if-possible solution for calls to c_str.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.

------=_Part_680_989454.1358990656352
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Wednesday, January 23, 2013 6:16:54 PM UTC-5, Nevin ":-)" Liber wrote:<b=
r><blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;b=
order-left: 1px #ccc solid;padding-left: 1ex;"><div class=3D"gmail_quote"><=
div>The following trivially breaks the convention:<br>

<br>auto s =3D make_string_ref("Oops");<br></div></div></blockquote><div><b=
r>Doh, I forgot about disappearing consts.<br></div><br>Okay, another alter=
native would be to leave string_ref as it is, except add:<br><br>operator c=
onst basic_string&lt;charT&gt;() const<br><br>Implementations would be perm=
itted to make this operation cheap, but making a copy would also be a legal=
 implementation.<br><br>That would have some important benefits:<br><br>* s=
tring_ref remains as elegant as ever.<br>* Users of string_ref are not nega=
tively impacted when interfacing with pre-C++14 code. In fact, passing a st=
ring_ref into a function which only accepts const string&amp; would become =
just as pretty as going in the other direction.<br>* Users of string_ref ha=
ve a cheap-if-possible solution for calls to c_str.<br>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_680_989454.1358990656352--

.

Author: Nicol Bolas <jmckesson@gmail.com>
Date: Wed, 23 Jan 2013 19:07:15 -0800 (PST) Raw View

------=_Part_120_10894548.1358996835176
Content-Type: text/plain; charset=ISO-8859-1



On Wednesday, January 23, 2013 5:24:16 PM UTC-8, ri...@longbowgames.com
wrote:
>
> On Wednesday, January 23, 2013 6:16:54 PM UTC-5, Nevin ":-)" Liber wrote:
>
>> The following trivially breaks the convention:
>>
>> auto s = make_string_ref("Oops");
>>
>
> Doh, I forgot about disappearing consts.
>
> Okay, another alternative would be to leave string_ref as it is, except
> add:
>
> operator const basic_string<charT>() const
>
> Implementations would be permitted to make this operation cheap, but
> making a copy would also be a legal implementation.
>
> That would have some important benefits:
>
> * string_ref remains as elegant as ever.
> * Users of string_ref are not negatively impacted when interfacing with
> pre-C++14 code. In fact, passing a string_ref into a function which only
> accepts const string& would become just as pretty as going in the other
> direction.
> * Users of string_ref have a cheap-if-possible solution for calls to c_str.
>

Thanks for mentioning `c_str`, because that's another reason why what you
want is not possible.

Consider the following string_ref code. Perfectly legal:

std::string_ref strref("Some Characters");
std::string_ref smaller = strref.substr(0, 4); //Shrink the string.

Now let's do what you suggest:

const std::string other = smaller;

std::string requires that `other.data()` be a null-terminated string. The
only way to implement this with your "constant string" idea is for
`other.data()` to perform a string copy. Which means that it will likely
have to allocate memory. But that's not allowed since
`std::basic_string::data` is defined as `noexcept`, and allocating memory
can throw exceptions.

Therefore, the allocation *must* have happened when the string was
constructed. So the copying must have happened then, not in `data()`. And
therefore, you gain nothing over regular string construction.

std::string is a container. You should not try to give a container
reference semantics. It dirties the interface and promotes CoW-style
shenanigans.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



------=_Part_120_10894548.1358996835176
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<br><br>On Wednesday, January 23, 2013 5:24:16 PM UTC-8, ri...@longbowgames=
..com wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left=
: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">On Wednesday, Janua=
ry 23, 2013 6:16:54 PM UTC-5, Nevin ":-)" Liber wrote:<br><blockquote class=
=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc s=
olid;padding-left:1ex"><div class=3D"gmail_quote"><div>The following trivia=
lly breaks the convention:<br>

<br>auto s =3D make_string_ref("Oops");<br></div></div></blockquote><div><b=
r>Doh, I forgot about disappearing consts.<br></div><br>Okay, another alter=
native would be to leave string_ref as it is, except add:<br><br>operator c=
onst basic_string&lt;charT&gt;() const<br><br>Implementations would be perm=
itted to make this operation cheap, but making a copy would also be a legal=
 implementation.<br><br>That would have some important benefits:<br><br>* s=
tring_ref remains as elegant as ever.<br>* Users of string_ref are not nega=
tively impacted when interfacing with pre-C++14 code. In fact, passing a st=
ring_ref into a function which only accepts const string&amp; would become =
just as pretty as going in the other direction.<br>* Users of string_ref ha=
ve a cheap-if-possible solution for calls to c_str.<br></blockquote><div><b=
r>Thanks for mentioning `c_str`, because that's another reason why what you=
 want is not possible.<br><br>Consider the following string_ref code. Perfe=
ctly legal:<br><br><div class=3D"prettyprint" style=3D"background-color: rg=
b(250, 250, 250); border-color: rgb(187, 187, 187); border-style: solid; bo=
rder-width: 1px; word-wrap: break-word;"><code class=3D"prettyprint"><div c=
lass=3D"subprettyprint"><span style=3D"color: #000;" class=3D"styled-by-pre=
ttify">std</span><span style=3D"color: #660;" class=3D"styled-by-prettify">=
::</span><span style=3D"color: #000;" class=3D"styled-by-prettify">string_r=
ef strref</span><span style=3D"color: #660;" class=3D"styled-by-prettify">(=
</span><span style=3D"color: #080;" class=3D"styled-by-prettify">"Some Char=
acters"</span><span style=3D"color: #660;" class=3D"styled-by-prettify">);<=
/span><span style=3D"color: #000;" class=3D"styled-by-prettify"><br>std</sp=
an><span style=3D"color: #660;" class=3D"styled-by-prettify">::</span><span=
 style=3D"color: #000;" class=3D"styled-by-prettify">string_ref smaller </s=
pan><span style=3D"color: #660;" class=3D"styled-by-prettify">=3D</span><sp=
an style=3D"color: #000;" class=3D"styled-by-prettify"> strref</span><span =
style=3D"color: #660;" class=3D"styled-by-prettify">.</span><span style=3D"=
color: #000;" class=3D"styled-by-prettify">substr</span><span style=3D"colo=
r: #660;" class=3D"styled-by-prettify">(</span><span style=3D"color: #066;"=
 class=3D"styled-by-prettify">0</span><span style=3D"color: #660;" class=3D=
"styled-by-prettify">,</span><span style=3D"color: #000;" class=3D"styled-b=
y-prettify"> </span><span style=3D"color: #066;" class=3D"styled-by-prettif=
y">4</span><span style=3D"color: #660;" class=3D"styled-by-prettify">);</sp=
an><span style=3D"color: #000;" class=3D"styled-by-prettify"> </span><span =
style=3D"color: #800;" class=3D"styled-by-prettify">//Shrink the string.</s=
pan><span style=3D"color: #000;" class=3D"styled-by-prettify"><br></span></=
div></code></div><br>Now let's do what you suggest:<br><br><div class=3D"pr=
ettyprint" style=3D"background-color: rgb(250, 250, 250); border-color: rgb=
(187, 187, 187); border-style: solid; border-width: 1px; word-wrap: break-w=
ord;"><code class=3D"prettyprint"><div class=3D"subprettyprint"><span style=
=3D"color: #008;" class=3D"styled-by-prettify">const</span><span style=3D"c=
olor: #000;" class=3D"styled-by-prettify"> std</span><span style=3D"color: =
#660;" class=3D"styled-by-prettify">::</span><span style=3D"color: #008;" c=
lass=3D"styled-by-prettify">string</span><span style=3D"color: #000;" class=
=3D"styled-by-prettify"> other </span><span style=3D"color: #660;" class=3D=
"styled-by-prettify">=3D</span><span style=3D"color: #000;" class=3D"styled=
-by-prettify"> smaller</span><span style=3D"color: #660;" class=3D"styled-b=
y-prettify">;</span><span style=3D"color: #000;" class=3D"styled-by-prettif=
y"><br></span></div></code></div><br>std::string requires that `other.data(=
)` be a null-terminated string. The only way to implement this with your "c=
onstant string" idea is for `other.data()` to perform a string copy. Which =
means that it will likely have to allocate memory. But that's not allowed s=
ince `std::basic_string::data` is defined as `noexcept`, and allocating mem=
ory can throw exceptions.<br><br>Therefore, the allocation <i>must</i> have=
 happened when the string was constructed. So the copying must have happene=
d then, not in `data()`. And therefore, you gain nothing over regular strin=
g construction.<br><br>std::string is a container. You should not try to gi=
ve a container reference semantics. It dirties the interface and promotes C=
oW-style shenanigans.<br></div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_120_10894548.1358996835176--

.

Author: rick@longbowgames.com
Date: Thu, 24 Jan 2013 00:42:34 -0800 (PST) Raw View

------=_Part_2735_5234364.1359016954197
Content-Type: text/plain; charset=ISO-8859-1

On Wednesday, January 23, 2013 10:07:15 PM UTC-5, Nicol Bolas wrote:
>
> and promotes CoW-style shenanigans.
>

Let me try to make this as clear as possible:

*I am not proposing CoW.*

If the conversion requires a copy, that *must* be done *at the time of
conversion*.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



------=_Part_2735_5234364.1359016954197
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Wednesday, January 23, 2013 10:07:15 PM UTC-5, Nicol Bolas wrote:<blockq=
uote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-lef=
t: 1px #ccc solid;padding-left: 1ex;"><div>and promotes CoW-style shenaniga=
ns.<br></div></blockquote><div><br>Let me try to make this as clear as poss=
ible:<br><br>*I am not proposing CoW.*<br><br>If the conversion requires a =
copy, that *must* be done *at the time of conversion*.<br></div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_2735_5234364.1359016954197--

.

Author: Nicol Bolas <jmckesson@gmail.com>
Date: Thu, 24 Jan 2013 01:23:35 -0800 (PST) Raw View

------=_Part_677_16761.1359019415804
Content-Type: text/plain; charset=ISO-8859-1

On Thursday, January 24, 2013 12:42:34 AM UTC-8, ri...@longbowgames.com
wrote:
>
> On Wednesday, January 23, 2013 10:07:15 PM UTC-5, Nicol Bolas wrote:
>>
>> and promotes CoW-style shenanigans.
>>
>
> Let me try to make this as clear as possible:
>
> *I am not proposing CoW.*
>
> If the conversion requires a copy, that *must* be done *at the time of
> conversion*.
>

Then under what circumstances will a copy *not* happen? As I pointed out,
`std::string::data` must return a null-terminated string. Assuming that
`make_string_ref` is the "time of conversion", if it isn't given a
null-terminated character range, then the constructed string will *have* to
copy it.

At which point, you've gained nothing at all.

Similarly, any attempt to get a substring from a `const std::string`, or to
remove elements from the end of such a string will be forced to copy the
character range.

Your way simply does not work much of the time. It *only* works when
dealing directly with a full null-terminated string. This makes it hard to
know when you're getting a copy and when you're getting a reference.

The `std::string_ref` way is very simple: `std::string` copies,
`std::string_ref` does not. `std::string` has value semantics;
`std::string_ref` has reference semantics. Always, every time, *no matter
what*. Your way is confusing and difficult to use.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To post to this group, send email to std-proposals@isocpp.org.
To unsubscribe from this group, send email to std-proposals+unsubscribe@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.

------=_Part_677_16761.1359019415804
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<br><br>On Thursday, January 24, 2013 12:42:34 AM UTC-8, ri...@longbowgames=
..com wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left=
: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">On Wednesday, Janua=
ry 23, 2013 10:07:15 PM UTC-5, Nicol Bolas wrote:<blockquote class=3D"gmail=
_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid;padd=
ing-left:1ex"><div>and promotes CoW-style shenanigans.<br></div></blockquot=
e><div><br>Let me try to make this as clear as possible:<br><br>*I am not p=
roposing CoW.*<br><br>If the conversion requires a copy, that *must* be don=
e *at the time of conversion*.<br></div></blockquote><div><br>Then under wh=
at circumstances will a copy <i>not</i> happen? As I pointed out, `std::str=
ing::data` must return a null-terminated string. Assuming that `make_string=
_ref` is the "time of conversion", if it isn't given a null-terminated char=
acter range, then the constructed string will <i>have</i> to copy it.<br><b=
r>At which point, you've gained nothing at all.<br><br>Similarly, any attem=
pt to get a substring from a `const std::string`, or to remove elements fro=
m the end of such a string will be forced to copy the character range.<br><=
br>Your way simply does not work much of the time. It <i>only</i> works whe=
n dealing directly with a full null-terminated string. This makes it hard t=
o know when you're getting a copy and when you're getting a reference.<br><=
br>The `std::string_ref` way is very simple: `std::string` copies, `std::st=
ring_ref` does not. `std::string` has value semantics; `std::string_ref` ha=
s reference semantics. Always, every time, <i>no matter what</i>. Your way =
is confusing and difficult to use.<br></div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
To unsubscribe from this group, send email to std-proposals+unsubscribe@iso=
cpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_677_16761.1359019415804--

.