Topic: Standard C++ String classes
Author: JdeBP@osmium.jba.co.uk (Jonathan de Boyne Pollard)
Date: 5 Aug 1994 12:23:25 GMT Raw View
>>NUL terminated, or not ? I've also been told that there will be no
>>such operator as standard and that there will be a char * c_str(void)
>>member instead.
>
> There will be no operator const char *. Too dangerous. There will be
>a member function that returns a const char * for a basic_string<char>, and
>a corresponding pointer type for other basic_string instantiations.
What's wrong with a
basic_string<T>::operator const T *
member ?
Talking of which, is a basic_string<T> going to be, in layman's terms, "a
variable-length array of T" ? e.g. is the length of such a string going to
be in units of sizeof(T) ?
In a supplementary question, this assumes that space will be allocated with
T::operator new[], so are we likely to see copy-on-write assignment in order
to save overhead ?
>>> .. And the reason is at least part that ISO REQUIRES
>>>that new languages NOT treat any character specially, in particular
>>>because some ISO Standard multibyte character sequences contain
>>>NUL (0) valued characters.
>>
>>So the standard string class will handle multibyte characters instead of
>>this being in an additional class ?
>
> The standard basic_string template can be instantiated for wide
>characters, for chars, and for more-or-less arbitrary user-defined types. It
>will not handle multibyte characters directly, since they do not have a
>fixed-size representation.
This is what was confusing about the original quote above. It
seemed to imply that C strings were being outlawed. This seemed a
little strange to me. I'd guessed that multibyte character strings
would be a separate class, more than likely not in the standard at
all, rather than being assimilated into basic_string<char>.
If anything, as you say, I'd expect multibyte sequences to have to
be converted to basic_string<wchar_t> before use. Which I bet will
be the subject of many flame wars to come ...
JdeBP
Author: maxtal@physics.su.OZ.AU (John Max Skaller)
Date: Sun, 7 Aug 1994 19:45:47 GMT Raw View
In article <31j66g$gik@silver.jba.co.uk> "Jonathan de Boyne Pollard" <@jba.co.uk:JdeBP@osmium> writes:
>
>>>1.2 If embedded NULs are allowed and the string length takes
>>> account of them, what are the semantics of
>>> string::operator const char * ?
>>
>> Returns a pointer to an array containing the string.
>
>NUL terminated, or not ? I've also been told that there will be no
>such operator as standard and that there will be a char * c_str(void)
>member instead.
You are correct, although the name got changed recently
to "data". It might get changed back. And, a nul is appended
to the bytes of the string so you can use it like an asciiz
string provided it has no embedded nuls.
>
>> .. And the reason is at least part that ISO REQUIRES
>>that new languages NOT treat any character specially, in particular
>>because some ISO Standard multibyte character sequences contain
>>NUL (0) valued characters.
>
>So the standard string class will handle multibyte characters instead of
>this being in an additional class ?
There will be a string of wchar_t as well.
Multibyte character handling will probably be done
by the iostream subsystem.
>
>Is the differentiation between ASCIIZ strings and multibyte strings in C++
>"going away" then ?
>
Internally, for international characters, use wchar_t strings.
I guess you can write your own handlers.
--
JOHN (MAX) SKALLER, INTERNET:maxtal@suphys.physics.su.oz.au
Maxtal Pty Ltd,
81A Glebe Point Rd, GLEBE Mem: SA IT/9/22,SC22/WG21
NSW 2037, AUSTRALIA Phone: 61-2-566-2189
Author: jaf3@ritz.cec.wustl.edu (John Andrew Fingerhut)
Date: 8 Aug 1994 14:52:46 -0500 Raw View
In article <rfgCtsq3t.FAx@netcom.com>,
Ronald F. Guilmette <rfg@netcom.com> wrote:
:In article <CtozHx.9JH@ucc.su.OZ.AU> maxtal@physics.su.OZ.AU (John Max Skaller) writes:
:>In article <31865v$sq8@silver.jba.co.uk> "Jonathan de Boyne Pollard" <@jba.co.uk:JdeBP@osmium> writes:
:>
:>>1.1 Do embedded NULs alter the length value of strings ?
:>
:> Ordinary character.
:>
:>>1.2 If embedded NULs are allowed and the string length takes
:>> account of them, what are the semantics of
:>> string::operator const char * ?
:>
:> Returns a pointer to an array containing the string.
:
:Congratulations on giving two non-answer answers in a row John.
:
I don't view these as non-answers....It looks like he is saying that the
C++ string class accepts embedded null characters, they are included in the
length, and that a pointer to the first character in the string is returned
from string::operator const char *. I don't see what your problem is with
these answers. (Unless he is wrong ;-)
:> In a string, all characters are equal.
:
:Perhaps this is true for C++ ``Strings'', but it sure as heck is NOT true
:for the strings pointed to by (char*) pointers.
:
In case you missed the subject line C++ ``Strings'' is what we are talking
about.
:--
:
:-- Ron Guilmette, Sunnyvale, CA ---------- RG Consulting -------------------
:---- domain addr: rfg@netcom.com ----------- Purveyors of Compiler Test ----
:---- uucp addr: ...!uunet!netcom!rfg ------- Suites and Bullet-Proof Shoes -
Author: JdeBP@osmium.jba.co.uk (Jonathan de Boyne Pollard)
Date: 1 Aug 1994 16:00:16 GMT Raw View
> You gather wrong. It is being completely replaced
>by a templated string class.
So I've been told. Good idea.
> MAKE NO ASSUMPTIONS ABOUT ANYTHING CLAIMING TO BE
>"TRACKING" THE STANDARDISATION EFFORT -- INCLUDING THE
>WORKING PAPER ITSELF. WAIT UNTIL THERE REALLY IS AN ISO STANDARD.
I knew that. That's why I asked in the first place.
>>1.2 If embedded NULs are allowed and the string length takes
>> account of them, what are the semantics of
>> string::operator const char * ?
>
> Returns a pointer to an array containing the string.
NUL terminated, or not ? I've also been told that there will be no
such operator as standard and that there will be a char * c_str(void)
member instead.
> Ordinary character.
> Ordinary character.
> No more than any other character.
> No, they behave like any other character.
I heard you the first time, thanks.
> .. And the reason is at least part that ISO REQUIRES
>that new languages NOT treat any character specially, in particular
>because some ISO Standard multibyte character sequences contain
>NUL (0) valued characters.
So the standard string class will handle multibyte characters instead of
this being in an additional class ?
Is the differentiation between ASCIIZ strings and multibyte strings in C++
"going away" then ?
JdeBP
Author: pete@genghis.interbase.borland.com (Pete Becker)
Date: Tue, 2 Aug 1994 15:50:34 GMT Raw View
In article <31j66g$gik@silver.jba.co.uk>,
Jonathan de Boyne Pollard <@jba.co.uk:JdeBP@osmium> wrote:
>> You gather wrong. It is being completely replaced
>>by a templated string class.
>
>So I've been told. Good idea.
Well, it's not being "completely replaced". There's a template that
does the same thing. Whatever you know about the string class applies equally
to the basic_string template.
>
>> MAKE NO ASSUMPTIONS ABOUT ANYTHING CLAIMING TO BE
>>"TRACKING" THE STANDARDISATION EFFORT -- INCLUDING THE
>>WORKING PAPER ITSELF. WAIT UNTIL THERE REALLY IS AN ISO STANDARD.
>
>I knew that. That's why I asked in the first place.
That's a bit extreme.
>
>>>1.2 If embedded NULs are allowed and the string length takes
>>> account of them, what are the semantics of
>>> string::operator const char * ?
>>
>> Returns a pointer to an array containing the string.
>
>NUL terminated, or not ? I've also been told that there will be no
>such operator as standard and that there will be a char * c_str(void)
>member instead.
There will be no operator const char *. Too dangerous. There will be
a member function that returns a const char * for a basic_string<char>, and
a corresponding pointer type for other basic_string instantiations.
>
>> .. And the reason is at least part that ISO REQUIRES
>>that new languages NOT treat any character specially, in particular
>>because some ISO Standard multibyte character sequences contain
>>NUL (0) valued characters.
>
>So the standard string class will handle multibyte characters instead of
>this being in an additional class ?
>
The standard basic_string template can be instantiated for wide
characters, for chars, and for more-or-less arbitrary user-defined types. It
will not handle multibyte characters directly, since they do not have a
fixed-size representation. Someone, somewhere, has to translate them. Usually
that job is left to the I/O system, as in the normative addendum to the ISO C
standard, and in the latest version of the iostreams library.
>Is the differentiation between ASCIIZ strings and multibyte strings in C++
>"going away" then ?
>
No. They're different data types, with an underlying similarity. That
underlying similarity makes it possible to use a template to create a family
of classes that can handle the different types.
-- Pete
Author: JdeBP@osmium.jba.co.uk (Jonathan de Boyne Pollard)
Date: 28 Jul 1994 11:52:31 GMT Raw View
I gather that the proposed string classes are as good as finalised.
If so, could someone clear up the semantics of strings for me ?
1. What are the semantics of embedded NULs in strings ?
1.1 Do embedded NULs alter the length value of strings ?
1.2 If embedded NULs are allowed and the string length takes
account of them, what are the semantics of
string::operator const char * ?
1.3 Do embedded NULs cause loss of contents during concatenation ?
1.4 Do embedded NULs cause loss of contents during substring extraction ?
2. What are the semantics of substrings ?
2.1 Are substrings lvalues ?
2.1.1 Can assigning to a substring change the length of a parent string ?
2.1.2 If not, what is used to pad the rvalue if it is shorter than the
lvalue in a substring assignment expression ?
2.2 What happens to a substring if its parent's value changes ?
2.3 What happens to a substring if its parent is destroyed ?
2.4 Are we going to get "BASIC-like" operators such as
substring string::Left(size_t),
substring string::Mid(size_t),
substring string::Mid(size_t, size_t), and
substring string::Right(size_t)
or is string::operator() going to be overloaded ?
I can see scope here for at least two (if not more) different string
classes, with varying behaviour. The first being a simple extension of
the strings in <string.h> that we have all come to know and love, and
the second behaving more like strings in other languages. Which one
has been/is being/will be chosen ?
BTW, I'm not after a "string war". I use both sorts of strings
myself as appropriate to the task at hand. I just want to know what
conforming C++ libraries are going to give me in the future, and
what I shall still have to code myself.
JdeBP
Author: maxtal@physics.su.OZ.AU (John Max Skaller)
Date: Fri, 29 Jul 1994 07:58:45 GMT Raw View
In article <31865v$sq8@silver.jba.co.uk> "Jonathan de Boyne Pollard" <@jba.co.uk:JdeBP@osmium> writes:
>I gather that the proposed string classes are as good as finalised.
You gather wrong. It is being completely replaced
by a templated string class.
MAKE NO ASSUMPTIONS ABOUT ANYTHING CLAIMING TO BE
"TRACKING" THE STANDARDISATION EFFORT -- INCLUDING THE
WORKING PAPER ITSELF. WAIT UNTIL THERE REALLY IS AN ISO STANDARD.
>
>If so, could someone clear up the semantics of strings for me ?
>
>1. What are the semantics of embedded NULs in strings ?
Ordinary character.
>1.1 Do embedded NULs alter the length value of strings ?
Ordinary character.
>1.2 If embedded NULs are allowed and the string length takes
> account of them, what are the semantics of
> string::operator const char * ?
Returns a pointer to an array containing the string.
>1.3 Do embedded NULs cause loss of contents during concatenation ?
No more than any other character.
>1.4 Do embedded NULs cause loss of contents during substring extraction ?
No, they behave like any other character.
.. And the reason is at least part that ISO REQUIRES
that new languages NOT treat any character specially, in particular
because some ISO Standard multibyte character sequences contain
NUL (0) valued characters.
In a string, all characters are equal.
--
JOHN (MAX) SKALLER, INTERNET:maxtal@suphys.physics.su.oz.au
Maxtal Pty Ltd,
81A Glebe Point Rd, GLEBE Mem: SA IT/9/22,SC22/WG21
NSW 2037, AUSTRALIA Phone: 61-2-566-2189
Author: jason@cygnus.com (Jason Merrill)
Date: Fri, 29 Jul 1994 08:21:03 GMT Raw View
>>>>> Jonathan de Boyne Pollard <JdeBP@osmium.jba.co.uk> writes:
> 1. What are the semantics of embedded NULs in strings ?
They have no special semantics.
> 1.2 If embedded NULs are allowed and the string length takes
> account of them, what are the semantics of
> string::operator const char * ?
There is no such operator. There is a member function c_str which tacks a
NUL onto the end of the array and returns it. It does not look inside the
array to see if there are any already there.
> 1.3 Do embedded NULs cause loss of contents during concatenation ?
No.
> 1.4 Do embedded NULs cause loss of contents during substring extraction?
No.
> 2. What are the semantics of substrings ?
There is no proper substring class; there is a substring method which
returns a new string copied from the old one. I imagine that a string
class which provides substring management will be a popular derivation.
Jason
Author: pete@genghis.interbase.borland.com (Pete Becker)
Date: Fri, 29 Jul 1994 17:03:16 GMT Raw View
In article <CtozHx.9JH@ucc.su.oz.au>,
John Max Skaller <maxtal@physics.su.OZ.AU> wrote:
>In article <31865v$sq8@silver.jba.co.uk> "Jonathan de Boyne Pollard" <@jba.co.uk:JdeBP@osmium> writes:
>>I gather that the proposed string classes are as good as finalised.
>
> You gather wrong. It is being completely replaced
>by a templated string class.
>
That's a bit too strong. The interface to basic_string<char> is the
same as the interface to the string class. All it takes is a typedef, and
everything that folks have learned about the string class remains true.
-- Pete
Author: rfg@netcom.com (Ronald F. Guilmette)
Date: Sun, 31 Jul 1994 08:26:16 GMT Raw View
In article <CtozHx.9JH@ucc.su.OZ.AU> maxtal@physics.su.OZ.AU (John Max Skaller) writes:
>In article <31865v$sq8@silver.jba.co.uk> "Jonathan de Boyne Pollard" <@jba.co.uk:JdeBP@osmium> writes:
>
>>1.1 Do embedded NULs alter the length value of strings ?
>
> Ordinary character.
>
>>1.2 If embedded NULs are allowed and the string length takes
>> account of them, what are the semantics of
>> string::operator const char * ?
>
> Returns a pointer to an array containing the string.
Congratulations on giving two non-answer answers in a row John.
> In a string, all characters are equal.
Perhaps this is true for C++ ``Strings'', but it sure as heck is NOT true
for the strings pointed to by (char*) pointers.
--
-- Ron Guilmette, Sunnyvale, CA ---------- RG Consulting -------------------
---- domain addr: rfg@netcom.com ----------- Purveyors of Compiler Test ----
---- uucp addr: ...!uunet!netcom!rfg ------- Suites and Bullet-Proof Shoes -