Thread

Topic: Status of standard string class?

Author: Pete Becker <pete@borland.com>
Date: Tue, 29 Nov 1994 18:46:27 GMT Raw View

>
> In my last posting (<avib@datasrv.co.il> 16 Nov 1994) message, I had asked
> Borland's Pete Becker why the following program (which exercises the ANSI
> 'string' class) fails after somewhere between 4000 and 5000 iterations:
>
> > #include <cstring.h>
> > #include <iostream.h>
> > #include <stdlib.h>
> > #include <limits.h>
> >
> > // EXAMPLE 1 - allocate identical strings of length 1
> > void main()
> > {
> >     int i;
> >     string * p;
> >
> >     for (i = 0; i < SHRT_MAX; ++i)
> >     {
> >  p = new string("X");
> >  if (i != 0 && i%1000 == 0) cout << i << endl;
> >  // delete p; -- when uncommented, program will NOT fail
> >     }
> > }
>
> Pete responded (<pete@borland.com> 16 Nov 1994) as follows:
>
> >   I haven't tried either of them, but I assume that they "fail"
> > for the same reason as the following program:
> >
> >  int main()
> >  {
> >  for(;;)
> >   char *p = new char[10];
> >  return 0;
> >  }
> >
> > Sooner or later if you keep allocating objects and not deleting them
> > you run out of memory.
> >  It seems like there's an underlying assumption in example 1
> > that strings that are initialized with the same text will share
> > memory. That's not the case in our implementation, nor is it required
> > by the current working paper. On the other hand, our implementation
> > does use reference counting ...
>
> Pete's response is disingenuous. So is Brett Schuchert's. They both say
> "What do you expect will happen to memory after 4000+ allocations".
> They forgot to add "of strings of length 1."
>
> My string cracker program allocates at
> most 5000 (identical) strings of size 2 (length 1 + null terminator).
> Upon investigation of Borland's published 'string' and 'TStringRef'
> interface, the reason for the failure is that the minimum allocation
> for any string is about 80 bytes. Most of this space is taken up by
> the actual char buffer which is given an initial minimum capacity
> of 64 bytes. (Resize increments are also defined to be 64 bytes).
>
> Unfortunately, for applications that require lots of little strings
> (e.g. compilers, assemblers, debuggers, linkers, etc.), this
> implementation is unacceptably inefficient. In order to take advantage
> of the 'string' class methods, temporary 'string's must be constructed
> from a 'char *', operated upon, and saved again as 'char *' before the
> temporary 'string's are destructed.

 You can control the default size of the character buffer with
the function inital_capacity().
 -- Pete

Author: avib@zeus.datasrv.co.il (Avraham Bernstein - Pitkha Systems)
Date: 23 Nov 1994 11:42:23 GMT Raw View

In my last posting (<avib@datasrv.co.il> 16 Nov 1994) message, I had asked
Borland's Pete Becker why the following program (which exercises the ANSI
'string' class) fails after somewhere between 4000 and 5000 iterations:

> #include <cstring.h>
> #include <iostream.h>
> #include <stdlib.h>
> #include <limits.h>
>
> // EXAMPLE 1 - allocate identical strings of length 1
> void main()
> {
>     int i;
>     string * p;
>
>     for (i = 0; i < SHRT_MAX; ++i)
>     {
>  p = new string("X");
>  if (i != 0 && i%1000 == 0) cout << i << endl;
>  // delete p; -- when uncommented, program will NOT fail
>     }
> }

Pete responded (<pete@borland.com> 16 Nov 1994) as follows:

>   I haven't tried either of them, but I assume that they "fail"
> for the same reason as the following program:
>
>  int main()
>  {
>  for(;;)
>   char *p = new char   10   ;
>  return 0;
>  }
>
> Sooner or later if you keep allocating objects and not deleting them
> you run out of memory.
>  It seems like there's an underlying assumption in example 1
> that strings that are initialized with the same text will share
> memory. That's not the case in our implementation, nor is it required
> by the current working paper. On the other hand, our implementation
> does use reference counting ...

Pete's response is disingenuous. So is Brett Schuchert's. They both say
"What do you expect will happen to memory after 4000+ allocations".
They forgot to add "of strings of length 1."

My string cracker program allocates at
most 5000 (identical) strings of size 2 (length 1 + null terminator).
Upon investigation of Borland's published 'string' and 'TStringRef'
interface, the reason for the failure is that the minimum allocation
for any string is about 80 bytes. Most of this space is taken up by
the actual char buffer which is given an initial minimum capacity
of 64 bytes. (Resize increments are also defined to be 64 bytes).

Unfortunately, for applications that require lots of little strings
(e.g. compilers, assemblers, debuggers, linkers, etc.), this
implementation is unacceptably inefficient. In order to take advantage
of the 'string' class methods, temporary 'string's must be constructed
from a 'char *', operated upon, and saved again as 'char *' before the
temporary 'string's are destructed.
--
Avraham (Abe) Bernstein, Technical Director, Pitkha Info Systems, Jerusalem.
Finger <avib@datasrv.co.il>. Voice +972 (2) 513-639. Fax +972 (2) 514-310.

Author: avib@zeus.datasrv.co.il (Avraham Bernstein - Pitkha Systems)
Date: 7 Nov 1994 15:08:17 GMT Raw View

Accel Technologies (accel@crash.cts.com) wrote:
: At various times during the preceeding months, we've read about the
: inclusion of a string class into the C++ standard library, and have read
: varying accounts of its demise or its incorporation into other template
: classes.  Would someone please summarize the status of string handling
: in the C++ standard library?

I am not aware of the official ANSI status of the 'cstring' class, BUT
we have had some horrible experience using the Borland C++ v4.02
implementation, although the problems described below are really the
result of the interface specification rather than Borland's implementation.

The cstring class relies upon a reference counting
scheme, where the actual strings are stored in a hash table. It is very
important to note that the hash table implementation is not in any
way controllable by the public interface - wherein lies the problem.
Applications such as compilers and assemblers which typically construct and
destruct thousands of strings during a run, can break a textbook hash
table implemenation in the following 2 ways:

 - insertions can simply exceed the capacity of the hash table
 - deletions can also cause the table to become effectively full,
          and of course while the number of deletions grows, the
   efficiency of insertions and searches rapidly degrades

If I recall correctly, the size of Borland's hash table is (the prime
number) 219. We broke their cstring class with the following loop:

 - construct a unique cstring using a char*
 - compare the cstring to the char* used to construct it
 - destroy the cstring

After a number of iterations not much larger than 219, the comparison
failed.

For our own internal use, we are 'fixing' the cstring class by using our
own proprietary hash table algorithm which allows the table to
efficiently grow and which knows when deleted entries can be
zeroed; but ANSI must provide a more general solution, whereby the user
can optionally specify the hash table mechanism.

IMHO, currently, the cstring class is not industrial strength, and
allowing it to be embedded in other ANSI classes (such as the exception
classes) is a big mistake.

------------------------------------------------------------------------
Avraham (Abe) Bernstein; finger <avib@datasrv.co.il>
Pitkha Info Systems Ltd, Jerusalem ISRAEL; tel +972 (2) 513-639; fax 514-310
product: custom compilers, & s/w development environments for full custom CPUs

Author: jason@cygnus.com (Jason Merrill)
Date: Mon, 7 Nov 1994 17:58:32 GMT Raw View

>>>>> Avraham Bernstein <avib@zeus.datasrv.co.il> writes:

> I am not aware of the official ANSI status of the 'cstring' class, BUT
> we have had some horrible experience using the Borland C++ v4.02
> implementation, although the problems described below are really the
> result of the interface specification rather than Borland's implementation.

Why do you say that?

> The cstring class relies upon a reference counting scheme, where the
> actual strings are stored in a hash table.

Nowhere in the Working Paper is this specified.  My implementation of the
ANSI string classes uses a reference counting scheme, but not a hash table.
An earlier implementation used neither.

> It is very important to note that the hash table implementation is not in
> any way controllable by the public interface

Author: pete@genghis.interbase.borland.com (Pete Becker)
Date: Mon, 7 Nov 1994 21:34:56 GMT Raw View

In article <JASON.94Nov7095832@phydeaux.cygnus.com>,
Jason Merrill <jason@cygnus.com> wrote:
>>>>>> Avraham Bernstein <avib@zeus.datasrv.co.il> writes:
>
>> I am not aware of the official ANSI status of the 'cstring' class, BUT
>> we have had some horrible experience using the Borland C++ v4.02
>> implementation, although the problems described below are really the
>> result of the interface specification rather than Borland's implementation.
>
>Why do you say that?
>
>> The cstring class relies upon a reference counting scheme, where the
>> actual strings are stored in a hash table.
>
>Nowhere in the Working Paper is this specified.  My implementation of the
>ANSI string classes uses a reference counting scheme, but not a hash table.
>An earlier implementation used neither.
>

 Ours does not use a hash table, either. Despite the claim in the
original message. It does have a hash() member function that you can use if
you want to generate a hash value from a string.
 -- Pete

Author: pete@genghis.interbase.borland.com (Pete Becker)
Date: Mon, 7 Nov 1994 22:04:42 GMT Raw View

In article <39lft1$10n@axil.datasrv.co.il>,
Avraham Bernstein - Pitkha Systems <avib@zeus.datasrv.co.il> wrote:
>Accel Technologies (accel@crash.cts.com) wrote:
>: At various times during the preceeding months, we've read about the
>: inclusion of a string class into the C++ standard library, and have read
>: varying accounts of its demise or its incorporation into other template
>: classes.  Would someone please summarize the status of string handling
>: in the C++ standard library?
>
>I am not aware of the official ANSI status of the 'cstring' class, BUT
>we have had some horrible experience using the Borland C++ v4.02
>implementation, although the problems described below are really the
>result of the interface specification rather than Borland's implementation.
>
>The cstring class relies upon a reference counting
>scheme, where the actual strings are stored in a hash table.

 This is simply wrong. There is no hash table in the implementation of
our string class. There is a hash() member function which is available for
users who want a hash function, but this is not used internally.

 We do not provide a class named 'cstring', although the definition of
the string class is in a header named cstring.h.
 -- Pete

Author: accel@crash.cts.com (Accel Technologies)
Date: Tue, 1 Nov 1994 01:20:26 GMT Raw View

At various times during the preceeding months, we've read about the
inclusion of a string class into the C++ standard library, and have read
varying accounts of its demise or its incorporation into other template
classes.  Would someone please summarize the status of string handling
in the C++ standard library?

Thanks -

-- Andy

-------------------------------------------------------------------------
  Andy Gray                  Email: agray@acceltech.com
  Software Engineer          Voice: (619) 554-1000
  ACCEL Technologies, Inc.   Fax:   (619) 554-1019