Topic: Probable FAQ on realloc equivalent for C++


Author: dfoster@jarthur.claremont.edu (Derek R. Foster)
Date: Sun, 25 Oct 1992 18:48:24 GMT
Raw View
In article <1992Oct23.184332.6959@taumet.com> steve@taumet.com (Steve Clamage) writes:
>frank@Cookie.secapl.com (Frank Adams) writes:
>
>>In article <1992Oct21.153505.1353@taumet.com> steve@taumet.com (Steve Clamage) writes:
>>>faustus@ygdrasil.CS.Berkeley.EDU (Wayne A. Christopher) writes:
>>>>Why is there no equivalent to realloc() that goes
>>>>with "new" and "delete"?
>>>When you make heap array of objects
>>> T* p = new T [N];
>>>you have created N objects of type T.  A "re-new" operation must
>>>perform the following logical steps:
>>>    1. Create a new array of T objects.
>>>    2. Copy up to N objects of the old array to the new one.
>>>    3. Delete the old array.
>
>>It seems obvious enough to me.  Given the above, and an expression like:
>
>>       renew [NNEW] p
>
>>Steps 1 and 2 should be combined; allocate the storage for NNEW objects of
>>type T.  Initialize the first min(N,NNEW) of them using the copy constructor
>>from the old array.  Initialize the remaining ones using the default
>>constructor.
>
>>For step 3, just do the equivalent of:
>
>>       delete [] p;
>
>>Now use the pointer to the new array as the expression value.
>
>This is certainly plausible, and a "renew" operator like this could
>be added to C++ without, I think, doing violence to the language.
>
>Arrays of objects don't really make any sense in an OO paradigm.
>For one thing, an object has its own identity which is somewhat
>lost when it is just an anonymous member of an array.  You usually
>deal with a particular object, or with a pointer to a base class
>to take advantage of polymorphism.  This doesn't fit well with
>arrays of objects, since arrays are not first-class types in C++.
>(Of course you can take the address of the array element and pass
>it around, but this is mixing styles.  Why not just use pointers
>to begin with?)

Because pointers require an extra indirection every time the object
is accessed. Consider:

class ArrayType
{
  MyObject * array;
  int size;
  public:
    ArrayType() {size = 0; array = 0;}
    changesize(int newsize) {array = renew [size=newsize] array;}
    MyObject & operator [] (unsigned index)
    {
      assert(index < size); return array[index];
    }
}

If this were implemented with a MyObject** array instead, an extra
indirection would be required each time to access the object. If this
object is accessed frequently, this could involve a significant
overhead.

>I would first ask, "Why realloc?"
>
>There is an implicit assumption that realloc will be able to extend
>the array in place often enough to justify its existence.

This is not necessarily true. Realloc is also used to SHRINK arrays
when it is known that a significant portion of them is no longer
needed. For instance, suppose I want to write a program that reads
in an arbitrary number of integers (which I know will always be
less than 10000). Suppose also that a lot of memory will be needed
by a later stage of the program for, say, a bunch of linked lists
or other small data structures (lack of memory is not so much of a
problem on UNIX due to virtual memory, but in PC's, it is very
frequently a problem.)

A reasonably efficient way to write this would be:

void main(void)
{
  int * array = malloc(10000*sizeof(int));
  int anint, index=0;
  while (cin >> anint)
    array[index++] = anint;
  array = realloc(array, index * sizeof(int));
}

The only alternatives I see to the above method involve storing
the values in some temporary data structure (linked list, etc.)
and then copying them into the final array when the number of
elements is known. This is not very efficient.

Now, suppose that I was reading in MyObject's instead of int's. THAT's
why I want operator renew().

(Yes, I realize that it is possible to overload operator new so that
it calls malloc, etc. But this requires a lot of ICK to implement a
generic realloc(), possibly including: defining constructors with
placement syntax for each object, deriving all objects from a common
base class, etc. I don't see any reason why the compiler can't do this
for me.)

>how often that is the case in real programs.  For this to happen,
>all of these must be true:
>    1 Malloc grabs chunks of memory from the bottom (not the top) of
>      the virtual address space, doling out portions as requested.

    Please keep in mind that not all computers use virtual memory....

>    2 Malloc doles out portions from the lowest address of the chunk.
>    3 No allocation requests occur between the original malloc and
>      the realloc.
>    4 The space left in the big chunk that malloc manages is enough
>      to satisfy the realloc request, or the OS can extend the big
>      chunk in the same virtual address space.

    Or an intervening 'free' has cleared up the space following the
    original malloc, and that space is now available for reallocation
    by extension of the original malloc.

>When not all these criteria are met, realloc just allocates new space,
>copies the data, frees the original space.  I suspect this is most
>often what really happens.

Not on PC's, particularly when there are a lot of malloc's and
free's (in the case of, for example, dynamically allocated strings,
when the number of strings is not large, but the number of allocations/
deallocations is.)

>When a programmer uses the realloc paradigm in the hopes of somehow
>being space-efficient, I think the program will usually have a
>significant performance hit.

Where is the 'performance hit' associated with the example I showed
above (the array of integers)?

>Worse, every realloc takes longer than
>the one before, since bigger chunks are needed and more data needs
>to be copied.
>  It can also happen that one big allocation could have
>been satisfied, but there isn't enough memory to do the big
>allocation while holding onto the smaller piece already allocated.

This is just as true in the case of manual copying, which is what
you have suggested as an alternative to renew().

>There are other techniques which don't waste memory and don't suffer
>the realloc performance problem.  One technique is block allocation,
>with an array of pointers to each block.  This requires one extra
>indirection on data access through the block table.  In exchange, you
>never copy the existing data.

Doesn't it seem rather silly and/or wasteful, in my above example, to have
a block of pointers to int's? What if I was instead using complex
numbers? The overhead for the allocations might well be as much as for
the numbers themselves.

>Moving back to C++, it seems to me that "renew" is of very little
>value.  In particular, instead of just a block copy of data, the renew
>would require individual construction and destruction of every object
>in the array.

In the case of an object with no defined copy constructor and no
defined destructor, the renew could simply do a block move of memory,
in a quite efficient manner. This could be considered a compiler
optimization. Also, the entire purpose of using renew would be based
on the assumption that a significant amount of the time, copying of the
entire array would not be necessary, either because the array was being
shrunk or because it was likely to be able to grow.

Actually, I wouldn't mind a 'renew' that just did a bitwise move, and only
called constructors when extending the array, or destructors when
shortening it. I suppose there are some objects for which it is vitally
important that they know what absolute address they are stored at, and
that this address not change, but it seems to me that the vast majority of
objects don't need to know this (they reference the 'this' pointer), and
can therefore be bitwise-moved without any ill side effects. (of course
outside code which references those objects via pointer, etc. need to
know, but this is a problem no matter HOW one accomplishes the copying.)
In the case of those rare objects which DO need to know this, the solution
would simply be "don't use renew().. or overload it to call constructors/
destructors for copying."

>Steve Clamage, TauMetric Corp, steve@taumet.com
>Vice Chair, ANSI C++ Committee, X3J16

Derek Riippa Foster