Topic: Some comments about N1953 (Allocator version 2)


Author: "=?iso-8859-1?q?Ion_Gazta=F1aga?=" <igaztanaga@gmail.com>
Date: Fri, 14 Apr 2006 02:38:25 CST
Raw View
[Sorry if this is duplicated. I sent this two days ago but I haven't
received any notification, so I'm resending it]

Hi to all,

 I've been reading N1953 and I must say that I find it very interesting

and a good addition to improve C++ performance in general. I've found
some issues that I would like to point out.

COMMENT 1: SIMPLE POOLED ALLOCATORS
-----------------------------------

 Revising how I would implement these new functions in several known
STL compatible allocators, I find that these new functions can be quite

limiting when implementing pooled allocators.

Suppose we have a pooled allocator using simple segregated storage.
Common implementations take a big chunk of memory that is divided into
nodes that are inserted in a free list using the node memory to
construct the linked list pointer. This technique is used, for example
in Boost.Pool.

The problem is that the node does not have access to metadata because
we've created this allocation type to avoid all the overhead associated
with storing the allocation count on purpose.

Currently this allocator implements allocate(1) returning a node from
the free linked list of nodes and implements allocate(n) using the
underlying general purpose allocator (usually new[]).

Let's implement now any version 2 function. The allocator can't have
access to the size of the allocation using only the pointer to the node
in a trivial way. For example, Boost.Pool couldn't easily deduce the
size of an allocation with only the external pointer.

That would require storing the size of the allocation value next to the
node just like with array allocations or have auxiliary data to deduce
the size from the pointer. And that's the overhead we want to avoid
with this type of node allocation. This is similar to  "new" and
"new[]" where "new" does not need to store the size of the allocation
and because of that, can be more efficient than new[].

Let's suppose we choose to maintain node allocators as version 1
allocators. After all, it's a node allocator for node containers that
always request 1 node, so we don't need version 2 functions. But now
suppose a pseudo-node container, an unordered_map where we will have at

least two allocators, both constructed from MyAllocator<T>:

-> One for the node allocation MyAllocator<Node>
-> One for the bucket/index array MyAllocator<Bucket>

The first one will be used for node allocation whereas the second one
is
a vector (we could even implement it with std::vector). We would loose
all version 2 grow capabilities if we use a node allocator. Or we
should
avoid node allocator. But surely, it would be better to optimize node
allocation than array allocation, since after all, node allocation is
more frequent.

What can we do? I can think several options:

1) Simple segregated list allocators are only version 1 allocators.
unordered_map's bucket array can't benefit from expand in place.

2) Impose a restriction: You can't mix version 1 and version 2
functions. If you use allocate(n) you can't use expand, etc... This way

we can use "allocate(1)" for node allocation and "allocate(size_type
size_requested, size_type &received_size)" for array allocation in
unordered map. Mixing also old deallocate with current allocation
functions can be tricky if old allocate(1) uses a pool, because you
could use the new allocate(1, received_size) and the allocator can't
know when calling deallocate(p,  1) if the memory was allocated using
allocate(1) (using the node pool) or using allocate(1, size_received).
This approach is quite dangerous.

3) We create two new one object allocation functions "allocate_one()"
"deallocate_one()" to mimick "new"/"new[]" aproach. deallocate(p, n)
would be used to deallocate old and new allocations but the node
pooling would be only activated with "allocate_one" "deallocate_one"
functions and old allocate(1) function would be exactly the same as new
allocate(1, received_size) ignoring the received_size. The user must
take in care that the memory allocated with "allocate_one" must be only
deallocated with "deallocate_one". Just like new/delete and
new[]/delete[].

I prefer option 3. Otherwise option 1.

COMMENT 2: Variadic templates to the rescue
-------------------------------------------

I think a version 2 allocator is a good place to implement a new
"construct()" function taking variadic template arguments to implement
in-place construction (this is a good example that shows that variadic
templates are not only for meta-programming).

This would put STL allocators as fully workable memory management
classes. Current copy constructor "construct(const value_type &val) is
very inefficient and limiting. If move semantics are accepted,
template<class U> construct (U &&val) is a better approach, but not
optimal. For example, for strings with small string optimization, move
constructor has no speed advantage. However, we could construct a
std::string directly from const char * in a container. So variadic
construct would allow zero overhead insertions in containers.

A variadic "construct" function can also help constructing nodes for
node containers, where the node can be constructed directly from the
value type and the pointers. Currently, some STL implementations
maintain several allocators and they construct first the value part of
the node with allocator<T> and after that, they construct the pointers
of the node with allocator<node_pointer> (because they don't suppose
node pointers are raw pointers, but allocator::pointer derived
pointers). With variadic "construct" node containers just need an
allocator of nodes and the node can be directly constructed. With
unordered_map, we would need 2 allocators even with complex
allocator::pointer typedefs, one for bucket allocation and one for node
allocation.

COMMENT 3: "request" function
-----------------------------

If I understand this function correctly is the same as "allocate(size,
received_size)" but on failure it returns a size for a future
allocation
that *maybe* will succeed in the future. Howard suggest that the user
could compare it with a minimum allocation size and request that
suggested size.

I don't see the utility of this if I need two steps: first to request
the size I want, then compare it to my minimum  size and if it's
correct, request it again. In multithreaded applications the new
allocation can fail. I would propose merging both "allocate(size,
received_size)" and "request" to a single atomic function:

allocate_at_least(size_type min_size, size_type preferred_size,
size_type& size_received);

"try to get preferred_size objects and if it's nos possible, return at
least min_size objects, otherwise, throw bad_alloc".

So we only need a single heap traversal/lookup and if we don't find
enough memory for the preferred_size, we return the biggest size found
if it's bigger than min_size.

I know that there is no proposed C function to implement this but I
think that this approach is much more efficient than the "request +
allocate" approach.

This is usually the steps we would do when inserting data into a
vector, size() is equal to capacity() and we want to insert N new
objects:

-> use

   expand(p, m_capacity+N, size_received)

to know if we can get more memory without moving old objects

-> if this fails, we would use

   allocate_at_least(m_capacity+N,
                     m_capacity*growth_factor,
                     size_received)

and move semantics to copy data. Otherwise copy construct from
source buffer to destination buffer.


COMMENT 4: "resize" function
----------------------------

Similar to "request" --> "allocate_at_least", I would make resizing an
atomic operation instead of a "try again with this parameter" style
function.

Since "expand" is already implemented this way, I suggest replacing
"resize" with a "shrink" function as a counterpart of expand:

bool shrink(pointer p, size_type max_size, size_type preferred_size,
size_type& size_received) throw()

Imagine that a vector that has 100000 objects. After erasing
duplicates,
you see that you only have 100 objects and you want to return some
memory to the system. You might call a new vector function called
unreserve:

   vector::unreserve(size_type new_capacity);

that calls (supposing new_capacity is smaller than capacity())

allocator::shrink(p, m_capacity, new_capacity, size_received);

where size_received would be between the old and the new capacity (or
equal to new capacity). We would free memory to the system without
copying/moving a single object of the vector. And this operation is
atomic/thread-safe. If it returns false, we can't do anything.

Again, there is no proposed C function to implement this, but shrink
would be quite easy to implement.

END OF COMMENTS
---------------

Otherwise I find the proposal very interesting as it tries to solve
some C++ memory management weaknesses. With move semantics and an
improved new "construct" function STL containers can get a huge
performance boost.

Regards,

Ion

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: howard.hinnant@gmail.com (Howard Hinnant)
Date: Fri, 14 Apr 2006 16:14:33 GMT
Raw View
In article <1144882676.683725.214070@i39g2000cwa.googlegroups.com>,
 "Ion Gazta=F1aga" <igaztanaga@gmail.com> wrote:

>  I've been reading N1953 and I must say that I find it very interesting
>=20
> and a good addition to improve C++ performance in general. I've found
> some issues that I would like to point out.

<snip>

I'm just wanting to go on record with a general support of Ion's=20
comments.  He raises some good points and offers reasonable solutions.

The current LWG support for this proposal is luke warm.  Ditto on the C=20
committee (for the C level parts - and I think C support is critical). =20
Luke warm support isn't generally sufficient to get a new library=20
interface through two committees.

-Howard

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]