Thread

Topic: string_table - an allocator for read only strings

Author: fmatthew5876@gmail.com
Date: Sun, 26 Jan 2014 06:12:55 -0800 (PST) Raw View

------=_Part_988_26782659.1390745575357
Content-Type: text/plain; charset=UTF-8

Dealing with string data efficiently is a hard problem due to the variable
length nature of strings. The standard way to store string data is to use
the std::string class. This has a lot of problems in situations where you
want to store a lot of read only strings at runtime, especially if the
strings are small.
- Every string is a separate memory allocation. Each allocation requires
extra bytes to store allocator meta-data and also may waste bytes for
allocators which use strategies such as small block allocation. For large
numbers of small strings, this can add up.
- Lots of small variable length allocations can contribute to memory
fragmentation.
- If std::string uses the small string optimization, we avoid the memory
allocations but then we can also waste a lot of space for unused bytes in
the internal buffer.
- You still need to store the std::string objects in some other data
structure.
- Using std::strings with const char*/string_view/string literals in some
contexts require constructing temporary std::string objects, which in turn
will allocate memory and kill the performance of parsing routines.

I've "invented" a data structure to store large numbers of read only
strings efficiently. I say invented in quotes because while I came up with
this idea in isolation and used it for a few projects, I doubt it hasn't
been done before. This is an append only data structure. It has only 2
basic routines, store and clear. It efficiently stores variable length read
only string data.

class string_table {
public:
  //Initialize the table with the given page size, does not do any memory
allocations
  string_table(size_t page_size = /* some default*/) : _head(nullptr),
_page_size(page_size) {}

  string_table(const string_table&) = delete;
  string_table& operator=(const string_table&) = delete;

  //this now owns the memory pages owned by other. other is cleared.
  string_table(string_table&& other) noexcept : _head(other._head),
_page_size(other._page_size) { other._head = nullptr; }

  //clear this, take the memory pages owned by other, clear other.
  string_table& operator=(string_table&&) noexcept {
    if(this != &other) {
       clear();
       _head = other._head;
       _page_size = other._page_size;
       other._head = nullptr;
    }
    return *this;
  }

  ~string_table() { clear(); }

  //Make a copy of the string pointed to by s and store it in the table
with a null terminator.
  //Return a string view which points to the stored memory. The returned
string view is
  //guaranteed to be null terminated and will be valid for either the
lifetime or this or when clear() is called.
  string_view string_table::store(string_view s);

  //Frees all string data stored in the table. All pointers (string_view)
pointing to memory addresses allocated by the table are invalidated.
  void string_table::clear();

  //Return the currently configured page size
  size_t get_page_size() const { return _page_size; }

  //Change the page size, all currently allocated pages will be unaffected
but newly allocated pages
  //will take the new page size.
  void set_page_size(size_t ps) { _page_size  = ps; }

private:
  struct Page;
  Page* _head;
  size_t _page_size;
};

And thats it, this can be used in many different applications:
- Storing hash table keys/value
- Parsing large configuration files
- Symbol tables

Implementation:
The way this data structure works is by keeping a singly linked list of
memory pages. Each page is of size page_size bytes. Each stored string is
tightly packed into these pages to minimize wasted memory.

The pages are kept sorted in order by how many bytes are available in each
page. The page with the most available bytes is first while the page with
the least available bytes (likely the full pages) are kept last.

Store operation:
1) L = s.length() + 1.
2) Find the page P with the least available memory which has at least L
bytes. O(number of pages)
  - If such page does not exist, allocate a new page P of size max(L,
page_size), and store it at the head of the list.
3) Copy the bytes of s to the first L-1 bytes available in P. Then append a
null terminator. O(L)
4) Move the page P into its correctly sorted position. O(number of pages)
5) Return a string_view pointing to the stored string in P with length
initialized to L-1.

Clear operation:
1) Walk to the tail of the page list
2) Free each page from the tail to the head.

The trade off in memory usage is the overhead of lots of std::string
objects and separate memory allocations vs the overhead of the unused space
in the string table pages. In terms of flexibility, the string table allows
allows you the freedom represent string data in any way you like
(string_view, const char*, make a copy into a std::string).

What do you think?

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

------=_Part_988_26782659.1390745575357
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Dealing with string data efficiently is a hard proble=
m due to the variable length nature of strings.&nbsp;<span style=3D"font-si=
ze: 13px;">The standard way to store string data is to use the std::string =
class. This has a lot of problems in situations where you want to store a l=
ot of read only strings at runtime, especially if the strings are small.</s=
pan></div><div>- Every string is a separate memory allocation. Each allocat=
ion requires extra bytes to store allocator meta-data and also may waste by=
tes for allocators which use strategies such as small block allocation. For=
 large numbers of small strings, this can add up.</div><div>- Lots of small=
 variable length allocations can contribute to memory fragmentation.</div><=
div>- If std::string uses the small string optimization, we avoid the memor=
y allocations but then we can also waste a lot of space for unused bytes in=
 the internal buffer.</div><div>- You still need to store the std::string o=
bjects in some other data structure.</div><div>- Using std::strings with co=
nst char*/string_view/string literals in some contexts require constructing=
 temporary std::string objects, which in turn will allocate memory and kill=
 the performance of parsing routines.</div><div><br></div><div>I've "invent=
ed" a data structure to store large numbers of read only strings efficientl=
y. I say invented in quotes because while I came up with this idea in isola=
tion and used it for a few projects, I doubt it hasn't been done before. Th=
is is an append only data structure. It has only 2 basic routines, store an=
d clear. It efficiently stores variable length read only string data.<br></=
div><div><br></div><div>class string_table {</div><div>public:</div><div>&n=
bsp; //Initialize the table with the given page size, does not do any memor=
y allocations</div><div>&nbsp; string_table(size_t page_size =3D /* some de=
fault*/) : _head(nullptr), _page_size(page_size) {}</div><div><br></div><di=
v>&nbsp; string_table(const string_table&amp;) =3D delete;</div><div>&nbsp;=
 string_table&amp; operator=3D(const string_table&amp;) =3D delete;</div><d=
iv><br></div><div>&nbsp; //this now owns the memory pages owned by other. o=
ther is cleared.</div><div>&nbsp; string_table(string_table&amp;&amp; other=
) noexcept : _head(other._head), _page_size(other._page_size) { other._head=
 =3D nullptr; }</div><div><br></div><div>&nbsp; //clear this, take the memo=
ry pages owned by other, clear other.</div><div>&nbsp; string_table&amp; op=
erator=3D(string_table&amp;&amp;) noexcept {</div><div>&nbsp; &nbsp; if(thi=
s !=3D &amp;other) {<br>&nbsp; &nbsp; &nbsp; &nbsp;clear();</div><div>&nbsp=
; &nbsp; &nbsp; &nbsp;_head =3D other._head;</div><div>&nbsp; &nbsp; &nbsp;=
 &nbsp;_page_size =3D other._page_size;</div><div>&nbsp; &nbsp; &nbsp; &nbs=
p;other._head =3D nullptr;</div><div>&nbsp; &nbsp; }&nbsp;</div><div><span =
style=3D"font-size: 13px;">&nbsp; &nbsp; return *this;&nbsp;</span></div><d=
iv>&nbsp; }</div><div><br></div><div>&nbsp; ~string_table() { clear(); }</d=
iv><div><br></div><div>&nbsp; //Make a copy of the string pointed to by s a=
nd store it in the table with a null terminator.&nbsp;</div><div>&nbsp; //R=
eturn a string view which points to&nbsp;<span style=3D"font-size: 13px;">t=
he stored memory. The returned string view is&nbsp;</span></div><div><span =
style=3D"font-size: 13px;">&nbsp; //guaranteed to be null terminated and wi=
ll be valid for either the lifetime or this or when clear() is called.</spa=
n></div><div>&nbsp; string_view string_table::store(string_view s);</div><d=
iv><br></div><div>&nbsp; //Frees all string data stored in the table. All p=
ointers (string_view) pointing to memory addresses allocated by the table a=
re invalidated.</div><div>&nbsp; void string_table::clear();</div><div><br>=
</div><div>&nbsp; //Return the currently configured page size</div><div>&nb=
sp; size_t get_page_size() const { return _page_size; }</div><div><br></div=
><div>&nbsp; //Change the page size, all currently allocated pages will be =
unaffected but newly allocated pages</div><div>&nbsp; //will take the new p=
age size.</div><div>&nbsp; void set_page_size(size_t ps) { _page_size &nbsp=
;=3D ps; }</div><div><br></div><div>private:</div><div>&nbsp; struct Page;<=
/div><div>&nbsp; Page* _head;</div><div>&nbsp; size_t _page_size;</div><div=
>};</div><div><br></div><div>And thats it, this can be used in many differe=
nt applications:</div><div>- Storing hash table keys/value</div><div>- Pars=
ing large configuration files</div><div>- Symbol tables</div><div><br></div=
><div>Implementation:</div><div>The way this data structure works is by kee=
ping a singly linked list of memory pages. Each page is of size page_size b=
ytes. Each stored string is tightly packed into these pages to minimize was=
ted memory.</div><div><br></div><div>The pages are kept sorted in order by =
how many bytes are available in each page. The page with the most available=
 bytes is first while the page with the least available bytes (likely the f=
ull pages) are kept last.</div><div><br></div><div>Store operation:</div><d=
iv>1) L =3D s.length() + 1.&nbsp;</div><div>2) Find the page P with the lea=
st available memory which has at least L bytes. O(number of pages)</div><di=
v>&nbsp; - If such page does not exist, allocate a new page P of size max(L=
, page_size), and store it at the head of the list.</div><div>3) Copy the b=
ytes of s to the first L-1 bytes available in P. Then append a null termina=
tor. O(L)</div><div>4) Move the page P into its correctly sorted position. =
O(number of pages)</div><div>5) Return a string_view pointing to the stored=
 string in P with length initialized to L-1.</div><div><br></div><div>Clear=
 operation:</div><div>1) Walk to the tail of the page list</div><div>2) Fre=
e each page from the tail to the head.</div><div><br></div><div>The trade o=
ff in memory usage is the overhead of lots of std::string objects and separ=
ate memory allocations vs the overhead of the unused space in the string ta=
ble pages. In terms of flexibility, the string table allows allows you the =
freedom represent string data in any way you like (string_view, const char*=
, make a copy into a std::string).</div><div><br></div><div>What do you thi=
nk?</div></div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

------=_Part_988_26782659.1390745575357--

.

Author: Thiago Macieira <thiago@macieira.org>
Date: Sun, 26 Jan 2014 09:50:44 -0800 Raw View

On domingo, 26 de janeiro de 2014 06:12:55, fmatthew5876@gmail.com wrote:
> Store operation:
> 1) L = s.length() + 1.
> 2) Find the page P with the least available memory which has at least L
> bytes. O(number of pages)
>   - If such page does not exist, allocate a new page P of size max(L,
> page_size), and store it at the head of the list.
> 3) Copy the bytes of s to the first L-1 bytes available in P. Then append a
> null terminator. O(L)
> 4) Move the page P into its correctly sorted position. O(number of pages)
> 5) Return a string_view pointing to the stored string in P with length
> initialized to L-1.

> The trade off in memory usage is the overhead of lots of std::string
> objects and separate memory allocations vs the overhead of the unused space
> in the string table pages. In terms of flexibility, the string table allows
> allows you the freedom represent string data in any way you like
> (string_view, const char*, make a copy into a std::string).
>
> What do you think?

I think you're missing the part about storing the link to the next free region
and its size in the store operation. With that information, there's no
guarantee that this allocator is going to be more efficient than the system's
malloc() at all.

In any case, wouldn't this be more useful as a std::allocator instance?

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center
      PGP/GPG: 0x6EF45358; fingerprint:
      E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.

Author: Matt Fioravante <fmatthew5876@gmail.com>
Date: Sun, 26 Jan 2014 12:37:09 -0800 (PST) Raw View

------=_Part_197_30071620.1390768629675
Content-Type: text/plain; charset=UTF-8

On Sunday, January 26, 2014 12:50:44 PM UTC-5, Thiago Macieira wrote:
>
> On domingo, 26 de janeiro de 2014 06:12:55, fmatth...@gmail.com<javascript:>wrote:
> > Store operation:
> > 1) L = s.length() + 1.
> > 2) Find the page P with the least available memory which has at least L
> > bytes. O(number of pages)
> >   - If such page does not exist, allocate a new page P of size max(L,
> > page_size), and store it at the head of the list.
> > 3) Copy the bytes of s to the first L-1 bytes available in P. Then
> append a
> > null terminator. O(L)
> > 4) Move the page P into its correctly sorted position. O(number of
> pages)
> > 5) Return a string_view pointing to the stored string in P with length
> > initialized to L-1.
>
> > The trade off in memory usage is the overhead of lots of std::string
> > objects and separate memory allocations vs the overhead of the unused
> space
> > in the string table pages. In terms of flexibility, the string table
> allows
> > allows you the freedom represent string data in any way you like
> > (string_view, const char*, make a copy into a std::string).
> >
> > What do you think?
>
> I think you're missing the part about storing the link to the next free
> region
> and its size in the store operation.

All of that was not mentioned but it is taken care of. This is just a high
level sketch. Each page has some metadata at the beginning about how many
bytes are occupied which can be used to compute the offset to the next free
position and sort order.

> With that information, there's no
> guarantee that this allocator is going to be more efficient than the
> system's
> malloc() at all.
>

It was more efficient for my use case. This was a c++98 project which I
designed a class which abstracted over a std::tr1::unordered_map<const
char*, T, CStringHash, CStringCmp>. The reason I did it this way is because
this hash table was populated once at startup and then used a lot, most
often with text parsed from a file. The parsing and hash lookup time was
critical and constructing temporary std::string objects just to do a hash
lookup was a huge bottleneck. For this use case I designed this allocator
specifically to find a way to store string data and avoid being hamstrung
with std::string. I also considered one large std::vector<char>, but that
is very difficult to manage because each resize invalidates all of the
pointers.

My use case may go away with the new hash proposals which allow you to use
a const char* lookup in a std::unordered_map<std::string, T> without
constructing temporary strings.

I'm curious what people plan to do with string_view when it gets
standardized. What will you use as a memory backing for your string data?
Lots of std::string objects? What is the best way to handle storing
variable length data such as character strings?

As far as real memory savings and speedup for the general case, more
benchmarking would be needed. It would also be interesting to measure heap
fragmentation. Allocating a bunch of fixed size pages is much easier on the
heap then a bunch of random length strings.

>
> In any case, wouldn't this be more useful as a std::allocator instance?
>

I suppose its possible but there are some issues. Allocators are designed
around general purpose allocation and freeing of individual blocks of
memory. They model malloc()/free(). This allocation strategy has no
individual free operation so if someone tried to use it as a general
purpose allocator in a situation where memory is often allocated and then
freed (std::vector resizing as an example), it would quickly waste a lot of
memory.

Its designed for read-only data you allocate once and use for a long time.
I'm not sure that usage pattern fits well into a std::allocator.

>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
>    Software Architect - Intel Open Source Technology Center
>       PGP/GPG: 0x6EF45358; fingerprint:
>       E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358
>
>

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

------=_Part_197_30071620.1390768629675
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br>On Sunday, January 26, 2014 12:50:44 PM UTC-5, Thi=
ago Macieira wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;mar=
gin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">On domingo,=
 26 de janeiro de 2014 06:12:55, <a href=3D"javascript:" target=3D"_blank" =
gdf-obfuscated-mailto=3D"UpFH45jeW9EJ" onmousedown=3D"this.href=3D'javascri=
pt:';return true;" onclick=3D"this.href=3D'javascript:';return true;">fmatt=
h...@gmail.com</a> wrote:
<br>&gt; Store operation:
<br>&gt; 1) L =3D s.length() + 1.=20
<br>&gt; 2) Find the page P with the least available memory which has at le=
ast L=20
<br>&gt; bytes. O(number of pages)
<br>&gt; &nbsp; - If such page does not exist, allocate a new page P of siz=
e max(L,=20
<br>&gt; page_size), and store it at the head of the list.
<br>&gt; 3) Copy the bytes of s to the first L-1 bytes available in P. Then=
 append a=20
<br>&gt; null terminator. O(L)
<br>&gt; 4) Move the page P into its correctly sorted position. O(number of=
 pages)
<br>&gt; 5) Return a string_view pointing to the stored string in P with le=
ngth=20
<br>&gt; initialized to L-1.
<br>
<br>&gt; The trade off in memory usage is the overhead of lots of std::stri=
ng=20
<br>&gt; objects and separate memory allocations vs the overhead of the unu=
sed space=20
<br>&gt; in the string table pages. In terms of flexibility, the string tab=
le allows
<br>&gt; allows you the freedom represent string data in any way you like
<br>&gt; (string_view, const char*, make a copy into a std::string).
<br>&gt;=20
<br>&gt; What do you think?
<br>
<br>I think you're missing the part about storing the link to the next free=
 region=20
<br>and its size in the store operation. </blockquote><div><br></div><div>A=
ll of that was not mentioned but it is taken care of. This is just a high l=
evel sketch. Each page has some metadata at the beginning about how many by=
tes are occupied which can be used to compute the offset to the next free p=
osition and sort order.</div><div>&nbsp;</div><blockquote class=3D"gmail_qu=
ote" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padd=
ing-left: 1ex;">With that information, there's no=20
<br>guarantee that this allocator is going to be more efficient than the sy=
stem's=20
<br>malloc() at all.
<br></blockquote><div><br></div><div>It was more efficient for my use case.=
 This was a c++98 project which I designed a class which abstracted over a =
std::tr1::unordered_map&lt;const char*, T, CStringHash, CStringCmp&gt;. The=
 reason I did it this way is because this hash table was populated once at =
startup and then used a lot, most often with text parsed from a file. The p=
arsing and hash lookup time was critical and constructing temporary std::st=
ring objects just to do a hash lookup was a huge bottleneck. For this use c=
ase I designed this allocator specifically to find a way to store string da=
ta and avoid being hamstrung with std::string. I also considered one large =
std::vector&lt;char&gt;, but that is very difficult to manage because each =
resize invalidates all of the pointers.</div><div><br></div><div>My use cas=
e may go away with the new hash proposals which allow you to use a const ch=
ar* lookup in a std::unordered_map&lt;std::string, T&gt; without constructi=
ng temporary strings.</div><div><br></div><div>I'm curious what people plan=
 to do with string_view when it gets standardized. What will you use as a m=
emory backing for your string data? Lots of std::string objects? What is th=
e best way to handle storing variable length data such as character strings=
?</div><div><br></div><div>As far as real memory savings and speedup for th=
e general case, more benchmarking would be needed. It would also be interes=
ting to measure heap fragmentation. Allocating a bunch of fixed size pages =
is much easier on the heap then a bunch of random length strings.<br></div>=
<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;bor=
der-left: 1px #ccc solid;padding-left: 1ex;">
<br>In any case, wouldn't this be more useful as a std::allocator instance?
<br></blockquote><div><br></div><div>I suppose its possible but there are s=
ome issues. Allocators are designed around general purpose allocation and f=
reeing of individual blocks of memory. They model malloc()/free(). This all=
ocation strategy has no individual free operation so if someone tried to us=
e it as a general purpose allocator in a situation where memory is often al=
located and then freed (std::vector resizing as an example), it would quick=
ly waste a lot of memory.</div><div><br></div><div>Its designed for read-on=
ly data you allocate once and use for a long time. I'm not sure that usage =
pattern fits well into a std::allocator.</div><blockquote class=3D"gmail_qu=
ote" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padd=
ing-left: 1ex;">
<br>--=20
<br>Thiago Macieira - thiago (AT) <a href=3D"http://macieira.info" target=
=3D"_blank" onmousedown=3D"this.href=3D'http://www.google.com/url?q\75http%=
3A%2F%2Fmacieira.info\46sa\75D\46sntz\0751\46usg\75AFQjCNEswDUBNCNanbu7euhq=
Ln_62FW8ag';return true;" onclick=3D"this.href=3D'http://www.google.com/url=
?q\75http%3A%2F%2Fmacieira.info\46sa\75D\46sntz\0751\46usg\75AFQjCNEswDUBNC=
Nanbu7euhqLn_62FW8ag';return true;">macieira.info</a> - thiago (AT) <a href=
=3D"http://kde.org" target=3D"_blank" onmousedown=3D"this.href=3D'http://ww=
w.google.com/url?q\75http%3A%2F%2Fkde.org\46sa\75D\46sntz\0751\46usg\75AFQj=
CNHGRJdo5_JYG1DowztwAHAKs80XSA';return true;" onclick=3D"this.href=3D'http:=
//www.google.com/url?q\75http%3A%2F%2Fkde.org\46sa\75D\46sntz\0751\46usg\75=
AFQjCNHGRJdo5_JYG1DowztwAHAKs80XSA';return true;">kde.org</a>
<br>&nbsp; &nbsp;Software Architect - Intel Open Source Technology Center
<br>&nbsp; &nbsp; &nbsp; PGP/GPG: 0x6EF45358; fingerprint:
<br>&nbsp; &nbsp; &nbsp; E067 918B B660 DBD1 105C &nbsp;966C 33F5 F005 6EF4=
 5358
<br>
<br></blockquote></div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

------=_Part_197_30071620.1390768629675--

.

Author: Thiago Macieira <thiago@macieira.org>
Date: Sun, 26 Jan 2014 14:31:47 -0800 Raw View

--nextPart11144128.1rvxB5JFVb
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"

On domingo, 26 de janeiro de 2014 12:37:09, Matt Fioravante wrote:
> It was more efficient for my use case.

A sample size of 1 is not very indicative of anything...

> As far as real memory savings and speedup for the general case, more
> benchmarking would be needed. It would also be interesting to measure heap
> fragmentation. Allocating a bunch of fixed size pages is much easier on the
> heap then a bunch of random length strings.

Please benchmark an allocator using an obstack too.

> > In any case, wouldn't this be more useful as a std::allocator instance?
>
> I suppose its possible but there are some issues. Allocators are designed
> around general purpose allocation and freeing of individual blocks of
> memory. They model malloc()/free(). This allocation strategy has no
> individual free operation so if someone tried to use it as a general
> purpose allocator in a situation where memory is often allocated and then
> freed (std::vector resizing as an example), it would quickly waste a lot of
> memory.
>
> Its designed for read-only data you allocate once and use for a long time.
> I'm not sure that usage pattern fits well into a std::allocator.

I don't see why not. Just implement the allocator's free operation as a no-op.
If all the information is going to be read-only after the initial allocation,
that should be fine.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center
      PGP/GPG: 0x6EF45358; fingerprint:
      E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358

--nextPart11144128.1rvxB5JFVb
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: This is a digitally signed message part.
Content-Transfer-Encoding: 7Bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)

iD8DBQBS5YzaM/XwBW70U1gRAnviAJ4qkVI2vAtlVdwNFl/dq+dRjpBPvACfX1Pa
i4kUd28wEngduBEZgrtW6D0=
=MC8n
-----END PGP SIGNATURE-----

--nextPart11144128.1rvxB5JFVb--


.

Author: Matt Fioravante <fmatthew5876@gmail.com>
Date: Sun, 26 Jan 2014 14:46:59 -0800 (PST) Raw View

------=_Part_343_27039121.1390776419579
Content-Type: text/plain; charset=UTF-8

On Sunday, January 26, 2014 5:31:47 PM UTC-5, Thiago Macieira wrote:
>
> On domingo, 26 de janeiro de 2014 12:37:09, Matt Fioravante wrote:
> > It was more efficient for my use case.
>
> A sample size of 1 is not very indicative of anything...
>

Indeed, and the primary benefit for my use case was avoiding std::string,
not the actual allocation strategy. I would have done pretty well with a
vector<char*> and strdup() as horrible as that solution may sound.

Also slightly related are allocation strategies sometimes used by game
programmers.

Frame allocators put aside a fixed size buffer and each frame anything can
allocate from this buffer by just taking the next N bytes of the buffer.
Once the frame is done rendered, the entire buffer is freed.

Another example are stack based allocators. Again put aside a fixed size
heap buffer. Push all of your base game data onto this heap. Then push your
Level 1 data. Once level 1 is completed, pop all the data off and push
Level 2's data etc..

>
> > As far as real memory savings and speedup for the general case, more
> > benchmarking would be needed. It would also be interesting to measure
> heap
> > fragmentation. Allocating a bunch of fixed size pages is much easier on
> the
> > heap then a bunch of random length strings.
>
> Please benchmark an allocator using an obstack too.
>
> > > In any case, wouldn't this be more useful as a std::allocator
> instance?
> >
> > I suppose its possible but there are some issues. Allocators are
> designed
> > around general purpose allocation and freeing of individual blocks of
> > memory. They model malloc()/free(). This allocation strategy has no
> > individual free operation so if someone tried to use it as a general
> > purpose allocator in a situation where memory is often allocated and
> then
> > freed (std::vector resizing as an example), it would quickly waste a lot
> of
> > memory.
> >
> > Its designed for read-only data you allocate once and use for a long
> time.
> > I'm not sure that usage pattern fits well into a std::allocator.
>
> I don't see why not. Just implement the allocator's free operation as a
> no-op.
> If all the information is going to be read-only after the initial
> allocation,
> that should be fine.
>

This allocation strategy could be fine with writable data as well. The
reason its read only is because typically if you want to write to a string
you also want to change its length, which this allocation strategy fails at.

In general I'd love to start a discussion about allocation strategies for
variable length string data. Its a difficult problem I've been pondering
for a long time.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

------=_Part_343_27039121.1390776419579
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br>On Sunday, January 26, 2014 5:31:47 PM UTC-5, Thia=
go Macieira wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;marg=
in-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">On domingo, =
26 de janeiro de 2014 12:37:09, Matt Fioravante wrote:
<br>&gt; It was more efficient for my use case.=20
<br>
<br>A sample size of 1 is not very indicative of anything...
<br></blockquote><div><br></div><div>Indeed, and the primary benefit for my=
 use case was avoiding std::string, not the actual allocation strategy. I w=
ould have done pretty well with a vector&lt;char*&gt; and strdup() as horri=
ble as that solution may sound.</div><div><br></div><div>Also slightly rela=
ted are allocation strategies sometimes used by game programmers.&nbsp;</di=
v><div><br></div><div>Frame allocators put aside a fixed size buffer and ea=
ch frame anything can allocate from this buffer by just taking the next N b=
ytes of the buffer. Once the frame is done rendered, the entire buffer is f=
reed.&nbsp;</div><div><br></div><div>Another example are stack based alloca=
tors. Again put aside a fixed size heap buffer. Push all of your base game =
data onto this heap. Then push your Level 1 data. Once level 1 is completed=
, pop all the data off and push Level 2's data etc..</div><blockquote class=
=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #cc=
c solid;padding-left: 1ex;">
<br>&gt; As far as real memory savings and speedup for the general case, mo=
re
<br>&gt; benchmarking would be needed. It would also be interesting to meas=
ure heap
<br>&gt; fragmentation. Allocating a bunch of fixed size pages is much easi=
er on the
<br>&gt; heap then a bunch of random length strings.
<br>
<br>Please benchmark an allocator using an obstack too.
<br>
<br>&gt; &gt; In any case, wouldn't this be more useful as a std::allocator=
 instance?
<br>&gt;=20
<br>&gt; I suppose its possible but there are some issues. Allocators are d=
esigned
<br>&gt; around general purpose allocation and freeing of individual blocks=
 of
<br>&gt; memory. They model malloc()/free(). This allocation strategy has n=
o
<br>&gt; individual free operation so if someone tried to use it as a gener=
al
<br>&gt; purpose allocator in a situation where memory is often allocated a=
nd then
<br>&gt; freed (std::vector resizing as an example), it would quickly waste=
 a lot of
<br>&gt; memory.
<br>&gt;=20
<br>&gt; Its designed for read-only data you allocate once and use for a lo=
ng time.
<br>&gt; I'm not sure that usage pattern fits well into a std::allocator.
<br>
<br>I don't see why not. Just implement the allocator's free operation as a=
 no-op.=20
<br>If all the information is going to be read-only after the initial alloc=
ation,=20
<br>that should be fine.
<br></blockquote><div><br></div><div>This allocation strategy could be fine=
 with writable data as well. The reason its read only is because typically =
if you want to write to a string you also want to change its length, which =
this allocation strategy fails at.</div><div><br></div><div>In general I'd =
love to start a discussion about allocation strategies for variable length =
string data. Its a difficult problem I've been pondering for a long time.</=
div><div>&nbsp;</div></div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

------=_Part_343_27039121.1390776419579--

.