Topic: A library to provide a virtual memory vector of strings


Author: Steve Heller <technovelist@gmail.com>
Date: Mon, 22 Jul 2013 17:37:19 -0700 (PDT)
Raw View
------=_Part_394_18867587.1374539839779
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

I own a library that I would like to propose for addition to the C++=20
standard. It provides an efficient method of managing variable-length=20
strings via a custom virtual memory implementation, so that you can=20
minimize the amount of physical memory when accessing large numbers=20
(millions) of strings. Several implementations of this library have been=20
used in industry for some time; it=92s not just a concept (no pun intended)=
=20
or a toy library. Some recent tests indicate that in a resource-constrained=
=20
environment, it can reduce memory pressure significantly, thus allowing the=
=20
system to continue functioning normally in cases where std::vector<string>=
=20
causes thrashing.=20

The maximum file size in the reference implementation is 64 GB. Individual=
=20
data types have their own maxima, described below.
The following data types are already implemented. The strings can vary=20
dynamically in length up to a maximum size that is implementation-defined=
=20
(approximately 16K in the reference implementation).

1. A vector of strings of dynamically variable-length. The number of=20
strings in one array is implementation-defined; in the reference=20
implementation it is approximately 16 million.
2. A map of strings to strings. The keys and the data can be of dynamically=
=20
varying size. The same limitation as above applies to the number of=20
elements in the map.
3. A dynamically growing vector of unsigned 4-byte integers. The maximum=20
number of elements in the vector is constrained by the maximum file size or=
=20
by UINT_MAX, whichever is smaller.


--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.



------=_Part_394_18867587.1374539839779
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

<div>I own a library that I would like to propose for addition to the C++ s=
tandard. It provides an efficient method of managing variable-length string=
s via a custom virtual memory implementation, so that you can minimize the =
amount of physical memory when accessing large numbers (millions) of string=
s. Several implementations of this library have been used in industry for s=
ome time; it=92s not just a concept (no pun intended) or a toy library. Som=
e recent tests indicate that in a resource-constrained environment, it can =
reduce memory pressure significantly, thus allowing the system to continue =
functioning normally in cases where std::vector&lt;string&gt; causes thrash=
ing.&nbsp;</div><div><br></div><div>The maximum file size in the reference =
implementation is 64 GB. Individual data types have their own maxima, descr=
ibed below.</div><div>The following data types are already implemented. The=
 strings can vary dynamically in length up to a maximum size that is implem=
entation-defined (approximately 16K in the reference implementation).</div>=
<div><br></div><div>1. A vector of strings of dynamically variable-length. =
The number of strings in one array is implementation-defined; in the refere=
nce implementation it is approximately 16 million.</div><div>2. A map of st=
rings to strings. The keys and the data can be of dynamically varying size.=
 The same limitation as above applies to the number of elements in the map.=
</div><div>3. A dynamically growing vector of unsigned 4-byte integers. The=
 maximum number of elements in the vector is constrained by the maximum fil=
e size or by UINT_MAX, whichever is smaller.</div><div><br></div><div><br><=
/div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_394_18867587.1374539839779--

.


Author: Nicol Bolas <jmckesson@gmail.com>
Date: Mon, 22 Jul 2013 18:43:25 -0700 (PDT)
Raw View
------=_Part_102_6067545.1374543805959
Content-Type: text/plain; charset=ISO-8859-1

You can write up a proposal if you like. But this sounds *incredibly*special case. That's not to say that it isn't useful. But when you might
need such a class sounds really specific.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.



------=_Part_102_6067545.1374543805959
Content-Type: text/html; charset=ISO-8859-1

You can write up a proposal if you like. But this sounds <i>incredibly</i> special case. That's not to say that it isn't useful. But when you might need such a class sounds really specific.<br>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
Visit this group at <a href="http://groups.google.com/a/isocpp.org/group/std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_102_6067545.1374543805959--

.


Author: Olaf van der Spek <olafvdspek@gmail.com>
Date: Tue, 23 Jul 2013 04:58:44 -0700 (PDT)
Raw View
------=_Part_685_24803104.1374580724133
Content-Type: text/plain; charset=ISO-8859-1

On Tuesday, July 23, 2013 2:37:19 AM UTC+2, Steve Heller wrote:

> I own a library that I would like to propose for addition to the C++
> standard.
>

Has the library been published already? If so, where?
What about first trying to get this into Boost?

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.



------=_Part_685_24803104.1374580724133
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Tuesday, July 23, 2013 2:37:19 AM UTC+2, Steve Heller wrote:<br><blockqu=
ote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left=
: 1px #ccc solid;padding-left: 1ex;"><div>I own a library that I would like=
 to propose for addition to the C++ standard.&nbsp;</div></blockquote><div>=
<br></div><div>Has the library been published already? If so, where?</div><=
div>What about first trying to get this into Boost?</div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_685_24803104.1374580724133--

.


Author: Jonathan Wakely <cxx@kayari.org>
Date: Wed, 24 Jul 2013 03:01:07 -0700 (PDT)
Raw View
------=_Part_1525_3301774.1374660067427
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable



On Tuesday, July 23, 2013 1:37:19 AM UTC+1, Steve Heller wrote:
>
> I own a library that I would like to propose for addition to the C++=20
> standard. It provides an efficient method of managing variable-length=20
> strings via a custom virtual memory implementation, so that you can=20
> minimize the amount of physical memory when accessing large numbers=20
> (millions) of strings. Several implementations of this library have been=
=20
> used in industry for some time; it=92s not just a concept (no pun intende=
d)=20
> or a toy library. Some recent tests indicate that in a resource-constrain=
ed=20
> environment, it can reduce memory pressure significantly, thus allowing t=
he=20
> system to continue functioning normally in cases where std::vector<string=
>=20
> causes thrashing.=20
>
> The maximum file size in the reference implementation is 64 GB. Individua=
l=20
> data types have their own maxima, described below.
> The following data types are already implemented. The strings can vary=20
> dynamically in length up to a maximum size that is implementation-defined=
=20
> (approximately 16K in the reference implementation).
>
> 1. A vector of strings of dynamically variable-length. The number of=20
> strings in one array is implementation-defined; in the reference=20
> implementation it is approximately 16 million.
> 2. A map of strings to strings. The keys and the data can be of=20
> dynamically varying size. The same limitation as above applies to the=20
> number of elements in the map.
> 3. A dynamically growing vector of unsigned 4-byte integers. The maximum=
=20
> number of elements in the vector is constrained by the maximum file size =
or=20
> by UINT_MAX, whichever is smaller.
>
>
Do other languages have anything similar in their standard libraries?  If=
=20
so, which? What are the problems it solves? (Efficient access to data in=20
memory is far too vague - is it just an optimisation technique?)  If other=
=20
standard libraries don't have this, why not? Is it a problem specific to=20
C++, or is it just not something worth standardising because there isn't=20
demand for it?

From a (very quick) glance at your PDF it seems the interesting part and=20
the novelty is in the implementation, but the C++ standard is not an=20
implementation, it's a specification of visible behaviour, not internal=20
details.

IMHO if this was an open source library some people would find it very=20
useful, but I don't see it as a candidate for standardisation.  But that's=
=20
just my opinion.


--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.



------=_Part_1525_3301774.1374660067427
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

<br><br>On Tuesday, July 23, 2013 1:37:19 AM UTC+1, Steve Heller wrote:<blo=
ckquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-=
left: 1px #ccc solid;padding-left: 1ex;"><div>I own a library that I would =
like to propose for addition to the C++ standard. It provides an efficient =
method of managing variable-length strings via a custom virtual memory impl=
ementation, so that you can minimize the amount of physical memory when acc=
essing large numbers (millions) of strings. Several implementations of this=
 library have been used in industry for some time; it=92s not just a concep=
t (no pun intended) or a toy library. Some recent tests indicate that in a =
resource-constrained environment, it can reduce memory pressure significant=
ly, thus allowing the system to continue functioning normally in cases wher=
e std::vector&lt;string&gt; causes thrashing.&nbsp;</div><div><br></div><di=
v>The maximum file size in the reference implementation is 64 GB. Individua=
l data types have their own maxima, described below.</div><div>The followin=
g data types are already implemented. The strings can vary dynamically in l=
ength up to a maximum size that is implementation-defined (approximately 16=
K in the reference implementation).</div><div><br></div><div>1. A vector of=
 strings of dynamically variable-length. The number of strings in one array=
 is implementation-defined; in the reference implementation it is approxima=
tely 16 million.</div><div>2. A map of strings to strings. The keys and the=
 data can be of dynamically varying size. The same limitation as above appl=
ies to the number of elements in the map.</div><div>3. A dynamically growin=
g vector of unsigned 4-byte integers. The maximum number of elements in the=
 vector is constrained by the maximum file size or by UINT_MAX, whichever i=
s smaller.</div><div><br></div></blockquote><div><br>Do other languages hav=
e anything similar in their standard libraries?&nbsp; If so, which? What ar=
e the problems it solves? (Efficient access to data in memory is far too va=
gue - is it just an optimisation technique?)&nbsp; If other standard librar=
ies don't have this, why not? Is it a problem specific to C++, or is it jus=
t not something worth standardising because there isn't demand for it?<br><=
br>From a (very quick) glance at your PDF it seems the interesting part and=
 the novelty is in the implementation, but the C++ standard is not an imple=
mentation, it's a specification of visible behaviour, not internal details.=
<br><br>IMHO if this was an open source library some people would find it v=
ery useful, but I don't see it as a candidate for standardisation.&nbsp; Bu=
t that's just my opinion.<br><br><br></div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_1525_3301774.1374660067427--

.


Author: Steve Heller <technovelist@gmail.com>
Date: Wed, 24 Jul 2013 07:16:21 -0700 (PDT)
Raw View
------=_Part_1742_20020442.1374675381332
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

I don't know much about standard libraries of other languages, so I can't=
=20
answer that question.
The problems my library solves are those in which random access to large=20
amounts (these days, typically several hundred megabytes to several=20
gigabytes) of variable-length data is required without demanding that much=
=20
virtual address space from the operating system. I have had several=20
situations in my career where this library has been indispensable in this=
=20
application, and I doubt my experience is that unusual.
It is not specific to C++, but since C++ has a systems programming tilt,=20
C++ would be more likely to need this than a language mostly used for=20
smaller problems.
It could be implemented in different ways; I have described my=20
implementation.
As for whether it should be standardized, if a programmer knew that the=20
facilities it provides would be available with any conforming compiler,=20
then he could write his programs assuming that those facilities exist,=20
rather than having to implement them himself or find an open-source library=
=20
with similar (but probably not identical) facilities.

On Wednesday, July 24, 2013 5:01:07 AM UTC-5, Jonathan Wakely wrote:
>
>
>
> On Tuesday, July 23, 2013 1:37:19 AM UTC+1, Steve Heller wrote:
>>
>> I own a library that I would like to propose for addition to the C++=20
>> standard. It provides an efficient method of managing variable-length=20
>> strings via a custom virtual memory implementation, so that you can=20
>> minimize the amount of physical memory when accessing large numbers=20
>> (millions) of strings. Several implementations of this library have been=
=20
>> used in industry for some time; it=92s not just a concept (no pun intend=
ed)=20
>> or a toy library. Some recent tests indicate that in a resource-constrai=
ned=20
>> environment, it can reduce memory pressure significantly, thus allowing =
the=20
>> system to continue functioning normally in cases where std::vector<strin=
g>=20
>> causes thrashing.=20
>>
>> The maximum file size in the reference implementation is 64 GB.=20
>> Individual data types have their own maxima, described below.
>> The following data types are already implemented. The strings can vary=
=20
>> dynamically in length up to a maximum size that is implementation-define=
d=20
>> (approximately 16K in the reference implementation).
>>
>> 1. A vector of strings of dynamically variable-length. The number of=20
>> strings in one array is implementation-defined; in the reference=20
>> implementation it is approximately 16 million.
>> 2. A map of strings to strings. The keys and the data can be of=20
>> dynamically varying size. The same limitation as above applies to the=20
>> number of elements in the map.
>> 3. A dynamically growing vector of unsigned 4-byte integers. The maximum=
=20
>> number of elements in the vector is constrained by the maximum file size=
 or=20
>> by UINT_MAX, whichever is smaller.
>>
>>
> Do other languages have anything similar in their standard libraries?  If=
=20
> so, which? What are the problems it solves? (Efficient access to data in=
=20
> memory is far too vague - is it just an optimisation technique?)  If othe=
r=20
> standard libraries don't have this, why not? Is it a problem specific to=
=20
> C++, or is it just not something worth standardising because there isn't=
=20
> demand for it?
>
> From a (very quick) glance at your PDF it seems the interesting part and=
=20
> the novelty is in the implementation, but the C++ standard is not an=20
> implementation, it's a specification of visible behaviour, not internal=
=20
> details.
>
> IMHO if this was an open source library some people would find it very=20
> useful, but I don't see it as a candidate for standardisation.  But that'=
s=20
> just my opinion.
>
>
>

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.



------=_Part_1742_20020442.1374675381332
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

I don't know much about standard libraries of other languages, so I can't a=
nswer that question.<div>The problems my library solves are those in which =
random access to large amounts (these days, typically several hundred megab=
ytes to several gigabytes) of variable-length data is required without dema=
nding that much virtual address space from the operating system. I have had=
 several situations in my career where this library has been indispensable =
in this application, and I doubt my experience is that unusual.</div><div>I=
t is not specific to C++, but since C++ has a systems programming tilt, C++=
 would be more likely to need this than a language mostly used for smaller =
problems.</div><div>It could be implemented in different ways; I have descr=
ibed my implementation.</div><div>As for whether it should be standardized,=
 if a programmer knew that the facilities it provides would be available wi=
th any conforming compiler, then he could write his programs assuming that =
those facilities exist, rather than having to implement them himself or fin=
d an open-source library with similar (but probably not identical) faciliti=
es.<br><br>On Wednesday, July 24, 2013 5:01:07 AM UTC-5, Jonathan Wakely wr=
ote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex=
;border-left: 1px #ccc solid;padding-left: 1ex;"><br><br>On Tuesday, July 2=
3, 2013 1:37:19 AM UTC+1, Steve Heller wrote:<blockquote class=3D"gmail_quo=
te" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-=
left:1ex"><div>I own a library that I would like to propose for addition to=
 the C++ standard. It provides an efficient method of managing variable-len=
gth strings via a custom virtual memory implementation, so that you can min=
imize the amount of physical memory when accessing large numbers (millions)=
 of strings. Several implementations of this library have been used in indu=
stry for some time; it=92s not just a concept (no pun intended) or a toy li=
brary. Some recent tests indicate that in a resource-constrained environmen=
t, it can reduce memory pressure significantly, thus allowing the system to=
 continue functioning normally in cases where std::vector&lt;string&gt; cau=
ses thrashing.&nbsp;</div><div><br></div><div>The maximum file size in the =
reference implementation is 64 GB. Individual data types have their own max=
ima, described below.</div><div>The following data types are already implem=
ented. The strings can vary dynamically in length up to a maximum size that=
 is implementation-defined (approximately 16K in the reference implementati=
on).</div><div><br></div><div>1. A vector of strings of dynamically variabl=
e-length. The number of strings in one array is implementation-defined; in =
the reference implementation it is approximately 16 million.</div><div>2. A=
 map of strings to strings. The keys and the data can be of dynamically var=
ying size. The same limitation as above applies to the number of elements i=
n the map.</div><div>3. A dynamically growing vector of unsigned 4-byte int=
egers. The maximum number of elements in the vector is constrained by the m=
aximum file size or by UINT_MAX, whichever is smaller.</div><div><br></div>=
</blockquote><div><br>Do other languages have anything similar in their sta=
ndard libraries?&nbsp; If so, which? What are the problems it solves? (Effi=
cient access to data in memory is far too vague - is it just an optimisatio=
n technique?)&nbsp; If other standard libraries don't have this, why not? I=
s it a problem specific to C++, or is it just not something worth standardi=
sing because there isn't demand for it?<br><br>From a (very quick) glance a=
t your PDF it seems the interesting part and the novelty is in the implemen=
tation, but the C++ standard is not an implementation, it's a specification=
 of visible behaviour, not internal details.<br><br>IMHO if this was an ope=
n source library some people would find it very useful, but I don't see it =
as a candidate for standardisation.&nbsp; But that's just my opinion.<br><b=
r><br></div></blockquote></div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_1742_20020442.1374675381332--

.