Thread

Topic: Any plan out there to add a real hash map to the standard?

Author: Bob Fang <dorafmon@gmail.com>
Date: Thu, 21 Dec 2017 07:36:46 -0800 (PST) Raw View

------=_Part_5731_1300045454.1513870606455
Content-Type: multipart/alternative;
 boundary="----=_Part_5732_1838361727.1513870606455"

------=_Part_5732_1838361727.1513870606455
Content-Type: text/plain; charset="UTF-8"

Hi,

At this point we all know unordered_map is by design cache unfriendly and
slow, I am just wondering if there is any outstanding papers that describe
a hash map that we can implement to be fast.

Thanks a lot.

Best,
Bob

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/eb22c8aa-4a71-42a5-92c0-83cee38cb91c%40isocpp.org.

------=_Part_5732_1838361727.1513870606455
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi,<div><br></div><div>At this point we all know unordered=
_map is by design cache unfriendly and slow, I am just wondering if there i=
s any outstanding papers that describe a hash map that we can implement to =
be fast.</div><div><br></div><div>Thanks a lot.</div><div><br></div><div>Be=
st,</div><div>Bob</div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/eb22c8aa-4a71-42a5-92c0-83cee38cb91c%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/eb22c8aa-4a71-42a5-92c0-83cee38cb91c=
%40isocpp.org</a>.<br />

------=_Part_5732_1838361727.1513870606455--

------=_Part_5731_1300045454.1513870606455--

.

Author: Nicol Bolas <jmckesson@gmail.com>
Date: Thu, 21 Dec 2017 07:52:15 -0800 (PST) Raw View

------=_Part_5691_2016697822.1513871536072
Content-Type: multipart/alternative;
 boundary="----=_Part_5692_2110764630.1513871536072"

------=_Part_5692_2110764630.1513871536072
Content-Type: text/plain; charset="UTF-8"

On Thursday, December 21, 2017 at 10:36:46 AM UTC-5, Bob Fang wrote:
>
> Hi,
>
> At this point we all know unordered_map is by design cache unfriendly and
> slow,
>

We do? If we do indeed "all know" that, then obviously you should be able
to:

1) explain exactly what it is about the design of `unordered` containers
that requires this.

2) explain what a design would look like that would permit a better
implementation, along with the limitations of said design.

So please, provide such explanations.

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/f3b90a2b-2f3e-4dbe-8b0b-6a92a4fd2711%40isocpp.org.

------=_Part_5692_2110764630.1513871536072
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Thursday, December 21, 2017 at 10:36:46 AM UTC-5, Bob F=
ang wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left:=
 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr">Hi,=
<div><br></div><div>At this point we all know unordered_map is by design ca=
che unfriendly and slow,</div></div></blockquote><div><br></div><div>We do?=
 If we do indeed &quot;all know&quot; that, then obviously you should be ab=
le to:</div><div><br></div><div>1) explain exactly what it is about the des=
ign of `unordered` containers that requires this.</div><div><br></div><div>=
2) explain what a design would look like that would permit a better impleme=
ntation, along with the limitations of said design.</div><div><br></div><di=
v>So please, provide such explanations.</div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/f3b90a2b-2f3e-4dbe-8b0b-6a92a4fd2711%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/f3b90a2b-2f3e-4dbe-8b0b-6a92a4fd2711=
%40isocpp.org</a>.<br />

------=_Part_5692_2110764630.1513871536072--

------=_Part_5691_2016697822.1513871536072--

.

Author: Thiago Macieira <thiago@macieira.org>
Date: Thu, 21 Dec 2017 18:16:29 -0200 Raw View

On quinta-feira, 21 de dezembro de 2017 13:36:46 -02 Bob Fang wrote:
> Hi,
>
> At this point we all know unordered_map is by design cache unfriendly and
> slow, I am just wondering if there is any outstanding papers that describe
> a hash map that we can implement to be fast.

unordered_map is cache-unfriendly because it's a hashing table. I don't know
of any cache-friendly hasing tables.

It's not slow.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/3945078.WOE4bWxDhs%40tjmaciei-mobl1.

.

Author: Andrew Tomazos <andrewtomazos@gmail.com>
Date: Thu, 21 Dec 2017 14:32:49 -0600 Raw View

--001a11450ac88bc74d0560df9ae5
Content-Type: text/plain; charset="UTF-8"

Doesn't the current unordered_map spec require (implicitly) that the
elements are heap allocated (so pointers are not invalidated across a
rehash), and linked collision resolution?  An alternative design would be
to use open addressing and store the elements within the actual hash table
(no heap allocation).  The later would be significantly faster for some T
as it avoids heap allocation on write, and an extra level of indirection on
read (which is cache-unfriendly).  Or do I miss something?

On Thu, Dec 21, 2017 at 2:16 PM, Thiago Macieira <thiago@macieira.org>
wrote:

> On quinta-feira, 21 de dezembro de 2017 13:36:46 -02 Bob Fang wrote:
> > Hi,
> >
> > At this point we all know unordered_map is by design cache unfriendly and
> > slow, I am just wondering if there is any outstanding papers that
> describe
> > a hash map that we can implement to be fast.
>
> unordered_map is cache-unfriendly because it's a hashing table. I don't
> know
> of any cache-friendly hasing tables.
>
> It's not slow.
>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
>    Software Architect - Intel Open Source Technology Center
>
> --
> You received this message because you are subscribed to the Google Groups
> "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> To view this discussion on the web visit https://groups.google.com/a/
> isocpp.org/d/msgid/std-proposals/3945078.WOE4bWxDhs%40tjmaciei-mobl1.
>

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CAB%2B4KHLqus%2B2U32CcpxWAck2iYnKqXYaLuOrDW56R3T6WS4MPw%40mail.gmail.com.

--001a11450ac88bc74d0560df9ae5
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Doesn&#39;t the current unordered_map spec require (implic=
itly) that the elements are heap allocated (so pointers are not invalidated=
 across a rehash), and linked collision resolution?=C2=A0 An alternative de=
sign would be to use open addressing and store the elements within the actu=
al hash table (no heap allocation).=C2=A0 The later would be significantly =
faster for some T as it avoids heap allocation on write, and an extra level=
 of indirection on read (which is cache-unfriendly).=C2=A0 Or do I miss som=
ething?</div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On T=
hu, Dec 21, 2017 at 2:16 PM, Thiago Macieira <span dir=3D"ltr">&lt;<a href=
=3D"mailto:thiago@macieira.org" target=3D"_blank">thiago@macieira.org</a>&g=
t;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0=
 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=3D"">On quin=
ta-feira, 21 de dezembro de 2017 13:36:46 -02 Bob Fang wrote:<br>
&gt; Hi,<br>
&gt;<br>
&gt; At this point we all know unordered_map is by design cache unfriendly =
and<br>
&gt; slow, I am just wondering if there is any outstanding papers that desc=
ribe<br>
&gt; a hash map that we can implement to be fast.<br>
<br>
</span>unordered_map is cache-unfriendly because it&#39;s a hashing table. =
I don&#39;t know<br>
of any cache-friendly hasing tables.<br>
<br>
It&#39;s not slow.<br>
<br>
--<br>
Thiago Macieira - thiago (AT) <a href=3D"http://macieira.info" rel=3D"noref=
errer" target=3D"_blank">macieira.info</a> - thiago (AT) <a href=3D"http://=
kde.org" rel=3D"noreferrer" target=3D"_blank">kde.org</a><br>
=C2=A0 =C2=A0Software Architect - Intel Open Source Technology Center<br>
<span class=3D""><br>
--<br>
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br>
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals%2Bunsubscribe@isocpp.org">std-propo=
sals+unsubscribe@<wbr>isocpp.org</a>.<br>
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br>
</span>To view this discussion on the web visit <a href=3D"https://groups.g=
oogle.com/a/isocpp.org/d/msgid/std-proposals/3945078.WOE4bWxDhs%40tjmaciei-=
mobl1" rel=3D"noreferrer" target=3D"_blank">https://groups.google.com/a/<wb=
r>isocpp.org/d/msgid/std-<wbr>proposals/3945078.WOE4bWxDhs%<wbr>40tjmaciei-=
mobl1</a>.<br>
</blockquote></div><br></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/CAB%2B4KHLqus%2B2U32CcpxWAck2iYnKqXYa=
LuOrDW56R3T6WS4MPw%40mail.gmail.com?utm_medium=3Demail&utm_source=3Dfooter"=
>https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CAB%2B4KHLqus=
%2B2U32CcpxWAck2iYnKqXYaLuOrDW56R3T6WS4MPw%40mail.gmail.com</a>.<br />

--001a11450ac88bc74d0560df9ae5--

.

Author: Richard Hodges <hodges.r@gmail.com>
Date: Thu, 21 Dec 2017 21:33:23 +0100 Raw View

--001a114aad788737270560df9c57
Content-Type: text/plain; charset="UTF-8"

I can't say that unordered_map has ever been a bottleneck for me.



On 21 December 2017 at 21:16, Thiago Macieira <thiago@macieira.org> wrote:

> On quinta-feira, 21 de dezembro de 2017 13:36:46 -02 Bob Fang wrote:
> > Hi,
> >
> > At this point we all know unordered_map is by design cache unfriendly and
> > slow, I am just wondering if there is any outstanding papers that
> describe
> > a hash map that we can implement to be fast.
>
> unordered_map is cache-unfriendly because it's a hashing table. I don't
> know
> of any cache-friendly hasing tables.
>
> It's not slow.
>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
>    Software Architect - Intel Open Source Technology Center
>
> --
> You received this message because you are subscribed to the Google Groups
> "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> To view this discussion on the web visit https://groups.google.com/a/
> isocpp.org/d/msgid/std-proposals/3945078.WOE4bWxDhs%40tjmaciei-mobl1.
>

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CALvx3hbpwnXSb8NXp%3DZtmYNNLn8vWp2D7iX31hz7Pwv1GWMcHQ%40mail.gmail.com.

--001a114aad788737270560df9c57
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>I can&#39;t say that unordered_map has ever been a bo=
ttleneck for me.</div><div><br></div><div><br></div></div><div class=3D"gma=
il_extra"><br><div class=3D"gmail_quote">On 21 December 2017 at 21:16, Thia=
go Macieira <span dir=3D"ltr">&lt;<a href=3D"mailto:thiago@macieira.org" ta=
rget=3D"_blank">thiago@macieira.org</a>&gt;</span> wrote:<br><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;p=
adding-left:1ex"><span class=3D"">On quinta-feira, 21 de dezembro de 2017 1=
3:36:46 -02 Bob Fang wrote:<br>
&gt; Hi,<br>
&gt;<br>
&gt; At this point we all know unordered_map is by design cache unfriendly =
and<br>
&gt; slow, I am just wondering if there is any outstanding papers that desc=
ribe<br>
&gt; a hash map that we can implement to be fast.<br>
<br>
</span>unordered_map is cache-unfriendly because it&#39;s a hashing table. =
I don&#39;t know<br>
of any cache-friendly hasing tables.<br>
<br>
It&#39;s not slow.<br>
<br>
--<br>
Thiago Macieira - thiago (AT) <a href=3D"http://macieira.info" rel=3D"noref=
errer" target=3D"_blank">macieira.info</a> - thiago (AT) <a href=3D"http://=
kde.org" rel=3D"noreferrer" target=3D"_blank">kde.org</a><br>
=C2=A0 =C2=A0Software Architect - Intel Open Source Technology Center<br>
<span class=3D""><br>
--<br>
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br>
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals%2Bunsubscribe@isocpp.org">std-propo=
sals+unsubscribe@<wbr>isocpp.org</a>.<br>
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br>
</span>To view this discussion on the web visit <a href=3D"https://groups.g=
oogle.com/a/isocpp.org/d/msgid/std-proposals/3945078.WOE4bWxDhs%40tjmaciei-=
mobl1" rel=3D"noreferrer" target=3D"_blank">https://groups.google.com/a/<wb=
r>isocpp.org/d/msgid/std-<wbr>proposals/3945078.WOE4bWxDhs%<wbr>40tjmaciei-=
mobl1</a>.<br>
</blockquote></div><br></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/CALvx3hbpwnXSb8NXp%3DZtmYNNLn8vWp2D7i=
X31hz7Pwv1GWMcHQ%40mail.gmail.com?utm_medium=3Demail&utm_source=3Dfooter">h=
ttps://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CALvx3hbpwnXSb8=
NXp%3DZtmYNNLn8vWp2D7iX31hz7Pwv1GWMcHQ%40mail.gmail.com</a>.<br />

--001a114aad788737270560df9c57--

.

Author: Ben Craig <ben.craig@gmail.com>
Date: Thu, 21 Dec 2017 13:31:24 -0800 (PST) Raw View

------=_Part_6680_473071213.1513891884954
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

I do not know of any better hash map proposals that are in flight.  There i=
s a flat map proposal though.

The main issue with unordered map is that it needs to be implemented with l=
ink chaining via a linked list per bucket.  At best, for a hash miss, you h=
ave one memory lookup (good!), and two lookups for a hash hit (less good). =
At worst, you need to inspect every element in the bucket, which is usually=
 going to be one or more cache misses per element.

Link chaining is a requirement because of the iterator invalidation guarant=
ees.  A rehash is not allowed to invalidate iterators.

A (typically) faster implementation is to embed the elements in the top lev=
el array, and then use some probing method for collisions.  When traversing=
 down the collisions, many probing methods will allow the next element in t=
he collision chain to be on the same cache line.  In the best cases, hits a=
nd misses can both require just one cache line lookup.  In the presence of =
collisions, you get fewer than N cache lookups.

This scheme has some drawbacks.  Large key / value pairs consume a lot more=
 memory than with link chaining.  Each key type needs two sentinel values..=
.. one for not present and one for recently deleted.

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/3efdfdc2-94c3-4ec0-a08f-15f20ff26cae%40isocpp.or=
g.

------=_Part_6680_473071213.1513891884954--

.

Author: Thiago Macieira <thiago@macieira.org>
Date: Thu, 21 Dec 2017 21:43:33 -0200 Raw View

On quinta-feira, 21 de dezembro de 2017 18:32:49 -02 Andrew Tomazos wrote:
> Doesn't the current unordered_map spec require (implicitly) that the
> elements are heap allocated (so pointers are not invalidated across a
> rehash), and linked collision resolution?  An alternative design would be
> to use open addressing and store the elements within the actual hash table
> (no heap allocation).  The later would be significantly faster for some T
> as it avoids heap allocation on write, and an extra level of indirection on
> read (which is cache-unfriendly).  Or do I miss something?

Please benchmark whether this is a significant bottleneck or not. My gut
feeling is that it's not going to be a relevant contribution.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/3668279.eIDDDPsqeC%40tjmaciei-mobl1.

.

Author: Arthur O'Dwyer <arthur.j.odwyer@gmail.com>
Date: Thu, 21 Dec 2017 17:16:51 -0800 (PST) Raw View

------=_Part_2143_1046756556.1513905411409
Content-Type: multipart/alternative;
 boundary="----=_Part_2144_350237831.1513905411409"

------=_Part_2144_350237831.1513905411409
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Thursday, December 21, 2017 at 1:31:25 PM UTC-8, Ben Craig wrote:
>
> I do not know of any better hash map proposals that are in flight.  There=
=20
> is a flat map proposal though.
>
> The main issue with unordered map is that it needs to be implemented with=
=20
> link chaining via a linked list per bucket.  At best, for a hash miss, yo=
u=20
> have one memory lookup (good!), and two lookups for a hash hit (less good=
).=20
> At worst, you need to inspect every element in the bucket, which is usual=
ly=20
> going to be one or more cache misses per element.
>
> Link chaining is a requirement because of the iterator invalidation=20
> guarantees.  A rehash is not allowed to invalidate iterators.
>

More directly, link chaining is a requirement because of the following=20
members of unordered_map:
- typename local_iterator
- bucket_count()
- bucket(const Key&)
- bucket_size(size_t)
- begin(size_t)
- end(size_t)
- load_factor()
- rehash(size_t)

Notice that load_factor() is also the sole reason that unordered_map=20
requires a floating-point unit.

=20

> A (typically) faster implementation is to embed the elements in the top=
=20
> level array, and then use some probing method for collisions.  When=20
> traversing down the collisions, many probing methods will allow the next=
=20
> element in the collision chain to be on the same cache line.  In the best=
=20
> cases, hits and misses can both require just one cache line lookup.  In t=
he=20
> presence of collisions, you get fewer than N cache lookups.
>
> This scheme has some drawbacks.  Large key / value pairs consume a lot=20
> more memory than with link chaining.  Each key type needs two sentinel=20
> values... one for not present and one for recently deleted.
>

The thing about data structures, IMHO, is that if you really care about=20
their performance you probably won't be happy with *any* product of a=20
committee. Whereas there is a flourishing cottage industry producing data=
=20
structure libraries for general use. See for example Google's=20
dense_hash_map and sparse_hash_map; and a quick googling turns up sparsepp,=
=20
hopscotch-map, ska::flat_hash_map,... And a special mention to=20
ska::flat_hash_map because its author has written some really nice blog=20
posts <https://probablydance.com/2017/02/26/i-wrote-the-fastest-hashtable/>=
=20
on the subject of hashtable performance and benchmarks (including more=20
hashtable implementations I didn't mention in this paragraph).

We can't standardize all of these containers, but at the same time, nothing=
=20
prevents you from using them in production today (except natural caution=20
about performance "optimizations" =E2=80=94 and such caution should apply j=
ust as=20
gravely to anything coming out of the C++ standard library).

The benefit of std::unordered_map is that it provides a relatively=20
hard-to-misuse vocabulary type for storing objects that may not have an=20
efficient (or even possible) "less-than" ordering. It doesn't *need* to be=
=20
fast because its primary benefit *is not performance-related*.

(Now, I still think unordered_map is stupider than it ought to be, with all=
=20
that bucket crap and reliance on float and such; but my complaints pertain=
=20
to wanting *less* of the API we *have*, as opposed to wanting a whole=20
nother API to learn *also*.)

=E2=80=93Arthur

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/8fbf9854-fb12-4d5a-888f-464d14e5e706%40isocpp.or=
g.

------=_Part_2144_350237831.1513905411409
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Thursday, December 21, 2017 at 1:31:25 PM UTC-8, Ben Cr=
aig wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left:=
 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">I do not know of any=
 better hash map proposals that are in flight. =C2=A0There is a flat map pr=
oposal though.<p>The main issue with unordered map is that it needs to be i=
mplemented with link chaining via a linked list per bucket. =C2=A0At best, =
for a hash miss, you have one memory lookup (good!), and two lookups for a =
hash hit (less good). At worst, you need to inspect every element in the bu=
cket, which is usually going to be one or more cache misses per element.</p=
><p>Link chaining is a requirement because of the iterator invalidation gua=
rantees. =C2=A0A rehash is not allowed to invalidate iterators.</p></blockq=
uote><div><br></div><div>More directly, link chaining is a requirement beca=
use of the following members of unordered_map:</div><div>- typename local_i=
terator</div><div>- bucket_count()</div><div>- bucket(const Key&amp;)</div>=
<div>- bucket_size(size_t)</div><div>- begin(size_t)<br></div><div>- end(si=
ze_t)</div><div>- load_factor()</div><div>- rehash(size_t)</div><div><br></=
div><div>Notice that load_factor() is also the sole reason that unordered_m=
ap requires a floating-point unit.</div><div><br></div><div>=C2=A0</div><bl=
ockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border=
-left: 1px #ccc solid;padding-left: 1ex;"><p>A (typically) faster implement=
ation is to embed the elements in the top level array, and then use some pr=
obing method for collisions. =C2=A0When traversing down the collisions, man=
y probing methods will allow the next element in the collision chain to be =
on the same cache line. =C2=A0In the best cases, hits and misses can both r=
equire just one cache line lookup. =C2=A0In the presence of collisions, you=
 get fewer than N cache lookups.</p><p>This scheme has some drawbacks. =C2=
=A0Large key / value pairs consume a lot more memory than with link chainin=
g. =C2=A0Each key type needs two sentinel values... one for not present and=
 one for recently deleted.</p></blockquote><div><br></div><div>The thing ab=
out data structures, IMHO, is that if you really care about their performan=
ce you probably won&#39;t be happy with <i>any</i> product of a committee. =
Whereas there is a flourishing cottage industry producing data structure li=
braries for general use. See for example Google&#39;s dense_hash_map and sp=
arse_hash_map; and a quick googling turns up sparsepp, hopscotch-map, ska::=
flat_hash_map,... And a special mention to ska::flat_hash_map because its a=
uthor has written <a href=3D"https://probablydance.com/2017/02/26/i-wrote-t=
he-fastest-hashtable/">some really nice blog posts</a> on the subject of ha=
shtable performance and benchmarks (including more hashtable implementation=
s I didn&#39;t mention in this paragraph).</div><div><br></div><div>We can&=
#39;t standardize all of these containers, but at the same time, nothing pr=
events you from using them in production today (except natural caution abou=
t performance &quot;optimizations&quot; =E2=80=94 and such caution should a=
pply just as gravely to anything coming out of the C++ standard library).</=
div><div><br></div><div>The benefit of std::unordered_map is that it provid=
es a relatively hard-to-misuse vocabulary type for storing objects that may=
 not have an efficient (or even possible) &quot;less-than&quot; ordering. I=
t doesn&#39;t <i>need</i> to be fast because its primary benefit <i>is not =
performance-related</i>.</div><div><br></div><div>(Now, I still think unord=
ered_map is stupider than it ought to be, with all that bucket crap and rel=
iance on float and such; but my complaints pertain to wanting <i>less</i>=
=C2=A0of the API we <i>have</i>, as opposed to wanting a whole nother API t=
o learn <i>also</i>.)</div><div><br></div><div>=E2=80=93Arthur</div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/8fbf9854-fb12-4d5a-888f-464d14e5e706%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/8fbf9854-fb12-4d5a-888f-464d14e5e706=
%40isocpp.org</a>.<br />

------=_Part_2144_350237831.1513905411409--

------=_Part_2143_1046756556.1513905411409--

.

Author: Andrey Semashev <andrey.semashev@gmail.com>
Date: Fri, 22 Dec 2017 13:18:37 +0300 Raw View

On 12/22/17 04:16, Arthur O'Dwyer wrote:
>=20
> The thing about data structures, IMHO, is that if you really care about=
=20
> their performance you probably won't be happy with /any/ product of a=20
> committee. Whereas there is a flourishing cottage industry producing=20
> data structure libraries for general use. See for example Google's=20
> dense_hash_map and sparse_hash_map; and a quick googling turns up=20
> sparsepp, hopscotch-map, ska::flat_hash_map,... And a special mention to=
=20
> ska::flat_hash_map because its author has written some really nice blog=
=20
> posts=20
> <https://probablydance.com/2017/02/26/i-wrote-the-fastest-hashtable/> on=
=20
> the subject of hashtable performance and benchmarks (including more=20
> hashtable implementations I didn't mention in this paragraph).
>=20
> We can't standardize all of these containers, but at the same time,=20
> nothing prevents you from using them in production today (except natural=
=20
> caution about performance "optimizations" =E2=80=94 and such caution shou=
ld=20
> apply just as gravely to anything coming out of the C++ standard library)=
..
>=20
> The benefit of std::unordered_map is that it provides a relatively=20
> hard-to-misuse vocabulary type for storing objects that may not have an=
=20
> efficient (or even possible) "less-than" ordering. It doesn't /need/ to=
=20
> be fast because its primary benefit /is not performance-related/.

I disagree. The primary argument you hear about using unordered_map vs.=20
map is that the former provides O(1) amortized lookup, which is a=20
performance argument. The fact that unordered_map does not require=20
ordering its keys, while may be significant, I don't think is crucial=20
when making that choice. Most of the time, if not always, there is no=20
problem in defining an ordering predicate, even if not natively provided=20
by the key type.

I think the premise that standard containers are not for performance is=20
wrong because it effectively tells people to not use the standard=20
library containers. The standard library has to provide the best=20
implementation of the generally useful components. If that means having=20
more containers in the standard library, so be it.

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/40e441c7-d8fa-3c2e-6a9e-49022fc3cce0%40gmail.com=
..

.

Author: Nicol Bolas <jmckesson@gmail.com>
Date: Fri, 22 Dec 2017 07:39:27 -0800 (PST) Raw View

------=_Part_8515_913646218.1513957167393
Content-Type: multipart/alternative;
 boundary="----=_Part_8516_846056506.1513957167393"

------=_Part_8516_846056506.1513957167393
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable



On Friday, December 22, 2017 at 5:18:41 AM UTC-5, Andrey Semashev wrote:
>
> On 12/22/17 04:16, Arthur O'Dwyer wrote:=20
> >=20
> > The thing about data structures, IMHO, is that if you really care about=
=20
> > their performance you probably won't be happy with /any/ product of a=
=20
> > committee. Whereas there is a flourishing cottage industry producing=20
> > data structure libraries for general use. See for example Google's=20
> > dense_hash_map and sparse_hash_map; and a quick googling turns up=20
> > sparsepp, hopscotch-map, ska::flat_hash_map,... And a special mention t=
o=20
> > ska::flat_hash_map because its author has written some really nice blog=
=20
> > posts=20
> > <https://probablydance.com/2017/02/26/i-wrote-the-fastest-hashtable/>=
=20
> on=20
> > the subject of hashtable performance and benchmarks (including more=20
> > hashtable implementations I didn't mention in this paragraph).=20
> >=20
> > We can't standardize all of these containers, but at the same time,=20
> > nothing prevents you from using them in production today (except natura=
l=20
> > caution about performance "optimizations" =E2=80=94 and such caution sh=
ould=20
> > apply just as gravely to anything coming out of the C++ standard=20
> library).=20
> >=20
> > The benefit of std::unordered_map is that it provides a relatively=20
> > hard-to-misuse vocabulary type for storing objects that may not have an=
=20
> > efficient (or even possible) "less-than" ordering. It doesn't /need/ to=
=20
> > be fast because its primary benefit /is not performance-related/.=20
>
> I disagree. The primary argument you hear about using unordered_map vs.=
=20
> map is that the former provides O(1) amortized lookup, which is a=20
> performance argument. The fact that unordered_map does not require=20
> ordering its keys, while may be significant, I don't think is crucial=20
> when making that choice. Most of the time, if not always, there is no=20
> problem in defining an ordering predicate, even if not natively provided=
=20
> by the key type.=20
>
> I think the premise that standard containers are not for performance is=
=20
> wrong because it effectively tells people to not use the standard=20
> library containers. The standard library has to provide the best=20
> implementation of the generally useful components. If that means having=
=20
> more containers in the standard library, so be it.
>

The question of course is what constitutes "generally useful".

A hash table is "generally useful". There are times when you genuinely need=
=20
O(1) lookup, when your algorithm demands that in order to perform=20
reasonably.

However, there are multiple ways one could implement a hash table, each=20
having different properties. One implementation may prioritize element=20
contiguity over element stability. And so forth. But a lot of users of hash=
=20
tables don't really care that much which implementation it is; they just=20
need a hash table and take an appropriate one off the shelf.

That's what the standard library containers are for: being that type you=20
can pull off the shelf and use, which has the expected properties of that=
=20
kind of container.

To me, the standard library needs a number of containers. But it shouldn't=
=20
try to provide every possible container that someone, somewhere,* might*=20
want to use. For every general "kind" of container, the standard library=20
should provide an appropriate type. But it should avoid trying to make=20
multiple variations of the same "kind" of container.

Now yes, exceptions can be made. The flat set/maps are really just regular=
=20
sets/maps which restrict the interface in order to provide a more optimal=
=20
implementation. But because they're more optimal in a* lot* of cases, there=
=20
is a justified reason to provide them.

So I would say that if we want this alternative unordered series of=20
containers, there needs to be a clear justification for why they are needed=
..

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/9fab492f-561e-4229-bd84-2f7848e26f8e%40isocpp.or=
g.

------=_Part_8516_846056506.1513957167393
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br>On Friday, December 22, 2017 at 5:18:41 AM UTC-5, =
Andrey Semashev wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;=
margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">On 12/22=
/17 04:16, Arthur O&#39;Dwyer wrote:
<br>&gt;=20
<br>&gt; The thing about data structures, IMHO, is that if you really care =
about=20
<br>&gt; their performance you probably won&#39;t be happy with /any/ produ=
ct of a=20
<br>&gt; committee. Whereas there is a flourishing cottage industry produci=
ng=20
<br>&gt; data structure libraries for general use. See for example Google&#=
39;s=20
<br>&gt; dense_hash_map and sparse_hash_map; and a quick googling turns up=
=20
<br>&gt; sparsepp, hopscotch-map, ska::flat_hash_map,... And a special ment=
ion to=20
<br>&gt; ska::flat_hash_map because its author has written some really nice=
 blog=20
<br>&gt; posts=20
<br>&gt; &lt;<a onmousedown=3D"this.href=3D&#39;https://www.google.com/url?=
q\x3dhttps%3A%2F%2Fprobablydance.com%2F2017%2F02%2F26%2Fi-wrote-the-fastest=
-hashtable%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNE9-_L7HL3CG0fKP2LM1zf=
Kvr0IeQ&#39;;return true;" onclick=3D"this.href=3D&#39;https://www.google.c=
om/url?q\x3dhttps%3A%2F%2Fprobablydance.com%2F2017%2F02%2F26%2Fi-wrote-the-=
fastest-hashtable%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNE9-_L7HL3CG0fK=
P2LM1zfKvr0IeQ&#39;;return true;" href=3D"https://probablydance.com/2017/02=
/26/i-wrote-the-fastest-hashtable/" target=3D"_blank" rel=3D"nofollow">http=
s://probablydance.com/<wbr>2017/02/26/i-wrote-the-<wbr>fastest-hashtable/</=
a>&gt; on=20
<br>&gt; the subject of hashtable performance and benchmarks (including mor=
e=20
<br>&gt; hashtable implementations I didn&#39;t mention in this paragraph).
<br>&gt;=20
<br>&gt; We can&#39;t standardize all of these containers, but at the same =
time,=20
<br>&gt; nothing prevents you from using them in production today (except n=
atural=20
<br>&gt; caution about performance &quot;optimizations&quot; =E2=80=94 and =
such caution should=20
<br>&gt; apply just as gravely to anything coming out of the C++ standard l=
ibrary).
<br>&gt;=20
<br>&gt; The benefit of std::unordered_map is that it provides a relatively=
=20
<br>&gt; hard-to-misuse vocabulary type for storing objects that may not ha=
ve an=20
<br>&gt; efficient (or even possible) &quot;less-than&quot; ordering. It do=
esn&#39;t /need/ to=20
<br>&gt; be fast because its primary benefit /is not performance-related/.
<br>
<br>I disagree. The primary argument you hear about using unordered_map vs.=
=20
<br>map is that the former provides O(1) amortized lookup, which is a=20
<br>performance argument. The fact that unordered_map does not require=20
<br>ordering its keys, while may be significant, I don&#39;t think is cruci=
al=20
<br>when making that choice. Most of the time, if not always, there is no=
=20
<br>problem in defining an ordering predicate, even if not natively provide=
d=20
<br>by the key type.
<br>
<br>I think the premise that standard containers are not for performance is=
=20
<br>wrong because it effectively tells people to not use the standard=20
<br>library containers. The standard library has to provide the best=20
<br>implementation of the generally useful components. If that means having=
=20
<br>more containers in the standard library, so be it.<br></blockquote><div=
><br></div><div>The question of course is what constitutes &quot;generally =
useful&quot;.</div><div><br></div><div>A hash table is &quot;generally usef=
ul&quot;. There are times when you genuinely need O(1) lookup, when your al=
gorithm demands that in order to perform reasonably.</div><div><br></div><d=
iv>However, there are multiple ways one could implement a hash table, each =
having different properties. One implementation may prioritize element cont=
iguity over element stability. And so forth. But a lot of users of hash tab=
les don&#39;t really care that much which implementation it is; they just n=
eed a hash table and take an appropriate one off the shelf.</div><div><br><=
/div><div>That&#39;s what the standard library containers are for: being th=
at type you can pull off the shelf and use, which has the expected properti=
es of that kind of container.</div><div><br></div><div>To me, the standard =
library needs a number of containers. But it shouldn&#39;t try to provide e=
very possible container that someone, somewhere,<i> might</i> want to use. =
For every general &quot;kind&quot; of container, the standard library shoul=
d provide an appropriate type. But it should avoid trying to make multiple =
variations of the same &quot;kind&quot; of container.</div><div><br></div><=
div>Now yes, exceptions can be made. The flat set/maps are really just regu=
lar sets/maps which restrict the interface in order to provide a more optim=
al implementation. But because they&#39;re more optimal in a<i> lot</i> of =
cases, there is a justified reason to provide them.</div><div><br></div><di=
v>So I would say that if we want this alternative unordered series of conta=
iners, there needs to be a clear justification for why they are needed.</di=
v></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/9fab492f-561e-4229-bd84-2f7848e26f8e%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/9fab492f-561e-4229-bd84-2f7848e26f8e=
%40isocpp.org</a>.<br />

------=_Part_8516_846056506.1513957167393--

------=_Part_8515_913646218.1513957167393--

.