Topic: Types Don't Know #


Author: John Bytheway <jbytheway@gmail.com>
Date: Wed, 02 Apr 2014 22:02:50 -0400
Raw View
On 2014-04-02 19:09, Howard Hinnant wrote:
> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html
>
>  We're aiming at the pre-Rapperswil mailing.  Any feedback, positive
> or negative, will be much appreciated.  The paper currently lacks
> proposed wording.  It is more of a technology demonstration.  There's
> a co-author spot open for anyone who wants to turn the demonstration
> into concrete proposed wording. :-)

An excellent presentation.

I see no mention of how to handle types whose hash-worthy content is not
visible in the header (pimpl'ed types, etc.).  For these a templated
hash_append function is impossible and I guess you would need some sort
of type-erased hasher.

These types of objects are less likely to be used as keys, but I think
they ought to be supported.

One way to support this nicely would be to use operator() in place of
append, and then std::function<void(void const*, size_t)> would work as
a type-erased hasher.

Even easier for the user, you could require all hashers to inherit from
some common base class (with a pure virtual append member function), so
that any class X has the choice between the following two styles of
hash_append:

template <class Hasher>
friend void hash_append(Hasher& h, X const& x) noexcept
{
    using hash_defaults::hash_append;
    hash_append(h, x.date_, x.data_);
}

friend void hash_append(std::hasher_base& h, X const& x) noexcept
{
    using hash_defaults::hash_append;
    hash_append(h, x.date_, x.data_);
}

Concrete hasher classes could mark themselves final so as to avoid
unnecessary virtual function call overhead.

Unfortunately, I think any solution will inevitably entail virtual
function call overhead for every append call on every contiguous
subobject of x.  All the more reason to hash the largest possible
contiguous ranges!


Do you think this issue warrants mention?  Do you have any other ideas
as to how to handle it?

John Bytheway

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Wed, 2 Apr 2014 19:09:22 -0400
Raw View
http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/=
master/hashing.html

We're aiming at the pre-Rapperswil mailing.  Any feedback, positive or nega=
tive, will be much appreciated.  The paper currently lacks proposed wording=
..  It is more of a technology demonstration.  There's a co-author spot open=
 for anyone who wants to turn the demonstration into concrete proposed word=
ing. :-)

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Sean Middleditch <sean.middleditch@gmail.com>
Date: Wed, 2 Apr 2014 18:24:52 -0700 (PDT)
Raw View
------=_Part_2136_33432103.1396488292368
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Very well done.

The only thing that stands out to me is the lack of mention of any utility=
=20
for users who may have a contiguous buffer of what may or may not be=20
contiguously hashable types. It'd be handy to pass in some pointer and a=20
size and have it Do The Right Thing(tm). Essentially the T(&a)[N] overload=
=20
but for dynamic-length arrays (pointers+length). If ranges are added to the=
=20
stdlib, it would be fine to just make sure that range<T*> does the right=20
thing when hashed. Otherwise, some new function (likely with a new name,=20
not an overload, to avoid issues with the variadic overload of hash_append)=
=20
that explicitly takes a T* and a size_t works.

In terms of use case: plenty of users implement their own containers or=20
buffers (even in modern C++). Look at how Chrome and LLVM both have a=20
stack_vector kind of thing, as one easy example. Circular buffers, KD-trees=
=20
(with lead nodes have >1 object stored in a contiguous chunk), and=20
specialized key-value containers (objects/tables like in YAML/JSON readers=
=20
or Lua generators) all also could use this new hashing infrastructure but=
=20
shouldn't need a lot of repeated template SFINAE magic to make use of the=
=20
contiguous optimization.

On Wednesday, April 2, 2014 4:09:22 PM UTC-7, Howard Hinnant wrote:
>
>
> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blo=
b/master/hashing.html=20
>
> We=E2=80=99re aiming at the pre-Rapperswil mailing.  Any feedback, positi=
ve or=20
> negative, will be much appreciated.  The paper currently lacks proposed=
=20
> wording.  It is more of a technology demonstration.  There=E2=80=99s a co=
-author=20
> spot open for anyone who wants to turn the demonstration into concrete=20
> proposed wording. :-)=20
>
> Howard=20
>
>

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

------=_Part_2136_33432103.1396488292368
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Very well done.<div><br></div><div>The only thing that sta=
nds out to me is the lack of mention of any utility for users who may have =
a contiguous buffer of what may or may not be contiguously hashable types. =
It'd be handy to pass in some pointer and a size and have it Do The Right T=
hing(tm). Essentially the T(&amp;a)[N] overload but for dynamic-length arra=
ys (pointers+length). If ranges are added to the stdlib, it would be fine t=
o just make sure that range&lt;T*&gt; does the right thing when hashed. Oth=
erwise, some new function (likely with a new name, not an overload, to avoi=
d issues with the variadic overload of hash_append) that explicitly takes a=
 T* and a size_t works.</div><div><br></div><div>In terms of use case: plen=
ty of users implement their own containers or buffers (even in modern C++).=
 Look at how Chrome and LLVM both have a stack_vector kind of thing, as one=
 easy example. Circular buffers, KD-trees (with lead nodes have &gt;1 objec=
t stored in a contiguous chunk), and specialized key-value containers (obje=
cts/tables like in YAML/JSON readers or Lua generators) all also could use =
this new hashing infrastructure but shouldn't need a lot of repeated templa=
te SFINAE magic to make use of the contiguous optimization.<br><div><br>On =
Wednesday, April 2, 2014 4:09:22 PM UTC-7, Howard Hinnant wrote:<blockquote=
 class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1=
px #ccc solid;padding-left: 1ex;"><a href=3D"http://htmlpreview.github.io/?=
https://github.com/HowardHinnant/papers/blob/master/hashing.html" target=3D=
"_blank" onmousedown=3D"this.href=3D'http://www.google.com/url?q\75http%3A%=
2F%2Fhtmlpreview.github.io%2F%3Fhttps%3A%2F%2Fgithub.com%2FHowardHinnant%2F=
papers%2Fblob%2Fmaster%2Fhashing.html\46sa\75D\46sntz\0751\46usg\75AFQjCNHP=
0pXriy_dMNhgyflMJ_r9qoA63Q';return true;" onclick=3D"this.href=3D'http://ww=
w.google.com/url?q\75http%3A%2F%2Fhtmlpreview.github.io%2F%3Fhttps%3A%2F%2F=
github.com%2FHowardHinnant%2Fpapers%2Fblob%2Fmaster%2Fhashing.html\46sa\75D=
\46sntz\0751\46usg\75AFQjCNHP0pXriy_dMNhgyflMJ_r9qoA63Q';return true;">http=
://htmlpreview.github.io/?<wbr>https://github.com/<wbr>HowardHinnant/papers=
/blob/<wbr>master/hashing.html</a>
<br>
<br>We=E2=80=99re aiming at the pre-Rapperswil mailing. &nbsp;Any feedback,=
 positive or negative, will be much appreciated. &nbsp;The paper currently =
lacks proposed wording. &nbsp;It is more of a technology demonstration. &nb=
sp;There=E2=80=99s a co-author spot open for anyone who wants to turn the d=
emonstration into concrete proposed wording. :-)
<br>
<br>Howard
<br>
<br></blockquote></div></div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

------=_Part_2136_33432103.1396488292368--

.


Author: Sebastian Gesemann <s.gesemann@gmail.com>
Date: Thu, 3 Apr 2014 12:46:57 +0200
Raw View
As for hash functions I would like to point out another option:
SipHash. It is fast but "cryotographic enough" to be able to avoid
hash collision attacks. If anyone of you thinks about using
unordered_map for some service where an attacker can choose keys for
values to store, it should be hard for an attacker to let the
unordered_map degenerate to a linked list. One way of avoiding this is
to use SipHash with a randomly chosen key.

https://131002.net/siphash/

So far, SipHash is getting very popular. It's already used in Perl,
Python, Redis, Ruby, Rust, systemd, OpenDNS, Haskell ...

On Thu, Apr 3, 2014 at 1:09 AM, Howard Hinnant <howard.hinnant@gmail.com> w=
rote:
> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blo=
b/master/hashing.html
>
> We're aiming at the pre-Rapperswil mailing.  Any feedback, positive or ne=
gative, will be much appreciated.  The paper currently lacks proposed wordi=
ng.  It is more of a technology demonstration.  There's a co-author spot op=
en for anyone who wants to turn the demonstration into concrete proposed wo=
rding. :-)
>
> Howard
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups=
 "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an=
 email to std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> Visit this group at http://groups.google.com/a/isocpp.org/group/std-propo=
sals/.

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Daniel James <dnljms@gmail.com>
Date: Thu, 3 Apr 2014 12:18:23 +0100
Raw View
On 3 April 2014 11:46, Sebastian Gesemann <s.gesemann@gmail.com> wrote:
> As for hash functions I would like to point out another option:
> SipHash.

FWIW I implemented a SipHash example a while ago, but didn't release
it because I wasn't happy with the implementation and didn't feel
enthusiastic enough to improve it:

https://github.com/boostorg/unordered/tree/0f080552fa6c14076938b2a2aa68635d61cca8a0/examples/siphash

But I think the strength of this proposal is that it supports a
generic mechanism for supporting different hash functions, so should
support both a randomized hash function and a more traditional one.
And not really specify the exact algorithm, a lot of the popular
hashing algorithms aren't portable to a variety of processor types.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Daniel James <dnljms@gmail.com>
Date: Thu, 3 Apr 2014 03:04:51 +0100
Raw View
On 3 April 2014 00:09, Howard Hinnant <howard.hinnant@gmail.com> wrote:
> We're aiming at the pre-Rapperswil mailing.  Any feedback, positive or ne=
gative, will be much appreciated.  The paper currently lacks proposed wordi=
ng.  It is more of a technology demonstration.  There's a co-author spot op=
en for anyone who wants to turn the demonstration into concrete proposed wo=
rding. :-)

I agree that hash_combine shouldn't be adopted, it has some problems
(e.g. no initialisation or finalisation of the hash value, limited
state). Since I think you're comparing various hash function with the
boost implementation, it might be relevant that I'm working on a new
version, but haven't found much time for it and am constrained by API
compatibility with previous versions. But it should be a bit better.

Although, using a faster, low quality hash function is entirely in
line with the current standard. If higher quality hash functions are
wanted, the standard really should say so. There's no requirement
about how hash values for unequal values are distributed. As it is,
unordered containers can only assume the very basic hash function
requirements that are stated in the standard, so they need to put in
extra effort to deal with poor hash functions. If a slower, higher
quality hash function is used, then the cost has to be paid twice -
both in the hash function and in the container.

A problem with Hasher as it's currently defined is that it doesn't
take into account the beginning and end of lists. For example, the
list [[1,2],[3]] will hash to the same value as [[1],[2,3]] and a list
of empty lists will always hash to the same value regardless of its
length.

It would be desirable to be able to use randomized hash functions,
such as siphash. These would need to be constructed from values stored
in the hash function object, so might not be default constructible, so
I don't think Hasher should be required to be default constructible.

I dislike using a cast to get the hash value from the Hasher. It seems
like an operation that should have a name, especially because many
hash functions have a finalisation stage, so they're doing something
more than a conversion. Since it's an explicit conversion, it's often
more verbose to call 'static_cast' than to call a member function
anyway. And I feel like a conversion confuses two different types.

It would also be nice if the finalisation function could return types
other than std::size_t. In some circumstances a 64-bit hash value can
be expensive and unnecessary (the standard containers use std::size_t,
but this more general hash function could have other uses). Similarly,
this mechanism could be used for stronger hash functions with much
larger hash values.

A type that is contiguously hashable can probably also use a binary
comparison for equality checks, but I don't that's strictly a
consequence, so it be better to have a slightly stronger trait which
indicates: "x=3D=3Dy if and only if memcmp(addressof(x), addressof(y),
sizeof(T)) =3D=3D 0". Could be called something like
contiguously_equality_comparable.

Is it possible that there will be ambiguity if 'hash_append' is
overloaded separately for a Hasher and a type?

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: =?UTF-8?Q?Andrzej_Krzemie=C5=84ski?= <akrzemi1@gmail.com>
Date: Thu, 3 Apr 2014 07:06:02 -0700 (PDT)
Raw View
------=_Part_484_30263887.1396533962598
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable



W dniu czwartek, 3 kwietnia 2014 01:09:22 UTC+2 u=C5=BCytkownik Howard Hinn=
ant=20
napisa=C5=82:
>
>
> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blo=
b/master/hashing.html=20
>
> We=E2=80=99re aiming at the pre-Rapperswil mailing.  Any feedback, positi=
ve or=20
> negative, will be much appreciated.  The paper currently lacks proposed=
=20
> wording.  It is more of a technology demonstration.  There=E2=80=99s a co=
-author=20
> spot open for anyone who wants to turn the demonstration into concrete=20
> proposed wording. :-)=20
>
> Howard=20
>

This technique of adapting types to the framework appears to me to be=20
similar to what Boost.Serialization<http://www.boost.org/doc/libs/1_55_0/li=
bs/serialization/doc/index.html>does. I wonder if it could be generalized t=
o things beyond hashing, like=20
serialization. =20

Regards,
&rzej

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

------=_Part_484_30263887.1396533962598
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br>W dniu czwartek, 3 kwietnia 2014 01:09:22 UTC+2 u=
=C5=BCytkownik Howard Hinnant napisa=C5=82:<blockquote class=3D"gmail_quote=
" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding=
-left: 1ex;"><a href=3D"http://htmlpreview.github.io/?https://github.com/Ho=
wardHinnant/papers/blob/master/hashing.html" target=3D"_blank" onmousedown=
=3D"this.href=3D'http://www.google.com/url?q\75http%3A%2F%2Fhtmlpreview.git=
hub.io%2F%3Fhttps%3A%2F%2Fgithub.com%2FHowardHinnant%2Fpapers%2Fblob%2Fmast=
er%2Fhashing.html\46sa\75D\46sntz\0751\46usg\75AFQjCNHP0pXriy_dMNhgyflMJ_r9=
qoA63Q';return true;" onclick=3D"this.href=3D'http://www.google.com/url?q\7=
5http%3A%2F%2Fhtmlpreview.github.io%2F%3Fhttps%3A%2F%2Fgithub.com%2FHowardH=
innant%2Fpapers%2Fblob%2Fmaster%2Fhashing.html\46sa\75D\46sntz\0751\46usg\7=
5AFQjCNHP0pXriy_dMNhgyflMJ_r9qoA63Q';return true;">http://htmlpreview.githu=
b.io/?<wbr>https://github.com/<wbr>HowardHinnant/papers/blob/<wbr>master/ha=
shing.html</a>
<br>
<br>We=E2=80=99re aiming at the pre-Rapperswil mailing. &nbsp;Any feedback,=
 positive or negative, will be much appreciated. &nbsp;The paper currently =
lacks proposed wording. &nbsp;It is more of a technology demonstration. &nb=
sp;There=E2=80=99s a co-author spot open for anyone who wants to turn the d=
emonstration into concrete proposed wording. :-)
<br>
<br>Howard
<br></blockquote><div><br>This technique of adapting types to the framework=
 appears to me to be similar to what&nbsp;<a href=3D"http://www.boost.org/d=
oc/libs/1_55_0/libs/serialization/doc/index.html">Boost.Serialization</a> d=
oes. I wonder if it could be generalized to things beyond hashing, like ser=
ialization.&nbsp; <br><br>Regards,<br>&amp;rzej<br></div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

------=_Part_484_30263887.1396533962598--

.


Author: rhalbersma@gmail.com
Date: Thu, 3 Apr 2014 12:42:37 -0700 (PDT)
Raw View
------=_Part_1405_6684668.1396554157587
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Thursday, April 3, 2014 4:06:02 PM UTC+2, Andrzej Krzemie=C5=84ski wrote=
:
>
> This technique of adapting types to the framework appears to me to be=20
>> similar to what Boost.Serialization<http://www.boost.org/doc/libs/1_55_0=
/libs/serialization/doc/index.html>does. I wonder if it could be generalize=
d to things beyond hashing, like=20
>> serialization. =20
>>
>
> Regards,
> &rzej
>

Both hashing/serialization *extract* the bit sequence. A similar thing=20
would be to allow to* insert* the bit sequence. This allows randomly=20
generating any class X (e.g. to stress test the very hash function you are=
=20
interested in!).=20

So perhaps one could define two lower-level functions bit_insert() and=20
bit_extract() [or maybe names like bit_push_back()/bit_pop_front() to=20
reflect FIFO character of bit sequence?] on top of which higher-level=20
functionality like hash_append, serialize() and random_generate() could be=
=20
written.

Rein=20

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

------=_Part_1405_6684668.1396554157587
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Thursday, April 3, 2014 4:06:02 PM UTC+2, Andrzej Krzem=
ie=C5=84ski wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;marg=
in-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"=
ltr"><blockquote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;=
border-left:1px #ccc solid;padding-left:1ex"><span style=3D"font-size: 13px=
;">This technique of adapting types to the framework appears to me to be si=
milar to what&nbsp;</span><a href=3D"http://www.boost.org/doc/libs/1_55_0/l=
ibs/serialization/doc/index.html" target=3D"_blank" onmousedown=3D"this.hre=
f=3D'http://www.google.com/url?q\75http%3A%2F%2Fwww.boost.org%2Fdoc%2Flibs%=
2F1_55_0%2Flibs%2Fserialization%2Fdoc%2Findex.html\46sa\75D\46sntz\0751\46u=
sg\75AFQjCNEUpLd8vkkGx8p3FP6IH9i_c24ZGw';return true;" onclick=3D"this.href=
=3D'http://www.google.com/url?q\75http%3A%2F%2Fwww.boost.org%2Fdoc%2Flibs%2=
F1_55_0%2Flibs%2Fserialization%2Fdoc%2Findex.html\46sa\75D\46sntz\0751\46us=
g\75AFQjCNEUpLd8vkkGx8p3FP6IH9i_c24ZGw';return true;" style=3D"font-size: 1=
3px;">Boost.Serialization</a><span style=3D"font-size: 13px;"> does. I wond=
er if it could be generalized to things beyond hashing, like serialization.=
 &nbsp;</span><br></blockquote><div><br>Regards,<br>&amp;rzej<br></div></di=
v></blockquote><div><br></div><div>Both hashing/serialization <i>extract</i=
> the bit sequence. A similar thing would be to allow to<i> insert</i> the =
bit sequence. This allows randomly generating any class X (e.g. to stress t=
est the very hash function you are interested in!).&nbsp;</div><div><br></d=
iv><div>So perhaps one could define two lower-level functions bit_insert() =
and bit_extract() [or maybe names like bit_push_back()/bit_pop_front() to r=
eflect FIFO character of bit sequence?] on top of which higher-level functi=
onality like hash_append, serialize() and random_generate() could be writte=
n.</div><div><span style=3D"font-size: 13px;"><br></span></div><div><span s=
tyle=3D"font-size: 13px;">Rein&nbsp;</span></div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

------=_Part_1405_6684668.1396554157587--

.


Author: Alex B <devalexb@gmail.com>
Date: Thu, 3 Apr 2014 13:03:20 -0700 (PDT)
Raw View
------=_Part_20_8440078.1396555400981
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

I'm not sure that hashing and serialization can be combined so simply in a=
=20
common interface.
=20
For serialization, when extracting a sequence container, you usually need=
=20
to first extract the size of the container.
For hashing, it is probably not desirable to hash_append the size of the=20
container.
=20

On Thursday, April 3, 2014 3:42:37 PM UTC-4, rhalb...@gmail.com wrote:

> On Thursday, April 3, 2014 4:06:02 PM UTC+2, Andrzej Krzemie=C5=84ski wro=
te:
>>
>> This technique of adapting types to the framework appears to me to be=20
>>> similar to what Boost.Serialization<http://www.boost.org/doc/libs/1_55_=
0/libs/serialization/doc/index.html>does. I wonder if it could be generaliz=
ed to things beyond hashing, like=20
>>> serialization. =20
>>>
>>
>> Regards,
>> &rzej
>>
>
> Both hashing/serialization *extract* the bit sequence. A similar thing=20
> would be to allow to* insert* the bit sequence. This allows randomly=20
> generating any class X (e.g. to stress test the very hash function you ar=
e=20
> interested in!).=20
>
> So perhaps one could define two lower-level functions bit_insert() and=20
> bit_extract() [or maybe names like bit_push_back()/bit_pop_front() to=20
> reflect FIFO character of bit sequence?] on top of which higher-level=20
> functionality like hash_append, serialize() and random_generate() could b=
e=20
> written.
>
> Rein=20
>

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

------=_Part_20_8440078.1396555400981
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>I'm not sure that hashing and serialization can be co=
mbined so simply&nbsp;in a common interface.</div><div>&nbsp;</div><div>For=
 serialization, when extracting a sequence container, you usually need to f=
irst extract the size of the container.</div><div>For hashing, it is probab=
ly not desirable to hash_append the size of the container.</div><div>&nbsp;=
</div><div><br>On Thursday, April 3, 2014 3:42:37 PM UTC-4, rhalb...@gmail.=
com wrote:</div><blockquote class=3D"gmail_quote" style=3D"margin: 0px 0px =
0px 0.8ex; padding-left: 1ex; border-left-color: rgb(204, 204, 204); border=
-left-width: 1px; border-left-style: solid;"><div dir=3D"ltr">On Thursday, =
April 3, 2014 4:06:02 PM UTC+2, Andrzej Krzemie=C5=84ski wrote:<blockquote =
class=3D"gmail_quote" style=3D"margin: 0px 0px 0px 0.8ex; padding-left: 1ex=
; border-left-color: rgb(204, 204, 204); border-left-width: 1px; border-lef=
t-style: solid;"><div dir=3D"ltr"><blockquote class=3D"gmail_quote" style=
=3D"margin: 0px 0px 0px 0.8ex; padding-left: 1ex; border-left-color: rgb(20=
4, 204, 204); border-left-width: 1px; border-left-style: solid;"><span styl=
e=3D"font-size: 13px;">This technique of adapting types to the framework ap=
pears to me to be similar to what&nbsp;</span><a style=3D"font-size: 13px;"=
 onmousedown=3D"this.href=3D'http://www.google.com/url?q\75http%3A%2F%2Fwww=
..boost.org%2Fdoc%2Flibs%2F1_55_0%2Flibs%2Fserialization%2Fdoc%2Findex.html\=
46sa\75D\46sntz\0751\46usg\75AFQjCNEUpLd8vkkGx8p3FP6IH9i_c24ZGw';return tru=
e;" onclick=3D"this.href=3D'http://www.google.com/url?q\75http%3A%2F%2Fwww.=
boost.org%2Fdoc%2Flibs%2F1_55_0%2Flibs%2Fserialization%2Fdoc%2Findex.html\4=
6sa\75D\46sntz\0751\46usg\75AFQjCNEUpLd8vkkGx8p3FP6IH9i_c24ZGw';return true=
;" href=3D"http://www.boost.org/doc/libs/1_55_0/libs/serialization/doc/inde=
x.html" target=3D"_blank">Boost.Serialization</a><span style=3D"font-size: =
13px;"> does. I wonder if it could be generalized to things beyond hashing,=
 like serialization. &nbsp;</span><br></blockquote><div><br>Regards,<br>&am=
p;rzej<br></div></div></blockquote><div><br></div><div>Both hashing/seriali=
zation <i>extract</i> the bit sequence. A similar thing would be to allow t=
o<i> insert</i> the bit sequence. This allows randomly generating any class=
 X (e.g. to stress test the very hash function you are interested in!).&nbs=
p;</div><div><br></div><div>So perhaps one could define two lower-level fun=
ctions bit_insert() and bit_extract() [or maybe names like bit_push_back()/=
bit_pop_front(<wbr>) to reflect FIFO character of bit sequence?] on top of =
which higher-level functionality like hash_append, serialize() and random_g=
enerate() could be written.</div><div><span style=3D"font-size: 13px;"><br>=
</span></div><div><span style=3D"font-size: 13px;">Rein&nbsp;</span></div><=
/div></blockquote></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

------=_Part_20_8440078.1396555400981--

.


Author: Richard Smith <richard@metafoo.co.uk>
Date: Thu, 3 Apr 2014 13:31:21 -0700
Raw View
--001a1136526ac0744404f62948cd
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Thu, Apr 3, 2014 at 1:03 PM, Alex B <devalexb@gmail.com> wrote:

> I'm not sure that hashing and serialization can be combined so simply in =
a
> common interface.
>
> For serialization, when extracting a sequence container, you usually need
> to first extract the size of the container.
> For hashing, it is probably not desirable to hash_append the size of the
> container.
>

As has been demonstrated already, leaving out the size when hashing a
container results in a bad hash in this model; [ [1], [2, 3] ] and [ [1,
2], [3] ] would hash the same.


> On Thursday, April 3, 2014 3:42:37 PM UTC-4, rhalb...@gmail.com wrote:
>
>> On Thursday, April 3, 2014 4:06:02 PM UTC+2, Andrzej Krzemie=C5=84ski wr=
ote:
>>>
>>> This technique of adapting types to the framework appears to me to be
>>>> similar to what Boost.Serialization<http://www.boost.org/doc/libs/1_55=
_0/libs/serialization/doc/index.html>does. I wonder if it could be generali=
zed to things beyond hashing, like
>>>> serialization.
>>>>
>>>
>>> Regards,
>>> &rzej
>>>
>>
>> Both hashing/serialization *extract* the bit sequence. A similar thing
>> would be to allow to* insert* the bit sequence. This allows randomly
>> generating any class X (e.g. to stress test the very hash function you a=
re
>> interested in!).
>>
>> So perhaps one could define two lower-level functions bit_insert() and
>> bit_extract() [or maybe names like bit_push_back()/bit_pop_front() to
>> reflect FIFO character of bit sequence?] on top of which higher-level
>> functionality like hash_append, serialize() and random_generate() could =
be
>> written.
>>
>> Rein
>>
>  --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> Visit this group at
> http://groups.google.com/a/isocpp.org/group/std-proposals/.
>

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

--001a1136526ac0744404f62948cd
Content-Type: text/html; charset=ISO-8859-2
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">On T=
hu, Apr 3, 2014 at 1:03 PM, Alex B <span dir=3D"ltr">&lt;<a href=3D"mailto:=
devalexb@gmail.com" target=3D"_blank">devalexb@gmail.com</a>&gt;</span> wro=
te:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-=
left:1px #ccc solid;padding-left:1ex">
<div dir=3D"ltr"><div>I&#39;m not sure that hashing and serialization can b=
e combined so simply=A0in a common interface.</div><div>=A0</div><div>For s=
erialization, when extracting a sequence container, you usually need to fir=
st extract the size of the container.</div>
<div>For hashing, it is probably not desirable to hash_append the size of t=
he container.</div></div></blockquote><div><br></div><div>As has been demon=
strated already, leaving out the size when hashing a container results in a=
 bad hash in this model; [ [1], [2, 3] ] and [ [1, 2], [3] ] would hash the=
 same.</div>
<div>=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;=
border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div cla=
ss=3D"h5"><div>On Thursday, April 3, 2014 3:42:37 PM UTC-4, <a href=3D"mail=
to:rhalb...@gmail.com" target=3D"_blank">rhalb...@gmail.com</a> wrote:</div=
>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;padding=
-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-l=
eft-style:solid"><div dir=3D"ltr">On Thursday, April 3, 2014 4:06:02 PM UTC=
+2, Andrzej Krzemie=F1ski wrote:<blockquote class=3D"gmail_quote" style=3D"=
margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204=
);border-left-width:1px;border-left-style:solid">
<div dir=3D"ltr"><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px =
0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-w=
idth:1px;border-left-style:solid"><span style=3D"font-size:13px">This techn=
ique of adapting types to the framework appears to me to be similar to what=
=A0</span><a style=3D"font-size:13px" href=3D"http://www.boost.org/doc/libs=
/1_55_0/libs/serialization/doc/index.html" target=3D"_blank">Boost.Serializ=
ation</a><span style=3D"font-size:13px"> does. I wonder if it could be gene=
ralized to things beyond hashing, like serialization. =A0</span><br>
</blockquote><div><br>Regards,<br>&amp;rzej<br></div></div></blockquote><di=
v><br></div><div>Both hashing/serialization <i>extract</i> the bit sequence=
.. A similar thing would be to allow to<i> insert</i> the bit sequence. This=
 allows randomly generating any class X (e.g. to stress test the very hash =
function you are interested in!).=A0</div>
<div><br></div><div>So perhaps one could define two lower-level functions b=
it_insert() and bit_extract() [or maybe names like bit_push_back()/bit_pop_=
front(<u></u>) to reflect FIFO character of bit sequence?] on top of which =
higher-level functionality like hash_append, serialize() and random_generat=
e() could be written.</div>
<div><span style=3D"font-size:13px"><br></span></div><div><span style=3D"fo=
nt-size:13px">Rein=A0</span></div></div></blockquote></div></div></div><div=
 class=3D"HOEnZb"><div class=3D"h5">

<p></p>

-- <br>
<br>
--- <br>
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br>
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org" target=3D"_=
blank">std-proposals+unsubscribe@isocpp.org</a>.<br>
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org" target=3D"_blank">std-proposals@isocpp.org</a>.<br>
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/" target=3D"_blank">http://groups.google.com/a/isocpp.org/gro=
up/std-proposals/</a>.<br>
</div></div></blockquote></div><br></div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

--001a1136526ac0744404f62948cd--

.


Author: Andrew Tomazos <andrewtomazos@gmail.com>
Date: Fri, 4 Apr 2014 00:10:57 +0200
Raw View
--047d7b670833ed86e104f62aace4
Content-Type: text/plain; charset=ISO-8859-1

Hi Howard,

I did quickly read your proposal and would like to share a couple of ideas:

- In cryptography a secure hash function has a well-defined mathematical
definition.  Roughly, if we define a true random hash function as a
function that contains an infinite table with an independent true random
integer value in the range [0..2^n) assigned to every possible bit string
{0, 1, 00, 01, 10, 11, 000, 001,...}, then a secure hash function is a
function that cannot be distinguished from a true random hash by an
attacker in polynomial time of n (called with "negligble" probability over
guessing).  I believe it can be shown that for compound sequence types
(such as arrays, vectors, lists, tuples and simple "struct" classes) that
applying a secure hash function to the bitstring formed by concatenating
the results (hash codes) of a secure hash function applied to its
subobjects, is itself a secure hash function.  This addresses the [[1],
[2,3]] vs [[1,2], [3]] issue.  I think a similar approach can be argued
less rigorously with "nearly" secure hash functions.  It wasn't 100% clear
if you have studied or considered this idea from the proposal.

- We are working on a reflection feature that will allow you to enumerate
and visit the member sub-objects of a class type given as a type template
parameter.  Interesting to think about would be rather than defining a
separate "hash visitor" per type, you could instead define the visitor once
in the program such that it defaults to a memberwise hash composition.
 This is most likely orthogonal to what you are proposing, but worth
pondering.

Regards,
Andrew.




On Thu, Apr 3, 2014 at 1:09 AM, Howard Hinnant <howard.hinnant@gmail.com>wrote:

>
> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html
>
> We're aiming at the pre-Rapperswil mailing.  Any feedback, positive or
> negative, will be much appreciated.  The paper currently lacks proposed
> wording.  It is more of a technology demonstration.  There's a co-author
> spot open for anyone who wants to turn the demonstration into concrete
> proposed wording. :-)
>
> Howard
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> Visit this group at
> http://groups.google.com/a/isocpp.org/group/std-proposals/.
>

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

--047d7b670833ed86e104f62aace4
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Howard,<div><br></div><div>I did quickly read your prop=
osal and would like to share a couple of ideas:</div><div><br></div><div>- =
In cryptography a secure hash function has a well-defined mathematical defi=
nition. =A0Roughly, if we define a true random hash function as a function =
that contains an infinite table with an independent true random integer val=
ue in the range [0..2^n) assigned to every possible bit string {0, 1, 00, 0=
1, 10, 11, 000, 001,...}, then a secure hash function is a function that ca=
nnot be distinguished from a true random hash by an attacker in polynomial =
time of n (called with &quot;negligble&quot; probability over guessing). =
=A0I believe it can be shown that for compound sequence types (such as arra=
ys, vectors, lists, tuples and simple &quot;struct&quot; classes) that appl=
ying a secure hash function to the bitstring formed by concatenating the re=
sults (hash codes) of a secure hash function applied to its subobjects, is =
itself a secure hash function. =A0This addresses the [[1], [2,3]] vs [[1,2]=
, [3]] issue. =A0I think a similar approach can be argued less rigorously w=
ith &quot;nearly&quot; secure hash functions. =A0It wasn&#39;t 100% clear i=
f you have studied or considered this idea from the proposal.</div>
<div><br></div><div>- We are working on a reflection feature that will allo=
w you to enumerate and visit the member sub-objects of a class type given a=
s a type template parameter. =A0Interesting to think about would be rather =
than defining a separate &quot;hash visitor&quot; per type, you could inste=
ad define the visitor once in the program such that it defaults to a member=
wise hash composition. =A0This is most likely orthogonal to what you are pr=
oposing, but worth pondering.</div>
<div><br></div><div>Regards,</div><div>Andrew.</div><div><br></div><div><br=
></div></div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">=
On Thu, Apr 3, 2014 at 1:09 AM, Howard Hinnant <span dir=3D"ltr">&lt;<a hre=
f=3D"mailto:howard.hinnant@gmail.com" target=3D"_blank">howard.hinnant@gmai=
l.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><a href=3D"http://htmlpreview.github.io/?htt=
ps://github.com/HowardHinnant/papers/blob/master/hashing.html" target=3D"_b=
lank">http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers=
/blob/master/hashing.html</a><br>

<br>
We&#39;re aiming at the pre-Rapperswil mailing. =A0Any feedback, positive o=
r negative, will be much appreciated. =A0The paper currently lacks proposed=
 wording. =A0It is more of a technology demonstration. =A0There&#39;s a co-=
author spot open for anyone who wants to turn the demonstration into concre=
te proposed wording. :-)<br>

<span class=3D"HOEnZb"><font color=3D"#888888"><br>
Howard<br>
<br>
--<br>
<br>
---<br>
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br>
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals%2Bunsubscribe@isocpp.org">std-propo=
sals+unsubscribe@isocpp.org</a>.<br>
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br>
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/" target=3D"_blank">http://groups.google.com/a/isocpp.org/gro=
up/std-proposals/</a>.<br>
</font></span></blockquote></div><br></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

--047d7b670833ed86e104f62aace4--

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Thu, 3 Apr 2014 20:24:34 -0400
Raw View
Thanks Sean.  I'll give this more thought.  But I'm not immediately coming =
up with an idea for an API.  If we had ranges (or when we get ranges), I ca=
n easily see that range<T> should do the right thing.  Perhaps it is best t=
o just wait for ranges to settle before proceeding on this point.  In the m=
ean time, if people use vector (or whatever std::container), then they can =
just pass the whole vector to the Hasher and it will do the right thing.

Howard

On Apr 2, 2014, at 9:24 PM, Sean Middleditch <sean.middleditch@gmail.com> w=
rote:

> Very well done.
>=20
> The only thing that stands out to me is the lack of mention of any utilit=
y for users who may have a contiguous buffer of what may or may not be cont=
iguously hashable types. It'd be handy to pass in some pointer and a size a=
nd have it Do The Right Thing(tm). Essentially the T(&a)[N] overload but fo=
r dynamic-length arrays (pointers+length). If ranges are added to the stdli=
b, it would be fine to just make sure that range<T*> does the right thing w=
hen hashed. Otherwise, some new function (likely with a new name, not an ov=
erload, to avoid issues with the variadic overload of hash_append) that exp=
licitly takes a T* and a size_t works.
>=20
> In terms of use case: plenty of users implement their own containers or b=
uffers (even in modern C++). Look at how Chrome and LLVM both have a stack_=
vector kind of thing, as one easy example. Circular buffers, KD-trees (with=
 lead nodes have >1 object stored in a contiguous chunk), and specialized k=
ey-value containers (objects/tables like in YAML/JSON readers or Lua genera=
tors) all also could use this new hashing infrastructure but shouldn't need=
 a lot of repeated template SFINAE magic to make use of the contiguous opti=
mization.
>=20
> On Wednesday, April 2, 2014 4:09:22 PM UTC-7, Howard Hinnant wrote:
> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blo=
b/master/hashing.html=20
>=20
> We're aiming at the pre-Rapperswil mailing.  Any feedback, positive or ne=
gative, will be much appreciated.  The paper currently lacks proposed wordi=
ng.  It is more of a technology demonstration.  There's a co-author spot op=
en for anyone who wants to turn the demonstration into concrete proposed wo=
rding. :-)=20
>=20
> Howard=20
>=20
>=20
> --=20
>=20
> ---=20
> You received this message because you are subscribed to the Google Groups=
 "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an=
 email to std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> Visit this group at http://groups.google.com/a/isocpp.org/group/std-propo=
sals/.

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Thu, 3 Apr 2014 20:27:01 -0400
Raw View
On Apr 2, 2014, at 10:02 PM, John Bytheway <jbytheway@gmail.com> wrote:

> On 2014-04-02 19:09, Howard Hinnant wrote:
>> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/bl=
ob/master/hashing.html
>>=20
>> We're aiming at the pre-Rapperswil mailing.  Any feedback, positive
>> or negative, will be much appreciated.  The paper currently lacks
>> proposed wording.  It is more of a technology demonstration.  There's
>> a co-author spot open for anyone who wants to turn the demonstration
>> into concrete proposed wording. :-)
>=20
> An excellent presentation.
>=20
> I see no mention of how to handle types whose hash-worthy content is not
> visible in the header (pimpl'ed types, etc.).  For these a templated
> hash_append function is impossible and I guess you would need some sort
> of type-erased hasher.
>=20
> These types of objects are less likely to be used as keys, but I think
> they ought to be supported.
>=20
> One way to support this nicely would be to use operator() in place of
> append, and then std::function<void(void const*, size_t)> would work as
> a type-erased hasher.

This is a most interesting suggestion, thanks.  The authors have actually b=
een debating such a signature, but purely on aesthetic grounds.  Your argum=
ent gives heavy weight to the the use of operator() on *functional* grounds=
..  That is most significant, and most appreciated.  Thanks!

Howard

>=20
> Even easier for the user, you could require all hashers to inherit from
> some common base class (with a pure virtual append member function), so
> that any class X has the choice between the following two styles of
> hash_append:
>=20
> template <class Hasher>
> friend void hash_append(Hasher& h, X const& x) noexcept
> {
>    using hash_defaults::hash_append;
>    hash_append(h, x.date_, x.data_);
> }
>=20
> friend void hash_append(std::hasher_base& h, X const& x) noexcept
> {
>    using hash_defaults::hash_append;
>    hash_append(h, x.date_, x.data_);
> }
>=20
> Concrete hasher classes could mark themselves final so as to avoid
> unnecessary virtual function call overhead.
>=20
> Unfortunately, I think any solution will inevitably entail virtual
> function call overhead for every append call on every contiguous
> subobject of x.  All the more reason to hash the largest possible
> contiguous ranges!
>=20
>=20
> Do you think this issue warrants mention?  Do you have any other ideas
> as to how to handle it?
>=20
> John Bytheway
>=20
> --=20
>=20
> ---=20
> You received this message because you are subscribed to the Google Groups=
 "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an=
 email to std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> Visit this group at http://groups.google.com/a/isocpp.org/group/std-propo=
sals/.

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Alex B <devalexb@gmail.com>
Date: Thu, 3 Apr 2014 20:53:27 -0400
Raw View
--089e013d1c4c15e71304f62cf24d
Content-Type: text/plain; charset=ISO-8859-1

On Thu, Apr 3, 2014 at 4:31 PM, Richard Smith <richard@metafoo.co.uk> wrote:

> On Thu, Apr 3, 2014 at 1:03 PM, Alex B <devalexb@gmail.com> wrote:
>
>> I'm not sure that hashing and serialization can be combined so simply in
>> a common interface.
>>
>> For serialization, when extracting a sequence container, you usually need
>> to first extract the size of the container.
>> For hashing, it is probably not desirable to hash_append the size of the
>> container.
>>
>
> As has been demonstrated already, leaving out the size when hashing a
> container results in a bad hash in this model; [ [1], [2, 3] ] and [ [1,
> 2], [3] ] would hash the same.
>
>

You will only have bad hash when you have containers of containers. If you
want to get the hash of a single container/string, there is no need to
combine/append the size of the container.

For pair<string, string>, you only need to combine/append the size of the
first string, not both of them.

For vector<string>, you need the size of the size()-1 first strings (no
need for last one). No need for the size of the vector.

But I admit that generalizing and expressing this (that the size is needed
in some contexts but not all) with clean code design can be quite a
challenge and it is very tempting to just always use the size. But it would
be a bit sad, considering that one of the most wanted hashable is a simple
std::string, which does not need the size in the hash (when taken alone).

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

--089e013d1c4c15e71304f62cf24d
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Thu, Apr 3, 2014 at 4:31 PM, Richard Smith <span dir=3D=
"ltr">&lt;<a href=3D"mailto:richard@metafoo.co.uk" target=3D"_blank">richar=
d@metafoo.co.uk</a>&gt;</span> wrote:<br><div class=3D"gmail_extra"><div cl=
ass=3D"gmail_quote">

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra">=
<div class=3D"gmail_quote"><div>On Thu, Apr 3, 2014 at 1:03 PM, Alex B <spa=
n dir=3D"ltr">&lt;<a href=3D"mailto:devalexb@gmail.com" target=3D"_blank">d=
evalexb@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
<div dir=3D"ltr"><div>I&#39;m not sure that hashing and serialization can b=
e combined so simply=A0in a common interface.</div><div>=A0</div><div>For s=
erialization, when extracting a sequence container, you usually need to fir=
st extract the size of the container.</div>


<div>For hashing, it is probably not desirable to hash_append the size of t=
he container.</div></div></blockquote><div><br></div></div><div>As has been=
 demonstrated already, leaving out the size when hashing a container result=
s in a bad hash in this model; [ [1], [2, 3] ] and [ [1, 2], [3] ] would ha=
sh the same.</div>


<div>=A0</div></div></div></div></blockquote><div><br></div><div>You will o=
nly have bad hash when you have containers of containers. If you want to ge=
t the hash of a single container/string, there is no need to combine/append=
 the size of the container.</div>

<div><br></div><div>For pair&lt;string, string&gt;, you only need to combin=
e/append the size of the first string, not both of them.</div><div><br></di=
v><div>For vector&lt;string&gt;, you need the size of the size()-1 first st=
rings (no need for last one). No need for the size of the vector.</div>
<div><br></div><div>But I admit that generalizing and expressing this (that=
 the size is needed in some contexts but not all) with clean code design ca=
n be quite a challenge and it is very tempting to just always use the size.=
 But it would be a bit sad, considering that one of the most wanted hashabl=
e is a simple std::string, which does not need the size in the hash (when t=
aken alone).</div>
</div></div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

--089e013d1c4c15e71304f62cf24d--

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Thu, 3 Apr 2014 21:33:47 -0400
Raw View
On Apr 2, 2014, at 10:04 PM, Daniel James <dnljms@gmail.com> wrote:

> A problem with Hasher as it's currently defined is that it doesn't
> take into account the beginning and end of lists. For example, the
> list [[1,2],[3]] will hash to the same value as [[1],[2,3]] and a list
> of empty lists will always hash to the same value regardless of its
> length.

Thanks for pointing out these issues.

The first issue looks like two different types in C++:  list [[1,2],[3]] an=
d list [[1],[2,3]].  There are two different classes of clients to hash_app=
end:

1.  The unordered containers which will be hashing types of only one kind (=
e.g. X, not Y).  If hash(X) =3D=3D hash(Y), such containers shouldn't care.

2.  Type Z, composed of X and Y, and is implementing its hash_append by has=
h_append'ing hash_append(X) and hash_append(Y).  The author of Z may or may=
 not be concerned that the hash of X and the hash of Y could be the same fo=
r degenerate cases.  If the author of Z does have such concerns, we can eas=
ily deal with such issues in the hash_append for Z, for example:

template <class Hasher>
void
hash_append(Hasher& h, const Z& z)
{
    using std::hash_append;
    hash_append("list 1", z.list1_);
    hash_append("list 2", z.list2_);
}

This seems like the ideal:  "pay for the feature only if you need it" solut=
ion.  unordered_set<X> doesn't care that it may hash(X) with collisions gen=
erated by unordered_set<Y>, and so should not suffer any extra hashing expe=
nse.  Clients like Z may indeed care.  And can very easily add whatever ext=
ra attributes (and expense) to meet their needs.

The second issue is similar: an empty list of lists.  Here we only have one=
 type, e.g. list<list<T>>.  And so unordered_set<list<list<T>>> might right=
ly be concerned about this.  However only the author of the instantiation u=
nordered_set<list<list<T>>> can truly know if empty lists are a case of con=
cern or not.  And if so, can easily prescribe the minimal fix in exactly th=
e right place for their problem domain:

template <class Hasher =3D spooky>
struct my_container_hash
{
    template <class Container>
    std::size_t
    operator()(Container const & c) const noexcept
    {
        Hasher h;
        using std::hash_append;
        hash_append(h, c.size(), c);
        return static_cast<std::size_t>(h);
    }
};

....

std::unordered_set<list<list<T>>, my_container_hash<>> my_container.

Now we have an especially elegant solution to the problem:  We happily pay =
the price for prepending the hash of the outer container size, but have no =
desire/motivation to pay that price for the inner container size.  Again, e=
verybody ends up happy: paying for only as much as they need.

> It would be desirable to be able to use randomized hash functions,
> such as sip hash.

Agreed.  We are currently using a hash function that takes a seed:

template <class T, class Hasher =3D detail::spooky_wrapper>
class hardened_hash
    : public detail::hardened_hash_base <std::size_t>
{
    typedef detail::hardened_hash_base <std::size_t> base;
public:
    typedef T argument_type;
    using detail::hardened_hash_base <std::size_t>::result_type;

public:
    hardened_hash() =3D default;
    explicit hardened_hash(result_type seed)
        : base (seed)
    {
    }

    result_type
    operator() (argument_type const& key) const noexcept
    {
        Hasher h {base::seed()};
        hash_append (h, key);
        return static_cast<result_type> (h);
    }
};

It is not currently clear to us if this API should be standardized, or if i=
t should be left up to the programmer to provide this API if wanted.  Shown=
 above we have templated it on T.  But this actually is not necessary, and =
could be simplified to:

template <class Hasher =3D my_favorite_hasher>
class hardened_hash
{
    // store seed here...
public:

    hardened_hash() =3D default;
    explicit hardened_hash(result_type seed);

    template <class T>
    result_type
    operator() (T const& key) const noexcept
    {
        Hasher h {seed()};
        hash_append (h, key);
        return static_cast<result_type> (h);
    }
};

If we don't standardize this, hardened_hash demonstrates that it is still e=
asily available to the programmer with O(1) effort.

> I dislike using a cast to get the hash value from the Hasher.


This is an admittedly experimental interface.

> It would also be nice if the finalisation function could return types
> other than std::size_t.

I am interested in exploring this possibility as well.

> A type that is contiguously hashable can probably also use a binary
> comparison for equality checks, but I don't that's strictly a
> consequence, so it be better to have a slightly stronger trait which
> indicates: "x=3D=3Dy if and only if memcmp(addressof(x), addressof(y),
> sizeof(T)) =3D=3D 0". Could be called something like
> contiguously_equality_comparable.

We will give this more thought.

> Is it possible that there will be ambiguity if 'hash_append' is
> overloaded separately for a Hasher and a type?


We are hoping there is zero motivation to do so.  The goal is that hash_app=
end is ignorant of the Hasher.

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Thu, 3 Apr 2014 21:34:55 -0400
Raw View
On Apr 3, 2014, at 6:46 AM, Sebastian Gesemann <s.gesemann@gmail.com> wrote:

> As for hash functions I would like to point out another option:
> SipHash. It is fast but "cryotographic enough" to be able to avoid
> hash collision attacks. If anyone of you thinks about using
> unordered_map for some service where an attacker can choose keys for
> values to store, it should be hard for an attacker to let the
> unordered_map degenerate to a linked list. One way of avoiding this is
> to use SipHash with a randomly chosen key.
>
> https://131002.net/siphash/
>
> So far, SipHash is getting very popular. It's already used in Perl,
> Python, Redis, Ruby, Rust, systemd, OpenDNS, Haskell ...

I agree SipHash looks very attractive.  I've only glanced at it, but I believe it can easily be cast to meet the requirements of a Hasher.

Howard

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Thu, 3 Apr 2014 21:41:36 -0400
Raw View
On Apr 3, 2014, at 8:53 PM, Alex B <devalexb@gmail.com> wrote:

> For pair<string, string>, you only need to combine/append the size of the=
 first string, not both of them.
>=20

Actually for pair<string, string> you do not need the size for either.  The=
 proposal proposes to hash the trailing null of a string, even for empty st=
rings.  This is done to make it compatible (exactly the same) for the hash =
of string literals, which are always composed of at least one byte.

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Felipe Magno de Almeida <felipe.m.almeida@gmail.com>
Date: Thu, 3 Apr 2014 22:44:15 -0300
Raw View
On Thu, Apr 3, 2014 at 10:41 PM, Howard Hinnant
<howard.hinnant@gmail.com> wrote:
> On Apr 3, 2014, at 8:53 PM, Alex B <devalexb@gmail.com> wrote:
>
>> For pair<string, string>, you only need to combine/append the size of th=
e first string, not both of them.
>>
> Actually for pair<string, string> you do not need the size for either.  T=
he proposal proposes to hash the trailing null of a string, even for empty =
strings.  This is done to make it compatible (exactly the same) for the has=
h of string literals, which are always composed of at least one byte.

Wouldn't these two clash?

"\0" "abc"
"" "\0abc"

> Howard
>
> --

Regards,
--=20
Felipe Magno de Almeida

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Thu, 3 Apr 2014 21:45:19 -0400
Raw View
On Apr 3, 2014, at 6:10 PM, Andrew Tomazos <andrewtomazos@gmail.com> wrote:

>  I think a similar approach can be argued less rigorously with "nearly" s=
ecure hash functions.  It wasn't 100% clear if you have studied or consider=
ed this idea from the proposal.

This work was motivated by a desire by the programmer to apply hash functio=
n X, where the programmer gets to choose X, and not have it compromised by =
a combining step.  Algorithm X may be motivated by cryptography, or efficie=
ncy, or possibly designed for a known finite set of inputs (a special case,=
 non-secure, perfect hash).

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Thu, 3 Apr 2014 22:07:16 -0400
Raw View
On Apr 3, 2014, at 9:44 PM, Felipe Magno de Almeida <felipe.m.almeida@gmail=
..com> wrote:

> On Thu, Apr 3, 2014 at 10:41 PM, Howard Hinnant
> <howard.hinnant@gmail.com> wrote:
>> On Apr 3, 2014, at 8:53 PM, Alex B <devalexb@gmail.com> wrote:
>>=20
>>> For pair<string, string>, you only need to combine/append the size of t=
he first string, not both of them.
>>>=20
>> Actually for pair<string, string> you do not need the size for either.  =
The proposal proposes to hash the trailing null of a string, even for empty=
 strings.  This is done to make it compatible (exactly the same) for the ha=
sh of string literals, which are always composed of at least one byte.
>=20
> Wouldn't these two clash?
>=20
> "\0" "abc"
> "" "\0abc"

Absolutely yes.  So will these:

    make_pair(1u, 2)
    make_pair(1, 2u)

And so will these:

    string("abc") + string("def")
    string("abcdef")

Note in this latter case that also:

    string("abc") + string("def") =3D=3D string("abcdef")

and in the second case, it would be reasonable to amend the standard such t=
hat:

    make_pair(1u, 2) =3D=3D make_pair(1, 2u)

In all cases you've got two (or more) values (maybe the same type, maybe no=
t) and when their hashable states are concatenated, create the same contigu=
ous stream of bytes.  This *is* the feature.

If a type X has a pair of such types, and does not want this behavior, such=
 a type can easily insert unique bytes between these embedded types:

template <class Hasher>
void
hash_append(Hasher& h, const X& x)
{
   using std::hash_append;
   hash_append("string 1", x.string1_);
   hash_append("string 2", x.string2_);
}

The point of this system is that it puts the class author in complete contr=
ol of the hashed state of the object, without making the class author commi=
t to any particular hashing algorithm.  He only needs to write the hashing =
support code once.  And once done, his clients can hash his object with any=
 algorithm of *their* choosing (not his choosing).

Client of type X's responsibility:  Choose a generic hashing algorithm to h=
ash X with.

Author of type X's responsibility: Decide what bytes to present in X to an =
unknown hashing algorithm (and in what order).

Hashing algorithm author's responsibility:  Decide how to hash a stream of =
bytes.

None of these 3 programmers need coordinate with the other, beyond conformi=
ng to the universal requirements of Hasher and hash_append.

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Fri, 4 Apr 2014 00:10:15 -0400
Raw View
I believe this discussion has helped me realize one further requirement:

A type X should document how it exposes itself to a hash function.  For exa=
mple it is important for the class X author to know that its member data ve=
ctor<int> simply concatenates the hashable states of all those ints.  And t=
hat its member string simply concatenates the state of all those chars.

The type should clearly document how it exposes its state to the Hasher, so=
 that other types which embed it, can better decide how they want to expose=
 themselves to a Hasher.

For example if some type Person, which contains two std::strings, wants to =
expose itself to a Hasher, it is important for Person to know if string exp=
oses its length or not to the Hasher.  That way Person can make an informed=
 decision as to whether or not it wants to prepend (for example) extra data=
 to either of its two std::strings while exposing itself to the hasher (suc=
h as the string's size, or the color of its passport).

Such a decision is so much better made by Person, than it is by std::string=
..  If std::string did not clearly document what it exposed, then Person cou=
ld not portably decide how to expose / amend its two strings.

And even though all these types publicly disclose what bytes they are sendi=
ng to the hashing algorithm, the hashing algorithm can still be just as sec=
ure / reliable.  After all, the hashing algorithms simply say:  give me you=
r bytes unscrambled:  I will scramble them for you.  That's my job, not you=
rs.

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Bjorn Reese <breese@mail1.stofanet.dk>
Date: Fri, 04 Apr 2014 13:48:00 +0200
Raw View
On 04/03/2014 04:06 PM, Andrzej Krzemie=C5=84ski wrote:

> This technique of adapting types to the framework appears to me to be
> similar to what Boost.Serialization
> <http://www.boost.org/doc/libs/1_55_0/libs/serialization/doc/index.html>
> does. I wonder if it could be generalized to things beyond hashing, like
> serialization.

What a wonderful observation.

I just wrote a small hashing archive for Boost.Serialization, and it
worked pretty much out-of-the-box. Class X then becomes:

     class X
     {
         std::tuple<short, unsigned char, unsigned char> date_;
         std::vector<std::pair<int, int>> data_;

     public:
         X();

         template<typename T>
         void serialize(T& archive, const unsigned int)
         {
             archive & date_;
             archive & data_;
         }
     };

The question about how data items are combined can be delegate to
specializations of the non-intrusive serialize() functions.

     namespace boost { namespace serialization {

     template <typename Archive, typename T1, typename T2>
     void save(Archive& archive,
               const std::pair<T1, T2>& data,
               const unsigned int /* version */)
     {
         archive << data.first;
         archive << data.second;
     }

     // And a serialize() function that calls split_free
     }}

Now I can calculate the hash of class X using:

     X x;
     hasher::oarchive<hasher::fnv1a> archive;
     archive << x;
     std::size_t result =3D archive.get();

And I can serialize the same class to JSON without any changes to X:

     X x;
     std::ostringsteam output;
     json::oarchive archive(output);
     archive << x;
     std::string result =3D output.str();

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: dixlorenz@gmail.com
Date: Fri, 4 Apr 2014 10:28:18 -0700 (PDT)
Raw View
------=_Part_147_1262393.1396632498620
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

I like the idea; I think it could be even more generic and be templated on=
=20
the resulting hash type.

When specialising std::hash the resulting type obviously has to be=20
std::size_t; but with this concept the Hasher could actually return=20
a std::array<unsigned char, 16> (MD5), std::array<unsigned char, 20> (SHA1)=
=20
etc... Such a Hasher wouldn't be directly usable for a std::unordered_map=
=20
any more, but it would open up different uses for it without (AFAICT)=20
affecting the normal use with Hashers that return a std::size_t.

On Thursday, April 3, 2014 1:09:22 AM UTC+2, Howard Hinnant wrote:
>
>
> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blo=
b/master/hashing.html=20
>
> We=E2=80=99re aiming at the pre-Rapperswil mailing.  Any feedback, positi=
ve or=20
> negative, will be much appreciated.  The paper currently lacks proposed=
=20
> wording.  It is more of a technology demonstration.  There=E2=80=99s a co=
-author=20
> spot open for anyone who wants to turn the demonstration into concrete=20
> proposed wording. :-)=20
>
> Howard=20
>
>

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

------=_Part_147_1262393.1396632498620
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I like the idea; I think it could be even more generic and=
 be templated on the resulting hash type.<div><br></div><div>When specialis=
ing std::hash the resulting type obviously has to be std::size_t; but with =
this concept the Hasher could actually return a&nbsp;std::array&lt;unsigned=
 char, 16&gt; (MD5),&nbsp;std::array&lt;unsigned char, 20&gt; (SHA1) etc...=
 Such a Hasher wouldn't be directly usable for a std::unordered_map any mor=
e, but it would open up different uses for it without (AFAICT) affecting th=
e normal use with Hashers that return a std::size_t.<br><br>On Thursday, Ap=
ril 3, 2014 1:09:22 AM UTC+2, Howard Hinnant wrote:<blockquote class=3D"gma=
il_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid=
;padding-left: 1ex;"><a href=3D"http://htmlpreview.github.io/?https://githu=
b.com/HowardHinnant/papers/blob/master/hashing.html" target=3D"_blank" onmo=
usedown=3D"this.href=3D'http://www.google.com/url?q\75http%3A%2F%2Fhtmlprev=
iew.github.io%2F%3Fhttps%3A%2F%2Fgithub.com%2FHowardHinnant%2Fpapers%2Fblob=
%2Fmaster%2Fhashing.html\46sa\75D\46sntz\0751\46usg\75AFQjCNHP0pXriy_dMNhgy=
flMJ_r9qoA63Q';return true;" onclick=3D"this.href=3D'http://www.google.com/=
url?q\75http%3A%2F%2Fhtmlpreview.github.io%2F%3Fhttps%3A%2F%2Fgithub.com%2F=
HowardHinnant%2Fpapers%2Fblob%2Fmaster%2Fhashing.html\46sa\75D\46sntz\0751\=
46usg\75AFQjCNHP0pXriy_dMNhgyflMJ_r9qoA63Q';return true;">http://htmlprevie=
w.github.io/?<wbr>https://github.com/<wbr>HowardHinnant/papers/blob/<wbr>ma=
ster/hashing.html</a>
<br>
<br>We=E2=80=99re aiming at the pre-Rapperswil mailing. &nbsp;Any feedback,=
 positive or negative, will be much appreciated. &nbsp;The paper currently =
lacks proposed wording. &nbsp;It is more of a technology demonstration. &nb=
sp;There=E2=80=99s a co-author spot open for anyone who wants to turn the d=
emonstration into concrete proposed wording. :-)
<br>
<br>Howard
<br>
<br></blockquote></div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

------=_Part_147_1262393.1396632498620--

.


Author: Sean Middleditch <sean@middleditch.us>
Date: Fri, 4 Apr 2014 16:38:03 -0700
Raw View
On Fri, Apr 4, 2014 at 10:28 AM,  <dixlorenz@gmail.com> wrote:
> I like the idea; I think it could be even more generic and be templated on
> the resulting hash type.

The general interface has no reliance on the result type. The types
being hashed operate on a template "hasher" type with no assumptions
about what the hasher ultimately produces. The example also shows that
a static_cast<size_t>() is used at least in one case, indicating that
a hasher could provide operator size_t(), operator std::array<char,
16>(), or whatever else it likes. I 'm not seeing any reason that a
hasher with a result type incompatible with std::hash (no size_t
output) couldn't be used with the general hash_append machinery. I
think this design satisfies your needs already here.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Geoffrey Romer <gromer@google.com>
Date: Fri, 4 Apr 2014 22:41:45 -0700
Raw View
--047d7bdc7826028c6004f64517a0
Content-Type: text/plain; charset=UTF-8

It might be helpful if the paper explicitly discussed how it relates to
N3333 and N3898, e.g. where any points of agreement or disagreement are,
etc.
On Apr 2, 2014 4:09 PM, "Howard Hinnant" <howard.hinnant@gmail.com> wrote:

>
> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html
>
> We're aiming at the pre-Rapperswil mailing.  Any feedback, positive or
> negative, will be much appreciated.  The paper currently lacks proposed
> wording.  It is more of a technology demonstration.  There's a co-author
> spot open for anyone who wants to turn the demonstration into concrete
> proposed wording. :-)
>
> Howard
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> Visit this group at
> http://groups.google.com/a/isocpp.org/group/std-proposals/.
>

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

--047d7bdc7826028c6004f64517a0
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<p dir=3D"ltr">It might be helpful if the paper explicitly discussed how it=
 relates to N3333 and N3898, e.g. where any points of agreement or disagree=
ment are, etc.</p>
<div class=3D"gmail_quote">On Apr 2, 2014 4:09 PM, &quot;Howard Hinnant&quo=
t; &lt;<a href=3D"mailto:howard.hinnant@gmail.com">howard.hinnant@gmail.com=
</a>&gt; wrote:<br type=3D"attribution"><blockquote class=3D"gmail_quote" s=
tyle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<a href=3D"http://htmlpreview.github.io/?https://github.com/HowardHinnant/p=
apers/blob/master/hashing.html" target=3D"_blank">http://htmlpreview.github=
..io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html</a><b=
r>

<br>
We&#39;re aiming at the pre-Rapperswil mailing. =C2=A0Any feedback, positiv=
e or negative, will be much appreciated. =C2=A0The paper currently lacks pr=
oposed wording. =C2=A0It is more of a technology demonstration. =C2=A0There=
&#39;s a co-author spot open for anyone who wants to turn the demonstration=
 into concrete proposed wording. :-)<br>

<br>
Howard<br>
<br>
--<br>
<br>
---<br>
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br>
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals%2Bunsubscribe@isocpp.org">std-propo=
sals+unsubscribe@isocpp.org</a>.<br>
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br>
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/" target=3D"_blank">http://groups.google.com/a/isocpp.org/gro=
up/std-proposals/</a>.<br>
</blockquote></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

--047d7bdc7826028c6004f64517a0--

.


Author: dixlorenz@gmail.com
Date: Sat, 5 Apr 2014 00:19:44 -0700 (PDT)
Raw View
------=_Part_1386_3438840.1396682384803
Content-Type: text/plain; charset=UTF-8


>
> The general interface has no reliance on the result type. The types

being hashed operate on a template "hasher" type with no assumptions
> about what the hasher ultimately produces. The example also shows that
> a static_cast<size_t>() is used at least in one case, indicating that
> a hasher could provide operator size_t(), operator std::array<char,
> 16>(), or whatever else it likes.


Exactly, it doesn't matter what the hasher produces, uhash will always
result in a std::size_t. That is my point, uhash could be templated on the
result type and then the user could decide if he wants the "actual" hash
result, a std::size_t or whatever. As it stands, if I had an MD5 hasher I
would have to replicate uhash::operator() to get an actual MD5 hash value;
if uhash was templated on the result type I could reuse uhash everywhere
for every hash function.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

------=_Part_1386_3438840.1396682384803
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><blockquote class=3D"gmail_quote" style=3D"margin: 0;margi=
n-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">The general i=
nterface has no reliance on the result type. The types&nbsp;</blockquote><b=
lockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;borde=
r-left: 1px #ccc solid;padding-left: 1ex;">being hashed operate on a templa=
te "hasher" type with no assumptions
<br>about what the hasher ultimately produces. The example also shows that
<br>a static_cast&lt;size_t&gt;() is used at least in one case, indicating =
that
<br>a hasher could provide operator size_t(), operator std::array&lt;char,
<br>16&gt;(), or whatever else it likes. </blockquote><div><br></div><div>E=
xactly, it doesn't matter what the hasher produces, uhash will always resul=
t in a std::size_t. That is my point, uhash could be templated on the resul=
t type and then the user could decide if he wants the "actual" hash result,=
 a std::size_t or whatever. As it stands, if I had an MD5 hasher I would ha=
ve to replicate uhash::operator() to get an actual MD5 hash value; if uhash=
 was templated on the result type I could reuse uhash everywhere for every =
hash function.</div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

------=_Part_1386_3438840.1396682384803--

.


Author: David Krauss <potswa@gmail.com>
Date: Sat, 5 Apr 2014 15:28:04 +0800
Raw View
--Apple-Mail=_FFD02DB0-9C8C-4C64-9D7F-BC5E7F5AA3D1
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=ISO-8859-1


On 2014-04-05, at 3:19 PM, dixlorenz@gmail.com wrote:

> The general interface has no reliance on the result type. The types=20
> being hashed operate on a template "hasher" type with no assumptions=20
> about what the hasher ultimately produces. The example also shows that=20
> a static_cast<size_t>() is used at least in one case, indicating that=20
> a hasher could provide operator size_t(), operator std::array<char,=20
> 16>(), or whatever else it likes.
>=20
> Exactly, it doesn't matter what the hasher produces, uhash will always re=
sult in a std::size_t. That is my point, uhash could be templated on the re=
sult type and then the user could decide if he wants the "actual" hash resu=
lt, a std::size_t or whatever. As it stands, if I had an MD5 hasher I would=
 have to replicate uhash::operator() to get an actual MD5 hash value; if uh=
ash was templated on the result type I could reuse uhash everywhere for eve=
ry hash function.

I'm working on a generalized alternative interface... hoped to post it yest=
erday... stay tuned...

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

--Apple-Mail=_FFD02DB0-9C8C-4C64-9D7F-BC5E7F5AA3D1
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset=ISO-8859-1

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html charset=
=3Dwindows-1252"></head><body style=3D"word-wrap: break-word; -webkit-nbsp-=
mode: space; -webkit-line-break: after-white-space;"><br><div><div>On 2014&=
ndash;04&ndash;05, at 3:19 PM, <a href=3D"mailto:dixlorenz@gmail.com">dixlo=
renz@gmail.com</a> wrote:</div><br class=3D"Apple-interchange-newline"><blo=
ckquote type=3D"cite"><div dir=3D"ltr"><blockquote class=3D"gmail_quote" st=
yle=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-lef=
t: 1ex;">The general interface has no reliance on the result type. The type=
s&nbsp;</blockquote><blockquote class=3D"gmail_quote" style=3D"margin: 0;ma=
rgin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">being hash=
ed operate on a template "hasher" type with no assumptions
<br>about what the hasher ultimately produces. The example also shows that
<br>a static_cast&lt;size_t&gt;() is used at least in one case, indicating =
that
<br>a hasher could provide operator size_t(), operator std::array&lt;char,
<br>16&gt;(), or whatever else it likes. </blockquote><div><br></div><div>E=
xactly, it doesn't matter what the hasher produces, uhash will always resul=
t in a std::size_t. That is my point, uhash could be templated on the resul=
t type and then the user could decide if he wants the "actual" hash result,=
 a std::size_t or whatever. As it stands, if I had an MD5 hasher I would ha=
ve to replicate uhash::operator() to get an actual MD5 hash value; if uhash=
 was templated on the result type I could reuse uhash everywhere for every =
hash function.</div></div></blockquote><div><br></div><div>I&rsquo;m workin=
g on a generalized alternative interface&hellip; hoped to post it yesterday=
&hellip; stay tuned&hellip;</div></div><br></body></html>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

--Apple-Mail=_FFD02DB0-9C8C-4C64-9D7F-BC5E7F5AA3D1--

.


Author: Daniel James <dnljms@gmail.com>
Date: Sat, 5 Apr 2014 08:57:48 +0100
Raw View
On 4 April 2014 02:33, Howard Hinnant <howard.hinnant@gmail.com> wrote:
> On Apr 2, 2014, at 10:04 PM, Daniel James <dnljms@gmail.com> wrote:
>
>> A problem with Hasher as it's currently defined is that it doesn't
>> take into account the beginning and end of lists. For example, the
>> list [[1,2],[3]] will hash to the same value as [[1],[2,3]] and a list
>> of empty lists will always hash to the same value regardless of its
>> length.
>
> Thanks for pointing out these issues.
>
> The first issue looks like two different types in C++:  list [[1,2],[3]] =
and list [[1],[2,3]].  There are two different classes of clients to hash_a=
ppend:

They were meant to be the same type, If it wasn't clear I was using
JSON style lists. In C++ it would be:

std::vector<std::vector<int>> x{{1,2},{3}};
std::vector<std::vector<int>> y{{1},{2,3}};

These both have the same type. The hash algorithm just hashes the leaf
elements, which are the same, so they'll always hash to same value.

> 2.  Type Z, composed of X and Y, and is implementing its hash_append by h=
ash_append'ing hash_append(X) and hash_append(Y).  The author of Z may or m=
ay not be concerned that the hash of X and the hash of Y could be the same =
for degenerate cases.  If the author of Z does have such concerns, we can e=
asily deal with such issues in the hash_append for Z, for example:
>
> template <class Hasher>
> void
> hash_append(Hasher& h, const Z& z)
> {
>     using std::hash_append;

It might be better to use a 'boost::swap' style mechanism to call
hash_append in order to avoid requiring 'using std::hash_append' which
is easily forgotten.

>     hash_append("list 1", z.list1_);
>     hash_append("list 2", z.list2_);
> }
>
> This seems like the ideal:  "pay for the feature only if you need it" sol=
ution.

If an attacker knew you used that method, they could possibly force
collisions for that algorithm. For example with strings, I think
{"","list 2\0"} and {"\0list 2", ""} will hash to the same value.

If there's a choice between an unsafe, fast method and a safe, but
slower method, IMO the default should be the safe method. The
processing requirement is not that large.

>unordered_set<X> doesn't care that it may hash(X) with collisions generate=
d by unordered_set<Y>

I only wrote about problems when hashing the same type.

> The second issue is similar: an empty list of lists.

The problem is for a list containing empty lists. Appending them is a
no-op, so it doesn't alter the hash value. For example, these two have
the same length, but will hash to the same value:

std::list<std::list<int> > x{{},{1},{},{2}};
std::list<std::list<int> > y{{1},{},{},{2}};

>> It would be desirable to be able to use randomized hash functions,
>> such as sip hash.
>
> Agreed.  We are currently using a hash function that takes a seed:
[snip]
>     hardened_hash() =3D default;

This default constructor is problematic. It's very easy to
accidentally use a default constructed hash function, so in order to
be safe it should generate a random hash, which would be surprising
and potentially expensive. My point here was that the default
constructor shouldn't be required.

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Daniel James <dnljms@gmail.com>
Date: Sat, 5 Apr 2014 09:05:17 +0100
Raw View
On 4 April 2014 12:48, Bjorn Reese <breese@mail1.stofanet.dk> wrote:
> On 04/03/2014 04:06 PM, Andrzej Krzemie=F1ski wrote:
>
>> This technique of adapting types to the framework appears to me to be
>> similar to what Boost.Serialization
>> <http://www.boost.org/doc/libs/1_55_0/libs/serialization/doc/index.html>
>>
>> does. I wonder if it could be generalized to things beyond hashing, like
>> serialization.
>
>
> What a wonderful observation.

It that's done, there will still need to be a separate hook for
creating a different hash function as serialization and hashing are
often concerned with different things. For example, a unicode string
might want to use a normalized form for comparison, but serialize the
original encoding to be lossless. Another example is an object which
caches its hash value so it doesn't have to be recomputed every time.

My worry here is that someone wouldn't consider a particular use case
when implementing the general function. Later someone else would use
it and assume that it'd work as there are no compile errors. So they
might find that their elements in an unordered_set aren't unique, or
that they have lossy serialization. I'm not sure how likely that is.

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Bjorn Reese <breese@mail1.stofanet.dk>
Date: Sat, 05 Apr 2014 14:13:04 +0200
Raw View
On 04/05/2014 10:05 AM, Daniel James wrote:

> It that's done, there will still need to be a separate hook for
> creating a different hash function as serialization and hashing are
> often concerned with different things. For example, a unicode string

I agree completely with this. I will even raise the ante and claim that
we also need to distinguish between different types of serialization
(e.g. some formats use type-length-value to serialize containers
whereas others use delimiters), and perhaps even between different
types of hashing.

There is actually a need for two different variation points. One for
the user-defined X::serialize() member function as you mentioned, and
one for the non-intrusive counterparts used to serialize classes like
std::pair and std::vector.

The latter could have been handled easily if C++ had had partial
specialization for template functions. However, we delegating the
serialization to a functor to do the partial specialization. Here is
an actual example:


http://sourceforge.net/p/protoc/code/ci/master/tree/include/protoc/json/serialization.hpp

http://sourceforge.net/p/protoc/code/ci/master/tree/include/protoc/json/map.hpp

Regarding the first variation point, there are several potential
solutions. For example, Boost.Serialization already has a way of
distinguishing between the direction of serialization (saving and
loading) via its split functionality:

   http://www.boost.org/libs/serialization/doc/tutorial.html#splitting

This could be generalized somehow, although that would not be my
favorite solution.

Another solution is to overload X::serialize() on the archive type. For
instance:

     class X
     {
         std::tuple<short, unsigned char, unsigned char> date_;
         std::vector<std::pair<int, int>> data_;

     public:
         X();

         // Catch all
         template<typename T>
         void serialize(T& archive, const unsigned int)
         {
             archive & date_;
             archive & data_;
         }
         // Specialization for hashing
         template<typename T>
         void serialize(hasher::oarchive<T>& archive, const unsigned int)
         {
             archive & date_;
             // data_ is not hashed
         }
     };

A third solution would be to use tag-based dispatching.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Chandler Carruth <chandlerc@googlers.com>
Date: Sat, 5 Apr 2014 09:42:16 -0700
Raw View
--047d7b5d524e35e50704f64e510b
Content-Type: text/plain; charset=UTF-8

On Wed, Apr 2, 2014 at 4:09 PM, Howard Hinnant <howard.hinnant@gmail.com>wrote:

>
> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html
>
> We're aiming at the pre-Rapperswil mailing.  Any feedback, positive or
> negative, will be much appreciated.


As Geoffrey Romer has mentioned, I too want to emphasize that I think it is
a waste of our time to look at further papers on hashing which fail to even
respond to the several existing papers on hashing, or to the fairly
extensive discussion on these matters that has taken place in the committee.

The last time this was discussed there seemed to be some consensus from the
group that the direction forward would be to update N3333 and add support
for consistent / predictable result hashes to it.

I don't see anything incompatible with N3333 and your paper other than the
fundamental problem of using an append method on a class as the fundamental
mechanism of composition has serious optimization problems in applications.
(Please note, these problems will *not* show up in benchmarks or other
small tests. This was discussed somewhat extensively in Issaquah, so I'm
hesitant to re-(forgive me)-hash it here.)

-Chandler


>  The paper currently lacks proposed wording.  It is more of a technology
> demonstration.  There's a co-author spot open for anyone who wants to turn
> the demonstration into concrete proposed wording. :-)
>
> Howard
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> Visit this group at
> http://groups.google.com/a/isocpp.org/group/std-proposals/.
>

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

--047d7b5d524e35e50704f64e510b
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">On W=
ed, Apr 2, 2014 at 4:09 PM, Howard Hinnant <span dir=3D"ltr">&lt;<a href=3D=
"mailto:howard.hinnant@gmail.com" target=3D"_blank">howard.hinnant@gmail.co=
m</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><a href=3D"http://htmlpreview.github.io/?htt=
ps://github.com/HowardHinnant/papers/blob/master/hashing.html" target=3D"_b=
lank">http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers=
/blob/master/hashing.html</a><br>

<br>
We&#39;re aiming at the pre-Rapperswil mailing. =C2=A0Any feedback, positiv=
e or negative, will be much appreciated.</blockquote><div><br></div><div>As=
 Geoffrey Romer has mentioned, I too want to emphasize that I think it is a=
 waste of our time to look at further papers on hashing which fail to even =
respond to the several existing papers on hashing, or to the fairly extensi=
ve discussion on these matters that has taken place in the committee.</div>
<div><br></div><div>The last time this was discussed there seemed to be som=
e consensus from the group that the direction forward would be to update N3=
333 and add support for consistent / predictable result hashes to it.</div>
<div><br></div><div>I don&#39;t see anything incompatible with N3333 and yo=
ur paper other than the fundamental problem of using an append method on a =
class as the fundamental mechanism of composition has serious optimization =
problems in applications. (Please note, these problems will *not* show up i=
n benchmarks or other small tests. This was discussed somewhat extensively =
in Issaquah, so I&#39;m hesitant to re-(forgive me)-hash it here.)</div>
<div><br></div><div>-Chandler</div><div>=C2=A0</div><blockquote class=3D"gm=
ail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-le=
ft:1ex"> =C2=A0The paper currently lacks proposed wording. =C2=A0It is more=
 of a technology demonstration. =C2=A0There&#39;s a co-author spot open for=
 anyone who wants to turn the demonstration into concrete proposed wording.=
 :-)<br>

<span class=3D"HOEnZb"><font color=3D"#888888"><br>
Howard<br>
<br>
--<br>
<br>
---<br>
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br>
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals%2Bunsubscribe@isocpp.org">std-propo=
sals+unsubscribe@isocpp.org</a>.<br>
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br>
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/" target=3D"_blank">http://groups.google.com/a/isocpp.org/gro=
up/std-proposals/</a>.<br>
</font></span></blockquote></div><br></div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

--047d7b5d524e35e50704f64e510b--

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Sat, 5 Apr 2014 12:56:30 -0400
Raw View
On Apr 5, 2014, at 12:42 PM, Chandler Carruth <chandlerc@googlers.com> wrot=
e:

> As Geoffrey Romer has mentioned, I too want to emphasize that I think it =
is a waste of our time to look at further papers on hashing which fail to e=
ven respond to the several existing papers on hashing, or to the fairly ext=
ensive discussion on these matters that has taken place in the committee.
>=20
> The last time this was discussed there seemed to be some consensus from t=
he group that the direction forward would be to update N3333 and add suppor=
t for consistent / predictable result hashes to it.
>=20
> I don't see anything incompatible with N3333 and your paper other than th=
e fundamental problem of using an append method on a class as the fundament=
al mechanism of composition has serious optimization problems in applicatio=
ns. (Please note, these problems will *not* show up in benchmarks or other =
small tests. This was discussed somewhat extensively in Issaquah, so I'm he=
sitant to re-(forgive me)-hash it here.)
>=20

My apologies.  N3333 did not go unread by me.  I shall insert the appropria=
te references.

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Sun, 6 Apr 2014 17:12:52 -0400
Raw View
Updated version with many suggestions from this discussion implemented:

http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html

If I have mis-credited or failed to credit anyone, please let me know.  Further constructive criticism welcome.  Thanks to all who have helped us make this a much better paper.

Howard

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Richard Smith <richard@metafoo.co.uk>
Date: Sun, 6 Apr 2014 16:47:51 -0700
Raw View
--bcaec51dd5510846e004f66861f6
Content-Type: text/plain; charset=ISO-8859-1

On Sun, Apr 6, 2014 at 2:12 PM, Howard Hinnant <howard.hinnant@gmail.com>wrote:

> Updated version with many suggestions from this discussion implemented:
>
>
> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html
>
> If I have mis-credited or failed to credit anyone, please let me know.
>  Further constructive criticism welcome.  Thanks to all who have helped us
> make this a much better paper.


It might be useful to explain how sending the size last satisfies Rule 2 --
without careful study, it might seem to be possible to construct two
vector<vector<int>>s where the sizes of one are elements of the other.
Further, it seems that this only works because your data blocks are
self-delimiting when read from right to left. I think this also needs to be
part of the documented contract, since using types that don't follow this
rule (but do follow Rule 1 and Rule 2) can lead to hash collisions.

For instance, if I add MyVector<T> that's just like vector<T> but hashes
the size first, then these collide:

typedef std::pair<vector<int>, MyVector<int>> P;
P x = { {0}, {} };
P y = { {}, {0} };

(both of them give the hasher the input "0 1 0").

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

--bcaec51dd5510846e004f66861f6
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">On S=
un, Apr 6, 2014 at 2:12 PM, Howard Hinnant <span dir=3D"ltr">&lt;<a href=3D=
"mailto:howard.hinnant@gmail.com" target=3D"_blank">howard.hinnant@gmail.co=
m</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Updated version with many suggestions from t=
his discussion implemented:<br>
<br>
<a href=3D"http://htmlpreview.github.io/?https://github.com/HowardHinnant/p=
apers/blob/master/hashing.html" target=3D"_blank">http://htmlpreview.github=
..io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html</a><b=
r>

<br>
If I have mis-credited or failed to credit anyone, please let me know. =A0F=
urther constructive criticism welcome. =A0Thanks to all who have helped us =
make this a much better paper.</blockquote><div><br></div><div>It might be =
useful to explain how sending the size last satisfies Rule 2 -- without car=
eful study, it might seem to be possible to construct two vector&lt;vector&=
lt;int&gt;&gt;s where the sizes of one are elements of the other. Further, =
it seems that this only works because your data blocks are self-delimiting =
when read from right to left. I think this also needs to be part of the doc=
umented contract, since using types that don&#39;t follow this rule (but do=
 follow Rule 1 and Rule 2) can lead to hash collisions.</div>
<div><br></div><div>For instance, if I add MyVector&lt;T&gt; that&#39;s jus=
t like vector&lt;T&gt; but hashes the size first, then these collide:</div>=
<div><br></div><div>typedef std::pair&lt;vector&lt;int&gt;, MyVector&lt;int=
&gt;&gt; P;</div>
<div>P x =3D { {0}, {} };</div><div>P y =3D { {}, {0} };</div><div><br></di=
v><div>(both of them give the hasher the input &quot;0 1 0&quot;).</div></d=
iv></div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

--bcaec51dd5510846e004f66861f6--

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Sun, 6 Apr 2014 20:36:57 -0400
Raw View
On Apr 6, 2014, at 7:47 PM, Richard Smith <richard@metafoo.co.uk> wrote:

> On Sun, Apr 6, 2014 at 2:12 PM, Howard Hinnant <howard.hinnant@gmail.com>=
 wrote:
>> Updated version with many suggestions from this discussion implemented:
>>=20
>> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/bl=
ob/master/hashing.html
>>=20
>> If I have mis-credited or failed to credit anyone, please let me know.  =
Further constructive criticism welcome.  Thanks to all who have helped us m=
ake this a much better paper.
>>=20
> It might be useful to explain how sending the size last satisfies Rule 2 =
-- without careful study, it might seem to be possible to construct two vec=
tor<vector<int>>s where the sizes of one are elements of the other.

I don't have a theoretical proof that there can be no collision.  If anyone=
 sees one, I would welcome it.  I have not come up with a collision scenari=
o beyond an attacker somehow inserting their own type+message as you show b=
elow.

> Further, it seems that this only works because your data blocks are self-=
delimiting when read from right to left.

You are speaking of the size as the delimiter?

> I think this also needs to be part of the documented contract, since usin=
g types that don't follow this rule (but do follow Rule 1 and Rule 2) can l=
ead to hash collisions.

I fully agree in the strongest terms.  I will update the paper to make this=
 clear.  Type's messages that are sent to hash_append are part of their pub=
lic API.  Other types will need to depend on the details of these messages =
to avoid collisions (e.g. is the container size prepended or appended?), as=
 you note  below:

> For instance, if I add MyVector<T> that's just like vector<T> but hashes =
the size first, then these collide:
>=20
> typedef std::pair<vector<int>, MyVector<int>> P;
> P x =3D { {0}, {} };
> P y =3D { {}, {0} };
>=20
> (both of them give the hasher the input "0 1 0").

Thanks,
Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Geoffrey Romer <gromer@google.com>
Date: Mon, 7 Apr 2014 14:50:58 -0700
Raw View
--001a11c300a6e073e904f67adc5b
Content-Type: text/plain; charset=UTF-8

The paper still doesn't give me a good sense of how it relates to N3333. In
particular, it's confusing that the paper begins by posing the question
"How do we write the hash function for X?", but doesn't consider N3333's
answer to that question as one of the options, instead treating it as a
sort of footnote to N3876. N3333 seems to be strictly superior to N3876 as
far as your paper is concerned, and N3333 seemed to have stronger support
in Issaquah, so N3333 would be a better primary point of comparison. As a
related editorial comment, I'd suggest that when referring to hash_combine,
you should be explicit about which paper's proposal you're referring to
("What about using the proposed std::hash_combine you rightly ask?" is one
example where that's a problem).

The paper also doesn't address the optimization concerns Chandler raised in
Issaquah, but to be fair I'm not sure how it could, since I'm not aware of
any record of those concerns apart from some very terse notes in the
meeting wiki.

As best I can tell, the primary point of disagreement between N3333 and
your paper is the fact that N3333's API does not permit the hash algorithm
to be customized independently of the per-type hashing logic (although it
seems to me that it's overstating the case to say that it would "hard-code
into the standard a single hash concatenation algorithm", since it does not
dictate any particular algorithm). On the other hand, your proposal does
allow such customization, but reportedly pays for it with substantially
degraded performance. That being the case, it would probably be helpful to
explicitly discuss the perceived benefits of making the library generic
with respect to the hash algorithm. My hunch is that these benefits will
mostly apply outside the core use case of hashing data for lookup in a
transient, process-local hash table; if that's the case, you should also
address N3333's proposal that std::hash and related APIs should focus
solely on that use case.


On Sun, Apr 6, 2014 at 2:12 PM, Howard Hinnant <howard.hinnant@gmail.com>wrote:

> Updated version with many suggestions from this discussion implemented:
>
>
> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html
>
> If I have mis-credited or failed to credit anyone, please let me know.
>  Further constructive criticism welcome.  Thanks to all who have helped us
> make this a much better paper.
>
> Howard
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> Visit this group at
> http://groups.google.com/a/isocpp.org/group/std-proposals/.
>

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

--001a11c300a6e073e904f67adc5b
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">The paper still doesn&#39;t give me a good sense of how it=
 relates to N3333. In particular, it&#39;s confusing that the paper begins =
by posing=C2=A0the question &quot;How do we write the hash function for X?&=
quot;, but doesn&#39;t consider N3333&#39;s answer to that question as one =
of the options, instead treating it as a sort of footnote to N3876. N3333 s=
eems to be strictly superior to=C2=A0N3876 as far as your paper is concerne=
d, and N3333 seemed to have stronger support in Issaquah, so N3333 would be=
 a better primary point of comparison. As a related editorial comment, I&#3=
9;d suggest that when referring to hash_combine, you should be explicit abo=
ut which paper&#39;s proposal you&#39;re referring to (&quot;What about usi=
ng the proposed std::hash_combine you rightly ask?&quot; is one example whe=
re that&#39;s a problem).<div>
<br></div><div>The paper also doesn&#39;t address the optimization concerns=
 Chandler raised in Issaquah, but to be fair I&#39;m not sure how it could,=
 since I&#39;m not aware of any record of those concerns apart from some ve=
ry terse notes in the meeting wiki.</div>
<div><br></div><div>As best I can tell, the primary point of disagreement b=
etween N3333 and your paper is the fact that N3333&#39;s API does not permi=
t the hash algorithm to be customized independently of the per-type hashing=
 logic (although it seems to me that it&#39;s overstating the case to say t=
hat it would &quot;hard-code into the standard a single hash concatenation =
algorithm&quot;, since it does not dictate any particular algorithm). On th=
e other hand, your proposal does allow such customization, but reportedly p=
ays for it with substantially degraded performance. That being the case, it=
 would probably be helpful to explicitly discuss the perceived benefits of =
making the library generic with respect to the hash algorithm. My hunch is =
that these benefits will mostly apply outside the core use case of hashing =
data for lookup in a transient, process-local hash table; if that&#39;s the=
 case, you should also address N3333&#39;s proposal that std::hash and rela=
ted APIs should focus solely on that use case.</div>
</div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Sun,=
 Apr 6, 2014 at 2:12 PM, Howard Hinnant <span dir=3D"ltr">&lt;<a href=3D"ma=
ilto:howard.hinnant@gmail.com" target=3D"_blank">howard.hinnant@gmail.com</=
a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Updated version with many suggestions from t=
his discussion implemented:<br>
<br>
<a href=3D"http://htmlpreview.github.io/?https://github.com/HowardHinnant/p=
apers/blob/master/hashing.html" target=3D"_blank">http://htmlpreview.github=
..io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html</a><b=
r>

<br>
If I have mis-credited or failed to credit anyone, please let me know. =C2=
=A0Further constructive criticism welcome. =C2=A0Thanks to all who have hel=
ped us make this a much better paper.<br>
<span class=3D"HOEnZb"><font color=3D"#888888"><br>
Howard<br>
</font></span><div class=3D"HOEnZb"><div class=3D"h5"><br>
--<br>
<br>
---<br>
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br>
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals%2Bunsubscribe@isocpp.org">std-propo=
sals+unsubscribe@isocpp.org</a>.<br>
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br>
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/" target=3D"_blank">http://groups.google.com/a/isocpp.org/gro=
up/std-proposals/</a>.<br>
</div></div></blockquote></div><br></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

--001a11c300a6e073e904f67adc5b--

.


Author: jgottman6@gmail.com
Date: Mon, 7 Apr 2014 20:36:38 -0700 (PDT)
Raw View
------=_Part_1832_29277187.1396928198220
Content-Type: text/plain; charset=UTF-8



On Sunday, April 6, 2014 5:12:52 PM UTC-4, Howard Hinnant wrote:
>
> Updated version with many suggestions from this discussion implemented:
>
>
> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html
>
> If I have mis-credited or failed to credit anyone, please let me know.
>  Further constructive criticism welcome.  Thanks to all who have helped us
> make this a much better paper.
>
>
>
I think you should standardize type_erased_hasher as well.  It  seems
complex enough that this would be worthwhile. If it is standardized, you
can replace the use of std::function and std::ref with a class that does
the right thing and hides std::ref and std::function as implementation
details.

class type_erased_hasher {
public:
   template <class Hasher>
   explicit type_erased_hasher(Hasher &hasher)
     : h_(std::ref(hasher))
   {}

   void operator()(const void *key, size_t len) {
      h_(key, len);
   }

private:
   std::function<void (const void *, size_t)> h_;
};





--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

------=_Part_1832_29277187.1396928198220
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br>On Sunday, April 6, 2014 5:12:52 PM UTC-4, Howard =
Hinnant wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-l=
eft: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">Updated version =
with many suggestions from this discussion implemented:
<br>
<br><a href=3D"http://htmlpreview.github.io/?https://github.com/HowardHinna=
nt/papers/blob/master/hashing.html" target=3D"_blank" onmousedown=3D"this.h=
ref=3D'http://www.google.com/url?q\75http%3A%2F%2Fhtmlpreview.github.io%2F%=
3Fhttps%3A%2F%2Fgithub.com%2FHowardHinnant%2Fpapers%2Fblob%2Fmaster%2Fhashi=
ng.html\46sa\75D\46sntz\0751\46usg\75AFQjCNHP0pXriy_dMNhgyflMJ_r9qoA63Q';re=
turn true;" onclick=3D"this.href=3D'http://www.google.com/url?q\75http%3A%2=
F%2Fhtmlpreview.github.io%2F%3Fhttps%3A%2F%2Fgithub.com%2FHowardHinnant%2Fp=
apers%2Fblob%2Fmaster%2Fhashing.html\46sa\75D\46sntz\0751\46usg\75AFQjCNHP0=
pXriy_dMNhgyflMJ_r9qoA63Q';return true;">http://htmlpreview.github.io/?<wbr=
>https://github.com/<wbr>HowardHinnant/papers/blob/<wbr>master/hashing.html=
</a>
<br>
<br>If I have mis-credited or failed to credit anyone, please let me know. =
&nbsp;Further constructive criticism welcome. &nbsp;Thanks to all who have =
helped us make this a much better paper.
<br>
<br>
<br></blockquote><div><br>I think you should standardize type_erased_hasher=
 as well.&nbsp; It&nbsp; seems complex enough that this would be worthwhile=
.. If it is standardized, you can replace the use of std::function and std::=
ref with a class that does the right thing and hides std::ref and std::func=
tion as implementation details.<br><br><div class=3D"prettyprint" style=3D"=
background-color: rgb(250, 250, 250); border-color: rgb(187, 187, 187); bor=
der-style: solid; border-width: 1px; word-wrap: break-word;"><code class=3D=
"prettyprint"><div class=3D"subprettyprint"><span style=3D"color: #008;" cl=
ass=3D"styled-by-prettify">class</span><span style=3D"color: #000;" class=
=3D"styled-by-prettify"> type_erased_hasher </span><span style=3D"color: #6=
60;" class=3D"styled-by-prettify">{</span><span style=3D"color: #000;" clas=
s=3D"styled-by-prettify"><br></span><span style=3D"color: #008;" class=3D"s=
tyled-by-prettify">public</span><span style=3D"color: #660;" class=3D"style=
d-by-prettify">:</span><span style=3D"color: #000;" class=3D"styled-by-pret=
tify"><br>&nbsp; &nbsp;</span><span style=3D"color: #008;" class=3D"styled-=
by-prettify">template</span><span style=3D"color: #000;" class=3D"styled-by=
-prettify"> </span><span style=3D"color: #660;" class=3D"styled-by-prettify=
">&lt;</span><span style=3D"color: #008;" class=3D"styled-by-prettify">clas=
s</span><span style=3D"color: #000;" class=3D"styled-by-prettify"> </span><=
span style=3D"color: #606;" class=3D"styled-by-prettify">Hasher</span><span=
 style=3D"color: #660;" class=3D"styled-by-prettify">&gt;</span><span style=
=3D"color: #000;" class=3D"styled-by-prettify"><br>&nbsp; &nbsp;</span><spa=
n style=3D"color: #008;" class=3D"styled-by-prettify">explicit</span><span =
style=3D"color: #000;" class=3D"styled-by-prettify"> type_erased_hasher</sp=
an><span style=3D"color: #660;" class=3D"styled-by-prettify">(</span><span =
style=3D"color: #606;" class=3D"styled-by-prettify">Hasher</span><span styl=
e=3D"color: #000;" class=3D"styled-by-prettify"> </span><span style=3D"colo=
r: #660;" class=3D"styled-by-prettify">&amp;</span><span style=3D"color: #0=
00;" class=3D"styled-by-prettify">hasher</span><span style=3D"color: #660;"=
 class=3D"styled-by-prettify">)</span><span style=3D"color: #000;" class=3D=
"styled-by-prettify"><br>&nbsp; &nbsp; &nbsp;</span><span style=3D"color: #=
660;" class=3D"styled-by-prettify">:</span><span style=3D"color: #000;" cla=
ss=3D"styled-by-prettify"> h_</span><span style=3D"color: #660;" class=3D"s=
tyled-by-prettify">(</span><span style=3D"color: #000;" class=3D"styled-by-=
prettify">std</span><span style=3D"color: #660;" class=3D"styled-by-prettif=
y">::</span><span style=3D"color: #008;" class=3D"styled-by-prettify">ref</=
span><span style=3D"color: #660;" class=3D"styled-by-prettify">(</span><spa=
n style=3D"color: #000;" class=3D"styled-by-prettify">hasher</span><span st=
yle=3D"color: #660;" class=3D"styled-by-prettify">))</span><span style=3D"c=
olor: #000;" class=3D"styled-by-prettify"><br>&nbsp; &nbsp;</span><span sty=
le=3D"color: #660;" class=3D"styled-by-prettify">{}</span><span style=3D"co=
lor: #000;" class=3D"styled-by-prettify"><br><br>&nbsp; &nbsp;</span><span =
style=3D"color: #008;" class=3D"styled-by-prettify">void</span><span style=
=3D"color: #000;" class=3D"styled-by-prettify"> </span><span style=3D"color=
: #008;" class=3D"styled-by-prettify">operator</span><span style=3D"color: =
#660;" class=3D"styled-by-prettify">()(</span><span style=3D"color: #008;" =
class=3D"styled-by-prettify">const</span><span style=3D"color: #000;" class=
=3D"styled-by-prettify"> </span><span style=3D"color: #008;" class=3D"style=
d-by-prettify">void</span><span style=3D"color: #000;" class=3D"styled-by-p=
rettify"> </span><span style=3D"color: #660;" class=3D"styled-by-prettify">=
*</span><span style=3D"color: #000;" class=3D"styled-by-prettify">key</span=
><span style=3D"color: #660;" class=3D"styled-by-prettify">,</span><span st=
yle=3D"color: #000;" class=3D"styled-by-prettify"> size_t len</span><span s=
tyle=3D"color: #660;" class=3D"styled-by-prettify">)</span><span style=3D"c=
olor: #000;" class=3D"styled-by-prettify"> </span><span style=3D"color: #66=
0;" class=3D"styled-by-prettify">{</span><span style=3D"color: #000;" class=
=3D"styled-by-prettify"><br>&nbsp; &nbsp; &nbsp; h_</span><span style=3D"co=
lor: #660;" class=3D"styled-by-prettify">(</span><span style=3D"color: #000=
;" class=3D"styled-by-prettify">key</span><span style=3D"color: #660;" clas=
s=3D"styled-by-prettify">,</span><span style=3D"color: #000;" class=3D"styl=
ed-by-prettify"> len</span><span style=3D"color: #660;" class=3D"styled-by-=
prettify">);</span><span style=3D"color: #000;" class=3D"styled-by-prettify=
"><br>&nbsp; &nbsp;</span><span style=3D"color: #660;" class=3D"styled-by-p=
rettify">}</span><span style=3D"color: #000;" class=3D"styled-by-prettify">=
<br><br></span><span style=3D"color: #008;" class=3D"styled-by-prettify">pr=
ivate</span><span style=3D"color: #660;" class=3D"styled-by-prettify">:</sp=
an><span style=3D"color: #000;" class=3D"styled-by-prettify"><br>&nbsp; &nb=
sp;std</span><span style=3D"color: #660;" class=3D"styled-by-prettify">::</=
span><span style=3D"color: #008;" class=3D"styled-by-prettify">function</sp=
an><span style=3D"color: #660;" class=3D"styled-by-prettify">&lt;</span><sp=
an style=3D"color: #008;" class=3D"styled-by-prettify">void</span><span sty=
le=3D"color: #000;" class=3D"styled-by-prettify"> </span><span style=3D"col=
or: #660;" class=3D"styled-by-prettify">(</span><span style=3D"color: #008;=
" class=3D"styled-by-prettify">const</span><span style=3D"color: #000;" cla=
ss=3D"styled-by-prettify"> </span><span style=3D"color: #008;" class=3D"sty=
led-by-prettify">void</span><span style=3D"color: #000;" class=3D"styled-by=
-prettify"> </span><span style=3D"color: #660;" class=3D"styled-by-prettify=
">*,</span><span style=3D"color: #000;" class=3D"styled-by-prettify"> size_=
t</span><span style=3D"color: #660;" class=3D"styled-by-prettify">)&gt;</sp=
an><span style=3D"color: #000;" class=3D"styled-by-prettify"> h_</span><spa=
n style=3D"color: #660;" class=3D"styled-by-prettify">;</span><span style=
=3D"color: #000;" class=3D"styled-by-prettify"><br></span><span style=3D"co=
lor: #660;" class=3D"styled-by-prettify">};</span><span style=3D"color: #00=
0;" class=3D"styled-by-prettify"> &nbsp; &nbsp;<br>&nbsp; &nbsp;<br></span>=
</div></code></div><br><br><br><br></div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

------=_Part_1832_29277187.1396928198220--

.


Author: Christopher Jefferson <chris@bubblescope.net>
Date: Sat, 12 Apr 2014 10:30:58 +0100
Raw View
When it comes to avoiding hashing collisions, the only real way of to
prove no collisions is to prove that the data fed into hash_append is
enough to uniquely reconstruct the original object. I would personally
be unhappy about a design where I no implementation of hash_combine
could avoid collisions for (for example) a combination of standard
library classes/containers.

If you prefix with the size of containers rather than postfix, then it
is easy to prove there are no hash collision issues. The only problem
with doing this is forward_list and list, where you don't know the
size until you have reached the end of the object.

Another option would be to add a 'mark()' function to hash_append, for
use with variable-length objects. hash_append implementations which
didn't care about collisions could just ignore this function, ones
which did could use this in various ways to ensure collisions do not
occur.

This is not just made up on the page, this is what I use in my
personal hashing library. I personally implement 'mark()' to place a
randomly generated (at program start time) 32-bit int into the stream,
which I find provides an acceptably low (never occuring in practice)
level of collisions.

Chris



On 8 April 2014 04:36,  <jgottman6@gmail.com> wrote:
>
>
> On Sunday, April 6, 2014 5:12:52 PM UTC-4, Howard Hinnant wrote:
>>
>> Updated version with many suggestions from this discussion implemented:
>>
>>
>> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html
>>
>> If I have mis-credited or failed to credit anyone, please let me know.
>> Further constructive criticism welcome.  Thanks to all who have helped us
>> make this a much better paper.
>>
>>
>
> I think you should standardize type_erased_hasher as well.  It  seems
> complex enough that this would be worthwhile. If it is standardized, you can
> replace the use of std::function and std::ref with a class that does the
> right thing and hides std::ref and std::function as implementation details.
>
> class type_erased_hasher {
> public:
>    template <class Hasher>
>    explicit type_erased_hasher(Hasher &hasher)
>      : h_(std::ref(hasher))
>    {}
>
>    void operator()(const void *key, size_t len) {
>       h_(key, len);
>    }
>
> private:
>    std::function<void (const void *, size_t)> h_;
> };
>
>
>
>
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> Visit this group at
> http://groups.google.com/a/isocpp.org/group/std-proposals/.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Sat, 12 Apr 2014 11:26:00 +0100
Raw View
What is your easy proof that if you prefix with the size of containers rather than postfix, that there can be no collisions?  I don't doubt that you are correct that there will be no collisions.  But I fail to see how prefix vs postfix makes a difference.

Howard

On Apr 12, 2014, at 10:30 AM, Christopher Jefferson <chris@bubblescope.net> wrote:

> When it comes to avoiding hashing collisions, the only real way of to
> prove no collisions is to prove that the data fed into hash_append is
> enough to uniquely reconstruct the original object. I would personally
> be unhappy about a design where I no implementation of hash_combine
> could avoid collisions for (for example) a combination of standard
> library classes/containers.
>
> If you prefix with the size of containers rather than postfix, then it
> is easy to prove there are no hash collision issues. The only problem
> with doing this is forward_list and list, where you don't know the
> size until you have reached the end of the object.
>
> Another option would be to add a 'mark()' function to hash_append, for
> use with variable-length objects. hash_append implementations which
> didn't care about collisions could just ignore this function, ones
> which did could use this in various ways to ensure collisions do not
> occur.
>
> This is not just made up on the page, this is what I use in my
> personal hashing library. I personally implement 'mark()' to place a
> randomly generated (at program start time) 32-bit int into the stream,
> which I find provides an acceptably low (never occuring in practice)
> level of collisions.
>
> Chris
>
>
>
> On 8 April 2014 04:36,  <jgottman6@gmail.com> wrote:
>>
>>
>> On Sunday, April 6, 2014 5:12:52 PM UTC-4, Howard Hinnant wrote:
>>>
>>> Updated version with many suggestions from this discussion implemented:
>>>
>>>
>>> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html
>>>
>>> If I have mis-credited or failed to credit anyone, please let me know.
>>> Further constructive criticism welcome.  Thanks to all who have helped us
>>> make this a much better paper.
>>>
>>>
>>
>> I think you should standardize type_erased_hasher as well.  It  seems
>> complex enough that this would be worthwhile. If it is standardized, you can
>> replace the use of std::function and std::ref with a class that does the
>> right thing and hides std::ref and std::function as implementation details.
>>
>> class type_erased_hasher {
>> public:
>>   template <class Hasher>
>>   explicit type_erased_hasher(Hasher &hasher)
>>     : h_(std::ref(hasher))
>>   {}
>>
>>   void operator()(const void *key, size_t len) {
>>      h_(key, len);
>>   }
>>
>> private:
>>   std::function<void (const void *, size_t)> h_;
>> };
>>
>>
>>
>>
>>
>> --
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "ISO C++ Standard - Future Proposals" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to std-proposals+unsubscribe@isocpp.org.
>> To post to this group, send email to std-proposals@isocpp.org.
>> Visit this group at
>> http://groups.google.com/a/isocpp.org/group/std-proposals/.
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Christopher Jefferson <chris@bubblescope.net>
Date: Mon, 14 Apr 2014 00:02:23 +0100
Raw View
--089e0160bea64eeeed04f6f48f98
Content-Type: text/plain; charset=UTF-8

On 12 Apr 2014 11:25, "Howard Hinnant" <howard.hinnant@gmail.com> wrote:
>
> What is your easy proof that if you prefix with the size of containers
rather than postfix, that there can be no collisions?  I don't doubt that
you are correct that there will be no collisions.  But I fail to see how
prefix vs postfix makes a difference.
>

The advantage of prefix is that first we read the length, then we know how
many values we have to read to get to the end of the object, so we can
reconstruct the original object easily from the values we feed to
hash_append. Similarly with recursive objects, we still always read the
length first, so know how far too read. With postfix we are never sure if
we are reading another value in the object, or the length at the end of it.
Of course it might turn out even with postfix you can prove there is only
one valid reading, but it is not so obvious to me.

> Howard
>
> On Apr 12, 2014, at 10:30 AM, Christopher Jefferson <chris@bubblescope.net>
wrote:
>
> > When it comes to avoiding hashing collisions, the only real way of to
> > prove no collisions is to prove that the data fed into hash_append is
> > enough to uniquely reconstruct the original object. I would personally
> > be unhappy about a design where I no implementation of hash_combine
> > could avoid collisions for (for example) a combination of standard
> > library classes/containers.
> >
> > If you prefix with the size of containers rather than postfix, then it
> > is easy to prove there are no hash collision issues. The only problem
> > with doing this is forward_list and list, where you don't know the
> > size until you have reached the end of the object.
> >
> > Another option would be to add a 'mark()' function to hash_append, for
> > use with variable-length objects. hash_append implementations which
> > didn't care about collisions could just ignore this function, ones
> > which did could use this in various ways to ensure collisions do not
> > occur.
> >
> > This is not just made up on the page, this is what I use in my
> > personal hashing library. I personally implement 'mark()' to place a
> > randomly generated (at program start time) 32-bit int into the stream,
> > which I find provides an acceptably low (never occuring in practice)
> > level of collisions.
> >
> > Chris
> >
> >
> >
> > On 8 April 2014 04:36,  <jgottman6@gmail.com> wrote:
> >>
> >>
> >> On Sunday, April 6, 2014 5:12:52 PM UTC-4, Howard Hinnant wrote:
> >>>
> >>> Updated version with many suggestions from this discussion
implemented:
> >>>
> >>>
> >>>
http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html
> >>>
> >>> If I have mis-credited or failed to credit anyone, please let me know.
> >>> Further constructive criticism welcome.  Thanks to all who have
helped us
> >>> make this a much better paper.
> >>>
> >>>
> >>
> >> I think you should standardize type_erased_hasher as well.  It  seems
> >> complex enough that this would be worthwhile. If it is standardized,
you can
> >> replace the use of std::function and std::ref with a class that does
the
> >> right thing and hides std::ref and std::function as implementation
details.
> >>
> >> class type_erased_hasher {
> >> public:
> >>   template <class Hasher>
> >>   explicit type_erased_hasher(Hasher &hasher)
> >>     : h_(std::ref(hasher))
> >>   {}
> >>
> >>   void operator()(const void *key, size_t len) {
> >>      h_(key, len);
> >>   }
> >>
> >> private:
> >>   std::function<void (const void *, size_t)> h_;
> >> };
> >>
> >>
> >>
> >>
> >>
> >> --
> >>
> >> ---
> >> You received this message because you are subscribed to the Google
Groups
> >> "ISO C++ Standard - Future Proposals" group.
> >> To unsubscribe from this group and stop receiving emails from it, send
an
> >> email to std-proposals+unsubscribe@isocpp.org.
> >> To post to this group, send email to std-proposals@isocpp.org.
> >> Visit this group at
> >> http://groups.google.com/a/isocpp.org/group/std-proposals/.
> >
> > --
> >
> > ---
> > You received this message because you are subscribed to the Google
Groups "ISO C++ Standard - Future Proposals" group.
> > To unsubscribe from this group and stop receiving emails from it, send
an email to std-proposals+unsubscribe@isocpp.org.
> > To post to this group, send email to std-proposals@isocpp.org.
> > Visit this group at
http://groups.google.com/a/isocpp.org/group/std-proposals/.
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an
email to std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> Visit this group at
http://groups.google.com/a/isocpp.org/group/std-proposals/.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

--089e0160bea64eeeed04f6f48f98
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<p dir=3D"ltr"><br>
On 12 Apr 2014 11:25, &quot;Howard Hinnant&quot; &lt;<a href=3D"mailto:howa=
rd.hinnant@gmail.com">howard.hinnant@gmail.com</a>&gt; wrote:<br>
&gt;<br>
&gt; What is your easy proof that if you prefix with the size of containers=
 rather than postfix, that there can be no collisions? =C2=A0I don&#39;t do=
ubt that you are correct that there will be no collisions. =C2=A0But I fail=
 to see how prefix vs postfix makes a difference.<br>

&gt;</p>
<p dir=3D"ltr">The advantage of prefix is that first we read the length, th=
en we know how many values we have to read to get to the end of the object,=
 so we can reconstruct the original object easily from the values we feed t=
o hash_append. Similarly with recursive objects, we still always read the l=
ength first, so know how far too read. With postfix we are never sure if we=
 are reading another value in the object, or the length at the end of it. O=
f course it might turn out even with postfix you can prove there is only on=
e valid reading, but it is not so obvious to me.<br>
</p>
<p dir=3D"ltr">&gt; Howard<br>
&gt;<br>
&gt; On Apr 12, 2014, at 10:30 AM, Christopher Jefferson &lt;<a href=3D"mai=
lto:chris@bubblescope.net">chris@bubblescope.net</a>&gt; wrote:<br>
&gt;<br>
&gt; &gt; When it comes to avoiding hashing collisions, the only real way o=
f to<br>
&gt; &gt; prove no collisions is to prove that the data fed into hash_appen=
d is<br>
&gt; &gt; enough to uniquely reconstruct the original object. I would perso=
nally<br>
&gt; &gt; be unhappy about a design where I no implementation of hash_combi=
ne<br>
&gt; &gt; could avoid collisions for (for example) a combination of standar=
d<br>
&gt; &gt; library classes/containers.<br>
&gt; &gt;<br>
&gt; &gt; If you prefix with the size of containers rather than postfix, th=
en it<br>
&gt; &gt; is easy to prove there are no hash collision issues. The only pro=
blem<br>
&gt; &gt; with doing this is forward_list and list, where you don&#39;t kno=
w the<br>
&gt; &gt; size until you have reached the end of the object.<br>
&gt; &gt;<br>
&gt; &gt; Another option would be to add a &#39;mark()&#39; function to has=
h_append, for<br>
&gt; &gt; use with variable-length objects. hash_append implementations whi=
ch<br>
&gt; &gt; didn&#39;t care about collisions could just ignore this function,=
 ones<br>
&gt; &gt; which did could use this in various ways to ensure collisions do =
not<br>
&gt; &gt; occur.<br>
&gt; &gt;<br>
&gt; &gt; This is not just made up on the page, this is what I use in my<br=
>
&gt; &gt; personal hashing library. I personally implement &#39;mark()&#39;=
 to place a<br>
&gt; &gt; randomly generated (at program start time) 32-bit int into the st=
ream,<br>
&gt; &gt; which I find provides an acceptably low (never occuring in practi=
ce)<br>
&gt; &gt; level of collisions.<br>
&gt; &gt;<br>
&gt; &gt; Chris<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt; On 8 April 2014 04:36, =C2=A0&lt;<a href=3D"mailto:jgottman6@gmai=
l.com">jgottman6@gmail.com</a>&gt; wrote:<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt; On Sunday, April 6, 2014 5:12:52 PM UTC-4, Howard Hinnant wro=
te:<br>
&gt; &gt;&gt;&gt;<br>
&gt; &gt;&gt;&gt; Updated version with many suggestions from this discussio=
n implemented:<br>
&gt; &gt;&gt;&gt;<br>
&gt; &gt;&gt;&gt;<br>
&gt; &gt;&gt;&gt; <a href=3D"http://htmlpreview.github.io/?https://github.c=
om/HowardHinnant/papers/blob/master/hashing.html">http://htmlpreview.github=
..io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html</a><b=
r>

&gt; &gt;&gt;&gt;<br>
&gt; &gt;&gt;&gt; If I have mis-credited or failed to credit anyone, please=
 let me know.<br>
&gt; &gt;&gt;&gt; Further constructive criticism welcome. =C2=A0Thanks to a=
ll who have helped us<br>
&gt; &gt;&gt;&gt; make this a much better paper.<br>
&gt; &gt;&gt;&gt;<br>
&gt; &gt;&gt;&gt;<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt; I think you should standardize type_erased_hasher as well. =
=C2=A0It =C2=A0seems<br>
&gt; &gt;&gt; complex enough that this would be worthwhile. If it is standa=
rdized, you can<br>
&gt; &gt;&gt; replace the use of std::function and std::ref with a class th=
at does the<br>
&gt; &gt;&gt; right thing and hides std::ref and std::function as implement=
ation details.<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt; class type_erased_hasher {<br>
&gt; &gt;&gt; public:<br>
&gt; &gt;&gt; =C2=A0 template &lt;class Hasher&gt;<br>
&gt; &gt;&gt; =C2=A0 explicit type_erased_hasher(Hasher &amp;hasher)<br>
&gt; &gt;&gt; =C2=A0 =C2=A0 : h_(std::ref(hasher))<br>
&gt; &gt;&gt; =C2=A0 {}<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt; =C2=A0 void operator()(const void *key, size_t len) {<br>
&gt; &gt;&gt; =C2=A0 =C2=A0 =C2=A0h_(key, len);<br>
&gt; &gt;&gt; =C2=A0 }<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt; private:<br>
&gt; &gt;&gt; =C2=A0 std::function&lt;void (const void *, size_t)&gt; h_;<b=
r>
&gt; &gt;&gt; };<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt; --<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt; ---<br>
&gt; &gt;&gt; You received this message because you are subscribed to the G=
oogle Groups<br>
&gt; &gt;&gt; &quot;ISO C++ Standard - Future Proposals&quot; group.<br>
&gt; &gt;&gt; To unsubscribe from this group and stop receiving emails from=
 it, send an<br>
&gt; &gt;&gt; email to <a href=3D"mailto:std-proposals%2Bunsubscribe@isocpp=
..org">std-proposals+unsubscribe@isocpp.org</a>.<br>
&gt; &gt;&gt; To post to this group, send email to <a href=3D"mailto:std-pr=
oposals@isocpp.org">std-proposals@isocpp.org</a>.<br>
&gt; &gt;&gt; Visit this group at<br>
&gt; &gt;&gt; <a href=3D"http://groups.google.com/a/isocpp.org/group/std-pr=
oposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/</a>.<b=
r>
&gt; &gt;<br>
&gt; &gt; --<br>
&gt; &gt;<br>
&gt; &gt; ---<br>
&gt; &gt; You received this message because you are subscribed to the Googl=
e Groups &quot;ISO C++ Standard - Future Proposals&quot; group.<br>
&gt; &gt; To unsubscribe from this group and stop receiving emails from it,=
 send an email to <a href=3D"mailto:std-proposals%2Bunsubscribe@isocpp.org"=
>std-proposals+unsubscribe@isocpp.org</a>.<br>
&gt; &gt; To post to this group, send email to <a href=3D"mailto:std-propos=
als@isocpp.org">std-proposals@isocpp.org</a>.<br>
&gt; &gt; Visit this group at <a href=3D"http://groups.google.com/a/isocpp.=
org/group/std-proposals/">http://groups.google.com/a/isocpp.org/group/std-p=
roposals/</a>.<br>
&gt;<br>
&gt; --<br>
&gt;<br>
&gt; ---<br>
&gt; You received this message because you are subscribed to the Google Gro=
ups &quot;ISO C++ Standard - Future Proposals&quot; group.<br>
&gt; To unsubscribe from this group and stop receiving emails from it, send=
 an email to <a href=3D"mailto:std-proposals%2Bunsubscribe@isocpp.org">std-=
proposals+unsubscribe@isocpp.org</a>.<br>
&gt; To post to this group, send email to <a href=3D"mailto:std-proposals@i=
socpp.org">std-proposals@isocpp.org</a>.<br>
&gt; Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/g=
roup/std-proposals/">http://groups.google.com/a/isocpp.org/group/std-propos=
als/</a>.<br>
</p>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

--089e0160bea64eeeed04f6f48f98--

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Sun, 13 Apr 2014 20:21:46 -0400
Raw View
On Apr 13, 2014, at 7:02 PM, Christopher Jefferson <chris@bubblescope.net> =
wrote:

> The advantage of prefix is that first we read the length, then we know ho=
w many values we have to read to get to the end of the object, so we can re=
construct the original object easily from the values we feed to hash_append=
.. Similarly with recursive objects, we still always read the length first, =
so know how far too read. With postfix we are never sure if we are reading =
another value in the object, or the length at the end of it. Of course it m=
ight turn out even with postfix you can prove there is only one valid readi=
ng, but it is not so obvious to me.

Ok, I guess this is a major difference between parsing and hashing.  The pr=
oposed hash_append signatures are not meant to be parseable.  Just to satis=
fy the rules relating hash_append to operator=3D=3D:

http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/=
master/hashing.html#hash_append_rules

It is my belief that if vector outputs the hash_append of each element foll=
owed by the hash_append of vector.size(), that no two vectors which compare=
 equal will send different messages to the hasher (satisfies rule 1), and t=
hat no two vectors which compare unequal will send the same message to the =
hash (satisfies rule 2).

Admittedly I haven't taken the time to work out a formal proof.  But it see=
ms intuitively obvious to me.  Nonetheless, I would welcome a formal proof.

Note also that no other hash-related paper I'm aware of that has been submi=
tted to the standards committee even deal with this issue at all.  Indeed, =
they seem to leave it up to the client to get right.  This proposal strongl=
y advocates that this detail, however it may be resolved, should be resolve=
d by std::vector itself, not by clients who want to hash a vector.  I've co=
me to believe that asking your clients to do your hashing for you is akin t=
o asking your clients to compute operator=3D=3D for you, or swap for you.

There is only one right answer as to what state of a type should participat=
e in a hash algorithm.  And the type author is in the best position to make=
 that call.

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Nevin Liber <nevin@eviloverlord.com>
Date: Sun, 13 Apr 2014 19:25:42 -0500
Raw View
--047d7bacc0faadb16404f6f5bb48
Content-Type: text/plain; charset=UTF-8

On 13 April 2014 19:21, Howard Hinnant <howard.hinnant@gmail.com> wrote:

> Note also that no other hash-related paper I'm aware of that has been
> submitted to the standards committee even deal with this issue at all.
>  Indeed, they seem to leave it up to the client to get right.  This
> proposal strongly advocates that this detail, however it may be resolved,
> should be resolved by std::vector itself, not by clients who want to hash a
> vector.  I've come to believe that asking your clients to do your hashing
> for you is akin to asking your clients to compute operator== for you, or
> swap for you.
>

+1, especially since a == b --> hash(a) == hash(b).
--
 Nevin ":-)" Liber  <mailto:nevin@eviloverlord.com>  (847) 691-1404

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

--047d7bacc0faadb16404f6f5bb48
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On 13 April 2014 19:21, Howard Hinnant <span dir=3D"ltr">&=
lt;<a href=3D"mailto:howard.hinnant@gmail.com" target=3D"_blank">howard.hin=
nant@gmail.com</a>&gt;</span> wrote:<br><div class=3D"gmail_extra"><div cla=
ss=3D"gmail_quote">

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div class=3D"">Note also that no other hash=
-related paper I&#39;m aware of that has been submitted to the standards co=
mmittee even deal with this issue at all. =C2=A0Indeed, they seem to leave =
it up to the client to get right. =C2=A0This proposal strongly advocates th=
at this detail, however it may be resolved, should be resolved by std::vect=
or itself, not by clients who want to hash a vector. =C2=A0I&#39;ve come to=
 believe that asking your clients to do your hashing for you is akin to ask=
ing your clients to compute operator=3D=3D for you, or swap for you.<br>

</div></blockquote><div><br></div><div>+1, especially since a =3D=3D b --&g=
t; hash(a) =3D=3D hash(b).</div></div>-- <br>=C2=A0Nevin &quot;:-)&quot; Li=
ber=C2=A0 &lt;mailto:<a href=3D"mailto:nevin@eviloverlord.com" target=3D"_b=
lank">nevin@eviloverlord.com</a>&gt;=C2=A0 (847) 691-1404
</div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

--047d7bacc0faadb16404f6f5bb48--

.


Author: Richard Smith <richard@metafoo.co.uk>
Date: Sun, 13 Apr 2014 21:39:18 -0700
Raw View
--001a1134a56e3319ec04f6f9442e
Content-Type: text/plain; charset=ISO-8859-1

On 13 Apr 2014 16:02, "Christopher Jefferson" <chris@bubblescope.net> wrote:
>
>
> On 12 Apr 2014 11:25, "Howard Hinnant" <howard.hinnant@gmail.com> wrote:
> >
> > What is your easy proof that if you prefix with the size of containers
rather than postfix, that there can be no collisions?  I don't doubt that
you are correct that there will be no collisions.  But I fail to see how
prefix vs postfix makes a difference.
> >
>
> The advantage of prefix is that first we read the length, then we know
how many values we have to read to get to the end of the object, so we can
reconstruct the original object easily from the values we feed to
hash_append. Similarly with recursive objects, we still always read the
length first, so know how far too read. With postfix we are never sure if
we are reading another value in the object, or the length at the end of it.

Postfix is exactly the same. Just parse right to left.

> Of course it might turn out even with postfix you can prove there is only
one valid reading, but it is not so obvious to me.
>
> > Howard
> >
> > On Apr 12, 2014, at 10:30 AM, Christopher Jefferson <
chris@bubblescope.net> wrote:
> >
> > > When it comes to avoiding hashing collisions, the only real way of to
> > > prove no collisions is to prove that the data fed into hash_append is
> > > enough to uniquely reconstruct the original object. I would personally
> > > be unhappy about a design where I no implementation of hash_combine
> > > could avoid collisions for (for example) a combination of standard
> > > library classes/containers.
> > >
> > > If you prefix with the size of containers rather than postfix, then it
> > > is easy to prove there are no hash collision issues. The only problem
> > > with doing this is forward_list and list, where you don't know the
> > > size until you have reached the end of the object.
> > >
> > > Another option would be to add a 'mark()' function to hash_append, for
> > > use with variable-length objects. hash_append implementations which
> > > didn't care about collisions could just ignore this function, ones
> > > which did could use this in various ways to ensure collisions do not
> > > occur.
> > >
> > > This is not just made up on the page, this is what I use in my
> > > personal hashing library. I personally implement 'mark()' to place a
> > > randomly generated (at program start time) 32-bit int into the stream,
> > > which I find provides an acceptably low (never occuring in practice)
> > > level of collisions.
> > >
> > > Chris
> > >
> > >
> > >
> > > On 8 April 2014 04:36,  <jgottman6@gmail.com> wrote:
> > >>
> > >>
> > >> On Sunday, April 6, 2014 5:12:52 PM UTC-4, Howard Hinnant wrote:
> > >>>
> > >>> Updated version with many suggestions from this discussion
implemented:
> > >>>
> > >>>
> > >>>
http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html
> > >>>
> > >>> If I have mis-credited or failed to credit anyone, please let me
know.
> > >>> Further constructive criticism welcome.  Thanks to all who have
helped us
> > >>> make this a much better paper.
> > >>>
> > >>>
> > >>
> > >> I think you should standardize type_erased_hasher as well.  It  seems
> > >> complex enough that this would be worthwhile. If it is standardized,
you can
> > >> replace the use of std::function and std::ref with a class that does
the
> > >> right thing and hides std::ref and std::function as implementation
details.
> > >>
> > >> class type_erased_hasher {
> > >> public:
> > >>   template <class Hasher>
> > >>   explicit type_erased_hasher(Hasher &hasher)
> > >>     : h_(std::ref(hasher))
> > >>   {}
> > >>
> > >>   void operator()(const void *key, size_t len) {
> > >>      h_(key, len);
> > >>   }
> > >>
> > >> private:
> > >>   std::function<void (const void *, size_t)> h_;
> > >> };
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >>
> > >> ---
> > >> You received this message because you are subscribed to the Google
Groups
> > >> "ISO C++ Standard - Future Proposals" group.
> > >> To unsubscribe from this group and stop receiving emails from it,
send an
> > >> email to std-proposals+unsubscribe@isocpp.org.
> > >> To post to this group, send email to std-proposals@isocpp.org.
> > >> Visit this group at
> > >> http://groups.google.com/a/isocpp.org/group/std-proposals/.
> > >
> > > --
> > >
> > > ---
> > > You received this message because you are subscribed to the Google
Groups "ISO C++ Standard - Future Proposals" group.
> > > To unsubscribe from this group and stop receiving emails from it,
send an email to std-proposals+unsubscribe@isocpp.org.
> > > To post to this group, send email to std-proposals@isocpp.org.
> > > Visit this group at
http://groups.google.com/a/isocpp.org/group/std-proposals/.
> >
> > --
> >
> > ---
> > You received this message because you are subscribed to the Google
Groups "ISO C++ Standard - Future Proposals" group.
> > To unsubscribe from this group and stop receiving emails from it, send
an email to std-proposals+unsubscribe@isocpp.org.
> > To post to this group, send email to std-proposals@isocpp.org.
> > Visit this group at
http://groups.google.com/a/isocpp.org/group/std-proposals/.
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an
email to std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> Visit this group at
http://groups.google.com/a/isocpp.org/group/std-proposals/.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

--001a1134a56e3319ec04f6f9442e
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<p dir=3D"ltr">On 13 Apr 2014 16:02, &quot;Christopher Jefferson&quot; &lt;=
<a href=3D"mailto:chris@bubblescope.net">chris@bubblescope.net</a>&gt; wrot=
e:<br>
&gt;<br>
&gt;<br>
&gt; On 12 Apr 2014 11:25, &quot;Howard Hinnant&quot; &lt;<a href=3D"mailto=
:howard.hinnant@gmail.com">howard.hinnant@gmail.com</a>&gt; wrote:<br>
&gt; &gt;<br>
&gt; &gt; What is your easy proof that if you prefix with the size of conta=
iners rather than postfix, that there can be no collisions? =A0I don&#39;t =
doubt that you are correct that there will be no collisions. =A0But I fail =
to see how prefix vs postfix makes a difference.<br>

&gt; &gt;<br>
&gt;<br>
&gt; The advantage of prefix is that first we read the length, then we know=
 how many values we have to read to get to the end of the object, so we can=
 reconstruct the original object easily from the values we feed to hash_app=
end. Similarly with recursive objects, we still always read the length firs=
t, so know how far too read. With postfix we are never sure if we are readi=
ng another value in the object, or the length at the end of it.</p>

<p dir=3D"ltr">Postfix is exactly the same. Just parse right to left.</p>
<p dir=3D"ltr"> &gt; Of course it might turn out even with postfix you can =
prove there is only one valid reading, but it is not so obvious to me.<br>
&gt;<br>
&gt; &gt; Howard<br>
&gt; &gt;<br>
&gt; &gt; On Apr 12, 2014, at 10:30 AM, Christopher Jefferson &lt;<a href=
=3D"mailto:chris@bubblescope.net">chris@bubblescope.net</a>&gt; wrote:<br>
&gt; &gt;<br>
&gt; &gt; &gt; When it comes to avoiding hashing collisions, the only real =
way of to<br>
&gt; &gt; &gt; prove no collisions is to prove that the data fed into hash_=
append is<br>
&gt; &gt; &gt; enough to uniquely reconstruct the original object. I would =
personally<br>
&gt; &gt; &gt; be unhappy about a design where I no implementation of hash_=
combine<br>
&gt; &gt; &gt; could avoid collisions for (for example) a combination of st=
andard<br>
&gt; &gt; &gt; library classes/containers.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; If you prefix with the size of containers rather than postfi=
x, then it<br>
&gt; &gt; &gt; is easy to prove there are no hash collision issues. The onl=
y problem<br>
&gt; &gt; &gt; with doing this is forward_list and list, where you don&#39;=
t know the<br>
&gt; &gt; &gt; size until you have reached the end of the object.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; Another option would be to add a &#39;mark()&#39; function t=
o hash_append, for<br>
&gt; &gt; &gt; use with variable-length objects. hash_append implementation=
s which<br>
&gt; &gt; &gt; didn&#39;t care about collisions could just ignore this func=
tion, ones<br>
&gt; &gt; &gt; which did could use this in various ways to ensure collision=
s do not<br>
&gt; &gt; &gt; occur.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; This is not just made up on the page, this is what I use in =
my<br>
&gt; &gt; &gt; personal hashing library. I personally implement &#39;mark()=
&#39; to place a<br>
&gt; &gt; &gt; randomly generated (at program start time) 32-bit int into t=
he stream,<br>
&gt; &gt; &gt; which I find provides an acceptably low (never occuring in p=
ractice)<br>
&gt; &gt; &gt; level of collisions.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; Chris<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; On 8 April 2014 04:36, =A0&lt;<a href=3D"mailto:jgottman6@gm=
ail.com">jgottman6@gmail.com</a>&gt; wrote:<br>
&gt; &gt; &gt;&gt;<br>
&gt; &gt; &gt;&gt;<br>
&gt; &gt; &gt;&gt; On Sunday, April 6, 2014 5:12:52 PM UTC-4, Howard Hinnan=
t wrote:<br>
&gt; &gt; &gt;&gt;&gt;<br>
&gt; &gt; &gt;&gt;&gt; Updated version with many suggestions from this disc=
ussion implemented:<br>
&gt; &gt; &gt;&gt;&gt;<br>
&gt; &gt; &gt;&gt;&gt;<br>
&gt; &gt; &gt;&gt;&gt; <a href=3D"http://htmlpreview.github.io/?https://git=
hub.com/HowardHinnant/papers/blob/master/hashing.html">http://htmlpreview.g=
ithub.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html<=
/a><br>

&gt; &gt; &gt;&gt;&gt;<br>
&gt; &gt; &gt;&gt;&gt; If I have mis-credited or failed to credit anyone, p=
lease let me know.<br>
&gt; &gt; &gt;&gt;&gt; Further constructive criticism welcome. =A0Thanks to=
 all who have helped us<br>
&gt; &gt; &gt;&gt;&gt; make this a much better paper.<br>
&gt; &gt; &gt;&gt;&gt;<br>
&gt; &gt; &gt;&gt;&gt;<br>
&gt; &gt; &gt;&gt;<br>
&gt; &gt; &gt;&gt; I think you should standardize type_erased_hasher as wel=
l. =A0It =A0seems<br>
&gt; &gt; &gt;&gt; complex enough that this would be worthwhile. If it is s=
tandardized, you can<br>
&gt; &gt; &gt;&gt; replace the use of std::function and std::ref with a cla=
ss that does the<br>
&gt; &gt; &gt;&gt; right thing and hides std::ref and std::function as impl=
ementation details.<br>
&gt; &gt; &gt;&gt;<br>
&gt; &gt; &gt;&gt; class type_erased_hasher {<br>
&gt; &gt; &gt;&gt; public:<br>
&gt; &gt; &gt;&gt; =A0 template &lt;class Hasher&gt;<br>
&gt; &gt; &gt;&gt; =A0 explicit type_erased_hasher(Hasher &amp;hasher)<br>
&gt; &gt; &gt;&gt; =A0 =A0 : h_(std::ref(hasher))<br>
&gt; &gt; &gt;&gt; =A0 {}<br>
&gt; &gt; &gt;&gt;<br>
&gt; &gt; &gt;&gt; =A0 void operator()(const void *key, size_t len) {<br>
&gt; &gt; &gt;&gt; =A0 =A0 =A0h_(key, len);<br>
&gt; &gt; &gt;&gt; =A0 }<br>
&gt; &gt; &gt;&gt;<br>
&gt; &gt; &gt;&gt; private:<br>
&gt; &gt; &gt;&gt; =A0 std::function&lt;void (const void *, size_t)&gt; h_;=
<br>
&gt; &gt; &gt;&gt; };<br>
&gt; &gt; &gt;&gt;<br>
&gt; &gt; &gt;&gt;<br>
&gt; &gt; &gt;&gt;<br>
&gt; &gt; &gt;&gt;<br>
&gt; &gt; &gt;&gt;<br>
&gt; &gt; &gt;&gt; --<br>
&gt; &gt; &gt;&gt;<br>
&gt; &gt; &gt;&gt; ---<br>
&gt; &gt; &gt;&gt; You received this message because you are subscribed to =
the Google Groups<br>
&gt; &gt; &gt;&gt; &quot;ISO C++ Standard - Future Proposals&quot; group.<b=
r>
&gt; &gt; &gt;&gt; To unsubscribe from this group and stop receiving emails=
 from it, send an<br>
&gt; &gt; &gt;&gt; email to <a href=3D"mailto:std-proposals%2Bunsubscribe@i=
socpp.org">std-proposals+unsubscribe@isocpp.org</a>.<br>
&gt; &gt; &gt;&gt; To post to this group, send email to <a href=3D"mailto:s=
td-proposals@isocpp.org">std-proposals@isocpp.org</a>.<br>
&gt; &gt; &gt;&gt; Visit this group at<br>
&gt; &gt; &gt;&gt; <a href=3D"http://groups.google.com/a/isocpp.org/group/s=
td-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/</=
a>.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; --<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; ---<br>
&gt; &gt; &gt; You received this message because you are subscribed to the =
Google Groups &quot;ISO C++ Standard - Future Proposals&quot; group.<br>
&gt; &gt; &gt; To unsubscribe from this group and stop receiving emails fro=
m it, send an email to <a href=3D"mailto:std-proposals%2Bunsubscribe@isocpp=
..org">std-proposals+unsubscribe@isocpp.org</a>.<br>
&gt; &gt; &gt; To post to this group, send email to <a href=3D"mailto:std-p=
roposals@isocpp.org">std-proposals@isocpp.org</a>.<br>
&gt; &gt; &gt; Visit this group at <a href=3D"http://groups.google.com/a/is=
ocpp.org/group/std-proposals/">http://groups.google.com/a/isocpp.org/group/=
std-proposals/</a>.<br>
&gt; &gt;<br>
&gt; &gt; --<br>
&gt; &gt;<br>
&gt; &gt; ---<br>
&gt; &gt; You received this message because you are subscribed to the Googl=
e Groups &quot;ISO C++ Standard - Future Proposals&quot; group.<br>
&gt; &gt; To unsubscribe from this group and stop receiving emails from it,=
 send an email to <a href=3D"mailto:std-proposals%2Bunsubscribe@isocpp.org"=
>std-proposals+unsubscribe@isocpp.org</a>.<br>
&gt; &gt; To post to this group, send email to <a href=3D"mailto:std-propos=
als@isocpp.org">std-proposals@isocpp.org</a>.<br>
&gt; &gt; Visit this group at <a href=3D"http://groups.google.com/a/isocpp.=
org/group/std-proposals/">http://groups.google.com/a/isocpp.org/group/std-p=
roposals/</a>.<br>
&gt;<br>
&gt; -- <br>
&gt;<br>
&gt; --- <br>
&gt; You received this message because you are subscribed to the Google Gro=
ups &quot;ISO C++ Standard - Future Proposals&quot; group.<br>
&gt; To unsubscribe from this group and stop receiving emails from it, send=
 an email to <a href=3D"mailto:std-proposals%2Bunsubscribe@isocpp.org">std-=
proposals+unsubscribe@isocpp.org</a>.<br>
&gt; To post to this group, send email to <a href=3D"mailto:std-proposals@i=
socpp.org">std-proposals@isocpp.org</a>.<br>
&gt; Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/g=
roup/std-proposals/">http://groups.google.com/a/isocpp.org/group/std-propos=
als/</a>.<br>
</p>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

--001a1134a56e3319ec04f6f9442e--

.


Author: Jeremy Maitin-Shepard <jeremy@jeremyms.com>
Date: Sun, 13 Apr 2014 23:27:36 -0700 (PDT)
Raw View
------=_Part_323_28006788.1397456856475
Content-Type: text/plain; charset=UTF-8

On Sunday, April 13, 2014 9:39:18 PM UTC-7, Richard Smith wrote:
>
> Postfix is exactly the same. Just parse right to left.


It seems it may make sense to standardize (for user-defined types as well,
not just for the standard containers) that the hashing be done in such a
way as to allow parsing from right to left, since as has been noted,
combining right-to-left parsable hash sequences with left-to-right parsable
hash sequences can result in ambiguity.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

------=_Part_323_28006788.1397456856475
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Sunday, April 13, 2014 9:39:18 PM UTC-7, Richard Smith =
wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8=
ex;border-left: 1px #ccc solid;padding-left: 1ex;">Postfix is exactly the s=
ame. Just parse right to left.
</blockquote><div><br>It seems it may make sense to standardize (for user-d=
efined types as well, not just for the standard containers) that the hashin=
g be done in such a way as to allow parsing from right to left, since as ha=
s been noted, combining right-to-left parsable hash sequences with left-to-=
right parsable hash sequences can result in ambiguity.<br><br></div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

------=_Part_323_28006788.1397456856475--

.


Author: Christopher Jefferson <chris@bubblescope.net>
Date: Mon, 14 Apr 2014 09:52:26 +0100
Raw View
On 14 April 2014 01:21, Howard Hinnant <howard.hinnant@gmail.com> wrote:
> On Apr 13, 2014, at 7:02 PM, Christopher Jefferson <chris@bubblescope.net=
> wrote:
>
>> The advantage of prefix is that first we read the length, then we know h=
ow many values we have to read to get to the end of the object, so we can r=
econstruct the original object easily from the values we feed to hash_appen=
d. Similarly with recursive objects, we still always read the length first,=
 so know how far too read. With postfix we are never sure if we are reading=
 another value in the object, or the length at the end of it. Of course it =
might turn out even with postfix you can prove there is only one valid read=
ing, but it is not so obvious to me.
>
> Ok, I guess this is a major difference between parsing and hashing.  The =
proposed hash_append signatures are not meant to be parseable.  Just to sat=
isfy the rules relating hash_append to operator=3D=3D:
>
> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blo=
b/master/hashing.html#hash_append_rules
>
> It is my belief that if vector outputs the hash_append of each element fo=
llowed by the hash_append of vector.size(), that no two vectors which compa=
re equal will send different messages to the hasher (satisfies rule 1), and=
 that no two vectors which compare unequal will send the same message to th=
e hash (satisfies rule 2).
>
> Admittedly I haven't taken the time to work out a formal proof.  But it s=
eems intuitively obvious to me.  Nonetheless, I would welcome a formal proo=
f.
>

I will construct such a proof. I believe the easiest way to construct
such a proof is to show the output is parseable (I believe the
hash_append rules and "the output of hash_append being parsable back
to the original object" will be equivalent for most sensible
implementations). Furthermore, I now agree that both prefix and
postfix are correct.

> Note also that no other hash-related paper I'm aware of that has been sub=
mitted to the standards committee even deal with this issue at all.  Indeed=
, they seem to leave it up to the client to get right.  This proposal stron=
gly advocates that this detail, however it may be resolved, should be resol=
ved by std::vector itself, not by clients who want to hash a vector.  I've =
come to believe that asking your clients to do your hashing for you is akin=
 to asking your clients to compute operator=3D=3D for you, or swap for you.

I agree. This setup is better than previous setups. But if we take the
choice off the client, we obviously have to make sure we get it right,
as users and implementors can't fix out mistakes!

However, I now agree the following is true, and will write out a proof
just to satisfy myself:

Hashing a variable-sized container either prefix or postfix (where
either prefix or postfix is used for all containers), and fixed size
containers simply enumerating the elements (and the same for user
types which can be treated like a pair/tuple), leads to no
'hash_append' collisions in the standard library, where (just for
clarity):

We treat std::map (and std::multimap) as a container of pairs.
We treat std::string like a container of chars, and prefix or postfix
the size like every other container.
We treat std::optional as a container of size 0 (disengaged) or size 1
(engaged).
I ignore the hashed containers for now.

We would then teach users that we recommend they follow the same rules
(just hash_append any fixed length data, postfix (if that's what we
decided) the length of any variable length data), although obviously
we wouldn't enforce that rule.

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Mon, 14 Apr 2014 10:52:10 -0400
Raw View
On Apr 14, 2014, at 4:52 AM, Christopher Jefferson <chris@bubblescope.net> =
wrote:

> We treat std::map (and std::multimap) as a container of pairs.

Agreed.

> We treat std::string like a container of chars, and prefix or postfix
> the size like every other container.

I much prefer to make an exception of std::basic_string.  We can make std::=
basic_string unambiguous by hashing the trailing null.  Doing so has one hu=
ge benefit:

hash_append(h, std::string("any text at all")) =3D=3D hash_append(h, "any t=
ext at all")

for any h. (one can't actually compare the results of hash_apppend using th=
e operator=3D=3D syntax)  Note that "any text at all" has type const char[1=
6], which includes the trailing null char.  Also note that const char[16] i=
s a fixed length type, and can be unambiguously hashed by the same rules yo=
u've already stated for other fixed length containers.  Note also that this=
 is consistent with existing operator=3D=3D overloads:

std::string("any text at all") =3D=3D "any text at all"

> We treat std::optional as a container of size 0 (disengaged) or size 1
> (engaged).

Sounds good.

> I ignore the hashed containers for now.

As it turns out, one can not hash an unordered container.  There is no way =
to satisfy rule 1:

http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/=
master/hashing.html#hash_append_rules

because two unordered containers can compare equal, and yet have their elem=
ents in different orders.  hash_append can not specify the proper order to =
hash_append the elements.  Therefore hash_append for unordered containers s=
imply will not exist, so if someone tries to hash an unordered container, t=
hey will get a compile-time error.

> We would then teach users that we recommend they follow the same rules
> (just hash_append any fixed length data, postfix (if that's what we
> decided) the length of any variable length data), although obviously
> we wouldn't enforce that rule.

Agreed completely.

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Miro Knejp <miro@knejp.de>
Date: Mon, 14 Apr 2014 17:37:54 +0200
Raw View
Am 14.04.2014 16:52, schrieb Howard Hinnant:
> size like every other container.
> I much prefer to make an exception of std::basic_string.  We can make std=
::basic_string unambiguous by hashing the trailing null.  Doing so has one =
huge benefit:
>
> hash_append(h, std::string("any text at all")) =3D=3D hash_append(h, "any=
 text at all")
>
> for any h. (one can't actually compare the results of hash_apppend using =
the operator=3D=3D syntax)  Note that "any text at all" has type const char=
[16], which includes the trailing null char.  Also note that const char[16]=
 is a fixed length type, and can be unambiguously hashed by the same rules =
you've already stated for other fixed length containers.  Note also that th=
is is consistent with existing operator=3D=3D overloads:
>
> std::string("any text at all") =3D=3D "any text at all"
>
This will unfortunatly not work with string_view should it become part=20
of std, then it would certainly be beneficial if hashers produced the=20
same value for all three variations of strings we have, i.e. string,=20
string_view and const char[n]. Fortunately string_view can be implicitly=20
constructed from const char*, so maybe it's worth elaborating if it=20
would be better to omit hash_append(const char[N]) but add=20
hash_append(string_view) instead and not hash the trailing \0 in string.

Miro

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Mon, 14 Apr 2014 12:13:41 -0400
Raw View
On Apr 14, 2014, at 11:37 AM, Miro Knejp <miro@knejp.de> wrote:

>=20
> Am 14.04.2014 16:52, schrieb Howard Hinnant:
>> size like every other container.
>> I much prefer to make an exception of std::basic_string.  We can make st=
d::basic_string unambiguous by hashing the trailing null.  Doing so has one=
 huge benefit:
>>=20
>> hash_append(h, std::string("any text at all")) =3D=3D hash_append(h, "an=
y text at all")
>>=20
>> for any h. (one can't actually compare the results of hash_apppend using=
 the operator=3D=3D syntax)  Note that "any text at all" has type const cha=
r[16], which includes the trailing null char.  Also note that const char[16=
] is a fixed length type, and can be unambiguously hashed by the same rules=
 you've already stated for other fixed length containers.  Note also that t=
his is consistent with existing operator=3D=3D overloads:
>>=20
>> std::string("any text at all") =3D=3D "any text at all"
>>=20
> This will unfortunatly not work with string_view should it become part of=
 std, then it would certainly be beneficial if hashers produced the same va=
lue for all three variations of strings we have, i.e. string, string_view a=
nd const char[n]. Fortunately string_view can be implicitly constructed fro=
m const char*, so maybe it's worth elaborating if it would be better to omi=
t hash_append(const char[N]) but add hash_append(string_view) instead and n=
ot hash the trailing \0 in string.

You bring up a good point, and I agree that all 3 should hash identically. =
 However I do not think there is a problem in hashing the appending null ch=
ar:

template <class Hasher, class charT, class traits>
std::enable_if_t
<
    is_contiguously_hashable<charT>{}
>
hash_append (Hasher& h, basic_string_view<charT, traits> const& sv)
{
    h (sv.data(), sv.data() + sv.size() * sizeof(charT));
    hash_append (h, charT{});
}

template <class Hasher, class charT, class traits>
std::enable_if_t
<
    !is_contiguously_hashable<charT>{}
>
hash_append (Hasher& h, basic_string_view<charT, traits> const& sv)
{
    for (charT const& c : sv)
        hash_append (h, c);
    hash_append (h, charT{});
}

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Jeremy Maitin-Shepard <jeremy@jeremyms.com>
Date: Mon, 14 Apr 2014 09:44:27 -0700 (PDT)
Raw View
------=_Part_889_20291212.1397493867574
Content-Type: text/plain; charset=UTF-8

On Monday, April 14, 2014 9:13:41 AM UTC-7, Howard Hinnant wrote:
>
> You bring up a good point, and I agree that all 3 should hash identically.
>  However I do not think there is a problem in hashing the appending null
> char:
>
>
Appending a 0 character does not produce a parseable result, either left to
right or right to left.  The string may itself contain an embedded 0,
resulting in an ambiguity in left to right parsing:

Consider:
pair<string,string>(string("Hello\0World",11), string("Hello",5))
pair<string,string>(string("Hello",5), string("World\0Hello",11))

Under the proposed scheme of hashing the character data followed by 0, both
will hash to:

Hello\0World\0Hello\0

Even without embedded 0 characters, it is never right-to-left parseable,
which means it could result in an ambiguity if combined with right-to-left
parseable types (i.e. all other containers).

It seems to me it would be better for string and string_view to remain
consistent with other containers and hash just the container contents
(excluding the implicit trailing 0 character in the case of string)
followed by the length.

Trying to ensure hash consistency between objects of different types that
compare equal is almost certainly impractical.  Consider integers of
different sizes, integer vs double, static vs variable-length containers.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

------=_Part_889_20291212.1397493867574
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Monday, April 14, 2014 9:13:41 AM UTC-7, Howard Hinnant=
 wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.=
8ex;border-left: 1px #ccc solid;padding-left: 1ex;">You bring up a good poi=
nt, and I agree that all 3 should hash identically. &nbsp;However I do not =
think there is a problem in hashing the appending null char:
<br>
<br></blockquote><div><br>Appending a 0 character does not produce a parsea=
ble result, either left to right or right to left.&nbsp; The string may its=
elf contain an embedded 0, resulting in an ambiguity in left to right parsi=
ng:<br><br>Consider:<br>pair&lt;string,string&gt;(string("Hello\0World",11)=
, string("Hello",5))<br>pair&lt;string,string&gt;(string("Hello",5), string=
("World\0Hello",11))<br><br>Under the proposed scheme of hashing the charac=
ter data followed by 0, both will hash to:<br><br>Hello\0World\0Hello\0<br>=
<br>Even without embedded 0 characters, it is never right-to-left parseable=
, which means it could result in an ambiguity if combined with right-to-lef=
t parseable types (i.e. all other containers).<br><br>It seems to me it wou=
ld be better for string and string_view to remain consistent with other con=
tainers and hash just the container contents (excluding the implicit traili=
ng 0 character in the case of string) followed by the length.<br><br>Trying=
 to ensure hash consistency between objects of different types that compare=
 equal is almost certainly impractical.&nbsp; Consider integers of differen=
t sizes, integer vs double, static vs variable-length containers.<br><br></=
div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

------=_Part_889_20291212.1397493867574--

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Mon, 14 Apr 2014 13:14:19 -0400
Raw View
On Apr 14, 2014, at 12:44 PM, Jeremy Maitin-Shepard <jeremy@jeremyms.com> wrote:

> On Monday, April 14, 2014 9:13:41 AM UTC-7, Howard Hinnant wrote:
> You bring up a good point, and I agree that all 3 should hash identically.  However I do not think there is a problem in hashing the appending null char:
>
>
> Appending a 0 character does not produce a parseable result, either left to right or right to left.  The string may itself contain an embedded 0, resulting in an ambiguity in left to right parsing:
>
> Consider:
> pair<string,string>(string("Hello\0World",11), string("Hello",5))
> pair<string,string>(string("Hello",5), string("World\0Hello",11))
>
> Under the proposed scheme of hashing the character data followed by 0, both will hash to:
>
> Hello\0World\0Hello\0

That is a compelling example, thanks!

>
> Even without embedded 0 characters, it is never right-to-left parseable, which means it could result in an ambiguity if combined with right-to-left parseable types (i.e. all other containers).
>
> It seems to me it would be better for string and string_view to remain consistent with other containers and hash just the container contents (excluding the implicit trailing 0 character in the case of string) followed by the length.

I'm now wondering if T[N] should be appended with size_t(N), and/or if we should specialize char[N] et al.  I will give this more thought.

>
> Trying to ensure hash consistency between objects of different types that compare equal is almost certainly impractical.  Consider integers of different sizes, integer vs double, static vs variable-length containers.

I'm hoping to lay the groundwork for:

unordered_set<string> s;
// ...
s.find("some text");  // avoids constructing string("some text")

Howard

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Jeremy Maitin-Shepard <jeremy@jeremyms.com>
Date: Mon, 14 Apr 2014 10:53:22 -0700 (PDT)
Raw View
------=_Part_965_11540763.1397498002845
Content-Type: text/plain; charset=UTF-8

On Monday, April 14, 2014 7:52:10 AM UTC-7, Howard Hinnant wrote:
>
> As it turns out, one can not hash an unordered container.  There is no way
> to satisfy rule 1:
>
>
> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html#hash_append_rules
>
> because two unordered containers can compare equal, and yet have their
> elements in different orders.  hash_append can not specify the proper order
> to hash_append the elements.  Therefore hash_append for unordered
> containers simply will not exist, so if someone tries to hash an unordered
> container, they will get a compile-time error.
>

A standard approach for hashing unordered containers (taken by Java, for
instance) is to sum the hash codes of the elements:

http://stackoverflow.com/questions/18021643/hashing-a-set-of-integers-in-an-order-independent-way

This seems to be an unfortunate limitation of the proposal, in that there
is a very useful form of hash function composition that is not supported by
it.  Summing the hash codes seems to be fundamentally incompatible with the
message-based hash_append scheme.  However, if the proposal is accepted,
hash_append will become the standard way to compose hash functions, and
users would have to manually define the hash function for any type
(transitively) containing unordered_{set,map} (or any other user-defined
hash map) essentially from scratch.

A workaround would be to define hash_append for unordered containers as
follows (ignoring the extra template parameters to unordered_set):

template <class Hasher, class T>
void hash_append(Hasher &h, unordered_set<T> const &x) {
    typename Hasher::result_type sum = 0;
    for (auto &&y : x) {
        Hasher h2 = h; // Note 1
        hash_append(h2, y);
        sum += static_cast<typename Hasher::result_type>(h2); // Note 2
    }
    hash_append(h, sum);
}

Note 1: Default constructing h2 does not make sense if h contains important
runtime state independent of the data that has been hashed, for instance
parameters of the hash function.  On the other hand, copy constructing h2
from h would likely copy the accumulated data state as well, which isn't
really desired, though possibly not harmful.

Note 2: This assumes that result_type is summable, which would not
necessarily be a requirement on Hasher.  For some hash functions, e.g.
SHA1, it might be more natural to produce an array of bytes, which might
require some encapsulation to become summable.

I don't know to what extent this definition would preserve the
collision-avoidance properties of the original hash function.  If this
definition of hash_append for unordered_set were included in the standard
library, users would have to be warned that it behaves differently than
other hash_append definitions, and may be more prone to collisions.  Still,
for the common case where the user just wishes to use a type that contains
unordered_{set,map} as the key of another unordered_{set,map}, this would
be very suitable and convenient.  If the definition were not included in
the standard library, a user would be forced to rely on a custom Hasher
type or a user-defined type as the element of unordered_{set,map} in order
for ADL to find the definition, which would be much less convenient.  That
is a strong argument in favor of including the definition.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

------=_Part_965_11540763.1397498002845
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Monday, April 14, 2014 7:52:10 AM UTC-7, Howard Hinnant=
 wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.=
8ex;border-left: 1px #ccc solid;padding-left: 1ex;">As it turns out, one ca=
n not hash an unordered container. &nbsp;There is no way to satisfy rule 1:
<br>
<br><a href=3D"http://htmlpreview.github.io/?https://github.com/HowardHinna=
nt/papers/blob/master/hashing.html#hash_append_rules" target=3D"_blank" onm=
ousedown=3D"this.href=3D'http://www.google.com/url?q\75http%3A%2F%2Fhtmlpre=
view.github.io%2F%3Fhttps%3A%2F%2Fgithub.com%2FHowardHinnant%2Fpapers%2Fblo=
b%2Fmaster%2Fhashing.html%23hash_append_rules\46sa\75D\46sntz\0751\46usg\75=
AFQjCNGsOYwWol7lNICjPbjbaAILbWr1Tg';return true;" onclick=3D"this.href=3D'h=
ttp://www.google.com/url?q\75http%3A%2F%2Fhtmlpreview.github.io%2F%3Fhttps%=
3A%2F%2Fgithub.com%2FHowardHinnant%2Fpapers%2Fblob%2Fmaster%2Fhashing.html%=
23hash_append_rules\46sa\75D\46sntz\0751\46usg\75AFQjCNGsOYwWol7lNICjPbjbaA=
ILbWr1Tg';return true;">http://htmlpreview.github.io/?<wbr>https://github.c=
om/<wbr>HowardHinnant/papers/blob/<wbr>master/hashing.html#hash_<wbr>append=
_rules</a>
<br>
<br>because two unordered containers can compare equal, and yet have their =
elements in different orders. &nbsp;hash_append can not specify the proper =
order to hash_append the elements. &nbsp;Therefore hash_append for unordere=
d containers simply will not exist, so if someone tries to hash an unordere=
d container, they will get a compile-time error.
<br></blockquote><div><br>A standard approach for hashing unordered contain=
ers (taken by Java, for instance) is to sum the hash codes of the elements:=
<br><br>http://stackoverflow.com/questions/18021643/hashing-a-set-of-intege=
rs-in-an-order-independent-way<br><br>This seems to be an unfortunate limit=
ation of the proposal, in that there is a very useful form of hash function=
 composition that is not supported by it.&nbsp; Summing the hash codes seem=
s to be fundamentally incompatible with the message-based hash_append schem=
e.&nbsp; However, if the proposal is accepted, hash_append will become the =
standard way to compose hash functions, and users would have to manually de=
fine the hash function for any type (transitively) containing unordered_{se=
t,map} (or any other user-defined hash map) essentially from scratch.<br><b=
r>A workaround would be to define hash_append for unordered containers as f=
ollows (ignoring the extra template parameters to unordered_set):<br><br>te=
mplate &lt;class Hasher, class T&gt;<br>void hash_append(Hasher &amp;h, uno=
rdered_set&lt;T&gt; const &amp;x) {<br>&nbsp;&nbsp;&nbsp; typename Hasher::=
result_type sum =3D 0;<br>&nbsp;&nbsp;&nbsp; for (auto &amp;&amp;y : x) {<b=
r>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Hasher h2 =3D h; // Note 1<br>=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; hash_append(h2, y);<br>&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; sum +=3D static_cast&lt;typename Hasher::=
result_type&gt;(h2); // Note 2<br>&nbsp;&nbsp;&nbsp; }<br>&nbsp;&nbsp;&nbsp=
; hash_append(h, sum);<br>}<br><br>Note 1: Default constructing h2 does not=
 make sense if h contains important runtime state independent of the data t=
hat has been hashed, for instance parameters of the hash function.&nbsp; On=
 the other hand, copy constructing h2 from h would likely copy the accumula=
ted data state as well, which isn't really desired, though possibly not har=
mful.<br><br>Note 2: This assumes that result_type is summable, which would=
 not necessarily be a requirement on Hasher.&nbsp; For some hash functions,=
 e.g. SHA1, it might be more natural to produce an array of bytes, which mi=
ght require some encapsulation to become summable.<br><br>I don't know to w=
hat extent this definition would preserve the collision-avoidance propertie=
s of the original hash function.&nbsp; If this definition of hash_append fo=
r unordered_set were included in the standard library, users would have to =
be warned that it behaves differently than other hash_append definitions, a=
nd may be more prone to collisions.&nbsp; Still, for the common case where =
the user just wishes to use a type that contains unordered_{set,map} as the=
 key of another unordered_{set,map}, this would be very suitable and conven=
ient.&nbsp; If the definition were not included in the standard library, a =
user would be forced to rely on a custom Hasher type or a user-defined type=
 as the element of unordered_{set,map} in order for ADL to find the definit=
ion, which would be much less convenient.&nbsp; That is a strong argument i=
n favor of including the definition.<br></div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

------=_Part_965_11540763.1397498002845--

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Mon, 14 Apr 2014 13:58:13 -0400
Raw View
On Apr 14, 2014, at 1:53 PM, Jeremy Maitin-Shepard <jeremy@jeremyms.com> wr=
ote:

> On Monday, April 14, 2014 7:52:10 AM UTC-7, Howard Hinnant wrote:
> As it turns out, one can not hash an unordered container.  There is no wa=
y to satisfy rule 1:=20
>=20
> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blo=
b/master/hashing.html#hash_append_rules=20
>=20
> because two unordered containers can compare equal, and yet have their el=
ements in different orders.  hash_append can not specify the proper order t=
o hash_append the elements.  Therefore hash_append for unordered containers=
 simply will not exist, so if someone tries to hash an unordered container,=
 they will get a compile-time error.=20
>=20
> A standard approach for hashing unordered containers (taken by Java, for =
instance) is to sum the hash codes of the elements:
>=20
> http://stackoverflow.com/questions/18021643/hashing-a-set-of-integers-in-=
an-order-independent-way
>=20
> This seems to be an unfortunate limitation of the proposal, in that there=
 is a very useful form of hash function composition that is not supported b=
y it.  Summing the hash codes seems to be fundamentally incompatible with t=
he message-based hash_append scheme.  However, if the proposal is accepted,=
 hash_append will become the standard way to compose hash functions, and us=
ers would have to manually define the hash function for any type (transitiv=
ely) containing unordered_{set,map} (or any other user-defined hash map) es=
sentially from scratch.
>=20
> A workaround would be to define hash_append for unordered containers as f=
ollows (ignoring the extra template parameters to unordered_set):
>=20
> template <class Hasher, class T>
> void hash_append(Hasher &h, unordered_set<T> const &x) {
>     typename Hasher::result_type sum =3D 0;
>     for (auto &&y : x) {
>         Hasher h2 =3D h; // Note 1
>         hash_append(h2, y);
>         sum +=3D static_cast<typename Hasher::result_type>(h2); // Note 2
>     }
>     hash_append(h, sum);
> }
>=20
> Note 1: Default constructing h2 does not make sense if h contains importa=
nt runtime state independent of the data that has been hashed, for instance=
 parameters of the hash function.  On the other hand, copy constructing h2 =
from h would likely copy the accumulated data state as well, which isn't re=
ally desired, though possibly not harmful.
>=20
> Note 2: This assumes that result_type is summable, which would not necess=
arily be a requirement on Hasher.  For some hash functions, e.g. SHA1, it m=
ight be more natural to produce an array of bytes, which might require some=
 encapsulation to become summable.
>=20
> I don't know to what extent this definition would preserve the collision-=
avoidance properties of the original hash function.  If this definition of =
hash_append for unordered_set were included in the standard library, users =
would have to be warned that it behaves differently than other hash_append =
definitions, and may be more prone to collisions.  Still, for the common ca=
se where the user just wishes to use a type that contains unordered_{set,ma=
p} as the key of another unordered_{set,map}, this would be very suitable a=
nd convenient.  If the definition were not included in the standard library=
, a user would be forced to rely on a custom Hasher type or a user-defined =
type as the element of unordered_{set,map} in order for ADL to find the def=
inition, which would be much less convenient.  That is a strong argument in=
 favor of including the definition.

Chris Jefferson also brought this to my attention.

Thanks, I will give this issue more thought.

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Jeremy Maitin-Shepard <jeremy@jeremyms.com>
Date: Mon, 14 Apr 2014 11:07:41 -0700 (PDT)
Raw View
------=_Part_745_9326109.1397498861328
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Monday, April 14, 2014 10:14:19 AM UTC-7, Howard Hinnant wrote:
>
> I=E2=80=99m now wondering if T[N] should be appended with size_t(N), and/=
or if we=20
> should specialize char[N] et al.  I will give this more thought.=20
>
If there is an array_view type, then it really doesn't matter much whether=
=20
the size is appended or not.  (array_view would presumably be consistent=20
with vector.)

Specializing char[N] seems like a bad idea to me.

>=20
> > Trying to ensure hash consistency between objects of different types=20
> that compare equal is almost certainly impractical.  Consider integers of=
=20
> different sizes, integer vs double, static vs variable-length containers.=
=20
>
> I=E2=80=99m hoping to lay the groundwork for:=20
>
> unordered_set<string> s;=20
> // =E2=80=A6=20
> s.find(=E2=80=9Csome text=E2=80=9D);  // avoids constructing string(=E2=
=80=9Csome text=E2=80=9D)=20
>
>
How about:

s.find(string_view("some text"))

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

------=_Part_745_9326109.1397498861328
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Monday, April 14, 2014 10:14:19 AM UTC-7, Howard Hinnan=
t wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0=
..8ex;border-left: 1px #ccc solid;padding-left: 1ex;">I=E2=80=99m now wonder=
ing if T[N] should be appended with size_t(N), and/or if we should speciali=
ze char[N] et al. &nbsp;I will give this more thought.
<br></blockquote>If there is an array_view type, then it really doesn't mat=
ter much whether the size is appended or not.&nbsp; (array_view would presu=
mably be consistent with vector.)<br><br>Specializing char[N] seems like a =
bad idea to me.<br><br><blockquote class=3D"gmail_quote" style=3D"margin: 0=
;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">&gt;=20
<br>&gt; Trying to ensure hash consistency between objects of different typ=
es that compare equal is almost certainly impractical. &nbsp;Consider integ=
ers of different sizes, integer vs double, static vs variable-length contai=
ners.
<br>
<br>I=E2=80=99m hoping to lay the groundwork for:
<br>
<br>unordered_set&lt;string&gt; s;
<br>// =E2=80=A6
<br>s.find(=E2=80=9Csome text=E2=80=9D); &nbsp;// avoids constructing strin=
g(=E2=80=9Csome text=E2=80=9D)
<br>
<br></blockquote><div><br>How about:<br><br>s.find(string_view("some text")=
)<br><br></div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

------=_Part_745_9326109.1397498861328--

.


Author: Daniel James <dnljms@gmail.com>
Date: Mon, 14 Apr 2014 20:08:28 +0100
Raw View
On 14 April 2014 18:14, Howard Hinnant <howard.hinnant@gmail.com> wrote:
> I'm now wondering if T[N] should be appended with size_t(N), and/or if we should specialize char[N] et al.  I will give this more thought.

Should note that since operator==(T[N],T[N]) compares the addresses of
the arrays, hashing the contents doesn't follow your guideline for
calling hash_append. Strictly speaking the address should be hashed,
although that might be error prone.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Jeremy Maitin-Shepard <jeremy@jeremyms.com>
Date: Mon, 14 Apr 2014 13:20:12 -0700 (PDT)
Raw View
------=_Part_3089_2156620.1397506812976
Content-Type: text/plain; charset=UTF-8



On Monday, April 14, 2014 12:08:28 PM UTC-7, Daniel James wrote:
>
> On 14 April 2014 18:14, Howard Hinnant <howard....@gmail.com <javascript:>>
> wrote:
> > I'm now wondering if T[N] should be appended with size_t(N), and/or if
> we should specialize char[N] et al.  I will give this more thought.
>
> Should note that since operator==(T[N],T[N]) compares the addresses of
> the arrays, hashing the contents doesn't follow your guideline for
> calling hash_append. Strictly speaking the address should be hashed,
> although that might be error prone.
>

It is hard to imagine any case in which decaying to a pointer would be the
desired behavior for the hash function.  The safest choice might be to
poison hash_append for T[N] to avoid unexpected behavior.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

------=_Part_3089_2156620.1397506812976
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br>On Monday, April 14, 2014 12:08:28 PM UTC-7, Danie=
l James wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-l=
eft: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">On 14 April 2014=
 18:14, Howard Hinnant &lt;<a href=3D"javascript:" target=3D"_blank" gdf-ob=
fuscated-mailto=3D"pePmbhiqD8QJ" onmousedown=3D"this.href=3D'javascript:';r=
eturn true;" onclick=3D"this.href=3D'javascript:';return true;">howard....@=
gmail.com</a>&gt; wrote:
<br>&gt; I'm now wondering if T[N] should be appended with size_t(N), and/o=
r if we should specialize char[N] et al. &nbsp;I will give this more though=
t.
<br>
<br>Should note that since operator=3D=3D(T[N],T[N]) compares the addresses=
 of
<br>the arrays, hashing the contents doesn't follow your guideline for
<br>calling hash_append. Strictly speaking the address should be hashed,
<br>although that might be error prone.
<br></blockquote><div><br>It is hard to imagine any case in which decaying =
to a pointer would be the desired behavior for the hash function.&nbsp; The=
 safest choice might be to poison hash_append for T[N] to avoid unexpected =
behavior.<br></div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

------=_Part_3089_2156620.1397506812976--

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Sat, 26 Apr 2014 21:58:10 -0400
Raw View
Updated:

http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/=
master/hashing.html

Lots of changes, including:

* std::string is no longer special.  It hashes just like vector.

* unordered containers are mentioned, but not handled.  Examples of ways to=
 handle them are included.

* New requirement for Hashers:  CopyConstructible, CopyAssignable, value se=
mantics.

* Updated type_erase_hasher example to reflect new requirements.

* Updated tests.

* Updated comparison / contrast with N3333 (now using llvm implementation o=
f N3333).

* Pointer to github implementation of this proposal + lots of example code.

* New siphash adaptor.  Really nice.  Authors claim algorithm is cryptograp=
hically secure.  And the speed is competitive, especially on longer message=
 sizes.  On shorter messages the very simple FNV-1a is hard to beat.  Empha=
sis is placed not on the quality on any one hashing algorithm, but the abil=
ity to easily change among them.

* New quality tests.

Still to-do:  proposed wording.

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Jeremy Maitin-Shepard <jeremy@jeremyms.com>
Date: Sun, 27 Apr 2014 01:09:13 -0700 (PDT)
Raw View
------=_Part_2472_13757677.1398586153617
Content-Type: text/plain; charset=UTF-8

On Saturday, April 26, 2014 6:58:10 PM UTC-7, Howard Hinnant wrote:
>
> Updated:
>
>
> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html
>
> [snip]

> * unordered containers are mentioned, but not handled.  Examples of ways
> to handle them are included.
>

There is one issue with hashing unordered_{set,map} by first sorting the
keys: they only have an equal_to predicate but not an ordering predicate.
(A different, generic hash table library might have an ordering predicate,
though.)

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

------=_Part_2472_13757677.1398586153617
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Saturday, April 26, 2014 6:58:10 PM UTC-7, Howard Hinna=
nt wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: =
0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">Updated:
<br>
<br><a href=3D"http://htmlpreview.github.io/?https://github.com/HowardHinna=
nt/papers/blob/master/hashing.html" target=3D"_blank" onmousedown=3D"this.h=
ref=3D'http://www.google.com/url?q\75http%3A%2F%2Fhtmlpreview.github.io%2F%=
3Fhttps%3A%2F%2Fgithub.com%2FHowardHinnant%2Fpapers%2Fblob%2Fmaster%2Fhashi=
ng.html\46sa\75D\46sntz\0751\46usg\75AFQjCNHP0pXriy_dMNhgyflMJ_r9qoA63Q';re=
turn true;" onclick=3D"this.href=3D'http://www.google.com/url?q\75http%3A%2=
F%2Fhtmlpreview.github.io%2F%3Fhttps%3A%2F%2Fgithub.com%2FHowardHinnant%2Fp=
apers%2Fblob%2Fmaster%2Fhashing.html\46sa\75D\46sntz\0751\46usg\75AFQjCNHP0=
pXriy_dMNhgyflMJ_r9qoA63Q';return true;">http://htmlpreview.github.io/?<wbr=
>https://github.com/<wbr>HowardHinnant/papers/blob/<wbr>master/hashing.html=
</a>
<br>
<br></blockquote><div>[snip] <br></div><blockquote class=3D"gmail_quote" st=
yle=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-lef=
t: 1ex;">* unordered containers are mentioned, but not handled. &nbsp;Examp=
les of ways to handle them are included.
<br></blockquote><div><br>There is one issue with hashing unordered_{set,ma=
p} by first sorting the keys: they only have an equal_to predicate but not =
an ordering predicate.&nbsp; (A different, generic hash table library might=
 have an ordering predicate, though.)<br></div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

------=_Part_2472_13757677.1398586153617--

.


Author: Christopher Jefferson <chris@bubblescope.net>
Date: Sun, 27 Apr 2014 12:44:11 +0100
Raw View
On 27 April 2014 09:09, Jeremy Maitin-Shepard <jeremy@jeremyms.com> wrote:
> On Saturday, April 26, 2014 6:58:10 PM UTC-7, Howard Hinnant wrote:
>>
>> Updated:
>>
>>
>> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html
>>
> [snip]
>>
>> * unordered containers are mentioned, but not handled.  Examples of ways
>> to handle them are included.
>
>
> There is one issue with hashing unordered_{set,map} by first sorting the
> keys: they only have an equal_to predicate but not an ordering predicate.
> (A different, generic hash table library might have an ordering predicate,
> though.)

That is true. A better route would be to count the number of
occurrences of each value (which is very cheap if there is few hash
collisions, and basically free if there are no hash collisions), hash
pairs of the form pair(value in hash table, count), and add these
hashes together.

I am convinced (and could produce a proof if necessary), that if your
base hash function is good (for an appropriate definitions of good),
then just adding the hashes of these pairs will produce a good quality
hash.

Of course for hash sets, every value occurs at most once, so this is
just the same as adding up the hashes . The only problem with just
adding the hashes of everything in an unordered_multiset is that as
values occur increasingly often, there is some lowing of quality of
the lower order bits (as an extreme, if there was 2^30 occurrences of
some value in a hash table, and you were using a 32-bit hash, then any
number*(2^30) takes one of 4 values).

The only problem with all this is if you have a hash which (for
example) hashes the integers to themselves, then you can get very poor
behaviour. One could either fix that in the hash of unordered_* with a
mixing function, or just tell users that rubbish hash in -> rubbish
hash out.

Chris

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: r4ndmuser@gmail.com
Date: Tue, 29 Apr 2014 06:44:04 -0700 (PDT)
Raw View
------=_Part_49_22249747.1398779044329
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

> Should vector expose its size() to the Hasher?
In general, yes, but e.g. for vector<int> there's no need to send the size.=
=20
 If we hash a vector<T> the hasher knows the number of bytes hashed and the=
=20
bytes itself, which means that vector<T> could reconstruct his own size=20
from that. For the outermost type, the size is implicitly known. So I=20
propose the following addition:

// hash_append_first is guaranteed to be the first hash_append called.
template <class Hasher, class T, class Alloc>=20
void
hash_append_first(Hasher& h, std::vector<T, Alloc> const& v) noexcept
{
    for (auto const& t : v)
        hash_append(h, t);
    // no need anymore for: hash_append(h, v.size());
}
uhash::operator()(T const& t) const noexcept
{
    Hasher h;
    using std::hash_append_first;
    hash_append_first(h, t); // falls back to hash_append if no overload=20
found.
    return static_cast<result_type>(h);
}

Both rules are satisfied, the proof that no collisions happen is analogous.

This may seem as a micro-optimization, but from that we gain:
1. s.find(=E2=80=9Csome text=E2=80=9D); works again, because std::string do=
es not append=20
the size anymore (unless he's embedded somewhere, e.g. in=20
pair<string,string>).
2. unordered_set<void*> can have a faster hash algorithm. There are faster=
=20
hash functions for fixed-size elements of small size (e.g.=20
<=3Dsizeof(size_t)). Depending on the data, the best hash function may be t=
he=20
identity. This sort of hashing is safe (there can't be any collisions) if=
=20
the size of each element is always the same. Note we cannot *mix* different=
=20
algorithms at runtime for (number of bytes hashed)<=3Dsizeof(size_t) and=20
>=3Dsizeof(size_t), since there may be collisions. But we can change to a=
=20
different algorithm for (number of bytes hashed)=3Dconst. Thus we can do

template <class Hasher>
void hash_append_first(Hasher& h, size_t x) noexcept
{
    hash_only(h, x); // we guarantee that this is the only hash append
}


and have the option to implement an optimized hash_only for each hasher (ha=
sh_only may fall back to hash_append).

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

------=_Part_49_22249747.1398779044329
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">&gt;&nbsp;<span style=3D"color: rgb(0, 0, 0); font-family:=
 serif; font-size: medium; text-align: justify;">Should&nbsp;</span><code s=
tyle=3D"color: rgb(0, 0, 0); text-align: justify;">vector</code><span style=
=3D"color: rgb(0, 0, 0); font-family: serif; font-size: medium; text-align:=
 justify;">&nbsp;expose its&nbsp;</span><code style=3D"color: rgb(0, 0, 0);=
 text-align: justify;">size()</code><span style=3D"color: rgb(0, 0, 0); fon=
t-family: serif; font-size: medium; text-align: justify;">&nbsp;to the&nbsp=
;</span><code style=3D"color: rgb(0, 0, 0); text-align: justify;">Hasher</c=
ode><span style=3D"color: rgb(0, 0, 0); font-family: serif; font-size: medi=
um; text-align: justify;">?</span><div>In general, yes, but e.g. for vector=
&lt;int&gt; there's no need to send the size. &nbsp;If we hash a vector&lt;=
T&gt; the hasher knows the number of bytes hashed and the bytes itself, whi=
ch means that vector&lt;T&gt; could reconstruct his own size from that. For=
 the outermost type, the size is implicitly known. So I propose the followi=
ng addition:</div><div><br></div><div><div><font face=3D"courier new, monos=
pace">// hash_append_first is guaranteed to be the first hash_append called=
..</font></div><div><font face=3D"courier new, monospace">template &lt;class=
 Hasher, class T, class Alloc&gt;&nbsp;</font></div><div><font face=3D"cour=
ier new, monospace">void</font></div><div><font face=3D"courier new, monosp=
ace">hash_append_first(Hasher&amp; h, std::vector&lt;T, Alloc&gt; const&amp=
; v) noexcept</font></div><div><font face=3D"courier new, monospace">{</fon=
t></div><div><font face=3D"courier new, monospace">&nbsp; &nbsp; for (auto =
const&amp; t : v)</font></div><div><font face=3D"courier new, monospace">&n=
bsp; &nbsp; &nbsp; &nbsp; hash_append(h, t);</font></div><div><font face=3D=
"courier new, monospace">&nbsp; &nbsp; // no need anymore for: hash_append(=
h, v.size());</font></div><div><font face=3D"courier new, monospace">}</fon=
t></div><div><font face=3D"courier new, monospace">uhash::operator()(T cons=
t&amp; t) const noexcept</font></div><div><font face=3D"courier new, monosp=
ace">{</font></div><div><font face=3D"courier new, monospace">&nbsp; &nbsp;=
 Hasher h;</font></div><div><font face=3D"courier new, monospace">&nbsp; &n=
bsp; using std::hash_append_first;</font></div><div><font face=3D"courier n=
ew, monospace">&nbsp; &nbsp; hash_append_first(h, t); // falls back to hash=
_append if no overload found.</font></div><div><font face=3D"courier new, m=
onospace">&nbsp; &nbsp; return static_cast&lt;result_type&gt;(h);</font></d=
iv><div><font face=3D"courier new, monospace">}</font></div></div><div><br>=
</div><div><div>Both rules are satisfied, the proof that no collisions happ=
en is analogous.</div></div><div><br></div><div>This may seem as a micro-op=
timization, but from that we gain:</div><div>1. s.find(=E2=80=9Csome text=
=E2=80=9D); works again, because std::string does not append the size anymo=
re (unless he's embedded somewhere, e.g. in pair&lt;string,string&gt;).</di=
v><div>2. unordered_set&lt;void*&gt; can have a faster hash algorithm. Ther=
e are faster hash functions for fixed-size elements of small size (e.g. &lt=
;=3Dsizeof(size_t)). Depending on the data, the best hash function may be t=
he identity. This sort of hashing is safe (there can't be any collisions) i=
f the size of each element is always the same. Note we cannot *mix* differe=
nt algorithms at runtime for (number of bytes hashed)&lt;=3Dsizeof(size_t) =
and &gt;=3Dsizeof(size_t), since there may be collisions. But we can change=
 to a different algorithm for (number of bytes hashed)=3Dconst. Thus we can=
 do</div><div><br></div><div><pre style=3D"color: rgb(0, 0, 0);">template &=
lt;class Hasher&gt;
void hash_append_first(Hasher&amp; h, size_t x) noexcept
{
    hash_only(h, x); // we guarantee that this is the only hash append
}</pre><pre style=3D"color: rgb(0, 0, 0);"><br></pre><pre style=3D"color: r=
gb(0, 0, 0);"><span style=3D"color: rgb(34, 34, 34); font-family: Arial, He=
lvetica, sans-serif;">and have the option to implement an optimized hash_on=
ly for each hasher (hash_only may fall back to hash_append).</span></pre></=
div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

------=_Part_49_22249747.1398779044329--

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Sat, 3 May 2014 15:28:52 -0400
Raw View
On Apr 29, 2014, at 9:44 AM, r4ndmuser@gmail.com wrote:

> > Should vector expose its size() to the Hasher?
> In general, yes, but e.g. for vector<int> there's no need to send the siz=
e.  If we hash a vector<T> the hasher knows the number of bytes hashed and =
the bytes itself, which means that vector<T> could reconstruct his own size=
 from that. For the outermost type, the size is implicitly known. So I prop=
ose the following addition:
>=20
> // hash_append_first is guaranteed to be the first hash_append called.
> template <class Hasher, class T, class Alloc>=20
> void
> hash_append_first(Hasher& h, std::vector<T, Alloc> const& v) noexcept
> {
>     for (auto const& t : v)
>         hash_append(h, t);
>     // no need anymore for: hash_append(h, v.size());
> }

What if hash_append(h, t); sends no update to h?  The client might write su=
ch a type.  tuple<> is such a type.  In this case vector<tuple<>>(0) hashes=
 to the same thing as vector<tuple<>>(1).

> uhash::operator()(T const& t) const noexcept
> {
>     Hasher h;
>     using std::hash_append_first;
>     hash_append_first(h, t); // falls back to hash_append if no overload =
found.
>     return static_cast<result_type>(h);
> }
>=20
> Both rules are satisfied, the proof that no collisions happen is analogou=
s.
>=20
> This may seem as a micro-optimization, but from that we gain:
> 1. s.find("some text"); works again, because std::string does not append =
the size anymore (unless he's embedded somewhere, e.g. in pair<string,strin=
g>).
> 2. unordered_set<void*> can have a faster hash algorithm. There are faste=
r hash functions for fixed-size elements of small size (e.g. <=3Dsizeof(siz=
e_t)). Depending on the data, the best hash function may be the identity. T=
his sort of hashing is safe (there can't be any collisions) if the size of =
each element is always the same. Note we cannot *mix* different algorithms =
at runtime for (number of bytes hashed)<=3Dsizeof(size_t) and >=3Dsizeof(si=
ze_t), since there may be collisions. But we can change to a different algo=
rithm for (number of bytes hashed)=3Dconst. Thus we can do
>=20
> template <class Hasher>
> void hash_append_first(Hasher& h, size_t x) noexcept
> {
>     hash_only(h, x); // we guarantee that this is the only hash append
> }
>=20
>=20
> and have the option to implement an optimized hash_only for each hasher (=
hash_only may fall back to hash_append).

But the cost is that *every* type now has to implement both hash_append and=
 hash_append_first.  This complicates things by O(N).  The odds of this pro=
posal being excepted are already extremely low.  O(N) costs must bring huge=
 benefits.

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Sat, 3 May 2014 15:28:53 -0400
Raw View
Now with proposed wording:

http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html

Howard

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: John Bytheway <jbytheway@gmail.com>
Date: Sat, 03 May 2014 17:19:12 -0400
Raw View
On 2014-05-03 15:28, Howard Hinnant wrote:
> Now with proposed wording:
>
> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html

Comments on the proposed wording:

* HashAlgorithm

You mention "The finalize operation" without ever defining it.

You did not specify that two consecutive calls to operator() are
equivalent to one call with the two sequences of bytes concatenated.  I
assume this was deliberate (given the note about calls with len==0).
However, later in the note about hash_append for arrays, you imply that
this is the case.  It should perhaps be clarified one way or the other.

* hash_append

I don't like "except that the function will modify the state of t prior
to forwarding it to h in such a way that the contiguously hashable
condition is preserved".  At the same time, I have tried and I realize
it's tough to write out the condition you really mean; something like:

Effects: h(p_t, len_t) where p_t points to len_t valid bytes with values
derived from t.  If t==u then len_t==len_u and memcmp(p_t, p_u, len_t) == 0.

(I'm not sure whether 'valid' is the right adjective there)

John Bytheway

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Sat, 3 May 2014 17:59:03 -0400
Raw View
Thanks John.  Updated

http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html

On May 3, 2014, at 5:19 PM, John Bytheway <jbytheway@gmail.com> wrote:

> On 2014-05-03 15:28, Howard Hinnant wrote:
>> Now with proposed wording:
>>
>> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html
>
> Comments on the proposed wording:
>
> * HashAlgorithm
>
> You mention "The finalize operation" without ever defining it.

Thanks, I've instead referred to the "conversion operation".

>
> You did not specify that two consecutive calls to operator() are
> equivalent to one call with the two sequences of bytes concatenated.  I
> assume this was deliberate (given the note about calls with len==0).
> However, later in the note about hash_append for arrays, you imply that
> this is the case.  It should perhaps be clarified one way or the other.

I've attempted to clarify that concatenated calls are equivalent.

>
> * hash_append
>
> I don't like "except that the function will modify the state of t prior
> to forwarding it to h in such a way that the contiguously hashable
> condition is preserved".  At the same time, I have tried and I realize
> it's tough to write out the condition you really mean; something like:
>
> Effects: h(p_t, len_t) where p_t points to len_t valid bytes with values
> derived from t.  If t==u then len_t==len_u and memcmp(p_t, p_u, len_t) == 0.
>
> (I'm not sure whether 'valid' is the right adjective there)

<nod> This is a rough paragraph.  I've taken another shot at it.  I'm now taking advantage of some tricky standardization definitions:

shall - means in order to conform, one must do this.

should - means we would like for this to happen, but it doesn't have to.

The latter is done to accommodate hash_append(h, nan).  Given a floating point x with a value of nan: x != x, but hash_append(h, x) == hash_append(h, x) (I'm using pseudo-code here, hash_append returns void).

Howard

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: John Bytheway <jbytheway@gmail.com>
Date: Sat, 03 May 2014 19:36:26 -0400
Raw View
On 2014-05-03 17:59, Howard Hinnant wrote:
> On May 3, 2014, at 5:19 PM, John Bytheway <jbytheway@gmail.com>
> wrote:
>
>> On 2014-05-03 15:28, Howard Hinnant wrote:
>>> Now with proposed wording:
>>>
>>> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html
>>
>>> Comments on the proposed wording:
>>
>> * HashAlgorithm
>>
>> You did not specify that two consecutive calls to operator() are
>> equivalent to one call with the two sequences of bytes
>> concatenated.  I assume this was deliberate (given the note about
>> calls with len==0). However, later in the note about hash_append
>> for arrays, you imply that this is the case.  It should perhaps be
>> clarified one way or the other.
>
> I've attempted to clarify that concatenated calls are equivalent.

Why are concatenated calls not necessarily equivalent when at least one
has zero length?  This seems an odd deviation from expectation, and I
don't see any justification for it in the rest of the paper.  It seems
likely to cause problems because it *will* be true for most
HashAlgorithms, and will come to be relied upon in code which will then
break when someone supplies a HashAlgorithm for which it is not true.

Also, having made this change, the comment about optimizing hash_append
for std::basic_string can be demoted to a Note.

>> * hash_append
>>
>> I don't like "except that the function will modify the state of t
>> prior to forwarding it to h in such a way that the contiguously
>> hashable condition is preserved".  At the same time, I have tried
>> and I realize it's tough to write out the condition you really
>> mean; something like:
>>
>> Effects: h(p_t, len_t) where p_t points to len_t valid bytes with
>> values derived from t.  If t==u then len_t==len_u and memcmp(p_t,
>> p_u, len_t) == 0.
>>
>> (I'm not sure whether 'valid' is the right adjective there)
>
> <nod> This is a rough paragraph.  I've taken another shot at it.  I'm
> now taking advantage of some tricky standardization definitions:
>
> shall - means in order to conform, one must do this.
>
> should - means we would like for this to happen, but it doesn't have
> to.
>
> The latter is done to accommodate hash_append(h, nan).  Given a
> floating point x with a value of nan: x != x, but hash_append(h, x)
> == hash_append(h, x) (I'm using pseudo-code here, hash_append returns
> void).

You are specifying in terms of the state to which h is updated, rather
than the call (or 'message') used to update that state.  This seems a
little dangerous, although I guess the intent is clear.  In particular,
if e.g. the state of h is 32 bits then there will be many pairs of
unequal doubles which update h to the same state.

Relatedly, I wonder whether a similar use of 'should' is warranted in
describing the effects of HashAlgorithm's operator().  Something like:

"... If h1 and h2 are given two non-equivalent keys then they should
have different updated state."

But perhaps this is redundant and unhelpful because it obviously can't
be achieved in general for any reasonable HashAlgorithm with bounded
state.  And the properties we are really striving for in the state
update are rather more subtle.

John

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Sat, 3 May 2014 20:41:36 -0400
Raw View
On May 3, 2014, at 7:36 PM, John Bytheway <jbytheway@gmail.com> wrote:

> On 2014-05-03 17:59, Howard Hinnant wrote:
>> On May 3, 2014, at 5:19 PM, John Bytheway <jbytheway@gmail.com>
>> wrote:
>>=20
>>> On 2014-05-03 15:28, Howard Hinnant wrote:
>>>> Now with proposed wording:
>>>>=20
>>>> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/=
blob/master/hashing.html
>>>=20
>>>> Comments on the proposed wording:
>>>=20
>>> * HashAlgorithm
>>>=20
>>> You did not specify that two consecutive calls to operator() are=20
>>> equivalent to one call with the two sequences of bytes
>>> concatenated.  I assume this was deliberate (given the note about
>>> calls with len=3D=3D0). However, later in the note about hash_append
>>> for arrays, you imply that this is the case.  It should perhaps be
>>> clarified one way or the other.
>>=20
>> I've attempted to clarify that concatenated calls are equivalent.
>=20
> Why are concatenated calls not necessarily equivalent when at least one
> has zero length?  This seems an odd deviation from expectation, and I
> don't see any justification for it in the rest of the paper.  It seems
> likely to cause problems because it *will* be true for most
> HashAlgorithms, and will come to be relied upon in code which will then
> break when someone supplies a HashAlgorithm for which it is not true.

Ah!  That is a very important misunderstanding.  Thank you for bringing it =
to light!

I am attempting to say that given HashAlgorithm1 and HashAlgorithm2, it is =
unspecified whether or not a 0 length key updates the state.  However if tw=
o instances of the *same* HashAlgorithm type see a 0 length key, they shoul=
d treat it the same way.  I had not realized this ambiguity in my English, =
and will work on it.


> Also, having made this change, the comment about optimizing hash_append
> for std::basic_string can be demoted to a Note.

Good point, thanks.

>=20
>>> * hash_append
>>>=20
>>> I don't like "except that the function will modify the state of t
>>> prior to forwarding it to h in such a way that the contiguously
>>> hashable condition is preserved".  At the same time, I have tried
>>> and I realize it's tough to write out the condition you really
>>> mean; something like:
>>>=20
>>> Effects: h(p_t, len_t) where p_t points to len_t valid bytes with
>>> values derived from t.  If t=3D=3Du then len_t=3D=3Dlen_u and memcmp(p_=
t,
>>> p_u, len_t) =3D=3D 0.
>>>=20
>>> (I'm not sure whether 'valid' is the right adjective there)
>>=20
>> <nod> This is a rough paragraph.  I've taken another shot at it.  I'm
>> now taking advantage of some tricky standardization definitions:
>>=20
>> shall - means in order to conform, one must do this.
>>=20
>> should - means we would like for this to happen, but it doesn't have
>> to.
>>=20
>> The latter is done to accommodate hash_append(h, nan).  Given a
>> floating point x with a value of nan: x !=3D x, but hash_append(h, x)
>> =3D=3D hash_append(h, x) (I'm using pseudo-code here, hash_append return=
s
>> void).
>=20
> You are specifying in terms of the state to which h is updated, rather
> than the call (or 'message') used to update that state.  This seems a
> little dangerous, although I guess the intent is clear.  In particular,
> if e.g. the state of h is 32 bits then there will be many pairs of
> unequal doubles which update h to the same state.

Right.  This is another good reason for the "should" in:

> And if t1 !=3D t2, then hash_append(h, t1) should update the state of h t=
o a different state as does hash_append(h, t2).=20


> Relatedly, I wonder whether a similar use of 'should' is warranted in
> describing the effects of HashAlgorithm's operator().  Something like:
>=20
> "... If h1 and h2 are given two non-equivalent keys then they should
> have different updated state."
>=20
> But perhaps this is redundant and unhelpful because it obviously can't
> be achieved in general for any reasonable HashAlgorithm with bounded
> state.  And the properties we are really striving for in the state
> update are rather more subtle.

This is not a bad suggestion at all.  I will try adding it.  We can always =
take it back out if it doesn't seem to work.

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: John Bytheway <jbytheway@gmail.com>
Date: Sun, 04 May 2014 17:24:59 -0400
Raw View
On 2014-04-02 19:09, Howard Hinnant wrote:
> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blo=
b/master/hashing.html
>=20
> We're aiming at the pre-Rapperswil mailing.  Any feedback, positive or ne=
gative, will be much appreciated.  The paper currently lacks proposed wordi=
ng.  It is more of a technology demonstration.  There's a co-author spot op=
en for anyone who wants to turn the demonstration into concrete proposed wo=
rding. :-)
>=20
> Howard

It occurs to me that the type trait is_contiguously_hashable proposed in
this paper is really declaring that, for a particular type, value
equality is equivalent to representation equality.  By 'representation
equality' I mean equality according to memcmp.  This has potential to
optimize things other than hashing.  For example, it could be used to
optimize any of the std:: algorithms which need to test for equality.
This includes find, find_end, find_first, adjacent_find, count,
mismatch, equal, search, replace, remove, and unique.

Does this seem like a trait we should be offering for these purposes?

If so, it would be nice to give the trait a more generic name which
reflects its wider utility.  Unfortunately, I can't think of a good
name.  Another bikeshedding opportunity...  The best I can think of is
along the lines of respects_representation_equality.

John Bytheway

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: John Bytheway <jbytheway@gmail.com>
Date: Sun, 04 May 2014 17:58:56 -0400
Raw View
On 2014-05-03 20:41, Howard Hinnant wrote:
>
> On May 3, 2014, at 7:36 PM, John Bytheway <jbytheway@gmail.com>
> wrote:

<snip>

>> Why are concatenated calls not necessarily equivalent when at least
>> one has zero length?  This seems an odd deviation from expectation,
>> and I don't see any justification for it in the rest of the paper.
>> It seems likely to cause problems because it *will* be true for
>> most HashAlgorithms, and will come to be relied upon in code which
>> will then break when someone supplies a HashAlgorithm for which it
>> is not true.
>
> Ah!  That is a very important misunderstanding.  Thank you for
> bringing it to light!
>
> I am attempting to say that given HashAlgorithm1 and HashAlgorithm2,
> it is unspecified whether or not a 0 length key updates the state.
> However if two instances of the *same* HashAlgorithm type see a 0
> length key, they should treat it the same way.  I had not realized
> this ambiguity in my English, and will work on it.

Based on reading this message, it didn't seem that you had understood my
concern.  However, I see the latest version no longer has a special case
for length zero keys, so I am happy :).

John

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Sun, 4 May 2014 18:00:18 -0400
Raw View
On May 4, 2014, at 5:58 PM, John Bytheway <jbytheway@gmail.com> wrote:

> On 2014-05-03 20:41, Howard Hinnant wrote:
>>
>> On May 3, 2014, at 7:36 PM, John Bytheway <jbytheway@gmail.com>
>> wrote:
>
> <snip>
>
>>> Why are concatenated calls not necessarily equivalent when at least
>>> one has zero length?  This seems an odd deviation from expectation,
>>> and I don't see any justification for it in the rest of the paper.
>>> It seems likely to cause problems because it *will* be true for
>>> most HashAlgorithms, and will come to be relied upon in code which
>>> will then break when someone supplies a HashAlgorithm for which it
>>> is not true.
>>
>> Ah!  That is a very important misunderstanding.  Thank you for
>> bringing it to light!
>>
>> I am attempting to say that given HashAlgorithm1 and HashAlgorithm2,
>> it is unspecified whether or not a 0 length key updates the state.
>> However if two instances of the *same* HashAlgorithm type see a 0
>> length key, they should treat it the same way.  I had not realized
>> this ambiguity in my English, and will work on it.
>
> Based on reading this message, it didn't seem that you had understood my
> concern.  However, I see the latest version no longer has a special case
> for length zero keys, so I am happy :).

Agreed.  Your argument didn't sink through my thick skull until it had slept on it. :-)

Howard

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Sun, 4 May 2014 18:00:33 -0400
Raw View
On May 4, 2014, at 5:24 PM, John Bytheway <jbytheway@gmail.com> wrote:

> It occurs to me that the type trait is_contiguously_hashable proposed in
> this paper is really declaring that, for a particular type, value
> equality is equivalent to representation equality.  By 'representation
> equality' I mean equality according to memcmp.  This has potential to
> optimize things other than hashing.  For example, it could be used to
> optimize any of the std:: algorithms which need to test for equality.
> This includes find, find_end, find_first, adjacent_find, count,
> mismatch, equal, search, replace, remove, and unique.
>=20
> Does this seem like a trait we should be offering for these purposes?
>=20
> If so, it would be nice to give the trait a more generic name which
> reflects its wider utility.  Unfortunately, I can't think of a good
> name.  Another bikeshedding opportunity...  The best I can think of is
> along the lines of respects_representation_equality.

If we could demonstrate such use, perhaps.  I've just looked at find, and I=
 don't really see an optimization opportunity there (though I could easily =
be missing it).  There is precedent for introducing a general bit of infras=
tructure to support a specific proposal.  ratio and common_type were standa=
rdized in this very way (introduced to support <chrono>).

At this time, I don't see a lot of opportunity (perhaps with std::equal, no=
t positive).

If the LWG shows interest in this proposal, that is certainly an aspect the=
y could look at (and bike shed).

N3333 called this trait is_contiguous_layout<T>, which I don't think is as =
good as is_contiguously_hashable<T>, because float has a contiguous layout,=
 and yet is still not contiguously hashable.  The closest thing there is to=
 a reference implementation for N3333 (llvm/include/llvm/ADT/Hashing.h) cal=
ls it is_hashable_data<T>.

<shrug> Good names are tough.

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Roman Perepelitsa <roman.perepelitsa@gmail.com>
Date: Tue, 6 May 2014 16:12:54 +0200
Raw View
--047d7b677d36485b1704f8bbd96c
Content-Type: text/plain; charset=UTF-8

2014-05-06 16:07 GMT+02:00 Howard Hinnant <howard.hinnant@gmail.com>:

> On May 5, 2014, at 12:22 PM, 'Geoffrey Romer' via ISO C++ Standard -
> Future Proposals <std-proposals@isocpp.org> wrote:
>
> > In _Elements of Programming_, Stepanov uses the term "uniquely
> represented" for this property. How do you feel about
> is_uniquely_represented<T>?
> >
>
> I've gone back and forth on this good suggestion, but I think I'm settling
> on not liking it.  Rationale:
>
> A quote from Stepanov's excellent book:
>
> > For example, a type representing a truth value as a byte that interprets
> > zero as false and nonzero as true is not uniquely represented.
>
> And I respect that definition, and agree with it.
>
> On the other hand, on my platform, bool is implemented by the compiler as
> a single byte constrained to hold one of the two bit patterns:
>
> 0x00
> 0x01
>
> Without playing with undefined behavior / aliasing, one simply can not
> give a bool any other bit pattern.  E.g:
>
> bool b = 128;
>
> compiles and sets b to 0x01.
>
> So while not guaranteed by the standard, on my platform, bool is
> contiguously hashable.  And that is a sufficiently important optimization
> that I would want my std::lib to make
> std::is_contiguously_hashable<bool>::value == true on this platform.  On
> another platform that implemented bool as something that (for example)
> could take on any bit pattern of a byte, and any of the non-zero bit
> patterns represented true, std::is_contiguously_hashable<bool>::value
> should be false.  And admittedly, this *is* precisely what Alexander says
> in his book.
>
> However, in order to avoid the situation of people reading EOP and
> concluding that is_uniquely_represented<bool>::value shall *always* be
> false, I would rather not use that name, in favor of the more pragmatic
> concerns implied by is_contiguously_hashable<bool>.  Otherwise a FAQ will
> be:
>
> EOP says bool is not uniquely represented.  But I'm seeing
> std::is_uniquely_represented<bool>::value == true.  Is that a bug?
>

EOP doesn't say that std::is_uniquely_represented<bool>::value is false. It
says that std::is_uniquely_represented<MyBoolean>::value must be false if
MyBoolean is a byte with 0 meaning false and 1 meaning true.

is_uniquely_represented looks like a perfect fit to me.

Roman Perepelitsa.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

--047d7b677d36485b1704f8bbd96c
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">2014=
-05-06 16:07 GMT+02:00 Howard Hinnant <span dir=3D"ltr">&lt;<a href=3D"mail=
to:howard.hinnant@gmail.com" target=3D"_blank">howard.hinnant@gmail.com</a>=
&gt;</span>:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex"><div class=3D"">On May 5, 2014, at 12:22 PM, &#39;Geoffrey=
 Romer&#39; via ISO C++ Standard - Future Proposals &lt;<a href=3D"mailto:s=
td-proposals@isocpp.org">std-proposals@isocpp.org</a>&gt; wrote:<br>


<br>
&gt; In _Elements of Programming_, Stepanov uses the term &quot;uniquely re=
presented&quot; for this property. How do you feel about is_uniquely_repres=
ented&lt;T&gt;?<br>
&gt;<br>
<br>
</div>I&#39;ve gone back and forth on this good suggestion, but I think I&#=
39;m settling on not liking it. =C2=A0Rationale:<br>
<br>
A quote from Stepanov&#39;s excellent book:<br>
<br>
&gt; For example, a type representing a truth value as a byte that interpre=
ts<br>
&gt; zero as false and nonzero as true is not uniquely represented.<br>
<br>
And I respect that definition, and agree with it.<br>
<br>
On the other hand, on my platform, bool is implemented by the compiler as a=
 single byte constrained to hold one of the two bit patterns:<br>
<br>
0x00<br>
0x01<br>
<br>
Without playing with undefined behavior / aliasing, one simply can not give=
 a bool any other bit pattern. =C2=A0E.g:<br>
<br>
bool b =3D 128;<br>
<br>
compiles and sets b to 0x01.<br>
<br>
So while not guaranteed by the standard, on my platform, bool is contiguous=
ly hashable. =C2=A0And that is a sufficiently important optimization that I=
 would want my std::lib to make std::is_contiguously_hashable&lt;bool&gt;::=
value =3D=3D true on this platform. =C2=A0On another platform that implemen=
ted bool as something that (for example) could take on any bit pattern of a=
 byte, and any of the non-zero bit patterns represented true, std::is_conti=
guously_hashable&lt;bool&gt;::value should be false. =C2=A0And admittedly, =
this *is* precisely what Alexander says in his book.<br>


<br>
However, in order to avoid the situation of people reading EOP and concludi=
ng that is_uniquely_represented&lt;bool&gt;::value shall *always* be false,=
 I would rather not use that name, in favor of the more pragmatic concerns =
implied by is_contiguously_hashable&lt;bool&gt;. =C2=A0Otherwise a FAQ will=
 be:<br>


<br>
EOP says bool is not uniquely represented. =C2=A0But I&#39;m seeing std::is=
_uniquely_represented&lt;bool&gt;::value =3D=3D true. =C2=A0Is that a bug?<=
br></blockquote><div><br></div><div>EOP doesn&#39;t say that=C2=A0std::is_u=
niquely_represented&lt;bool&gt;::value is false. It says that=C2=A0std::is_=
uniquely_represented&lt;MyBoolean&gt;::value must be false if MyBoolean is =
a byte with 0 meaning false and 1 meaning true.</div>

<div><br></div><div>is_uniquely_represented looks like a perfect fit to me.=
</div><div><br></div><div>Roman Perepelitsa.</div></div></div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

--047d7b677d36485b1704f8bbd96c--

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Tue, 6 May 2014 10:07:11 -0400
Raw View
On May 5, 2014, at 12:22 PM, 'Geoffrey Romer' via ISO C++ Standard - Future=
 Proposals <std-proposals@isocpp.org> wrote:

> In _Elements of Programming_, Stepanov uses the term "uniquely represente=
d" for this property. How do you feel about is_uniquely_represented<T>?
>=20

I've gone back and forth on this good suggestion, but I think I'm settling =
on not liking it.  Rationale:

A quote from Stepanov's excellent book:

> For example, a type representing a truth value as a byte that interprets
> zero as false and nonzero as true is not uniquely represented.

And I respect that definition, and agree with it.

On the other hand, on my platform, bool is implemented by the compiler as a=
 single byte constrained to hold one of the two bit patterns:

0x00
0x01

Without playing with undefined behavior / aliasing, one simply can not give=
 a bool any other bit pattern.  E.g:

bool b =3D 128;

compiles and sets b to 0x01.

So while not guaranteed by the standard, on my platform, bool is contiguous=
ly hashable.  And that is a sufficiently important optimization that I woul=
d want my std::lib to make std::is_contiguously_hashable<bool>::value =3D=
=3D true on this platform.  On another platform that implemented bool as so=
mething that (for example) could take on any bit pattern of a byte, and any=
 of the non-zero bit patterns represented true, std::is_contiguously_hashab=
le<bool>::value should be false.  And admittedly, this *is* precisely what =
Alexander says in his book.

However, in order to avoid the situation of people reading EOP and concludi=
ng that is_uniquely_represented<bool>::value shall *always* be false, I wou=
ld rather not use that name, in favor of the more pragmatic concerns implie=
d by is_contiguously_hashable<bool>.  Otherwise a FAQ will be:

EOP says bool is not uniquely represented.  But I'm seeing std::is_uniquely=
_represented<bool>::value =3D=3D true.  Is that a bug?

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: "'Geoffrey Romer' via ISO C++ Standard - Future Proposals" <std-proposals@isocpp.org>
Date: Tue, 6 May 2014 10:29:38 -0700
Raw View
--001a113ab056aa285504f8be977a
Content-Type: text/plain; charset=UTF-8

On Tue, May 6, 2014 at 7:07 AM, Howard Hinnant <howard.hinnant@gmail.com>wrote:

> On May 5, 2014, at 12:22 PM, 'Geoffrey Romer' via ISO C++ Standard -
> Future Proposals <std-proposals@isocpp.org> wrote:
>
> > In _Elements of Programming_, Stepanov uses the term "uniquely
> represented" for this property. How do you feel about
> is_uniquely_represented<T>?
> >
>
> I've gone back and forth on this good suggestion, but I think I'm settling
> on not liking it.  Rationale:
>
> A quote from Stepanov's excellent book:
>
> > For example, a type representing a truth value as a byte that interprets
> > zero as false and nonzero as true is not uniquely represented.
>
> And I respect that definition, and agree with it.
>
> On the other hand, on my platform, bool is implemented by the compiler as
> a single byte constrained to hold one of the two bit patterns:
>
> 0x00
> 0x01
>
> Without playing with undefined behavior / aliasing, one simply can not
> give a bool any other bit pattern.  E.g:
>
> bool b = 128;
>
> compiles and sets b to 0x01.
>
> So while not guaranteed by the standard, on my platform, bool is
> contiguously hashable.  And that is a sufficiently important optimization
> that I would want my std::lib to make
> std::is_contiguously_hashable<bool>::value == true on this platform.  On
> another platform that implemented bool as something that (for example)
> could take on any bit pattern of a byte, and any of the non-zero bit
> patterns represented true, std::is_contiguously_hashable<bool>::value
> should be false.  And admittedly, this *is* precisely what Alexander says
> in his book.
>
> However, in order to avoid the situation of people reading EOP and
> concluding that is_uniquely_represented<bool>::value shall *always* be
> false, I would rather not use that name, in favor of the more pragmatic
> concerns implied by is_contiguously_hashable<bool>.  Otherwise a FAQ will
> be:
>
> EOP says bool is not uniquely represented.  But I'm seeing
> std::is_uniquely_represented<bool>::value == true.  Is that a bug?
>

This sounds highly speculative to me, and it proves far too much; if we're
willing to reject a perfectly good name on the grounds that some people
might hypothetically misunderstand a particular book's discussion of that
term, and apply that misunderstanding to C++'s use of that term, then no
possible name for anything is safe.


>
> Howard
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> Visit this group at
> http://groups.google.com/a/isocpp.org/group/std-proposals/.
>

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

--001a113ab056aa285504f8be977a
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">=
On Tue, May 6, 2014 at 7:07 AM, Howard Hinnant <span dir=3D"ltr">&lt;<a hre=
f=3D"mailto:howard.hinnant@gmail.com" target=3D"_blank">howard.hinnant@gmai=
l.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div class=3D"">On May 5, 2014, at 12:22 PM,=
 &#39;Geoffrey Romer&#39; via ISO C++ Standard - Future Proposals &lt;<a hr=
ef=3D"mailto:std-proposals@isocpp.org">std-proposals@isocpp.org</a>&gt; wro=
te:<br>

<br>
&gt; In _Elements of Programming_, Stepanov uses the term &quot;uniquely re=
presented&quot; for this property. How do you feel about is_uniquely_repres=
ented&lt;T&gt;?<br>
&gt;<br>
<br>
</div>I&#39;ve gone back and forth on this good suggestion, but I think I&#=
39;m settling on not liking it. =C2=A0Rationale:<br>
<br>
A quote from Stepanov&#39;s excellent book:<br>
<br>
&gt; For example, a type representing a truth value as a byte that interpre=
ts<br>
&gt; zero as false and nonzero as true is not uniquely represented.<br>
<br>
And I respect that definition, and agree with it.<br>
<br>
On the other hand, on my platform, bool is implemented by the compiler as a=
 single byte constrained to hold one of the two bit patterns:<br>
<br>
0x00<br>
0x01<br>
<br>
Without playing with undefined behavior / aliasing, one simply can not give=
 a bool any other bit pattern. =C2=A0E.g:<br>
<br>
bool b =3D 128;<br>
<br>
compiles and sets b to 0x01.<br>
<br>
So while not guaranteed by the standard, on my platform, bool is contiguous=
ly hashable. =C2=A0And that is a sufficiently important optimization that I=
 would want my std::lib to make std::is_contiguously_hashable&lt;bool&gt;::=
value =3D=3D true on this platform. =C2=A0On another platform that implemen=
ted bool as something that (for example) could take on any bit pattern of a=
 byte, and any of the non-zero bit patterns represented true, std::is_conti=
guously_hashable&lt;bool&gt;::value should be false. =C2=A0And admittedly, =
this *is* precisely what Alexander says in his book.<br>

<br>
However, in order to avoid the situation of people reading EOP and concludi=
ng that is_uniquely_represented&lt;bool&gt;::value shall *always* be false,=
 I would rather not use that name, in favor of the more pragmatic concerns =
implied by is_contiguously_hashable&lt;bool&gt;. =C2=A0Otherwise a FAQ will=
 be:<br>

<br>
EOP says bool is not uniquely represented. =C2=A0But I&#39;m seeing std::is=
_uniquely_represented&lt;bool&gt;::value =3D=3D true. =C2=A0Is that a bug?<=
br></blockquote><div><br></div><div>This sounds highly speculative to me, a=
nd it proves far too much; if we&#39;re willing to reject a perfectly good =
name on the grounds that some people might hypothetically misunderstand a p=
articular book&#39;s discussion of that term, and apply that misunderstandi=
ng to C++&#39;s use of that term, then no possible name for anything is saf=
e.</div>
<div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8=
ex;border-left:1px #ccc solid;padding-left:1ex">
<div class=3D"HOEnZb"><div class=3D"h5"><br>
Howard<br>
<br>
--<br>
<br>
---<br>
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br>
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals%2Bunsubscribe@isocpp.org">std-propo=
sals+unsubscribe@isocpp.org</a>.<br>
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br>
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/" target=3D"_blank">http://groups.google.com/a/isocpp.org/gro=
up/std-proposals/</a>.<br>
</div></div></blockquote></div><br></div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

--001a113ab056aa285504f8be977a--

.


Author: "'Geoffrey Romer' via ISO C++ Standard - Future Proposals" <std-proposals@isocpp.org>
Date: Mon, 5 May 2014 09:22:25 -0700
Raw View
--047d7bdc84c46c75b204f8a9898f
Content-Type: text/plain; charset=UTF-8

On Sun, May 4, 2014 at 3:00 PM, Howard Hinnant <howard.hinnant@gmail.com>wrote:

> On May 4, 2014, at 5:24 PM, John Bytheway <jbytheway@gmail.com> wrote:
>
> > It occurs to me that the type trait is_contiguously_hashable proposed in
> > this paper is really declaring that, for a particular type, value
> > equality is equivalent to representation equality.  By 'representation
> > equality' I mean equality according to memcmp.  This has potential to
> > optimize things other than hashing.  For example, it could be used to
> > optimize any of the std:: algorithms which need to test for equality.
> > This includes find, find_end, find_first, adjacent_find, count,
> > mismatch, equal, search, replace, remove, and unique.
> >
> > Does this seem like a trait we should be offering for these purposes?
> >
> > If so, it would be nice to give the trait a more generic name which
> > reflects its wider utility.  Unfortunately, I can't think of a good
> > name.  Another bikeshedding opportunity...  The best I can think of is
> > along the lines of respects_representation_equality.
>
> If we could demonstrate such use, perhaps.  I've just looked at find, and
> I don't really see an optimization opportunity there (though I could easily
> be missing it).  There is precedent for introducing a general bit of
> infrastructure to support a specific proposal.  ratio and common_type were
> standardized in this very way (introduced to support <chrono>).
>
> At this time, I don't see a lot of opportunity (perhaps with std::equal,
> not positive).
>
> If the LWG shows interest in this proposal, that is certainly an aspect
> they could look at (and bike shed).
>
> N3333 called this trait is_contiguous_layout<T>, which I don't think is as
> good as is_contiguously_hashable<T>, because float has a contiguous layout,
> and yet is still not contiguously hashable.  The closest thing there is to
> a reference implementation for N3333 (llvm/include/llvm/ADT/Hashing.h)
> calls it is_hashable_data<T>.
>
> <shrug> Good names are tough.
>

In _Elements of Programming_, Stepanov uses the term "uniquely represented"
for this property. How do you feel about is_uniquely_represented<T>?


>
> Howard
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> Visit this group at
> http://groups.google.com/a/isocpp.org/group/std-proposals/.
>

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

--047d7bdc84c46c75b204f8a9898f
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">=
On Sun, May 4, 2014 at 3:00 PM, Howard Hinnant <span dir=3D"ltr">&lt;<a hre=
f=3D"mailto:howard.hinnant@gmail.com" target=3D"_blank">howard.hinnant@gmai=
l.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex"><div class=3D"">On May 4, 2014, at 5:24 PM, John Bytheway =
&lt;<a href=3D"mailto:jbytheway@gmail.com">jbytheway@gmail.com</a>&gt; wrot=
e:<br>

<br>
&gt; It occurs to me that the type trait is_contiguously_hashable proposed =
in<br>
&gt; this paper is really declaring that, for a particular type, value<br>
&gt; equality is equivalent to representation equality. =C2=A0By &#39;repre=
sentation<br>
&gt; equality&#39; I mean equality according to memcmp. =C2=A0This has pote=
ntial to<br>
&gt; optimize things other than hashing. =C2=A0For example, it could be use=
d to<br>
&gt; optimize any of the std:: algorithms which need to test for equality.<=
br>
&gt; This includes find, find_end, find_first, adjacent_find, count,<br>
&gt; mismatch, equal, search, replace, remove, and unique.<br>
&gt;<br>
&gt; Does this seem like a trait we should be offering for these purposes?<=
br>
&gt;<br>
&gt; If so, it would be nice to give the trait a more generic name which<br=
>
&gt; reflects its wider utility. =C2=A0Unfortunately, I can&#39;t think of =
a good<br>
&gt; name. =C2=A0Another bikeshedding opportunity... =C2=A0The best I can t=
hink of is<br>
&gt; along the lines of respects_representation_equality.<br>
<br>
</div>If we could demonstrate such use, perhaps. =C2=A0I&#39;ve just looked=
 at find, and I don&#39;t really see an optimization opportunity there (tho=
ugh I could easily be missing it). =C2=A0There is precedent for introducing=
 a general bit of infrastructure to support a specific proposal. =C2=A0rati=
o and common_type were standardized in this very way (introduced to support=
 &lt;chrono&gt;).<br>

<br>
At this time, I don&#39;t see a lot of opportunity (perhaps with std::equal=
, not positive).<br>
<br>
If the LWG shows interest in this proposal, that is certainly an aspect the=
y could look at (and bike shed).<br>
<br>
N3333 called this trait is_contiguous_layout&lt;T&gt;, which I don&#39;t th=
ink is as good as is_contiguously_hashable&lt;T&gt;, because float has a co=
ntiguous layout, and yet is still not contiguously hashable. =C2=A0The clos=
est thing there is to a reference implementation for N3333 (llvm/include/ll=
vm/ADT/Hashing.h) calls it is_hashable_data&lt;T&gt;.<br>

<br>
&lt;shrug&gt; Good names are tough.<br></blockquote><div><br></div><div>In =
_Elements of Programming_, Stepanov uses the term &quot;uniquely represente=
d&quot; for this property. How do you feel about is_uniquely_represented&lt=
;T&gt;?<br>
</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0p=
x 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);bo=
rder-left-style:solid;padding-left:1ex">
<span class=3D""><font color=3D"#888888"><br>
Howard<br>
</font></span><div class=3D""><div class=3D"h5"><br>
--<br>
<br>
---<br>
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br>
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals%2Bunsubscribe@isocpp.org">std-propo=
sals+unsubscribe@isocpp.org</a>.<br>
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br>
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/" target=3D"_blank">http://groups.google.com/a/isocpp.org/gro=
up/std-proposals/</a>.<br>
</div></div></blockquote></div><br></div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

--047d7bdc84c46c75b204f8a9898f--

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Tue, 6 May 2014 23:00:44 -0400
Raw View
On May 6, 2014, at 1:29 PM, 'Geoffrey Romer' via ISO C++ Standard - Future =
Proposals <std-proposals@isocpp.org> wrote:

> This sounds highly speculative to me, and it proves far too much; if we'r=
e willing to reject a perfectly good name on the grounds that some people m=
ight hypothetically misunderstand a particular book's discussion of that te=
rm, and apply that misunderstanding to C++'s use of that term, then no poss=
ible name for anything is safe.

Fair enough.  I'm inclined to include an Appendix of issues such as this.  =
Ultimately it will be the LWG, not the authors of this paper, that will dec=
ide details like this, assuming the proposal even gets that far.

Thanks,
Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Daniel James <dnljms@gmail.com>
Date: Mon, 5 May 2014 11:22:59 +0100
Raw View
On 3 May 2014 22:59, Howard Hinnant <howard.hinnant@gmail.com> wrote:
>
> http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html

The "plausible implementation" of a hash function for floating point
numbers won't always work for IEEE floats, as they can include
padding. For example, intel linux represents long double with an
80-bit float (i.e. 10 bytes), but sizeof(long double) is 12 for 32-bit
linux, and 16 for 64-bit.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: jgottman6@gmail.com
Date: Fri, 9 May 2014 15:43:32 -0700 (PDT)
Raw View
------=_Part_731_16087672.1399675412664
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

You should also define hash_append for the string_view and optional classes=
=20
in the Library Fundamentals paper.

On Tuesday, May 6, 2014 11:00:44 PM UTC-4, Howard Hinnant wrote:
>
> On May 6, 2014, at 1:29 PM, 'Geoffrey Romer' via ISO C++ Standard - Futur=
e=20
> Proposals <std-pr...@isocpp.org <javascript:>> wrote:=20
>
> > This sounds highly speculative to me, and it proves far too much; if=20
> we're willing to reject a perfectly good name on the grounds that some=20
> people might hypothetically misunderstand a particular book's discussion =
of=20
> that term, and apply that misunderstanding to C++'s use of that term, the=
n=20
> no possible name for anything is safe.=20
>
> Fair enough.  I=E2=80=99m inclined to include an Appendix of issues such =
as this.=20
>  Ultimately it will be the LWG, not the authors of this paper, that will=
=20
> decide details like this, assuming the proposal even gets that far.=20
>
> Thanks,=20
> Howard=20
>
>

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

------=_Part_731_16087672.1399675412664
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">You should also define hash_append for the string_view and=
 optional classes in the Library Fundamentals paper.<br><br>On Tuesday, May=
 6, 2014 11:00:44 PM UTC-4, Howard Hinnant wrote:<blockquote class=3D"gmail=
_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;p=
adding-left: 1ex;">On May 6, 2014, at 1:29 PM, 'Geoffrey Romer' via ISO C++=
 Standard - Future Proposals &lt;<a href=3D"javascript:" target=3D"_blank" =
gdf-obfuscated-mailto=3D"YoqIfk_SsysJ" onmousedown=3D"this.href=3D'javascri=
pt:';return true;" onclick=3D"this.href=3D'javascript:';return true;">std-p=
r...@isocpp.org</a>&gt; wrote:
<br>
<br>&gt; This sounds highly speculative to me, and it proves far too much; =
if we're willing to reject a perfectly good name on the grounds that some p=
eople might hypothetically misunderstand a particular book's discussion of =
that term, and apply that misunderstanding to C++'s use of that term, then =
no possible name for anything is safe.
<br>
<br>Fair enough. &nbsp;I=E2=80=99m inclined to include an Appendix of issue=
s such as this. &nbsp;Ultimately it will be the LWG, not the authors of thi=
s paper, that will decide details like this, assuming the proposal even get=
s that far.
<br>
<br>Thanks,
<br>Howard
<br>
<br></blockquote></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

------=_Part_731_16087672.1399675412664--

.