Topic: Cryptographic hash functions reloaded [was


Author: Zhihao Yuan <zy@miator.net>
Date: Sat, 23 Aug 2014 17:17:58 -0400
Raw View
--089e01634bf0eeca4f0501527cc0
Content-Type: text/plain; charset=UTF-8

On Sat, Aug 23, 2014 at 4:14 PM, Markus Mayer <lotharlutz@gmx.de> wrote:

>
> Design (heavily based on boost::crc):
>

I used boost::detail::sha1 in two projects but latter I dropped it,
because it has a serious flaw:  The hasher object become
unusable after you getting the digest (finalized), but you
have no way to tell whether such an object is still usable or
not if the object is returned from a function.

Python's hashlib has no such problem, and the solution is
very simple: just copy the digest before you finalize it.

Now my design is thin wrapper to openssl, and provides all
functionality of Python's hashlib at compile time:

  https://github.com/lichray/cpp-deuceclient/blob/master/src/hashlib.h


>   hash_function& process_bytes( void const *buffer, std::size_t
> byte_count);
>

void* is a very bad interface.  It destroys type safety.  A user
may call process_bytes(my_std_string, a_shorter_length)
and think he is getting hash of a prefix of the string.


>   const result_type& hash_value();
>

The issue I explained above: we need to return a value
to make the object less stateful.


>
> -Why 'result_type'?
> To be consistent with std::function.
>

It's not a function, then why be consistent?  Plus TR1
function interface no longer matters.


>
> -Why not add an iterator based process_bytes?
> For now I consider it to complex.
>

Yes.  Iterator is for type generic algorithms, while message
digest has much more limited input types.


> -How to handle the large state of a hash function?
> Hash function can have a large internal state (64 Byte for sha2, 200 Byte
> for sha3) is it OK to put such objects on the stack, or do we need to
> allocate them dynamically (using allocators)?
>

I don't think we need.  200 bytes does not look large to me,
comparing with those of pseudo random number generators.


> -How to hash a files?
> Hashing a file is quite common. As the iterator interface was removed,
> there is no easy way to hash a file (using istream_iterator). How to do it
> now?
>

I tried, but I found that the internal data flow of streambuf is
too complex to a hasher.

User can use an external buffer to do so, or to use system
calls like read(2) directly.  To me, it does not seem to be a
problem must be solved in the standard.

--
Zhihao Yuan, ID lichray
The best way to predict the future is to invent it.
___________________________________________________
4BSD -- http://bit.ly/blog4bsd

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

--089e01634bf0eeca4f0501527cc0
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Sat, Aug 23, 2014 at 4:14 PM, Markus Mayer <span dir=3D=
"ltr">&lt;<a href=3D"mailto:lotharlutz@gmx.de" target=3D"_blank">lotharlutz=
@gmx.de</a>&gt;</span> wrote:<br><div class=3D"gmail_extra"><div class=3D"g=
mail_quote">
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><br>
Design (heavily based on boost::crc):<br></blockquote><div><br></div><div>I=
 used boost::detail::sha1 in two projects but latter I dropped it,<br></div=
><div>because it has a serious flaw:=C2=A0 The hasher object become<br></di=
v>
<div>unusable after you getting the digest (finalized), but you<br></div><d=
iv>have no way to tell whether such an object is still usable or<br></div><=
div>not if the object is returned from a function.<br><br></div><div>Python=
&#39;s hashlib has no such problem, and the solution is<br>
very simple: just copy the digest before you finalize it.<br><br></div><div=
>Now my design is thin wrapper to openssl, and provides all<br>functionalit=
y of Python&#39;s hashlib at compile time:<br><br>=C2=A0 <a href=3D"https:/=
/github.com/lichray/cpp-deuceclient/blob/master/src/hashlib.h">https://gith=
ub.com/lichray/cpp-deuceclient/blob/master/src/hashlib.h</a><br>
</div><div><br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 =
0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
=C2=A0 hash_function&amp; process_bytes( void const *buffer, std::size_t by=
te_count);<br></blockquote><div><br></div><div>void* is a very bad interfac=
e.=C2=A0 It destroys type safety.=C2=A0 A user<br>may call process_bytes(my=
_std_string, a_shorter_length)<br>
</div><div>and think he is getting hash of a prefix of the string.<br></div=
><div><br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8e=
x;border-left:1px #ccc solid;padding-left:1ex">
<br>
=C2=A0 const result_type&amp; hash_value();<br></blockquote><div><br></div>=
<div>The issue I explained above: we need to return a value<br></div><div>t=
o make the object less stateful.<br></div><div>=C2=A0</div><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pad=
ding-left:1ex">
<br>
-Why &#39;result_type&#39;?<br>
To be consistent with std::function.<br></blockquote><div><br></div><div>It=
&#39;s not a function, then why be consistent?=C2=A0 Plus TR1<br>function i=
nterface no longer matters.<br></div><div>=C2=A0</div><blockquote class=3D"=
gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-=
left:1ex">

<br>
-Why not add an iterator based process_bytes?<br>
For now I consider it to complex.<br></blockquote><div><br></div><div>Yes.=
=C2=A0 Iterator is for type generic algorithms, while message<br></div><div=
>digest has much more limited input types.<br><br></div><blockquote class=
=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex">

<br>
-How to handle the large state of a hash function?<br>
Hash function can have a large internal state (64 Byte for sha2, 200 Byte f=
or sha3) is it OK to put such objects on the stack, or do we need to alloca=
te them dynamically (using allocators)?<br></blockquote><div><br></div>
<div>I don&#39;t think we need.=C2=A0 200 bytes does not look large to me,<=
br></div><div>comparing with those of pseudo random number generators.<br><=
br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;borde=
r-left:1px #ccc solid;padding-left:1ex">

<br>
-How to hash a files?<br>
Hashing a file is quite common. As the iterator interface was removed, ther=
e is no easy way to hash a file (using istream_iterator). How to do it now?=
<br></blockquote><div><br></div><div>I tried, but I found that the internal=
 data flow of streambuf is<br>
</div><div>too complex to a hasher.<br><br></div><div>User can use an exter=
nal buffer to do so, or to use system<br>calls like read(2) directly.=C2=A0=
 To me, it does not seem to be a<br>problem must be solved in the standard.=
<br>
</div></div><br>-- <br>Zhihao Yuan, ID lichray<br>The best way to predict t=
he future is to invent it.<br>_____________________________________________=
______<br>4BSD -- <a href=3D"http://bit.ly/blog4bsd" target=3D"_blank">http=
://bit.ly/blog4bsd</a>
</div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

--089e01634bf0eeca4f0501527cc0--

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Sat, 23 Aug 2014 21:25:38 -0400
Raw View
On Aug 23, 2014, at 4:14 PM, Markus Mayer <lotharlutz@gmx.de> wrote:

> Design (heavily based on boost::crc):
>=20
> class hash_function
> {
> public:
>  typedef std::array<unsigned char, ALGORITHM_DEFINED> result_type;
>  //Default contructable
>  //Copyable
>  //Moveable
>=20
>  hash_function& process_bytes( void const *buffer, std::size_t byte_count=
);
>=20
>  void reset();
>=20
>  const result_type& hash_value();
>=20
> };
>=20
> //I am not sure about this function yet...
> template<class hash>
> typename hash::result_type calculate_hash(void const *buffer, std::size_t=
 byte_count);
>=20
> The implemented algorithms will be (the class name is given in the list):
> -md5
> -sha_1
> -sha_224
> -sha_256
> -sha_384
> -sha_512
> -sha3_224
> -sha3_256
> -sha3_384
> -sha3_512
> -Various flavors of crc
>=20
>=20
> Rational:
> -Why 'result_type'?
> To be consistent with std::function.

Agreed with result_type, but see below.

>=20
> -Why 'unsigned char' and not 'uint8_t' for result_type?
> Most hash algorithms work on an 8 bit basis. But as long as unsigned char=
 has a multiple of 8 bits, the algorithms can still be applied. So 'unsigne=
d char' enables those architectures to implement the functions. Architectur=
es where 'unsigned char' is not a multiple of 8 bits will be excluded by th=
e proposal.

Ditto for uint8_t, except that if you use uint8_t, you'll find out about th=
ose excluded architectures at compile time instead of at run time.

> -Why not implement operator()?
> Having a function (with a name) is more vocal (and clear) then just brace=
s. IMHO Operator() is only useful if the object will be used as a functor (=
as in std::less). But the signature is to uncommon to be used in any standa=
rd algorithm. But I'm willing to change if someone came up with a good exam=
ple.

The "Types Don't Know #" proposal used operator() because it made it much e=
asier to design a type-erased hasher that could be used for pimpl types.

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#pimpl

> -Why not rename 'hash_value' to result_type()?
> What do you prefer?
> auto result =3D myHash.hash_value;
> or
> auto result =3D static_cast<hash_function::result_type>(myHash);

Or:

hash_function::result_type result{myHash};

auto is a double-edged sword.  I personally love it.  But I'm also finding =
it can be abused.  Sometimes it is helpful to have the type of a variable b=
e more explicit.  But see below where it is shown that minimal code interfa=
ces directly with hash algorithms.  Only hash functors should call hash alg=
orithms.  And hash functors are what everybody else calls, such as unordere=
d containers, or clients needing the result of a sha256.  The distinction b=
etween hash functors and hash algorithms is what enables end-user-clients t=
o easily swap out hash algorithms, simply by telling the hash functor to sw=
itch hash algorithms.=20


> -Sync with 'Types Don't Know #'

Here is a 'Types Don't Know #' version of sha256 based on the implementatio=
n at http://www.aarongifford.com/computers/sha.html

class sha256
{
    SHA256_CTX state_;
public:=20
    static constexpr std::endian endian =3D std::endian::big;
    using result_type =3D std::array<std::uint8_t, SHA256_DIGEST_LENGTH>;

    sha256() noexcept
    {
        SHA256_Init(&state_);
    }

    void
    operator()(void const* key, std::size_t len) noexcept
    {
        SHA256_Update(&state_, static_cast<uint8_t const*>(key), len);
    }

    explicit
    operator result_type() noexcept
    {
        result_type r;
        SHA256_Final(r.data(), &state_);
        return r;
    }
};

The types, constants, and functions starting with SHA256_ come from the C i=
mplementation at http://www.aarongifford.com/computers/sha.html.  sha256 is=
 copy constructible (and thus move constructible).  It is not complicated -=
- a simple adaptor of existing C implementations.  And it is easy and type-=
safe to use:

int
main()
{
    std::uhash<sha256> h;
    auto r =3D h(54);
}

r has type sha256::result_type, which is a 32 byte std::array.  It hashes t=
he int 54, converted to big endian prior to feeding it to the hash (if nece=
ssary).  Little endian or native endian could just have easily been chosen.=
  One uses native endian if you don't care about the results being consiste=
nt across platforms with differing endian.  Note that the endian bits are l=
acking from N3980.  They are a later refinement, strictly to aid fingerprin=
ting applications such as sha256.

Note that clients such  as the main() shown above, nor unordered containers=
, use sha256 directly, except to parameterize hash functions such as uhash.=
  The hash functions such as uhash communicate directly with the hash algor=
ithms such as sha256.  This (compile-time) level of indirection is what mak=
es it possible to very easily and quickly switch between sha256 and sha3_51=
2, or siphash, etc.

std::uhash is an example, default, std-supplied hash_function, much like st=
d::vector is a an example std-supplied container.  However std::uhash is fa=
r simpler than std::vector.  Clients can very easily supply custom hash fun=
ctors to replace std::uhash to do things such as seeding, padding and salti=
ng.  For example from:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#seeding

   std::unordered_set<MyType, randomly_seeded_hash<acme::spooky>> my_set;

Using sha256 in this seeded context would fail at compile time, since sha25=
6 is not seedable (at least not the implementation I used).

Finally note that if instead of wanting to has an int, I want to hash a str=
ing, it is simply:

int
main()
{
    std::uhash<sha256> h;
    auto r =3D h(std::string("54"));
}

The remarkable thing to note here is that neither int nor std::string has t=
he slightest notion of sha256, nor even std::uhash.  These types are hash-a=
lgorithm-agnostic.  And this means if tomorrow, someone invents and impleme=
nts super_sha, then all we have to do to use it is:

int
main()
{
    std::uhash<super_sha> h;
    auto r =3D h(std::string("54"));
}

This is in stark contrast to today's std::hash<T> model where one would hav=
e to revisit int, std::string, and all other types, to teach them about sup=
er_sha.

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Miro Knejp <miro@knejp.de>
Date: Sun, 24 Aug 2014 05:38:06 +0200
Raw View
This is a multi-part message in MIME format.
--------------070606040009050005030705
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: quoted-printable


Am 24.08.2014 03:25, schrieb Howard Hinnant:
>> -Why 'unsigned char' and not 'uint8_t' for result_type?
>> Most hash algorithms work on an 8 bit basis. But as long as unsigned cha=
r has a multiple of 8 bits, the algorithms can still be applied. So 'unsign=
ed char' enables those architectures to implement the functions. Architectu=
res where 'unsigned char' is not a multiple of 8 bits will be excluded by t=
he proposal.
> Ditto for uint8_t, except that if you use uint8_t, you'll find out about =
those excluded architectures at compile time instead of at run time.
>
Wouldn't it be the job of the implementation on such platforms to make=20
the algorithm work on the machine? That an algorithm works on an 8 bit=20
basis does not mean it cannot be implemented on architectures with other=20
word sizes as long as the input size conforms to the algorithm's=20
requirements and instructions for the necessary bit fiddling are available.

Besides, such architectures would most likely have a /freestanding=20
implementation /(see 17.6.1.3 [compliance]) where most standard headers=20
are optional and excluded features must be documented, so I see no=20
reason to add this restriction to a proposal.

Miro

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

--------------070606040009050005030705
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<html>
  <head>
    <meta content=3D"text/html; charset=3DISO-8859-1"
      http-equiv=3D"Content-Type">
  </head>
  <body text=3D"#000000" bgcolor=3D"#FFFFFF">
    <br>
    <div class=3D"moz-cite-prefix">Am 24.08.2014 03:25, schrieb Howard
      Hinnant:<br>
    </div>
    <blockquote
      cite=3D"mid:B33BCE97-51A4-4880-9C26-5347D6C8A02A@gmail.com"
      type=3D"cite">
      <blockquote type=3D"cite">
        <pre wrap=3D"">
-Why 'unsigned char' and not 'uint8_t' for result_type?
Most hash algorithms work on an 8 bit basis. But as long as unsigned char h=
as a multiple of 8 bits, the algorithms can still be applied. So 'unsigned =
char' enables those architectures to implement the functions. Architectures=
 where 'unsigned char' is not a multiple of 8 bits will be excluded by the =
proposal.
</pre>
      </blockquote>
      <pre wrap=3D"">
Ditto for uint8_t, except that if you use uint8_t, you'll find out about th=
ose excluded architectures at compile time instead of at run time.

</pre>
    </blockquote>
    Wouldn't it be the job of the implementation on such platforms to
    make the algorithm work on the machine? That an algorithm works on
    an 8 bit basis does not mean it cannot be implemented on
    architectures with other word sizes as long as the input size
    conforms to the algorithm's requirements and instructions for the
    necessary bit fiddling are available.<br>
    <br>
    Besides, such architectures would most likely have a <i>freestanding
      implementation </i>(see 17.6.1.3 [compliance]) where most
    standard headers are optional and excluded features must be
    documented, so I see no reason to add this restriction to a
    proposal.<br>
    <br>
    Miro<br>
    <br>
  </body>
</html>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

--------------070606040009050005030705--


.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Sat, 23 Aug 2014 23:54:55 -0400
Raw View
On Aug 23, 2014, at 11:38 PM, Miro Knejp <miro@knejp.de> wrote:

>=20
> Am 24.08.2014 03:25, schrieb Howard Hinnant:
>>> -Why 'unsigned char' and not 'uint8_t' for result_type?
>>> Most hash algorithms work on an 8 bit basis. But as long as unsigned ch=
ar has a multiple of 8 bits, the algorithms can still be applied. So 'unsig=
ned char' enables those architectures to implement the functions. Architect=
ures where 'unsigned char' is not a multiple of 8 bits will be excluded by =
the proposal.
>>>=20
>> Ditto for uint8_t, except that if you use uint8_t, you'll find out about=
 those excluded architectures at compile time instead of at run time.
>>=20
>>=20
> Wouldn't it be the job of the implementation on such platforms to make th=
e algorithm work on the machine? That an algorithm works on an 8 bit basis =
does not mean it cannot be implemented on architectures with other word siz=
es as long as the input size conforms to the algorithm's requirements and i=
nstructions for the necessary bit fiddling are available.
>=20
> Besides, such architectures would most likely have a freestanding impleme=
ntation (see 17.6.1.3 [compliance]) where most standard headers are optiona=
l and excluded features must be documented, so I see no reason to add this =
restriction to a proposal.

I misspoke.  This sentence of mine is a side issue, not worthy of comment. =
 My apologies for the noise.

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Jens Maurer <Jens.Maurer@gmx.net>
Date: Sun, 24 Aug 2014 08:30:46 +0200
Raw View
On 08/23/2014 10:14 PM, Markus Mayer wrote:
> -Why 'unsigned char' and not 'uint8_t' for result_type?
> Most hash algorithms work on an 8 bit basis. But as long as unsigned
> char has a multiple of 8 bits, the algorithms can still be applied. So
> 'unsigned char' enables those architectures to implement the functions.
> Architectures where 'unsigned char' is not a multiple of 8 bits will be
> excluded by the proposal.

If you specify that a hash function operates on octets, i.e. 8-bit
quantities (I believe all them do), then it seems ok to have a 9-bit
unsigned char, whose top bit will always be zero.

Conversion from object representation to octets is a separate issue.

> -Why not implement operator()?
> Having a function (with a name) is more vocal (and clear) then just
> braces. IMHO Operator() is only useful if the object will be used as a
> functor (as in std::less). But the signature is to uncommon to be used
> in any standard algorithm. But I'm willing to change if someone came up
> with a good example.

In my opinion, a hash is a function object and should have operator().
> Open topics:
> -How to handle the large state of a hash function?
> Hash function can have a large internal state (64 Byte for sha2, 200
> Byte for sha3) is it OK to put such objects on the stack, or do we need
> to allocate them dynamically (using allocators)?

No, you've got constant state size.  Putting 200 bytes on the stack
is totally ok; implementations where that isn't are free to use
another strategy.

> -How to hash a files?
> Hashing a file is quite common. As the iterator interface was removed,
> there is no easy way to hash a file (using istream_iterator). How to do
> it now?

It's certainly possible to provide a generic wrapper template that converts
from input iterators to byte buffers, i.e.

template<class H, class InputIterator>
void process(H&, InputIterator first, InputIterator last);

> -Should we only add some crc classes (like crc_32, crc_16 or crc_ccitt)
> or a an generic (templated) crc algorithm (like in boost:crc)?

Yes, please add crc32c at least.  Intel has a special instruction for it
that we should make easily usable.

> -Add 'nothrow' where applicable

noexcept

Jens

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Jens Maurer <Jens.Maurer@gmx.net>
Date: Sun, 24 Aug 2014 08:44:57 +0200
Raw View
On 08/24/2014 03:25 AM, Howard Hinnant wrote:
> On Aug 23, 2014, at 4:14 PM, Markus Mayer <lotharlutz@gmx.de> wrote:
>>  hash_function& process_bytes( void const *buffer, std::size_t byte_coun=
t);

I'd like to argue in favor of "unsigned char*", not "void*".

>> -Why not implement operator()?
>> Having a function (with a name) is more vocal (and clear) then just brac=
es. IMHO Operator() is only useful if the object will be used as a functor =
(as in std::less). But the signature is to uncommon to be used in any stand=
ard algorithm. But I'm willing to change if someone came up with a good exa=
mple.
>=20
> The "Types Don't Know #" proposal used operator() because it made it much=
 easier to design a type-erased hasher that could be used for pimpl types.

Yes, operator() is better for an entity that does essentially just one thin=
g.

>> -Sync with 'Types Don't Know #'
>=20
> Here is a 'Types Don't Know #' version of sha256 based on the implementat=
ion at http://www.aarongifford.com/computers/sha.html
>=20
> class sha256
> {
>     SHA256_CTX state_;
> public:=20
>     static constexpr std::endian endian =3D std::endian::big;
>     using result_type =3D std::array<std::uint8_t, SHA256_DIGEST_LENGTH>;
>=20
>     sha256() noexcept
>     {
>         SHA256_Init(&state_);
>     }
>=20
>     void
>     operator()(void const* key, std::size_t len) noexcept
>     {
>         SHA256_Update(&state_, static_cast<uint8_t const*>(key), len);
>     }

I'd like to have "unsigned char *" as the interface. Your implementation
makes it clear it doesn't work on non-8-bit-char machines, which is
fine, but that shouldn't constrain the interface.

> int
> main()
> {
>     std::uhash<sha256> h;
>     auto r =3D h(54);
> }

It's definitely the right approach to separate the core hash algorithm from
the preparation and adaptation of values of arbitrary type that gets passed=
 in.

> r has type sha256::result_type, which is a 32 byte std::array.  It hashes=
 the int 54, converted to big endian prior to feeding it to the hash (if ne=
cessary).  Little endian or native endian could just have easily been chose=
n.  One uses native endian if you don't care about the results being consis=
tent across platforms with differing endian.  Note that the endian bits are=
 lacking from N3980.  They are a later refinement, strictly to aid fingerpr=
inting applications such as sha256.

I'm not convinced the "endian" parts are well-designed as-is.

A hash function such as SHA256 should be defined to hash a sequence
of octets, accessed via "unsigned char *".  At that level, endianess
doesn't matter.  I agree there is a choice for std::uhash<> when
converting a, say, "int" to a sequence of "unsigned char"s whether to
perform endian conversion or not, and that influences the cross-platform
reproducibility of the hash when e.g. hashing a sequence of "int"
values.

Some hash functions that process more than one octet in one step
could omit repeated endian conversion when accumulating multiple
"unsigned char"s into a step-unit (e.g. uint64_t).  Specializations
of std::uhash<> that exploit that might be useful, possibly with an
optional
   operator()(const uint64_t *, size_t len)
interface for the individual hash function that makes the dependency
explicit.

Jens

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: David Krauss <potswa@gmail.com>
Date: Sun, 24 Aug 2014 15:05:35 +0800
Raw View
--Apple-Mail=_7A6906B7-1F3D-4F98-9836-9DE6189F440F
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=ISO-8859-1


On 2014-08-24, at 2:30 PM, Jens Maurer <Jens.Maurer@gmx.net> wrote:

> If you specify that a hash function operates on octets, i.e. 8-bit
> quantities (I believe all them do), then it seems ok to have a 9-bit
> unsigned char, whose top bit will always be zero.
>=20
> Conversion from object representation to octets is a separate issue.

9-bit systems use the extra bit for parity anyway and it the hash would jus=
t ignore it.

Anyway, portable hashes should be defined in terms of values, not represent=
ation bytes. For text, any system will give you a well-defined octet sequen=
ce. For anything else, it's better to follow the lines of Hinnant's researc=
h than to give up and reinterpret_cast< unsigned char * >( obj ), even for =
simple integers.

I'm even more leery of unsigned char * than void * as a generic interface. =
The user needs a template to handle anything besides text, which is covered=
 by ambiguously-signed plain char *.

> In my opinion, a hash is a function object and should have operator().

+1. Function objects are amenable to metaprogramming facilities. Hashes are=
 often used as filters, so they should work with higher-order functions. A =
method name inside a one-function class only introduces repetition. A metho=
d name inside a generic interface implemented by various classes reserves t=
hat name from the generic implementations.

>> -Should we only add some crc classes (like crc_32, crc_16 or crc_ccitt)=
=20
>> or a an generic (templated) crc algorithm (like in boost:crc)?
>=20
> Yes, please add crc32c at least.  Intel has a special instruction for it
> that we should make easily usable.

A generic template still allows a hardware-accelerated specialization. Othe=
r hardware supports CRC16 in its various flavors.

I think the random number engines ([rand.eng] and [rand.predef]) set a good=
 precedent.

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

--Apple-Mail=_7A6906B7-1F3D-4F98-9836-9DE6189F440F
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset=ISO-8859-1

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html charset=
=3Dwindows-1252"></head><body style=3D"word-wrap: break-word; -webkit-nbsp-=
mode: space; -webkit-line-break: after-white-space;"><br><div><div>On 2014&=
ndash;08&ndash;24, at 2:30 PM, Jens Maurer &lt;<a href=3D"mailto:Jens.Maure=
r@gmx.net">Jens.Maurer@gmx.net</a>&gt; wrote:</div><br class=3D"Apple-inter=
change-newline"><blockquote type=3D"cite">If you specify that a hash functi=
on operates on octets, i.e. 8-bit<br>quantities (I believe all them do), th=
en it seems ok to have a 9-bit<br>unsigned char, whose top bit will always =
be zero.<br><br>Conversion from object representation to octets is a separa=
te issue.<br></blockquote><div><br></div><div>9-bit systems use the extra b=
it for parity anyway and it the hash would just ignore it.</div><div><br></=
div><div>Anyway, portable hashes should be defined in terms of values, not =
representation bytes. For text, any system will give you a well-defined oct=
et sequence. For anything else, it&rsquo;s better to follow the lines of Hi=
nnant&rsquo;s research than to give up and <font face=3D"Courier">reinterpr=
et_cast&lt; unsigned char * &gt;( obj )</font>, even for simple integers.</=
div><div><br></div><div>I&rsquo;m even more leery of <font face=3D"Courier"=
>unsigned char *</font> than <font face=3D"Courier">void *</font>&nbsp;as a=
 generic interface. The user needs a template to handle anything besides te=
xt, which is covered by ambiguously-signed plain <font face=3D"Courier">cha=
r *</font>.</div><div><br></div><blockquote type=3D"cite">In my opinion, a =
hash is a function object and should have operator().<br></blockquote><div>=
<br></div><div>+1. Function objects are amenable to metaprogramming facilit=
ies. Hashes are often used as filters, so they should work with higher-orde=
r functions. A method name inside a one-function class only introduces repe=
tition. A method name inside a generic interface implemented by various cla=
sses reserves that name from the generic implementations.</div><br><blockqu=
ote type=3D"cite"><blockquote type=3D"cite">-Should we only add some crc cl=
asses (like crc_32, crc_16 or crc_ccitt) <br>or a an generic (templated) cr=
c algorithm (like in boost:crc)?<br></blockquote><br>Yes, please add crc32c=
 at least. &nbsp;Intel has a special instruction for it<br>that we should m=
ake easily usable.<br></blockquote><div><br></div><div>A generic template s=
till allows a hardware-accelerated specialization. Other hardware supports =
CRC16 in its various flavors.</div><div><br></div><div>I think the random n=
umber engines ([rand.eng] and [rand.predef]) set a good precedent.</div><di=
v><br></div></div></body></html>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

--Apple-Mail=_7A6906B7-1F3D-4F98-9836-9DE6189F440F--

.


Author: Markus Mayer <lotharlutz@gmx.de>
Date: Sun, 24 Aug 2014 10:33:44 +0200
Raw View
On 08/24/2014 05:38 AM, Miro Knejp wrote:
>
> Am 24.08.2014 03:25, schrieb Howard Hinnant:
>>> -Why 'unsigned char' and not 'uint8_t' for result_type?
>>> Most hash algorithms work on an 8 bit basis. But as long as unsigned ch=
ar has a multiple of 8 bits, the algorithms can still be applied. So 'unsig=
ned char' enables those architectures to implement the functions. Architect=
ures where 'unsigned char' is not a multiple of 8 bits will be excluded by =
the proposal.
>> Ditto for uint8_t, except that if you use uint8_t, you'll find out about=
 those excluded architectures at compile time instead of at run time.
>>
> Wouldn't it be the job of the implementation on such platforms to make
> the algorithm work on the machine? That an algorithm works on an 8 bit
> basis does not mean it cannot be implemented on architectures with other
> word sizes as long as the input size conforms to the algorithm's
> requirements and instructions for the necessary bit fiddling are availabl=
e.
>
> Besides, such architectures would most likely have a /freestanding
> implementation /(see 17.6.1.3 [compliance]) where most standard headers
> are optional and excluded features must be documented, so I see no
> reason to add this restriction to a proposal.
>
> Miro
>

That's a really good point. I will use 'unsigned char' and let=20
implementations of odd-sized architectures decide themselves if they=20
implement it.

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Markus Mayer <lotharlutz@gmx.de>
Date: Sun, 24 Aug 2014 10:52:53 +0200
Raw View
On 08/24/2014 03:25 AM, Howard Hinnant wrote:
> On Aug 23, 2014, at 4:14 PM, Markus Mayer <lotharlutz@gmx.de> wrote:
>
>> -Why not implement operator()?
>> Having a function (with a name) is more vocal (and clear) then just brac=
es. IMHO Operator() is only useful if the object will be used as a functor =
(as in std::less). But the signature is to uncommon to be used in any stand=
ard algorithm. But I'm willing to change if someone came up with a good exa=
mple.
>
> The "Types Don't Know #" proposal used operator() because it made it much=
 easier to design a type-erased hasher that could be used for pimpl types.
>
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#pimpl

That's a good reason for add operator(). Adding it to the next version

>
>> -Why not rename 'hash_value' to result_type()?
>> What do you prefer?
>> auto result =3D myHash.hash_value;
>> or
>> auto result =3D static_cast<hash_function::result_type>(myHash);
>
> Or:
>
> hash_function::result_type result{myHash};

With my version it is:
hash_function::result_type result{myHash.hash_value};

I think my solution allows everything your solution allows plus
auto result =3D myHash.hash_value;

So I stick with my solution (for now)...
>
> auto is a double-edged sword.  I personally love it.  But I'm also findin=
g it can be abused.  Sometimes it is helpful to have the type of a variable=
 be more explicit.  But see below where it is shown that minimal code inter=
faces directly with hash algorithms.
 > Only hash functors should call hash algorithms.

I'm not sure if I get your point, but I think others will also call hash=20
algorithms. Think a about implementing a protocol which uses a checksum=20
for the header and the content.

At first you load it into a buffer, then you want to check if the=20
checksum matches.

vector<unsigned char> buffer;
myHash(buffer.data(), buffer.size());
//compare checksum...

As you don't know how the hasher of vector works (maybe it is add size=20
to the hash) you have to use the algorithm directly.


> And hash functors are what everybody else calls, such as unordered contai=
ners, or clients needing the result of a sha256.  The distinction between h=
ash functors and hash algorithms is what enables end-user-clients to easily=
 swap out hash algorithms, simply by telling the hash functor to switch has=
h algorithms.
>
>
>> -Sync with 'Types Don't Know #'
>
> Here is a 'Types Don't Know #' version of sha256 based on the implementat=
ion at http://www.aarongifford.com/computers/sha.html
>
> class sha256
> {
>      SHA256_CTX state_;
> public:
>      static constexpr std::endian endian =3D std::endian::big;
>      using result_type =3D std::array<std::uint8_t, SHA256_DIGEST_LENGTH>=
;
>
>      sha256() noexcept
>      {
>          SHA256_Init(&state_);
>      }
>
>      void
>      operator()(void const* key, std::size_t len) noexcept
>      {
>          SHA256_Update(&state_, static_cast<uint8_t const*>(key), len);
>      }
>
>      explicit
>      operator result_type() noexcept
>      {
>          result_type r;
>          SHA256_Final(r.data(), &state_);
>          return r;
>      }
> };
>
> The types, constants, and functions starting with SHA256_ come from the C=
 implementation at http://www.aarongifford.com/computers/sha.html.  sha256 =
is copy constructible (and thus move constructible).  It is not complicated=
 -- a simple adaptor of existing C implementations.  And it is easy and typ=
e-safe to use:
>
> int
> main()
> {
>      std::uhash<sha256> h;
>      auto r =3D h(54);
> }
>
> r has type sha256::result_type, which is a 32 byte std::array.  It hashes=
 the int 54, converted to big endian prior to feeding it to the hash (if ne=
cessary).  Little endian or native endian could just have easily been chose=
n.  One uses native endian if you don't care about the results being consis=
tent across platforms with differing endian.  Note that the endian bits are=
 lacking from N3980.  They are a later refinement, strictly to aid fingerpr=
inting applications such as sha256.
>
> Note that clients such  as the main() shown above, nor unordered containe=
rs, use sha256 directly, except to parameterize hash functions such as uhas=
h.  The hash functions such as uhash communicate directly with the hash alg=
orithms such as sha256.  This (compile-time) level of indirection is what m=
akes it possible to very easily and quickly switch between sha256 and sha3_=
512, or siphash, etc.
>
> std::uhash is an example, default, std-supplied hash_function, much like =
std::vector is a an example std-supplied container.  However std::uhash is =
far simpler than std::vector.  Clients can very easily supply custom hash f=
unctors to replace std::uhash to do things such as seeding, padding and sal=
ting.  For example from:
>
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#seedin=
g
>
>     std::unordered_set<MyType, randomly_seeded_hash<acme::spooky>> my_set=
;
>
> Using sha256 in this seeded context would fail at compile time, since sha=
256 is not seedable (at least not the implementation I used).
>
> Finally note that if instead of wanting to has an int, I want to hash a s=
tring, it is simply:
>
> int
> main()
> {
>      std::uhash<sha256> h;
>      auto r =3D h(std::string("54"));
> }
>
> The remarkable thing to note here is that neither int nor std::string has=
 the slightest notion of sha256, nor even std::uhash.  These types are hash=
-algorithm-agnostic.  And this means if tomorrow, someone invents and imple=
ments super_sha, then all we have to do to use it is:
>
> int
> main()
> {
>      std::uhash<super_sha> h;
>      auto r =3D h(std::string("54"));
> }
>
> This is in stark contrast to today's std::hash<T> model where one would h=
ave to revisit int, std::string, and all other types, to teach them about s=
uper_sha.
>
> Howard
>

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Markus Mayer <lotharlutz@gmx.de>
Date: Sun, 24 Aug 2014 11:05:59 +0200
Raw View
Thanks for all your feedback. Here comes the new version...

Changes:
-Add operator()
-Make hash_value() const and return a value
-Remove question if a allocator is needed. It is not.
-Rename 'nothrow' to 'noexcept'
-Add a rational for operator()
-Rewrite 'Why 'unsigned char' and not 'uint8_t' for result_type?'

New open questions:
-Should we add an overload for 'signed char' and 'char' also?

Design:

class hash_function
{
public:
   typedef std::array<unsigned char, ALGORITHM_DEFINED> result_type;
   //Default contructable
   //Copyable
   //Moveable

   hash_function& process_bytes( void unsigned char *buffer, std::size_t
byte_count);

   hash_function& operator() ( void unsigned char *buffer, std::size_t
byte_count);

   void reset();

   result_type hash_value() const;

};

//I am not sure about this function yet...
template<class hash>
typename hash::result_type calculate_hash(void const *buffer,
std::size_t byte_count);

The implemented algorithms will be (the class name is given in the list):
-md5
-sha_1
-sha_224
-sha_256
-sha_384
-sha_512
-sha3_224
-sha3_256
-sha3_384
-sha3_512
-Various flavors of crc


Rational:
-Why 'result_type'?
To be consistent with std::function. And the name fits pretty good anyway.

-Why 'unsigned char' and not 'uint8_t' for result_type?
Most hash algorithms work on an 8 bit basis. But they can often by
implemented on odd-sized architectures as well. As such architectures
often uses freestanding implementations, they are also free to not
implement it.

-Why 'unsigned char' and not 'char' for result_type?
It is to prevent, that people think that the result is a text(string).
Furthermore if I interpret a raw byte it is always positive. Or do you
interpret 0xFF as -128 when it is given in a raw byte stream.

-Why 'process_bytes' and not 'write', 'update', ...?
Well, naming is hard. I will stick to 'process_bytes' during design, but
I'm open to suggestions.

-Why operator()?
To allow the usage of hash functions as function objects.

-Why not rename 'hash_value' to result_type()?
What do you prefer?
auto result = myHash.hash_value;
or
auto result = static_cast<hash_function::result_type>(myHash);

-Why 'sha_512' and not 'sha2_512'?
'sha_512' is the official name for the algorithm. I know it is bad, but
better be consistent.

-Why not add an iterator based process_bytes?
For now I consider it to complex.

-Why not add/delete algorithm XXX?
I think these are the most common. But I am open to suggestions.

-Why not use the naming of 'N3980: Types Don't Know #'?
Is already discussed above.

Open topics:

-How to hash a files?
Hashing a file is quite common. As the iterator interface was removed,
there is no easy way to hash a file (using istream_iterator). How to do
it now?

-Sync with 'Types Don't Know #'

-Should we only add some crc classes (like crc_32, crc_16 or crc_ccitt)
or a an generic (templated) crc algorithm (like in boost:crc)?

-Add 'noexcept' where applicable

-More naming discussions

-Find a suitable header file (maybe functional or algorithm)


regards
   Markus

--

--- You received this message because you are subscribed to the Google
Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at
http://groups.google.com/a/isocpp.org/group/std-proposals/.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: David Krauss <potswa@gmail.com>
Date: Sun, 24 Aug 2014 17:41:55 +0800
Raw View
--Apple-Mail=_18D0C000-017F-4C1B-97B3-E4763EE3FC33
Content-Type: text/plain; charset=ISO-8859-1


On 2014-08-24, at 5:05 PM, Markus Mayer <lotharlutz@gmx.de> wrote:

> -Should we only add some crc classes (like crc_32, crc_16 or crc_ccitt) or a an generic (templated) crc algorithm (like in boost:crc)?

Please use generic templates wherever possible. Provide typedefs to the common cases. Special cases and hardware acceleration can be done in template specializations, but that should be transparent to the user.

It's better to let the library worry about a generic solution than to leave users without a paddle. There are many weird permutations of CRC in the wild. And it shouldn't be much worry to the library, if there's already such a thing as Boost CRC.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

--Apple-Mail=_18D0C000-017F-4C1B-97B3-E4763EE3FC33
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset=ISO-8859-1

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html charset=
=3Dwindows-1252"></head><body style=3D"word-wrap: break-word; -webkit-nbsp-=
mode: space; -webkit-line-break: after-white-space;"><br><div><div>On 2014&=
ndash;08&ndash;24, at 5:05 PM, Markus Mayer &lt;<a href=3D"mailto:lotharlut=
z@gmx.de">lotharlutz@gmx.de</a>&gt; wrote:</div><br class=3D"Apple-intercha=
nge-newline"><blockquote type=3D"cite"><div style=3D"font-size: 12px; font-=
style: normal; font-variant: normal; font-weight: normal; letter-spacing: n=
ormal; line-height: normal; orphans: auto; text-align: start; text-indent: =
0px; text-transform: none; white-space: normal; widows: auto; word-spacing:=
 0px; -webkit-text-stroke-width: 0px;">-Should we only add some crc classes=
 (like crc_32, crc_16 or crc_ccitt) or a an generic (templated) crc algorit=
hm (like in boost:crc)?</div></blockquote></div><br><div>Please use generic=
 templates wherever possible. Provide typedefs to the common cases. Special=
 cases and hardware acceleration can be done in template specializations, b=
ut that should be transparent to the user.</div><div><br></div><div>It&rsqu=
o;s better to let the library worry about a generic solution than to leave =
users without a paddle. There are many weird permutations of CRC in the wil=
d. And it shouldn&rsquo;t be much worry to the library, if there&rsquo;s al=
ready such a thing as Boost CRC.</div><div><br></div></body></html>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

--Apple-Mail=_18D0C000-017F-4C1B-97B3-E4763EE3FC33--

.


Author: Jens Maurer <Jens.Maurer@gmx.net>
Date: Sun, 24 Aug 2014 14:31:26 +0200
Raw View
On 08/24/2014 11:05 AM, Markus Mayer wrote:
> class hash_function
> {
> public:
>    typedef std::array<unsigned char, ALGORITHM_DEFINED> result_type;
>    //Default contructable
>    //Copyable
>    //Moveable
>
>    hash_function& process_bytes( void unsigned char *buffer, std::size_t
> byte_count);
>
>    hash_function& operator() ( void unsigned char *buffer, std::size_t
> byte_count);

Remove the "void".

Remove "process_bytes()", it's redundant.

>    void reset();
>
>    result_type hash_value() const;

I've got a slight preference towards a named "result" function,
but I can also live with the explicit conversion operator.

> -How to hash a files?
> Hashing a file is quite common. As the iterator interface was removed,
> there is no easy way to hash a file (using istream_iterator). How to do
> it now?

Provide a thin wrapper template that processes input iterators
in e.g. 1024-byte chunks.

> -Should we only add some crc classes (like crc_32, crc_16 or crc_ccitt)
> or a an generic (templated) crc algorithm (like in boost:crc)?

I've no objection against providing a fully general CRC capability,
but I'd like to see a typedef for at least crc32c, since Intel has
a hardware instruction for this that asks for specialization.

> -Find a suitable header file (maybe functional or algorithm)

I'd be inclined to ask for a separate header file.

Are there any legal restrictions on secure hash functions somewhere?
There are definitely some for encryption algorithms in some places.

Jens

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: =?ISO-8859-1?Q?Ion_Gazta=F1aga?= <igaztanaga@gmail.com>
Date: Sun, 24 Aug 2014 15:15:44 +0200
Raw View
El 24/08/2014 10:33, Markus Mayer escribi=F3:
> That's a really good point. I will use 'unsigned char' and let
> implementations of odd-sized architectures decide themselves if they
> implement it.

Although it's pretty similar, you can use uint_least8_t, to express that=20
the result will be put in an array storage of a type that will only use=20
8 bits and it's interpreted as an unsigned number.

That means that if SHA-256 is used, ALGORITHM_DEFINED will always be 32=20
in all platforms. Otherwise, in a platform with CHAR_BIT=3D=3D32 (say, a 32=
=20
bit DSP) one could store 32 bits in an unsigned char and=20
ALGORITHM_DEFINED would be 8 for SHA-256. This is important for the user=20
as she has to correctly interpret how the result is stored.

Best,

Ion


--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Olaf van der Spek <olafvdspek@gmail.com>
Date: Sun, 24 Aug 2014 07:30:16 -0700 (PDT)
Raw View
------=_Part_253_1862368163.1408890616470
Content-Type: text/plain; charset=UTF-8

On Sunday, August 24, 2014 8:45:01 AM UTC+2, Jens Maurer wrote:
>
> On 08/24/2014 03:25 AM, Howard Hinnant wrote:
> > On Aug 23, 2014, at 4:14 PM, Markus Mayer <lotha...@gmx.de <javascript:>>
> wrote:
> >>  hash_function& process_bytes( void const *buffer, std::size_t
> byte_count);
>
> I'd like to argue in favor of "unsigned char*", not "void*".
>

(const void*, size_t) is a pretty standard type, why use const unsigned
char*?
It'd disallow easily hashing a string for example.
Perhaps we should have (array_view<>) instead?


--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

------=_Part_253_1862368163.1408890616470
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Sunday, August 24, 2014 8:45:01 AM UTC+2, Jens Maurer w=
rote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8e=
x;border-left: 1px #ccc solid;padding-left: 1ex;">On 08/24/2014 03:25 AM, H=
oward Hinnant wrote:
<br>&gt; On Aug 23, 2014, at 4:14 PM, Markus Mayer &lt;<a href=3D"javascrip=
t:" target=3D"_blank" gdf-obfuscated-mailto=3D"0wD9LqhRNC8J" onmousedown=3D=
"this.href=3D'javascript:';return true;" onclick=3D"this.href=3D'javascript=
:';return true;">lotha...@gmx.de</a>&gt; wrote:
<br>&gt;&gt; &nbsp;hash_function&amp; process_bytes( void const *buffer, st=
d::size_t byte_count);
<br>
<br>I'd like to argue in favor of "unsigned char*", not "void*".
<br></blockquote><div><br></div><div>(const void*, size_t) is a pretty stan=
dard type, why use const unsigned char*?</div><div>It'd disallow easily has=
hing a string for example.</div><div>Perhaps we should have (array_view&lt;=
&gt;) instead?</div><div><br></div><div><br></div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

------=_Part_253_1862368163.1408890616470--

.


Author: Olaf van der Spek <olafvdspek@gmail.com>
Date: Sun, 24 Aug 2014 07:36:26 -0700 (PDT)
Raw View
------=_Part_19_942100050.1408890986868
Content-Type: text/plain; charset=UTF-8



On Sunday, August 24, 2014 11:06:02 AM UTC+2, Markus Mayer wrote:
>
> -Why not add/delete algorithm XXX?
> I think these are the most common. But I am open to suggestions.
>
>
bcrypt or something suitable for hashing passwords?

> hash_value()

The hash part seems a bit redundant, isn't there a standard name to return
the value from such an object?

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

------=_Part_19_942100050.1408890986868
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br>On Sunday, August 24, 2014 11:06:02 AM UTC+2, Mark=
us Mayer wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-=
left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">-Why not add/de=
lete algorithm XXX?
<br>I think these are the most common. But I am open to suggestions.
<br>
<br></blockquote><div><br></div><div>bcrypt or something suitable for hashi=
ng passwords?</div><div>&nbsp;</div><div>&gt; hash_value()</div><div><br></=
div><div>The hash part seems a bit redundant, isn't there a standard name t=
o return the value from such an object?</div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

------=_Part_19_942100050.1408890986868--

.


Author: Thiago Macieira <thiago@macieira.org>
Date: Sun, 24 Aug 2014 08:29:51 -0700
Raw View
On Sunday 24 August 2014 08:44:57 Jens Maurer wrote:
> Yes, operator() is better for an entity that does essentially just one
> thing.

This one does more than one thing.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center
      PGP/GPG: 0x6EF45358; fingerprint:
      E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Myriachan <myriachan@gmail.com>
Date: Sun, 24 Aug 2014 08:31:33 -0700 (PDT)
Raw View
I think "unsigned char *" would be a confusing interface, unless the docume=
ntation explicitly stated that only the low 8 bits of each char were used a=
s input.  Remember, the MD5 and the SHA family are formally defined as a bi=
twise algorithm, not bytewise, so on an architecture in which CHAR_BIT > 8,=
 does that mean every bit gets hashed, or only the low 8?  Keep in mind tha=
t there are weird 12-bit microcontrollers and such, though they may be too =
small to have C++ and these algorithms.

The American government agency NIST actually certifies algorithms as either=
 supporting bitwise mode or only supporting bytewise.

Even if these *are* considered to be just be the low 8 bits, I suppose we a=
ssume big-endian bit order in all cases?  (MD5 is big-endian bit order, lit=
tle-endian byte order; the SHA-1 and -2 series is big-endian for both; no i=
dea about SHA-3).

By the way, I would make sure that there is no built-in way in the interfac=
e to avoid the +=3D step at the end of an MD5/SHA-1/SHA-256 series of round=
s.  Otherwise, it's trivial to turn these into block ciphers, and then it'd=
 be a legal mess.

Melissa

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Thiago Macieira <thiago@macieira.org>
Date: Sun, 24 Aug 2014 08:32:37 -0700
Raw View
On Sunday 24 August 2014 11:05:59 Markus Mayer wrote:
>    //Default contructable
>    //Copyable
>    //Moveable

Do you mean trivially for any of the above?

I don't see how the types could be trivially constructible. And if you allow
the implementations to allocate heap for the state, they can't be trivially
copyable, movable or destructible either.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center
      PGP/GPG: 0x6EF45358; fingerprint:
      E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Markus Mayer <lotharlutz@gmx.de>
Date: Sun, 24 Aug 2014 18:05:47 +0200
Raw View
On 08/24/2014 02:31 PM, Jens Maurer wrote:
> On 08/24/2014 11:05 AM, Markus Mayer wrote:
>
> Are there any legal restrictions on secure hash functions somewhere?
> There are definitely some for encryption algorithms in some places.
>

I have not found evidence that hash functions are export restricted, but
this have to be checked by a layer eventually.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Markus Mayer <lotharlutz@gmx.de>
Date: Sun, 24 Aug 2014 18:38:17 +0200
Raw View
On 08/24/2014 04:36 PM, Olaf van der Spek wrote:
>
>
> On Sunday, August 24, 2014 11:06:02 AM UTC+2, Markus Mayer wrote:
>
>     -Why not add/delete algorithm XXX?
>     I think these are the most common. But I am open to suggestions.
>
>
> bcrypt or something suitable for hashing passwords?

I have added bcrypt and scrypt.

>  > hash_value()
>
> The hash part seems a bit redundant, isn't there a standard name to
> return the value from such an object?

std::future uses get(). But it does not fit that good, imho

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Markus Mayer <lotharlutz@gmx.de>
Date: Sun, 24 Aug 2014 18:41:07 +0200
Raw View
On 08/24/2014 04:30 PM, Olaf van der Spek wrote:
> On Sunday, August 24, 2014 8:45:01 AM UTC+2, Jens Maurer wrote:
>
>     On 08/24/2014 03:25 AM, Howard Hinnant wrote:
>      > On Aug 23, 2014, at 4:14 PM, Markus Mayer <lotha...@gmx.de
>     <javascript:>> wrote:
>      >>  hash_function& process_bytes( void const *buffer, std::size_t
>     byte_count);
>
>     I'd like to argue in favor of "unsigned char*", not "void*".
>
>
> (const void*, size_t) is a pretty standard type, why use const unsigned
> char*?
> It'd disallow easily hashing a string for example.
> Perhaps we should have (array_view<>) instead?
>

I reverted to (const void*, size_t). It is used by boost::crc and
boost::asio.

I think a common usage is:
myHashFunc.process_bytes(&myStruct, sizeof(myStruct));

array_view<> would require to create an additional object and write more
code. So I stick with (const void*, size_t)

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Markus Mayer <lotharlutz@gmx.de>
Date: Sun, 24 Aug 2014 18:44:41 +0200
Raw View
On 08/24/2014 05:32 PM, Thiago Macieira wrote:
> On Sunday 24 August 2014 11:05:59 Markus Mayer wrote:
>>     //Default contructable
>>     //Copyable
>>     //Moveable
>
> Do you mean trivially for any of the above?
>
> I don't see how the types could be trivially constructible. And if you allow
> the implementations to allocate heap for the state, they can't be trivially
> copyable, movable or destructible either.
>
No, I don't mean trivially. Maybe they will be noexcept, which also
forbids dynamic allocation. I'm not sure if the implementors need that
much flexibility.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Markus Mayer <lotharlutz@gmx.de>
Date: Sun, 24 Aug 2014 19:16:17 +0200
Raw View

Thanks for all your feedback. Here comes the new version...

Changes:
-change signature of process_bytes and operator() (I think I will keep
both of them)
-add more algorithms
-add draft of a function to process a range given by two iterators
blockwise (process_blockwise)

Design:

class hash_function
{
public:
   typedef std::array<unsigned char, ALGORITHM_DEFINED> result_type;
   //Default contructable
   //Copyable
   //Moveable

   hash_function& process_bytes( const void *buffer, std::size_t
byte_count);

   hash_function& operator() ( const void *buffer, std::size_t byte_count);

   void reset();

   result_type hash_value() const;

};

template<class InputIt, class Func>
process_blockwise(InputIt first, InputIt last, std::size_t block_size,
Func& processing_function);

//Maybe add process_blockwise_n

The implemented algorithms will be (the class name is given in the list):
-md5
-sha_1
-sha_224
-sha_256
-sha_384
-sha_512
-sha3_224
-sha3_256
-sha3_384
-sha3_512
-crc as a generic implementation with typedefs for crc_32, crc_16,
crc_32c, crc_xmodem and crc_ccitt
-bcrypt
-scrypt



Rational:
-Why 'result_type'?
To be consistent with std::function. And the name fits pretty good anyway.

-Why 'unsigned char' and not 'uint8_t' for result_type?
Most hash algorithms work on an 8 bit basis. But they can often by
implemented on odd-sized architectures as well. As such architectures
often uses freestanding implementations, they are also free to not
implement it.

-Why 'unsigned char' and not 'char' for result_type?
It is to prevent, that people think that the result is a text(string).
Furthermore if I interpret a raw byte it is always positive. Or do you
interpret 0xFF as -128 when it is given in a raw byte stream.

-Why 'process_bytes' and not 'write', 'update', ...?
Well, naming is hard. I will stick to 'process_bytes' during design, but
I'm open to suggestions.

-Why operator()?
To allow the usage of hash functions as function objects.

-Why not rename 'hash_value' to result_type()?
What do you prefer?
auto result = myHash.hash_value;
or
auto result = static_cast<hash_function::result_type>(myHash);

-Why 'sha_512' and not 'sha2_512'?
'sha_512' is the official name for the algorithm. I know it is bad, but
better be consistent.

-Why not add an iterator based process_bytes?
For now I consider it to complex.

-Why not add/delete algorithm XXX?
I think these are the most common. But I am open to suggestions.

-Why not use the naming of 'N3980: Types Don't Know #'?
Is already discussed above.

Open topics:

-How to hash a files?
Hashing a file is quite common. As the iterator interface was removed,
there is no easy way to hash a file (using istream_iterator). How to do
it now?

-Sync with 'Types Don't Know #'

-Add 'noexcept' where applicable

-More naming discussions

-Find a suitable header file (maybe functional or algorithm)


regards
   Markus

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Sun, 24 Aug 2014 13:39:53 -0400
Raw View
I wrote:

>> class sha256
>> {
>>    SHA256_CTX state_;
>> public:=20
>>    static constexpr std::endian endian =3D std::endian::big;


On Aug 24, 2014, at 2:44 AM, Jens Maurer <Jens.Maurer@gmx.net> wrote:

> I agree there is a choice for std::uhash<> when
> converting a, say, "int" to a sequence of "unsigned char"s whether to
> perform endian conversion or not, and that influences the cross-platform
> reproducibility of the hash when e.g. hashing a sequence of "int"
> values.

This is the only role of the endian specifier, which can be any one of:

   static constexpr std::endian endian =3D std::endian::big;
   static constexpr std::endian endian =3D std::endian::little;
   static constexpr std::endian endian =3D std::endian::native;

std::endian::native will be equal to one of std::endian::little or std::end=
ian::big.  These enums are nothing more than the C++ version of the already=
 popular macros:  __BYTE_ORDER__, __ORDER_LITTLE_ENDIAN__, __ORDER_BIG_ENDI=
AN__.

And endian is really not even used directly by the hash functor uhash<>:

template <class Hasher =3D acme::siphash>
struct uhash
{
    using result_type =3D typename Hasher::result_type;

    template <class T>
    result_type
    operator()(T const& t) const noexcept
    {
        Hasher h;
        hash_append(h, t);
        return static_cast<result_type>(h);
    }
};

Instead there are now two traits:

template <class T> struct is_uniquely_represented;
template <class T, class HashAlgorithm> struct is_contiguously_hashable;

The first, is_uniquely_represented, is exactly what N3980 called is_contigu=
ously_hashable.  And is_contiguously_hashable has evolved into:

template <class T, class HashAlgorithm>
struct is_contiguously_hashable
    : public std::integral_constant<bool, is_uniquely_represented<T>{} &&
                                      (sizeof(T) =3D=3D 1 ||
                                       HashAlgorithm::endian =3D=3D endian:=
:native)>
{};

That is, whether or not a type T is contiguously hashable depends not only =
on if the type is_uniquely_represented, but also on whether the HashAlgorit=
hm needs to reverse the bytes of T before consuming it.  If the HashAlgorit=
hm's *requested endian* (HashAlgorithm::endian) is the same as the platform=
's *native endian* (endian::native), then the bytes do not need to be rever=
sed prior to feeding T to the HashAlgorithm.  And thus if T is also uniquel=
y represented, then one can feed T (or an array of T's) directly to the Has=
hAlgorithm.  This is the job of this function:

template <class Hasher, class T>
inline
std::enable_if_t
<
    is_contiguously_hashable<T, Hasher>{}
>
hash_append(Hasher& h, T const& t) noexcept
{
    h(std::addressof(t), sizeof(t));
}

If we are dealing with a platform/HashAlgorithm disagreement in endian, the=
n an alternative hash_append can be used for scalars:

template <class Hasher, class T>
inline
std::enable_if_t
<
    !is_contiguously_hashable<T, Hasher>{} &&
    (std::is_integral<T>{} || std::is_pointer<T>{} || std::is_enum<T>{})
>
hash_append(Hasher& h, T t) noexcept
{
    detail::reverse_bytes(t);
    h(std::addressof(t), sizeof(t));
}

Although it doesn't look like it in the source code, reverse_bytes is caref=
ully crafted to compile down to x86 instructions such as bswapl, at least o=
n clang/OSX, and so is maximally efficient.

So in summary, when uhash<HashAlgorithm> says:

        hash_append(h, t);

A correct hash_append will be chosen (at compile time), which for scalar ty=
pes that have endian issues, may or may not reverse the bytes of t dependin=
g on what the hash algorithm has requested, and what the platform's native =
endian is.

Most hash functions won't care about the endian of the scalars fed to them,=
 and they can indicate this by requesting the native endian, whatever that =
is:

   static constexpr std::endian endian =3D std::endian::native;

Given an underlying C implementation of a hash algorithm (such as SHA-256),=
 it would be quite easy to write adaptors around that C code with varying e=
ndian requirements, so that different parts of your code could use SHA-256 =
but reverse, or not reverse scalars as required:

class sha256
{
    SHA256_CTX state_;
public:=20
    static constexpr xstd::endian endian =3D xstd::endian::native;
    // ...


class sha256_little
{
    SHA256_CTX state_;
public:=20
    static constexpr xstd::endian endian =3D xstd::endian::little;
    // ...

// ...

uhash<sha256> h1;  // don't worry about endian
uhash<sha256_little> h2;  // ensure scalars are little endian prior to hash=
ing

Finally note that the implementation of hash_append is made simpler by the =
use of the (const void*, size_t) interface, as opposed to a (const unsigned=
 char*, size_t) interface.  With the latter, one would have to code:

template <class Hasher, class T>
inline
std::enable_if_t
<
    is_contiguously_hashable<T, Hasher>{}
>
hash_append(Hasher& h, T const& t) noexcept
{
    h(reinterpret_cast<const unsigned char*>(std::addressof(t)), sizeof(t))=
;
}

See https://github.com/HowardHinnant/hash_append for complete code.

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Myriachan <myriachan@gmail.com>
Date: Sun, 24 Aug 2014 14:01:37 -0700 (PDT)
Raw View
On Sunday, August 24, 2014 10:39:58 AM UTC-7, Howard Hinnant wrote:
> > I agree there is a choice for std::uhash<> when
> > converting a, say, "int" to a sequence of "unsigned char"s whether to
> > perform endian conversion or not, and that influences the cross-platfor=
m
> > reproducibility of the hash when e.g. hashing a sequence of "int"
> > values.

Sadly, C++ still supports non-two's-complement architectures, so what happe=
ns when you hash a negative int?

> This is the only role of the endian specifier, which can be any one of:
>=20
>    static constexpr std::endian endian =3D std::endian::big;
>    static constexpr std::endian endian =3D std::endian::little;
>    static constexpr std::endian endian =3D std::endian::native;
>=20
> std::endian::native will be equal to one of std::endian::little or std::e=
ndian::big.  These enums are nothing more than the C++ version of the alrea=
dy popular macros:  __BYTE_ORDER__, __ORDER_LITTLE_ENDIAN__, __ORDER_BIG_EN=
DIAN__.

What about PDP-endian?=20

> Although it doesn=E2=80=99t look like it in the source code, reverse_byte=
s is carefully crafted to compile down to x86 instructions such as bswapl, =
at least on clang/OSX, and so is maximally efficient.

On x86:

For 16-bit swaps, use "rol reg, 8".  "xchg al, ah" and equivalent work, but=
 may be slower.
For 32-bit swaps, use "bswap reg".
For 64-bit swaps on x86-32, use "bswap reg", but use the opposite registers=
 for results.
For 64-bit swaps on x86-64, use "bswap reg".

On GCC and Clang, the intrinsics for the above are __builtin_bswap16, __bui=
ltin_bswap32 and __builtin_bswap64
On Visual C++, the intrinsics for the above are _byteswap_ushort, _byteswap=
_ulong and _byteswap_uint64 respectively.

> A correct hash_append will be chosen (at compile time), which for scalar
> types that have endian issues, may or may not reverse the bytes of t
> depending on what the hash algorithm has requested, and what the
> platform=E2=80=99s native endian is.

Does this mechanism support handling multiple scalars at once?  On some sys=
tems, byte swapping may be more efficient if does as vector math.

> See https://github.com/HowardHinnant/hash_append for complete code.

Its SHA-256 code is unsuitable for any platform for which "int" is larger t=
han 32 bits.  It will have undefined behavior in those cases due to signed =
integer overflow or shifting left a negative number.

Melissa

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Jens Maurer <Jens.Maurer@gmx.net>
Date: Sun, 24 Aug 2014 23:29:41 +0200
Raw View
On 08/24/2014 04:30 PM, Olaf van der Spek wrote:
> On Sunday, August 24, 2014 8:45:01 AM UTC+2, Jens Maurer wrote:
>
>     On 08/24/2014 03:25 AM, Howard Hinnant wrote:
>     > On Aug 23, 2014, at 4:14 PM, Markus Mayer <lotha...@gmx.de <javascript:>> wrote:
>     >>  hash_function& process_bytes( void const *buffer, std::size_t byte_count);
>
>     I'd like to argue in favor of "unsigned char*", not "void*".
>
>
> (const void*, size_t) is a pretty standard type, why use const unsigned char*?
> It'd disallow easily hashing a string for example.

And it disallows (directly) hashing

  struct S {
     char c;
     int x;
  };

  S s[10] = { };
  my_hash(s, sizeof(s));    // bad: allowed with "void *" proposal

which is a good thing, because all the padding between "c" and "x"
has unspecified value, so your hash is very unpredictable.

Also, simply hashing
   int x[] = { 0, 1, 2, 3 };
   my_hash(x, sizeof(x));

doesn't give you a consistent hash value across different machines,
because the hash operates on bytes, but the endianess of the "int"
(i.e. the mapping to bytes) differs.

> Perhaps we should have (array_view<>) instead?

Well, array_view<unsigned char> is probably equivalent to the
"unsigned char" proposal.

(I'm also weakly in favor of an iterator-style
  const unsigned char * first, const unsigned char * last
interface, because it seems to save one register in some
hash functions compared to the "pointer, length" style,
but it's certainly possible to convert.)

Jens

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Jens Maurer <Jens.Maurer@gmx.net>
Date: Sun, 24 Aug 2014 23:35:48 +0200
Raw View
On 08/24/2014 07:39 PM, Howard Hinnant wrote:
> I wrote:
>
>>> class sha256
>>> {
>>>    SHA256_CTX state_;
>>> public:
>>>    static constexpr std::endian endian = std::endian::big;
>
>
> On Aug 24, 2014, at 2:44 AM, Jens Maurer <Jens.Maurer@gmx.net> wrote:
>
>> I agree there is a choice for std::uhash<> when
>> converting a, say, "int" to a sequence of "unsigned char"s whether to
>> perform endian conversion or not, and that influences the cross-platform
>> reproducibility of the hash when e.g. hashing a sequence of "int"
>> values.
>
> This is the only role of the endian specifier, which can be any one of:
>
>    static constexpr std::endian endian = std::endian::big;
>    static constexpr std::endian endian = std::endian::little;
>    static constexpr std::endian endian = std::endian::native;
>
> std::endian::native will be equal to one of std::endian::little or std::endian::big.  These enums are nothing more than the C++ version of the already popular macros:  __BYTE_ORDER__, __ORDER_LITTLE_ENDIAN__, __ORDER_BIG_ENDIAN__.

We need a better abstraction for this.  C++ makes no assumption on
endianess, and there is certainly more than big and little
(VAX-endian springs to mind).

> And endian is really not even used directly by the hash functor uhash<>:
>
> template <class Hasher = acme::siphash>
> struct uhash
> {
>     using result_type = typename Hasher::result_type;
>
>     template <class T>
>     result_type
>     operator()(T const& t) const noexcept
>     {
>         Hasher h;
>         hash_append(h, t);
>         return static_cast<result_type>(h);
>     }
> };

Uh, is there some incremental functionality (without finalizing)
as well, and a variadic template overload?

Jens

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Jens Maurer <Jens.Maurer@gmx.net>
Date: Sun, 24 Aug 2014 23:38:02 +0200
Raw View
On 08/24/2014 06:41 PM, Markus Mayer wrote:
> I reverted to (const void*, size_t). It is used by boost::crc and
> boost::asio.

That doesn't mean it's the best choice.

> I think a common usage is:
> myHashFunc.process_bytes(&myStruct, sizeof(myStruct));

which is totally bogus, because it hashes all the padding inside
the struct, which likely has unspecified values.

Jens

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Sun, 24 Aug 2014 19:21:17 -0400
Raw View
On Aug 24, 2014, at 5:01 PM, Myriachan <myriachan@gmail.com> wrote:

> On Sunday, August 24, 2014 10:39:58 AM UTC-7, Howard Hinnant wrote:
>>> I agree there is a choice for std::uhash<> when
>>> converting a, say, "int" to a sequence of "unsigned char"s whether to
>>> perform endian conversion or not, and that influences the cross-platfor=
m
>>> reproducibility of the hash when e.g. hashing a sequence of "int"
>>> values.
>=20
> Sadly, C++ still supports non-two's-complement architectures, so what hap=
pens when you hash a negative int?

I presume the int's bytes would be hashed just like any other scalar.

>=20
>> This is the only role of the endian specifier, which can be any one of:
>>=20
>>   static constexpr std::endian endian =3D std::endian::big;
>>   static constexpr std::endian endian =3D std::endian::little;
>>   static constexpr std::endian endian =3D std::endian::native;
>>=20
>> std::endian::native will be equal to one of std::endian::little or std::=
endian::big.  These enums are nothing more than the C++ version of the alre=
ady popular macros:  __BYTE_ORDER__, __ORDER_LITTLE_ENDIAN__, __ORDER_BIG_E=
NDIAN__.
>=20
> What about PDP-endian?

I stand corrected.  On PDP-endian std::endian::native has a value that is n=
ot equal to either of std::endian::little or std::endian::big.  If the Hash=
Algorithm specifies little or big endian, the implementation is required to=
 map the bytes of the int (or whatever) to little or big endian prior to fe=
eding it to the HashAlgorithm.  For std::endian::big, this is the exact sam=
e thing as inserting a call to hton() prior to the HashAlgorithm.

> By the late 1990s, not only DEC but most of the New England computer indu=
stry which had been built around minicomputers similar to the PDP-11 collap=
sed in the face of microcomputer-based workstations and servers.

>> A correct hash_append will be chosen (at compile time), which for scalar
>> types that have endian issues, may or may not reverse the bytes of t
>> depending on what the hash algorithm has requested, and what the
>> platform's native endian is.
>=20
> Does this mechanism support handling multiple scalars at once?  On some s=
ystems, byte swapping may be more efficient if does as vector math.

An implementation could conceivably create a hash_append overload on arrays=
 of int and optimize that.

>=20
>> See https://github.com/HowardHinnant/hash_append for complete code.
>=20
> Its SHA-256 code is unsuitable for any platform for which "int" is larger=
 than 32 bits.  It will have undefined behavior in those cases due to signe=
d integer overflow or shifting left a negative number.

Thanks, I'll put in a static_assert.

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Sun, 24 Aug 2014 19:21:42 -0400
Raw View
On Aug 24, 2014, at 5:35 PM, Jens Maurer <Jens.Maurer@gmx.net> wrote:

> On 08/24/2014 07:39 PM, Howard Hinnant wrote:
>> I wrote:
>>=20
>>>> class sha256
>>>> {
>>>>   SHA256_CTX state_;
>>>> public:=20
>>>>   static constexpr std::endian endian =3D std::endian::big;
>>=20
>>=20
>> On Aug 24, 2014, at 2:44 AM, Jens Maurer <Jens.Maurer@gmx.net> wrote:
>>=20
>>> I agree there is a choice for std::uhash<> when
>>> converting a, say, "int" to a sequence of "unsigned char"s whether to
>>> perform endian conversion or not, and that influences the cross-platfor=
m
>>> reproducibility of the hash when e.g. hashing a sequence of "int"
>>> values.
>>=20
>> This is the only role of the endian specifier, which can be any one of:
>>=20
>>   static constexpr std::endian endian =3D std::endian::big;
>>   static constexpr std::endian endian =3D std::endian::little;
>>   static constexpr std::endian endian =3D std::endian::native;
>>=20
>> std::endian::native will be equal to one of std::endian::little or std::=
endian::big.  These enums are nothing more than the C++ version of the alre=
ady popular macros:  __BYTE_ORDER__, __ORDER_LITTLE_ENDIAN__, __ORDER_BIG_E=
NDIAN__.
>=20
> We need a better abstraction for this.  C++ makes no assumption on
> endianess, and there is certainly more than big and little
> (VAX-endian springs to mind).

I have a hard time getting excited about hardware that you can't find outsi=
de of museum.  I do have fond memories of learning the VAX operating system=
, but that was a very long time ago.  However on such a machine std::endian=
::native has a value that is not equal to either of std::endian::little or =
std::endian::big.  If the HashAlgorithm specifies little or big endian, the=
 implementation is required to map the bytes of the int (or whatever) to li=
ttle or big endian prior to feeding it to the HashAlgorithm.  For std::endi=
an::big, this is the exact same thing as inserting a call to hton() prior t=
o the HashAlgorithm.

>=20
>> And endian is really not even used directly by the hash functor uhash<>:
>>=20
>> template <class Hasher =3D acme::siphash>
>> struct uhash
>> {
>>    using result_type =3D typename Hasher::result_type;
>>=20
>>    template <class T>
>>    result_type
>>    operator()(T const& t) const noexcept
>>    {
>>        Hasher h;
>>        hash_append(h, t);
>>        return static_cast<result_type>(h);
>>    }
>> };
>=20
> Uh, is there some incremental functionality (without finalizing)
> as well, and a variadic template overload?

Yes to both, for example:

struct S
{
    char c;
    int x;
};

template <class HashAlgorithm>
void
hash_append(HashAlgorithm& h, const S& s)
{
    using std::hash_append;
    hash_append(h, s.c, s.x);
}

One can not hash S without writing this hash_append.  And once written, S c=
an be hashed with any hash algorithm conforming to HashAlgorithm.  I've dem=
onstrated FNV-1A, Jenkins1, Spooky, Murmur2, SipHash, and SHA-256, none of =
which require any changes to hash_append(HashAlgorithm& h, const S& s).  To=
 choose an algorithm, one simply specifies it as the template parameter to =
uhash at the point of use:

    std::unordered_set<S, uhash<fnv1a>> s;

Within hash_append, the HashAlgorithm is being updated incrementally, first=
 with the char c, then with the int x (not both at once, because of the pad=
ding issues you have referred to).  That is, one could have written hash_ap=
pend for S as:

template <class HashAlgorithm>
void
hash_append(HashAlgorithm& h, const S& s)
{
    using std::hash_append;
    hash_append(h, s.c);
    hash_append(h, s.x);
}

These are exercising the incremental hash functionality.

This is composable.  For example consider another type Y that has S as a me=
mber:

struct Y
{
    S s;
    std::string name;
};

hash_append for this could be written:

template <class HashAlgorithm>
void
hash_append(HashAlgorithm& h, const Y& y)
{
    using std::hash_append;
    hash_append(h, y.s, y.name);
}

And Y can then be hashed:

    std::unordered_set<Y, uhash<fnv1a>> y;

or:

    std::cout << uhash<siphash>{}(y) << '\n';

Y doesn't have to know how to hash S, nor std::string.  And Y doesn't have =
to use anything like hash_combine on the results of hashing S or std::strin=
g.  This is because *all* of the hashing is happening incrementally, with o=
nly the hash functor (uhash in this case) doing the hash algorithm initiali=
zation and finalization.  And uhash is only used at the top level of a data=
 structure when you actually need a hash code.

Individual types don't really know how to create a hash code.  They only kn=
ow how to present themselves to a generic hash algorithm so as to increment=
ally append their state to the state of the hash algorithm.  Types Don't Kn=
ow #.

Hash functors initialize a generic hash algorithm, ask a type to append its=
 state to the hash algorithm (which recursively asks its bases and members =
to append their state to the generic hash algorithm), and finalizes the gen=
eric hash algorithm.

Top-level clients, such as unordered containers, combine a hash functor tem=
plate with a specific hash algorithm, to create a hash functor that will ha=
sh any type for which hash_append has been implemented, using that specific=
 hash algorithm as if all of the discontinuous bits of memory had been rear=
ranged into one contiguous chunk of memory.

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Myriachan <myriachan@gmail.com>
Date: Mon, 25 Aug 2014 08:16:40 -0700 (PDT)
Raw View
On Sunday, August 24, 2014 4:22:35 PM UTC-7, Howard Hinnant wrote:
> I have a hard time getting excited about hardware that you can=E2=80=99t =
find
> outside of museum.  I do have fond memories of learning the VAX
> operating system, but that was a very long time ago.  However on
> such a machine std::endian::native has a value that is not equal to
> either of std::endian::little or std::endian::big.  If the HashAlgorithm
> specifies little or big endian, the implementation is required to map the
> bytes of the int (or whatever) to little or big endian prior to feeding i=
t to
> the HashAlgorithm.  For std::endian::big, this is the exact same thing
> as inserting a call to hton() prior to the HashAlgorithm.

Don't you mean std::endian::little, unless the hash algorithm is, say, MD5,=
 or the little-endian version of one of the SHA series?  (The latter of whi=
ch would not make sense in the Standard since they're non-standard, and the=
 former is deprecated for security reasons.)

Also, I would love to say =C2=A1adi=C3=B3s! to such architectures as well, =
because it means signed integer overflow being undefined would have a chanc=
e in Hell of being removed instead of the current no.  Only overzealous com=
piler optimizers would be in the way, rather than architectures that'd be l=
eft behind. >.<

Does your design easily templatize into constructions like HMAC?

Melissa

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Olaf van der Spek <olafvdspek@gmail.com>
Date: Mon, 25 Aug 2014 17:21:23 +0200
Raw View
On Sun, Aug 24, 2014 at 11:29 PM, Jens Maurer <Jens.Maurer@gmx.net> wrote:
>> (const void*, size_t) is a pretty standard type, why use const unsigned char*?
>> It'd disallow easily hashing a string for example.
>
> And it disallows (directly) hashing ...
> which is a good thing, because all the padding between "c" and "x"
> has unspecified value, so your hash is very unpredictable.

True

> Also, simply hashing
>    int x[] = { 0, 1, 2, 3 };
>    my_hash(x, sizeof(x));
>
> doesn't give you a consistent hash value across different machines,
> because the hash operates on bytes, but the endianess of the "int"
> (i.e. the mapping to bytes) differs.
>
>> Perhaps we should have (array_view<>) instead?
>
> Well, array_view<unsigned char> is probably equivalent to the
> "unsigned char" proposal.

Kinda. Shouldn't both array_view<> and string_view<> be supported?

--
Olaf

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Mon, 25 Aug 2014 11:45:57 -0400
Raw View
On Aug 25, 2014, at 11:16 AM, Myriachan <myriachan@gmail.com> wrote:

> On Sunday, August 24, 2014 4:22:35 PM UTC-7, Howard Hinnant wrote:
>> I have a hard time getting excited about hardware that you can't find
>> outside of museum.  I do have fond memories of learning the VAX
>> operating system, but that was a very long time ago.  However on
>> such a machine std::endian::native has a value that is not equal to
>> either of std::endian::little or std::endian::big.  If the HashAlgorithm
>> specifies little or big endian, the implementation is required to map th=
e
>> bytes of the int (or whatever) to little or big endian prior to feeding =
it to
>> the HashAlgorithm.  For std::endian::big, this is the exact same thing
>> as inserting a call to hton() prior to the HashAlgorithm.
>=20
> Don't you mean std::endian::little, unless the hash algorithm is, say, MD=
5, or the little-endian version of one of the SHA series?  (The latter of w=
hich would not make sense in the Standard since they're non-standard, and t=
he former is deprecated for security reasons.)

The intended semantics is:

struct MyHashAlgorithm
{
    static constexpr std::endian endian =3D std::endian::big;
    // ...
};

means:  Attention hash_append function: Prior to feeding MyHashAlgorithm by=
tes from scalar types, map them (from native) into big endian.  So if two p=
latforms had scalars with identical layout except for endian, the two platf=
orms could generate identical hash codes for identical scalar input.

>=20
> Also, I would love to say =A1adi=F3s! to such architectures as well, beca=
use it means signed integer overflow being undefined would have a chance in=
 Hell of being removed instead of the current no.  Only overzealous compile=
r optimizers would be in the way, rather than architectures that'd be left =
behind. >.<
>=20
> Does your design easily templatize into constructions like HMAC?

I am unsure.

One can easily build hash functors that can be seeded, for example:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#seeding

I.e. you choose a seed (key?) and use that to initialize the HashAlgorithm.=
  And then the hash functor runs the update and finalization stages in the =
same way as previously shown.

One could also prepend a message, by updating the HashAlgorithm with a key =
prior to updating it with the message.  Or one could append a message by up=
dating the HashAlgorithm just prior to running the finalization stage.

But I am unsure if the abilities of seeding, prepending and appending are s=
ufficient to generate a HMAC.

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: "'Jeffrey Yasskin' via ISO C++ Standard - Future Proposals" <std-proposals@isocpp.org>
Date: Mon, 25 Aug 2014 11:01:42 -0700
Raw View
On Mon, Aug 25, 2014 at 8:45 AM, Howard Hinnant
<howard.hinnant@gmail.com> wrote:
> On Aug 25, 2014, at 11:16 AM, Myriachan <myriachan@gmail.com> wrote:
>
>> On Sunday, August 24, 2014 4:22:35 PM UTC-7, Howard Hinnant wrote:
>>> I have a hard time getting excited about hardware that you can't find
>>> outside of museum.  I do have fond memories of learning the VAX
>>> operating system, but that was a very long time ago.  However on
>>> such a machine std::endian::native has a value that is not equal to
>>> either of std::endian::little or std::endian::big.  If the HashAlgorith=
m
>>> specifies little or big endian, the implementation is required to map t=
he
>>> bytes of the int (or whatever) to little or big endian prior to feeding=
 it to
>>> the HashAlgorithm.  For std::endian::big, this is the exact same thing
>>> as inserting a call to hton() prior to the HashAlgorithm.
>>
>> Don't you mean std::endian::little, unless the hash algorithm is, say, M=
D5, or the little-endian version of one of the SHA series?  (The latter of =
which would not make sense in the Standard since they're non-standard, and =
the former is deprecated for security reasons.)
>
> The intended semantics is:
>
> struct MyHashAlgorithm
> {
>     static constexpr std::endian endian =3D std::endian::big;
>     // ...
> };
>
> means:  Attention hash_append function: Prior to feeding MyHashAlgorithm =
bytes from scalar types, map them (from native) into big endian.  So if two=
 platforms had scalars with identical layout except for endian, the two pla=
tforms could generate identical hash codes for identical scalar input.
>
>>
>> Also, I would love to say =A1adi=F3s! to such architectures as well, bec=
ause it means signed integer overflow being undefined would have a chance i=
n Hell of being removed instead of the current no.  Only overzealous compil=
er optimizers would be in the way, rather than architectures that'd be left=
 behind. >.<
>>
>> Does your design easily templatize into constructions like HMAC?
>
> I am unsure.
>
> One can easily build hash functors that can be seeded, for example:
>
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#seedin=
g
>
> I.e. you choose a seed (key?) and use that to initialize the HashAlgorith=
m.  And then the hash functor runs the update and finalization stages in th=
e same way as previously shown.
>
> One could also prepend a message, by updating the HashAlgorithm with a ke=
y prior to updating it with the message.  Or one could append a message by =
updating the HashAlgorithm just prior to running the finalization stage.
>
> But I am unsure if the abilities of seeding, prepending and appending are=
 sufficient to generate a HMAC.

I think you need a "block size" accessor on cryptographic hashers in
order to write the HMAC<> template, since HMAC involves padding the
key to the block size. You could write an HMAC_SHA256 without that by
hard-coding the block size.

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Jens Maurer <Jens.Maurer@gmx.net>
Date: Mon, 25 Aug 2014 22:40:05 +0200
Raw View
On 08/25/2014 05:21 PM, Olaf van der Spek wrote:
> On Sun, Aug 24, 2014 at 11:29 PM, Jens Maurer <Jens.Maurer@gmx.net> wrote:
>> Also, simply hashing
>>    int x[] = { 0, 1, 2, 3 };
>>    my_hash(x, sizeof(x));
>>
>> doesn't give you a consistent hash value across different machines,
>> because the hash operates on bytes, but the endianess of the "int"
>> (i.e. the mapping to bytes) differs.
>>
>>> Perhaps we should have (array_view<>) instead?
>>
>> Well, array_view<unsigned char> is probably equivalent to the
>> "unsigned char" proposal.
>
> Kinda. Shouldn't both array_view<> and string_view<> be supported?

The mapping of application-level data (structs, strings etc.)
should be left to an adaptation layer, independent of the actual
hash algorithm.  Howard's proposal has this part right.

Jens


--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Myriachan <myriachan@gmail.com>
Date: Mon, 25 Aug 2014 14:53:06 -0700 (PDT)
Raw View
------=_Part_5033_1851215800.1409003586698
Content-Type: text/plain; charset=UTF-8

On Monday, August 25, 2014 11:02:04 AM UTC-7, Jeffrey Yasskin wrote:
>
> On Mon, Aug 25, 2014 at 8:45 AM, Howard Hinnant
> > I.e. you choose a seed (key?) and use that to initialize the
> HashAlgorithm.  And then the hash functor runs the update and finalization
> stages in the same way as previously shown.
> >
> > One could also prepend a message, by updating the HashAlgorithm with a
> key prior to updating it with the message.  Or one could append a message
> by updating the HashAlgorithm just prior to running the finalization stage.
> >
> > But I am unsure if the abilities of seeding, prepending and appending
> are sufficient to generate a HMAC.
>
> I think you need a "block size" accessor on cryptographic hashers in
> order to write the HMAC<> template, since HMAC involves padding the
> key to the block size. You could write an HMAC_SHA256 without that by
> hard-coding the block size.
>

Yes; it seems like block_size would be a useful property to expose.  Maybe
a has_block_size as well, so that it's not necessary to do the whole member
detection mess just to check.  Or, perhaps, the block_size static constexpr
member could be mandatory, with the rule that it be 1 if no other value
makes sense for a given hash function.

So is this system going to be defined as only using the low 8 bits of each
unsigned char, and only for bytewise hash functions (or bitwise hash
functions defined in a certain bit order as multiples of 8)?

Melissa

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

------=_Part_5033_1851215800.1409003586698
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Monday, August 25, 2014 11:02:04 AM UTC-7, Jeffrey Yass=
kin wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left:=
 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">On Mon, Aug 25, 2014=
 at 8:45 AM, Howard Hinnant
<br>&gt; I.e. you choose a seed (key?) and use that to initialize the HashA=
lgorithm. &nbsp;And then the hash functor runs the update and finalization =
stages in the same way as previously shown.
<br>&gt;
<br>&gt; One could also prepend a message, by updating the HashAlgorithm wi=
th a key prior to updating it with the message. &nbsp;Or one could append a=
 message by updating the HashAlgorithm just prior to running the finalizati=
on stage.
<br>&gt;
<br>&gt; But I am unsure if the abilities of seeding, prepending and append=
ing are sufficient to generate a HMAC.
<br>
<br>I think you need a "block size" accessor on cryptographic hashers in
<br>order to write the HMAC&lt;&gt; template, since HMAC involves padding t=
he
<br>key to the block size. You could write an HMAC_SHA256 without that by
<br>hard-coding the block size.
<br></blockquote><div><br>Yes; it seems like block_size would be a useful p=
roperty to expose.&nbsp; Maybe a has_block_size as well, so that it's not n=
ecessary to do the whole member detection mess just to check.&nbsp; Or, per=
haps, the block_size static constexpr member could be mandatory, with the r=
ule that it be 1 if no other value makes sense for a given hash function.<b=
r><br>So is this system going to be defined as only using the low 8 bits of=
 each unsigned char, and only for bytewise hash functions (or bitwise hash =
functions defined in a certain bit order as multiples of 8)?<br><br>Melissa=
<br></div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

------=_Part_5033_1851215800.1409003586698--

.


Author: "'Geoffrey Romer' via ISO C++ Standard - Future Proposals" <std-proposals@isocpp.org>
Date: Tue, 26 Aug 2014 11:26:15 -0700
Raw View
--001a11c2cc185efdda05018c70ec
Content-Type: text/plain; charset=UTF-8

On Sun, Aug 24, 2014 at 4:21 PM, Howard Hinnant <howard.hinnant@gmail.com>
wrote:

>
> On Aug 24, 2014, at 5:01 PM, Myriachan <myriachan@gmail.com> wrote:
>
> > On Sunday, August 24, 2014 10:39:58 AM UTC-7, Howard Hinnant wrote:
> >>> I agree there is a choice for std::uhash<> when
> >>> converting a, say, "int" to a sequence of "unsigned char"s whether to
> >>> perform endian conversion or not, and that influences the
> cross-platform
> >>> reproducibility of the hash when e.g. hashing a sequence of "int"
> >>> values.
> >
> > Sadly, C++ still supports non-two's-complement architectures, so what
> happens when you hash a negative int?
>
> I presume the int's bytes would be hashed just like any other scalar.
>

OK, so the output of this hasher is implementation-defined, and not
suitable for use cases that require cross-binary reproducibility. I think
that's the only reasonable choice, but then why bother with the whole
endian rigamarole?


>
> >
> >> This is the only role of the endian specifier, which can be any one of:
> >>
> >>   static constexpr std::endian endian = std::endian::big;
> >>   static constexpr std::endian endian = std::endian::little;
> >>   static constexpr std::endian endian = std::endian::native;
> >>
> >> std::endian::native will be equal to one of std::endian::little or
> std::endian::big.  These enums are nothing more than the C++ version of the
> already popular macros:  __BYTE_ORDER__, __ORDER_LITTLE_ENDIAN__,
> __ORDER_BIG_ENDIAN__.
> >
> > What about PDP-endian?
>
> I stand corrected.  On PDP-endian std::endian::native has a value that is
> not equal to either of std::endian::little or std::endian::big.  If the
> HashAlgorithm specifies little or big endian, the implementation is
> required to map the bytes of the int (or whatever) to little or big endian
> prior to feeding it to the HashAlgorithm.  For std::endian::big, this is
> the exact same thing as inserting a call to hton() prior to the
> HashAlgorithm.
>
> > By the late 1990s, not only DEC but most of the New England computer
> industry which had been built around minicomputers similar to the PDP-11
> collapsed in the face of microcomputer-based workstations and servers.
>
> >> A correct hash_append will be chosen (at compile time), which for scalar
> >> types that have endian issues, may or may not reverse the bytes of t
> >> depending on what the hash algorithm has requested, and what the
> >> platform's native endian is.
> >
> > Does this mechanism support handling multiple scalars at once?  On some
> systems, byte swapping may be more efficient if does as vector math.
>
> An implementation could conceivably create a hash_append overload on
> arrays of int and optimize that.
>
> >
> >> See https://github.com/HowardHinnant/hash_append for complete code.
> >
> > Its SHA-256 code is unsuitable for any platform for which "int" is
> larger than 32 bits.  It will have undefined behavior in those cases due to
> signed integer overflow or shifting left a negative number.
>
> Thanks, I'll put in a static_assert.
>
> Howard
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> Visit this group at
> http://groups.google.com/a/isocpp.org/group/std-proposals/.
>

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

--001a11c2cc185efdda05018c70ec
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">=
On Sun, Aug 24, 2014 at 4:21 PM, Howard Hinnant <span dir=3D"ltr">&lt;<a hr=
ef=3D"mailto:howard.hinnant@gmail.com" target=3D"_blank">howard.hinnant@gma=
il.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div class=3D""><br>
On Aug 24, 2014, at 5:01 PM, Myriachan &lt;<a href=3D"mailto:myriachan@gmai=
l.com">myriachan@gmail.com</a>&gt; wrote:<br>
<br>
&gt; On Sunday, August 24, 2014 10:39:58 AM UTC-7, Howard Hinnant wrote:<br=
>
&gt;&gt;&gt; I agree there is a choice for std::uhash&lt;&gt; when<br>
&gt;&gt;&gt; converting a, say, &quot;int&quot; to a sequence of &quot;unsi=
gned char&quot;s whether to<br>
&gt;&gt;&gt; perform endian conversion or not, and that influences the cros=
s-platform<br>
&gt;&gt;&gt; reproducibility of the hash when e.g. hashing a sequence of &q=
uot;int&quot;<br>
&gt;&gt;&gt; values.<br>
&gt;<br>
&gt; Sadly, C++ still supports non-two&#39;s-complement architectures, so w=
hat happens when you hash a negative int?<br>
<br>
</div>I presume the int&#39;s bytes would be hashed just like any other sca=
lar.<br></blockquote><div><br></div><div>OK, so the output of this hasher i=
s implementation-defined, and not suitable for use cases that require cross=
-binary reproducibility. I think that&#39;s the only reasonable choice, but=
 then why bother with the whole endian rigamarole?</div>
<div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8=
ex;border-left:1px #ccc solid;padding-left:1ex">
<div class=3D""><br>
&gt;<br>
&gt;&gt; This is the only role of the endian specifier, which can be any on=
e of:<br>
&gt;&gt;<br>
&gt;&gt;=C2=A0 =C2=A0static constexpr std::endian endian =3D std::endian::b=
ig;<br>
&gt;&gt;=C2=A0 =C2=A0static constexpr std::endian endian =3D std::endian::l=
ittle;<br>
&gt;&gt;=C2=A0 =C2=A0static constexpr std::endian endian =3D std::endian::n=
ative;<br>
&gt;&gt;<br>
&gt;&gt; std::endian::native will be equal to one of std::endian::little or=
 std::endian::big.=C2=A0 These enums are nothing more than the C++ version =
of the already popular macros:=C2=A0 __BYTE_ORDER__, __ORDER_LITTLE_ENDIAN_=
_, __ORDER_BIG_ENDIAN__.<br>

&gt;<br>
&gt; What about PDP-endian?<br>
<br>
</div>I stand corrected.=C2=A0 On PDP-endian std::endian::native has a valu=
e that is not equal to either of std::endian::little or std::endian::big.=
=C2=A0 If the HashAlgorithm specifies little or big endian, the implementat=
ion is required to map the bytes of the int (or whatever) to little or big =
endian prior to feeding it to the HashAlgorithm.=C2=A0 For std::endian::big=
, this is the exact same thing as inserting a call to hton() prior to the H=
ashAlgorithm.<br>

<br>
&gt; By the late 1990s, not only DEC but most of the New England computer i=
ndustry which had been built around minicomputers similar to the PDP-11 col=
lapsed in the face of microcomputer-based workstations and servers.<br>

<div class=3D""><br>
&gt;&gt; A correct hash_append will be chosen (at compile time), which for =
scalar<br>
&gt;&gt; types that have endian issues, may or may not reverse the bytes of=
 t<br>
&gt;&gt; depending on what the hash algorithm has requested, and what the<b=
r>
&gt;&gt; platform&#39;s native endian is.<br>
&gt;<br>
&gt; Does this mechanism support handling multiple scalars at once?=C2=A0 O=
n some systems, byte swapping may be more efficient if does as vector math.=
<br>
<br>
</div>An implementation could conceivably create a hash_append overload on =
arrays of int and optimize that.<br>
<div class=3D""><br>
&gt;<br>
&gt;&gt; See <a href=3D"https://github.com/HowardHinnant/hash_append" targe=
t=3D"_blank">https://github.com/HowardHinnant/hash_append</a> for complete =
code.<br>
&gt;<br>
&gt; Its SHA-256 code is unsuitable for any platform for which &quot;int&qu=
ot; is larger than 32 bits.=C2=A0 It will have undefined behavior in those =
cases due to signed integer overflow or shifting left a negative number.<br=
>

<br>
</div>Thanks, I&#39;ll put in a static_assert.<br>
<span class=3D"HOEnZb"><font color=3D"#888888"><br>
Howard<br>
</font></span><div class=3D"HOEnZb"><div class=3D"h5"><br>
--<br>
<br>
---<br>
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br>
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals%2Bunsubscribe@isocpp.org">std-propo=
sals+unsubscribe@isocpp.org</a>.<br>
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br>
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/" target=3D"_blank">http://groups.google.com/a/isocpp.org/gro=
up/std-proposals/</a>.<br>
</div></div></blockquote></div><br></div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

--001a11c2cc185efdda05018c70ec--

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Tue, 26 Aug 2014 16:02:38 -0400
Raw View
On Aug 26, 2014, at 2:26 PM, 'Geoffrey Romer' via ISO C++ Standard - Future=
 Proposals <std-proposals@isocpp.org> wrote:

> OK, so the output of this hasher is implementation-defined, and not suita=
ble for use cases that require cross-binary reproducibility. I think that's=
 the only reasonable choice, but then why bother with the whole endian riga=
marole?

Because 98% of the platforms that actually have a modern C++ compiler have =
a pretty standard scalar layout that differs only by endian (or word size).=
  That is probably to be useful to somebody.  No need to take that function=
ality away just because it wouldn't have worked on a platform that stopped =
being sold two decades ago.

Howard




--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: "'Geoffrey Romer' via ISO C++ Standard - Future Proposals" <std-proposals@isocpp.org>
Date: Tue, 26 Aug 2014 14:28:17 -0700
Raw View
--001a11c1271e577bce05018efb34
Content-Type: text/plain; charset=UTF-8

On Tue, Aug 26, 2014 at 1:02 PM, Howard Hinnant <howard.hinnant@gmail.com>
wrote:

> On Aug 26, 2014, at 2:26 PM, 'Geoffrey Romer' via ISO C++ Standard -
> Future Proposals <std-proposals@isocpp.org> wrote:
>
> > OK, so the output of this hasher is implementation-defined, and not
> suitable for use cases that require cross-binary reproducibility. I think
> that's the only reasonable choice, but then why bother with the whole
> endian rigamarole?
>
> Because 98% of the platforms that actually have a modern C++ compiler have
> a pretty standard scalar layout that differs only by endian (or word
> size).


Good point about word size, but I don't see where your proposal handles it.


> That is probably to be useful to somebody.  No need to take that
> functionality away just because it wouldn't have worked on a platform that
> stopped being sold two decades ago.
>

OK, let's assume that you're right, and in practice everyone will use the
same byte representation for int (up to endianness and maybe word size).
Can you say the same for an arbitrary type T? In other words, when I define
a hash_append overload, am I required to guarantee that the byte
representation it produces will not vary across platforms, or over time?

You are no doubt right that this functionality will "probably be useful to
somebody", but that's a fairly low bar to clear. Can you point to a
specific use case that would benefit from a generic fingerprinting (as
opposed to hashing) API?


>
> Howard
>
>
>
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> Visit this group at
> http://groups.google.com/a/isocpp.org/group/std-proposals/.
>

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

--001a11c1271e577bce05018efb34
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">=
On Tue, Aug 26, 2014 at 1:02 PM, Howard Hinnant <span dir=3D"ltr">&lt;<a hr=
ef=3D"mailto:howard.hinnant@gmail.com" target=3D"_blank">howard.hinnant@gma=
il.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div class=3D"">On Aug 26, 2014, at 2:26 PM,=
 &#39;Geoffrey Romer&#39; via ISO C++ Standard - Future Proposals &lt;<a hr=
ef=3D"mailto:std-proposals@isocpp.org">std-proposals@isocpp.org</a>&gt; wro=
te:<br>

<br>
&gt; OK, so the output of this hasher is implementation-defined, and not su=
itable for use cases that require cross-binary reproducibility. I think tha=
t&#39;s the only reasonable choice, but then why bother with the whole endi=
an rigamarole?<br>

<br>
</div>Because 98% of the platforms that actually have a modern C++ compiler=
 have a pretty standard scalar layout that differs only by endian (or word =
size).=C2=A0 </blockquote><div><br></div><div>Good point about word size, b=
ut I don&#39;t see where your proposal handles it.</div>
<div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8=
ex;border-left:1px #ccc solid;padding-left:1ex">That is probably to be usef=
ul to somebody.=C2=A0 No need to take that functionality away just because =
it wouldn&#39;t have worked on a platform that stopped being sold two decad=
es ago.<br>
</blockquote><div><br></div><div>OK, let&#39;s assume that you&#39;re right=
, and in practice everyone will use the same byte representation for int (u=
p to endianness and maybe word size). Can you say the same for an arbitrary=
 type T? In other words, when I define a hash_append overload, am I require=
d to guarantee that the byte representation it produces will not vary acros=
s platforms, or over time?</div>
<div><br></div><div>You are no doubt right that this functionality will &qu=
ot;probably be useful to somebody&quot;, but that&#39;s a fairly low bar to=
 clear. Can you point to a specific use case that would benefit from a gene=
ric fingerprinting (as opposed to hashing) API?</div>
<div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8=
ex;border-left:1px #ccc solid;padding-left:1ex">
<div class=3D"HOEnZb"><div class=3D"h5"><br>
Howard<br>
<br>
<br>
<br>
<br>
--<br>
<br>
---<br>
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br>
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals%2Bunsubscribe@isocpp.org">std-propo=
sals+unsubscribe@isocpp.org</a>.<br>
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br>
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/" target=3D"_blank">http://groups.google.com/a/isocpp.org/gro=
up/std-proposals/</a>.<br>
</div></div></blockquote></div><br></div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

--001a11c1271e577bce05018efb34--

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Tue, 26 Aug 2014 18:13:40 -0400
Raw View
On Aug 26, 2014, at 5:28 PM, 'Geoffrey Romer' via ISO C++ Standard - Future=
 Proposals <std-proposals@isocpp.org> wrote:

>=20
>> On Tue, Aug 26, 2014 at 1:02 PM, Howard Hinnant <howard.hinnant@gmail.co=
m> wrote:
>> On Aug 26, 2014, at 2:26 PM, 'Geoffrey Romer' via ISO C++ Standard - Fut=
ure Proposals <std-proposals@isocpp.org> wrote:
>>=20
>> > OK, so the output of this hasher is implementation-defined, and not su=
itable for use cases that require cross-binary reproducibility. I think tha=
t's the only reasonable choice, but then why bother with the whole endian r=
igamarole?
>>=20
>> Because 98% of the platforms that actually have a modern C++ compiler ha=
ve a pretty standard scalar layout that differs only by endian (or word siz=
e).=20
>>=20
> Good point about word size, but I don't see where your proposal handles i=
t.

One can hash a std::int32_t (for example) if you want to specify integral s=
ize.  Technically, std::int32_t is optional in the standard.  But lots of p=
latforms have a std::int32_t, and the client doesn't have to care if that i=
s a short, int or long.
=20
>> That is probably to be useful to somebody.  No need to take that functio=
nality away just because it wouldn't have worked on a platform that stopped=
 being sold two decades ago.
>>=20
> OK, let's assume that you're right, and in practice everyone will use the=
 same byte representation for int (up to endianness and maybe word size). C=
an you say the same for an arbitrary type T? In other words, when I define =
a hash_append overload, am I required to guarantee that the byte representa=
tion it produces will not vary across platforms, or over time?

I missed the smiley on the end of that sentence.  You're kidding me, right?=
!

If we restrict T to the set of scalars.  And if we say that hash_append is =
part of the std::lib.  Then the std::lib implementors will be required to p=
rovide a hash_append as the standard specifies for all scalars.  Hopefully =
for many std-defined types as well.  That's all a standard could possibly d=
o.

> You are no doubt right that this functionality will "probably be useful t=
o somebody", but that's a fairly low bar to clear. Can you point to a speci=
fic use case that would benefit from a generic fingerprinting (as opposed t=
o hashing) API?

I can well imagine that an application that implements a secure financial P=
2P protocol consisting of a history of public ledgers might use SHA-256 as =
a means of securing the integrity of those ledgers.  And I can imagine that=
 it might be desirable to have that application run on platforms of differi=
ng endian.  Say PPC and x86.  When a 64 byte integral type is part of a mes=
sage that gets hashed by the SHA-256 hash algorithm, it will be important t=
hat when this is computed by both the PPC machine and the x86 machine, that=
 they get the same result.  Otherwise the two machines will not be able to =
agree upon the contents of the public ledger.  Think bitcoin.  Or perhaps r=
ipple:  https://ripple.com.

Such applications will also have to limit their porting to that tiny slice =
of platforms that use ASCII/Unicode as well.  There will be no porting to E=
BCDIC-based platforms. :-)

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: "'Geoffrey Romer' via ISO C++ Standard - Future Proposals" <std-proposals@isocpp.org>
Date: Wed, 27 Aug 2014 08:30:09 -0700
Raw View
--001a1134497a6daa2d05019e18f0
Content-Type: text/plain; charset=UTF-8

On Tue, Aug 26, 2014 at 3:13 PM, Howard Hinnant <howard.hinnant@gmail.com>
wrote:

>
> On Aug 26, 2014, at 5:28 PM, 'Geoffrey Romer' via ISO C++ Standard -
> Future Proposals <std-proposals@isocpp.org> wrote:
>
> >
> >> On Tue, Aug 26, 2014 at 1:02 PM, Howard Hinnant <
> howard.hinnant@gmail.com> wrote:
> >> On Aug 26, 2014, at 2:26 PM, 'Geoffrey Romer' via ISO C++ Standard -
> Future Proposals <std-proposals@isocpp.org> wrote:
> >>
> >> > OK, so the output of this hasher is implementation-defined, and not
> suitable for use cases that require cross-binary reproducibility. I think
> that's the only reasonable choice, but then why bother with the whole
> endian rigamarole?
> >>
> >> Because 98% of the platforms that actually have a modern C++ compiler
> have a pretty standard scalar layout that differs only by endian (or word
> size).
> >>
> > Good point about word size, but I don't see where your proposal handles
> it.
>
> One can hash a std::int32_t (for example) if you want to specify integral
> size.  Technically, std::int32_t is optional in the standard.  But lots of
> platforms have a std::int32_t, and the client doesn't have to care if that
> is a short, int or long.
>
> >> That is probably to be useful to somebody.  No need to take that
> functionality away just because it wouldn't have worked on a platform that
> stopped being sold two decades ago.
> >>
> > OK, let's assume that you're right, and in practice everyone will use
> the same byte representation for int (up to endianness and maybe word
> size). Can you say the same for an arbitrary type T? In other words, when I
> define a hash_append overload, am I required to guarantee that the byte
> representation it produces will not vary across platforms, or over time?
>
> I missed the smiley on the end of that sentence.  You're kidding me,
> right?!
>
> If we restrict T to the set of scalars.  And if we say that hash_append is
> part of the std::lib.  Then the std::lib implementors will be required to
> provide a hash_append as the standard specifies for all scalars.  Hopefully
> for many std-defined types as well.  That's all a standard could possibly
> do.
>

Imposing a contract on users of a standard-defined extension point is well
within the standard's purview; see std::hash, to name a particularly
salient example. I agree that the standard shouldn't impose this particular
requirement, but only because it would be exceedingly onerous, not because
the standard can't do it even in principle.


>
> > You are no doubt right that this functionality will "probably be useful
> to somebody", but that's a fairly low bar to clear. Can you point to a
> specific use case that would benefit from a generic fingerprinting (as
> opposed to hashing) API?
>
> I can well imagine that an application that implements a secure financial
> P2P protocol consisting of a history of public ledgers might use SHA-256 as
> a means of securing the integrity of those ledgers.  And I can imagine that
> it might be desirable to have that application run on platforms of
> differing endian.  Say PPC and x86.  When a 64 byte integral type is part
> of a message that gets hashed by the SHA-256 hash algorithm, it will be
> important that when this is computed by both the PPC machine and the x86
> machine, that they get the same result.  Otherwise the two machines will
> not be able to agree upon the contents of the public ledger.  Think
> bitcoin.  Or perhaps ripple:  https://ripple.com.


How does this use case benefit from a _generic_ fingerprinting API? How
does it benefit from using the same hash_append overloads for both
fingerprinting and in-memory hashing?

Don't get me wrong, this design sounds plausible for this particular use
case; I'm concerned that it may couple the protocol too closely to the C++
implementation, which may create rigidities as the two evolve, but no doubt
the engineers implementing the system have thought that through (or would
have, since this is of course entirely hypothetical ;-) ). However, what
makes this case plausible is the fact that cryptographic fingerprinting is
at the very core of the application, and so every class that's being
fingerprinted has been designed to support efficient, reliable
fingerprinting. That being the case, reusing the existing fingerprint
infrastructure for ordinary in-memory hashing makes sense.

However, if this API is standardized, it seems self-evident that the vast
majority of classes, and in particular their hash_append overloads, will be
written without much attention to fingerprinting, since in-memory hashing
is a much more common use case than persistent fingerprinting.
Consequently, hash_append authors are very likely to do things that are
inappropriate for fingerprinting, such as ignoring the endian flags, using
types with implementation-defined sizes, or changing hash_append to reflect
changes in the underlying class.

In an environment where fingerprinting is not a central concern, using
hash_append for both hashing and fingerprinting seems very likely to lead
to bugs. These would be the worst sort of bugs, too: bugs where the code
initially works just fine, and then breaks months or years later, after
you've had plenty of time for your faulty assumptions to become deeply
embedded in both the codebase and the protocol/serialization format that it
implements.


>
> Such applications will also have to limit their porting to that tiny slice
> of platforms that use ASCII/Unicode as well.  There will be no porting to
> EBCDIC-based platforms. :-)
>

This is doubtless the right choice for this application, but I don't think
the standard can afford to be so cavalier. Or maybe it can, but if so it
should start with the core language, not with library extensions (I for one
would be perfectly happy for C++ to standardize on Unicode, and for that
matter on two's complement little-endian LP64 integers and IEEE 754
floating-point, but I get the feeling there are good reasons that hasn't
happened). We can't just write off marginal platforms in the library while
claiming to support them in the language, _especially_ if our degraded
support comes in the form of correctness bugs, rather than build failures
or degraded performance.


> Howard
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to std-proposals+unsubscribe@isocpp.org.
> To post to this group, send email to std-proposals@isocpp.org.
> Visit this group at
> http://groups.google.com/a/isocpp.org/group/std-proposals/.
>

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

--001a1134497a6daa2d05019e18f0
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><br></div><div class=3D"gmail_e=
xtra"><br></div><div class=3D"gmail_extra"><div class=3D"gmail_quote">On Tu=
e, Aug 26, 2014 at 3:13 PM, Howard Hinnant <span dir=3D"ltr">&lt;<a href=3D=
"mailto:howard.hinnant@gmail.com" target=3D"_blank">howard.hinnant@gmail.co=
m</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex"><div class=3D""><br>
On Aug 26, 2014, at 5:28 PM, &#39;Geoffrey Romer&#39; via ISO C++ Standard =
- Future Proposals &lt;<a href=3D"mailto:std-proposals@isocpp.org">std-prop=
osals@isocpp.org</a>&gt; wrote:<br>
<br>
&gt;<br>
&gt;&gt; On Tue, Aug 26, 2014 at 1:02 PM, Howard Hinnant &lt;<a href=3D"mai=
lto:howard.hinnant@gmail.com">howard.hinnant@gmail.com</a>&gt; wrote:<br>
&gt;&gt; On Aug 26, 2014, at 2:26 PM, &#39;Geoffrey Romer&#39; via ISO C++ =
Standard - Future Proposals &lt;<a href=3D"mailto:std-proposals@isocpp.org"=
>std-proposals@isocpp.org</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; &gt; OK, so the output of this hasher is implementation-defined, a=
nd not suitable for use cases that require cross-binary reproducibility. I =
think that&#39;s the only reasonable choice, but then why bother with the w=
hole endian rigamarole?<br>

&gt;&gt;<br>
&gt;&gt; Because 98% of the platforms that actually have a modern C++ compi=
ler have a pretty standard scalar layout that differs only by endian (or wo=
rd size).<br>
&gt;&gt;<br>
&gt; Good point about word size, but I don&#39;t see where your proposal ha=
ndles it.<br>
<br>
</div>One can hash a std::int32_t (for example) if you want to specify inte=
gral size.=C2=A0 Technically, std::int32_t is optional in the standard.=C2=
=A0 But lots of platforms have a std::int32_t, and the client doesn&#39;t h=
ave to care if that is a short, int or long.<br>

<div class=3D""><br>
&gt;&gt; That is probably to be useful to somebody.=C2=A0 No need to take t=
hat functionality away just because it wouldn&#39;t have worked on a platfo=
rm that stopped being sold two decades ago.<br>
&gt;&gt;<br>
&gt; OK, let&#39;s assume that you&#39;re right, and in practice everyone w=
ill use the same byte representation for int (up to endianness and maybe wo=
rd size). Can you say the same for an arbitrary type T? In other words, whe=
n I define a hash_append overload, am I required to guarantee that the byte=
 representation it produces will not vary across platforms, or over time?<b=
r>

<br>
</div>I missed the smiley on the end of that sentence.=C2=A0 You&#39;re kid=
ding me, right?!<br>
<br>
If we restrict T to the set of scalars.=C2=A0 And if we say that hash_appen=
d is part of the std::lib.=C2=A0 Then the std::lib implementors will be req=
uired to provide a hash_append as the standard specifies for all scalars.=
=C2=A0 Hopefully for many std-defined types as well.=C2=A0 That&#39;s all a=
 standard could possibly do.<br>
</blockquote><div><br></div><div>Imposing a contract on users of a standard=
-defined extension point is well within the standard&#39;s purview; see std=
::hash, to name a particularly salient example. I agree that the standard s=
houldn&#39;t impose this particular requirement, but only because it would =
be exceedingly onerous, not because the standard can&#39;t do it even in pr=
inciple.</div>
<div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px =
0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-l=
eft-style:solid;padding-left:1ex">
<div class=3D""><br>
&gt; You are no doubt right that this functionality will &quot;probably be =
useful to somebody&quot;, but that&#39;s a fairly low bar to clear. Can you=
 point to a specific use case that would benefit from a generic fingerprint=
ing (as opposed to hashing) API?<br>

<br>
</div>I can well imagine that an application that implements a secure finan=
cial P2P protocol consisting of a history of public ledgers might use SHA-2=
56 as a means of securing the integrity of those ledgers.=C2=A0 And I can i=
magine that it might be desirable to have that application run on platforms=
 of differing endian.=C2=A0 Say PPC and x86.=C2=A0 When a 64 byte integral =
type is part of a message that gets hashed by the SHA-256 hash algorithm, i=
t will be important that when this is computed by both the PPC machine and =
the x86 machine, that they get the same result.=C2=A0 Otherwise the two mac=
hines will not be able to agree upon the contents of the public ledger.=C2=
=A0 Think bitcoin.=C2=A0 Or perhaps ripple:=C2=A0 <a href=3D"https://ripple=
..com" target=3D"_blank">https://ripple.com</a>.</blockquote>
<div><br></div><div><div>How does this use case benefit from a _generic_ fi=
ngerprinting API? How does it benefit from using the same hash_append overl=
oads for both fingerprinting and in-memory hashing?</div><div><br></div>
<div>Don&#39;t get me wrong, this design sounds plausible for this particul=
ar use case; I&#39;m concerned that it may couple the protocol too closely =
to the C++ implementation, which may create rigidities as the two evolve, b=
ut no doubt the engineers implementing the system have thought that through=
 (or would have, since this is of course entirely hypothetical ;-) ). Howev=
er, what makes this case plausible is the fact that cryptographic fingerpri=
nting is at the very core of the application, and so every class that&#39;s=
 being fingerprinted has been designed to support efficient, reliable finge=
rprinting. That being the case, reusing the existing fingerprint infrastruc=
ture for ordinary in-memory hashing makes sense.</div>
<div><br></div><div>However, if this API is standardized, it seems self-evi=
dent that the vast majority of classes, and in particular their hash_append=
 overloads, will be written without much attention to fingerprinting, since=
 in-memory hashing is a much more common use case than persistent fingerpri=
nting. Consequently, hash_append authors are very likely to do things that =
are inappropriate for fingerprinting, such as ignoring the endian flags, us=
ing types with implementation-defined sizes, or changing hash_append to ref=
lect changes in the underlying class.</div>
<div><br></div><div>In an environment where fingerprinting is not a central=
 concern, using hash_append for both hashing and fingerprinting seems very =
likely to lead to bugs. These would be the worst sort of bugs, too: bugs wh=
ere the code initially works just fine, and then breaks months or years lat=
er, after you&#39;ve had plenty of time for your faulty assumptions to beco=
me deeply embedded in both the codebase and the protocol/serialization form=
at that it implements.</div>
</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0p=
x 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);bo=
rder-left-style:solid;padding-left:1ex">
<br>
Such applications will also have to limit their porting to that tiny slice =
of platforms that use ASCII/Unicode as well.=C2=A0 There will be no porting=
 to EBCDIC-based platforms. :-)<br></blockquote><div><br></div><div>This is=
 doubtless the right choice for this application, but I don&#39;t think the=
 standard can afford to be so cavalier. Or maybe it can, but if so it shoul=
d start with the core language, not with library extensions (I for one woul=
d be perfectly happy for C++ to standardize on Unicode, and for that matter=
 on two&#39;s complement little-endian LP64 integers and IEEE 754 floating-=
point, but I get the feeling there are good reasons that hasn&#39;t happene=
d). We can&#39;t just write off marginal platforms in the library while cla=
iming to support them in the language, _especially_ if our degraded support=
 comes in the form of correctness bugs, rather than build failures or degra=
ded performance.</div>
<div><br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0p=
x 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-lef=
t-style:solid;padding-left:1ex">
<div class=3D""><div class=3D"h5"><br>
Howard<br>
<br>
--<br>
<br>
---<br>
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br>
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals%2Bunsubscribe@isocpp.org">std-propo=
sals+unsubscribe@isocpp.org</a>.<br>
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br>
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/" target=3D"_blank">http://groups.google.com/a/isocpp.org/gro=
up/std-proposals/</a>.<br>
</div></div></blockquote></div><br></div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

--001a1134497a6daa2d05019e18f0--

.


Author: "'Jeffrey Yasskin' via ISO C++ Standard - Future Proposals" <std-proposals@isocpp.org>
Date: Wed, 27 Aug 2014 09:05:32 -0700
Raw View
On Wed, Aug 27, 2014 at 8:30 AM, 'Geoffrey Romer' via ISO C++ Standard
- Future Proposals <std-proposals@isocpp.org> wrote:
>
>
> On Tue, Aug 26, 2014 at 3:13 PM, Howard Hinnant <howard.hinnant@gmail.com>
> wrote:
>>
>>
>> On Aug 26, 2014, at 5:28 PM, 'Geoffrey Romer' via ISO C++ Standard -
>> Future Proposals <std-proposals@isocpp.org> wrote:
>>
>> >
>> >> On Tue, Aug 26, 2014 at 1:02 PM, Howard Hinnant
>> >> <howard.hinnant@gmail.com> wrote:
>> >> On Aug 26, 2014, at 2:26 PM, 'Geoffrey Romer' via ISO C++ Standard -
>> >> Future Proposals <std-proposals@isocpp.org> wrote:
>> >>
>> >> > OK, so the output of this hasher is implementation-defined, and not
>> >> > suitable for use cases that require cross-binary reproducibility. I think
>> >> > that's the only reasonable choice, but then why bother with the whole endian
>> >> > rigamarole?
>> >>
>> >> Because 98% of the platforms that actually have a modern C++ compiler
>> >> have a pretty standard scalar layout that differs only by endian (or word
>> >> size).
>> >>
>> > Good point about word size, but I don't see where your proposal handles
>> > it.
>>
>> One can hash a std::int32_t (for example) if you want to specify integral
>> size.  Technically, std::int32_t is optional in the standard.  But lots of
>> platforms have a std::int32_t, and the client doesn't have to care if that
>> is a short, int or long.
>>
>> >> That is probably to be useful to somebody.  No need to take that
>> >> functionality away just because it wouldn't have worked on a platform that
>> >> stopped being sold two decades ago.
>> >>
>> > OK, let's assume that you're right, and in practice everyone will use
>> > the same byte representation for int (up to endianness and maybe word size).
>> > Can you say the same for an arbitrary type T? In other words, when I define
>> > a hash_append overload, am I required to guarantee that the byte
>> > representation it produces will not vary across platforms, or over time?
>>
>> I missed the smiley on the end of that sentence.  You're kidding me,
>> right?!
>>
>> If we restrict T to the set of scalars.  And if we say that hash_append is
>> part of the std::lib.  Then the std::lib implementors will be required to
>> provide a hash_append as the standard specifies for all scalars.  Hopefully
>> for many std-defined types as well.  That's all a standard could possibly
>> do.
>
>
> Imposing a contract on users of a standard-defined extension point is well
> within the standard's purview; see std::hash, to name a particularly salient
> example. I agree that the standard shouldn't impose this particular
> requirement, but only because it would be exceedingly onerous, not because
> the standard can't do it even in principle.
>
>>
>>
>> > You are no doubt right that this functionality will "probably be useful
>> > to somebody", but that's a fairly low bar to clear. Can you point to a
>> > specific use case that would benefit from a generic fingerprinting (as
>> > opposed to hashing) API?
>>
>> I can well imagine that an application that implements a secure financial
>> P2P protocol consisting of a history of public ledgers might use SHA-256 as
>> a means of securing the integrity of those ledgers.  And I can imagine that
>> it might be desirable to have that application run on platforms of differing
>> endian.  Say PPC and x86.  When a 64 byte integral type is part of a message
>> that gets hashed by the SHA-256 hash algorithm, it will be important that
>> when this is computed by both the PPC machine and the x86 machine, that they
>> get the same result.  Otherwise the two machines will not be able to agree
>> upon the contents of the public ledger.  Think bitcoin.  Or perhaps ripple:
>> https://ripple.com.
>
>
> How does this use case benefit from a _generic_ fingerprinting API? How does
> it benefit from using the same hash_append overloads for both fingerprinting
> and in-memory hashing?
>
> Don't get me wrong, this design sounds plausible for this particular use
> case; I'm concerned that it may couple the protocol too closely to the C++
> implementation, which may create rigidities as the two evolve, but no doubt
> the engineers implementing the system have thought that through (or would
> have, since this is of course entirely hypothetical ;-) ). However, what
> makes this case plausible is the fact that cryptographic fingerprinting is
> at the very core of the application, and so every class that's being
> fingerprinted has been designed to support efficient, reliable
> fingerprinting. That being the case, reusing the existing fingerprint
> infrastructure for ordinary in-memory hashing makes sense.
>
> However, if this API is standardized, it seems self-evident that the vast
> majority of classes, and in particular their hash_append overloads, will be
> written without much attention to fingerprinting, since in-memory hashing is
> a much more common use case than persistent fingerprinting. Consequently,
> hash_append authors are very likely to do things that are inappropriate for
> fingerprinting, such as ignoring the endian flags, using types with
> implementation-defined sizes, or changing hash_append to reflect changes in
> the underlying class.
>
> In an environment where fingerprinting is not a central concern, using
> hash_append for both hashing and fingerprinting seems very likely to lead to
> bugs. These would be the worst sort of bugs, too: bugs where the code
> initially works just fine, and then breaks months or years later, after
> you've had plenty of time for your faulty assumptions to become deeply
> embedded in both the codebase and the protocol/serialization format that it
> implements.
>
>>
>>
>> Such applications will also have to limit their porting to that tiny slice
>> of platforms that use ASCII/Unicode as well.  There will be no porting to
>> EBCDIC-based platforms. :-)
>
>
> This is doubtless the right choice for this application, but I don't think
> the standard can afford to be so cavalier. Or maybe it can, but if so it
> should start with the core language, not with library extensions (I for one
> would be perfectly happy for C++ to standardize on Unicode, and for that
> matter on two's complement little-endian LP64 integers and IEEE 754
> floating-point, but I get the feeling there are good reasons that hasn't
> happened). We can't just write off marginal platforms in the library while
> claiming to support them in the language, _especially_ if our degraded
> support comes in the form of correctness bugs, rather than build failures or
> degraded performance.


I am suspicious that I don't know of any other platform that provides
cryptographic hash functions as a generic traversal over user-defined
data types. They all require byte-string inputs instead. If we did
that for the C++ library, users would need to explicitly serialize
their types before feeding them into the cryptographic hash, which
doesn't really seem that bad, especially if we have a way to stream
the serialization into the fingerprint function so there never needs
to be a large string allocated in memory.

I went back to check
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#serialization,
and the problem Howard pointed out is that hashing wants to treat 0.0
and -0.0 as equivalent. But I don't see a reason to think that
*fingerprinting* needs to treat them as equivalent, especially if the
serialization code wants to treat them as distinct behaviors. If
you're generating a signature of your current state, and that state
depends on the difference between 0.0 and -0.0, I think you'd want to
incorporate that into the signature.

Jeffrey

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Jens Maurer <Jens.Maurer@gmx.net>
Date: Wed, 27 Aug 2014 23:38:44 +0200
Raw View
On 08/27/2014 06:05 PM, 'Jeffrey Yasskin' via ISO C++ Standard - Future Proposals wrote:
> I am suspicious that I don't know of any other platform that provides
> cryptographic hash functions as a generic traversal over user-defined
> data types. They all require byte-string inputs instead. If we did
> that for the C++ library, users would need to explicitly serialize
> their types before feeding them into the cryptographic hash, which
> doesn't really seem that bad, especially if we have a way to stream
> the serialization into the fingerprint function so there never needs
> to be a large string allocated in memory.

But hash_append() is exactly that: A "serialization" that streams
directly into the cryptographic hash.  You have to define hash_append()
for your own types (thereby defining the serialization format, and
which parts are salient for the hash and which ones aren't), and you
use hash_append() on scalar types and strings as defined by the standard
library.

> I went back to check
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#serialization,
> and the problem Howard pointed out is that hashing wants to treat 0.0
> and -0.0 as equivalent.

It will be hard to specify the details of  hash_append(double) ,
given that C++ says nearly nothing about floating-point behavior,
in general.

>  But I don't see a reason to think that
> *fingerprinting* needs to treat them as equivalent, especially if the
> serialization code wants to treat them as distinct behaviors. If
> you're generating a signature of your current state, and that state
> depends on the difference between 0.0 and -0.0, I think you'd want to
> incorporate that into the signature.

I think/hope hash_append() provides enough customization points to deal
with that.  For example, you could define your own SHA256 as a thin
wrapper around the library-provided one.  Now you have a customization
point for   hash_append(sha256, double)   to do what you want.
You can also distinguish that crypto-hashing from other kinds
of hashing, if you're so inclined.

Yes, it's rather brittle if you switch your hash from sha256 to
something else.  If you want to avoid that, we should ensure the
design is flexible enough that I can have my own
my_crypto_hash_append()   that does what I want, without too much
repetition of code.  Yep, I can't use std::uhash<> any more, but
that should contain only trivial code, anyway.

Jens

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Jens Maurer <Jens.Maurer@gmx.net>
Date: Thu, 28 Aug 2014 00:41:14 +0200
Raw View
Hi Howard!

I'll have to ask a few more questions here.  If something gets
standardized in this area, I'd like to see a roadmap how all
platforms supported by C++ can get portable (crypto) hash values,
even if not maximally efficient on "strange" environments.

I'm assuming that (in an abstract sense) a crypto hash algorithm
(and most others) hash a sequence of octets (i.e. 8-bit quantities).

(SHA256 and others can do odd trailing bits, but let's ignore
this for now.)

I presume I'm passing these octets to the hash algorithm using an
array of (unsigned) char, right?

Is this assumption also true for platforms where (unsigned) char
is e.g. 32-bits (DSPs)?  If so, the following implementation doesn't
work there, because it makes no effort to split e.g. T==int (suppose
it's 32-bit) into four individual (unsigned) char objects.

(Oh, and on such platforms, 1 == sizeof(char) == sizeof(int).)

> hash_append(Hasher& h, T const& t) noexcept
> {
>     h(std::addressof(t), sizeof(t));
> }


On 08/24/2014 07:39 PM, Howard Hinnant wrote:
> If we are dealing with a platform/HashAlgorithm disagreement in endian, then an alternative hash_append can be used for scalars:

So, the endianness is a boolean, not a three-way type?  Either you're "native" or not
seems all that matters, from the code you presented.


I think we're mixing two slightly related, but distinct aspects here.

One aspect is the fact that hashing an array of "short" values with a portable
hash (such as sha256) should give the same value across all platforms.
(The cost of endianess normalization is negligible compared to the cost
of running sha256 at all.)  Maybe we need to identify those hash algorithms
that are supposed to give portable results.

So, for example, hashing this:

  short a[] = { 1, 2, 3, 4 };

with sha256 should give consistent cross-platform results.
Thus, when feeding octets to the hash algorithm, we must have a
(default) idea how to map a "short" value into (two? four? more?)
octets.  That might be big or little endian, or something entirely
different (e.g. a variable-length encoding, which might actually
turn out to be the most portable).


The other aspect is the fact that hash algorithms such as sha256 like
to process e.g. 32-bits (= 4 octets) at once.  When reading four
octets from memory, it's helpful to be able to simply read them
into a register on "suitable" platforms and only do the endianess
(or other) conversion on the remainder of the platforms.  But, on
the abstract level, this is not a configuration option, it's a
question of correctness.

[Giving the user a way to opt-out of this endianess correctness is
fine for me (emphasis: opt-out).]


I don't think a single "endian" value captures both aspects.


> uhash<sha256> h1;  // don't worry about endian
> uhash<sha256_little> h2;  // ensure scalars are little endian prior to hashing

The simple name must result in the portable hash value.

> Finally note that the implementation of hash_append is made simpler by the use of the (const void*, size_t) interface, as opposed to a (const unsigned char*, size_t) interface.  With the latter, one would have to code:
>
> template <class Hasher, class T>
> inline
> std::enable_if_t
> <
>     is_contiguously_hashable<T, Hasher>{}
>>
> hash_append(Hasher& h, T const& t) noexcept
> {
>     h(reinterpret_cast<const unsigned char*>(std::addressof(t)), sizeof(t));
> }

I agree "void *" is simpler, but I continue to believe this is a more dangerous
interface, allowing to inadvertently pass stuff with padding in it.  Note
that the user is not expected to write this code, but rely on the standard
library to hash scalar types.

(If hashing comes up in Urbana-Champaign, please grab me so that I can voice
a "strongly against" for this particular aspect.)

(You can use two static_casts via "void *" if the reinterpret_cast is too
dreadful.)

Jens

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Miro Knejp <miro.knejp@gmail.com>
Date: Thu, 28 Aug 2014 02:51:07 +0200
Raw View
Maybe this whole issue simply shows that algorithms with strict size
requirements should not be defined in terms of char, short, int but
int8_t, int16_t, and so on. If hash_append has (by default) only
overloads for exactly sized types then the compiler should pick the
correct one when the user feeds it unsized types like int. On machines
where no int8_t exists no hash_append overload for int8_t exists.

Where the same source code is to produce equal hashes for the same data
structures on different machines then the only portable way is to use
the explicitly sized types. That is honestly the *only* way to be sure
of consistent hash values and should maybe be added as a note somewhere.

Though I think "char" (not "signed char" or "unsigned char" as those are
3 different types) should be treated implementation-defined as the
standard uses this type only for character values in strings. On a
machine where char has more than 8 bit only the implementation knows
which of these bits are representative of the value of a string
character/codepoint. Same goes for wchar_t.

Regarding the void* question, I think Howard's is_contiguously_hashable
covers this nicely. If your type has no padding in it, specialize the
trait. If it does, provide your own hash_append overload and feed it the
required members. I think a good set of predefined overloads for
hash_append would be

hash_append(Hasher&, const T&) // enable if is_contiguously_hashable<T,
Hasher> is true
hash_append(Hasher&, array_view<T>) // enable if hash_append(Hasher, T)
is well-formed
hash_append(Hasher&, basic_string_view<Char, Traits>) // enable if
hash_append(Hasher, Char) is well-formed

Having is_contiguously_hashable<T, Hasher> predefined as true for the
types [u]int[8|16|32|64]_t, char, wchar_t and char[16|32]_t allows the
implementation to select which integral types are acceptable and can
then internally provide specializations with tag dispatching. Using
unsized types like short or int would pick the proper overload depending
on what int##_t typedefs alias to.

A further alternative would be to provide a third parameter in the form
hash_append(Hasher&, const T&, valid_bits<N>), so even on architectures
without direct int8_t support one could use ints for storage and only
mask the leading N bits as relevant for the hash.

This should make any need for a hash_append(void*, size_t) overload
obsolete. The Hasher itself needs a (void* p, size_t n) overload where n
denotes the number of valid OCTETS pointed to by p. hash_append() has
then already taken care of endianess and other details by applying
Hasher's traits. The nice hing here is that if a type has no padding and
the user *decided that endianess, etc. does not matter* then enabling
is_contiguously_hashable makes hash_append() feed the entire structure
to (void*, size_t). Appropriate warning signs should be positioned
around is_contiguously_hashable to make the user aware of its positive
and negative implications.

Does this make sense?

Am 28.08.2014 00:41, schrieb Jens Maurer:
> Hi Howard!
>
> I'll have to ask a few more questions here.  If something gets
> standardized in this area, I'd like to see a roadmap how all
> platforms supported by C++ can get portable (crypto) hash values,
> even if not maximally efficient on "strange" environments.
>
> I'm assuming that (in an abstract sense) a crypto hash algorithm
> (and most others) hash a sequence of octets (i.e. 8-bit quantities).
>
> (SHA256 and others can do odd trailing bits, but let's ignore
> this for now.)
>
> I presume I'm passing these octets to the hash algorithm using an
> array of (unsigned) char, right?
>
> Is this assumption also true for platforms where (unsigned) char
> is e.g. 32-bits (DSPs)?  If so, the following implementation doesn't
> work there, because it makes no effort to split e.g. T==int (suppose
> it's 32-bit) into four individual (unsigned) char objects.
>
> (Oh, and on such platforms, 1 == sizeof(char) == sizeof(int).)
>
>> hash_append(Hasher& h, T const& t) noexcept
>> {
>>      h(std::addressof(t), sizeof(t));
>> }
>
> On 08/24/2014 07:39 PM, Howard Hinnant wrote:
>> If we are dealing with a platform/HashAlgorithm disagreement in endian, then an alternative hash_append can be used for scalars:
> So, the endianness is a boolean, not a three-way type?  Either you're "native" or not
> seems all that matters, from the code you presented.
>
>
> I think we're mixing two slightly related, but distinct aspects here.
>
> One aspect is the fact that hashing an array of "short" values with a portable
> hash (such as sha256) should give the same value across all platforms.
> (The cost of endianess normalization is negligible compared to the cost
> of running sha256 at all.)  Maybe we need to identify those hash algorithms
> that are supposed to give portable results.
>
> So, for example, hashing this:
>
>    short a[] = { 1, 2, 3, 4 };
>
> with sha256 should give consistent cross-platform results.
> Thus, when feeding octets to the hash algorithm, we must have a
> (default) idea how to map a "short" value into (two? four? more?)
> octets.  That might be big or little endian, or something entirely
> different (e.g. a variable-length encoding, which might actually
> turn out to be the most portable).
>
>
> The other aspect is the fact that hash algorithms such as sha256 like
> to process e.g. 32-bits (= 4 octets) at once.  When reading four
> octets from memory, it's helpful to be able to simply read them
> into a register on "suitable" platforms and only do the endianess
> (or other) conversion on the remainder of the platforms.  But, on
> the abstract level, this is not a configuration option, it's a
> question of correctness.
>
> [Giving the user a way to opt-out of this endianess correctness is
> fine for me (emphasis: opt-out).]
>
>
> I don't think a single "endian" value captures both aspects.
>
>
>> uhash<sha256> h1;  // don't worry about endian
>> uhash<sha256_little> h2;  // ensure scalars are little endian prior to hashing
> The simple name must result in the portable hash value.
>
>> Finally note that the implementation of hash_append is made simpler by the use of the (const void*, size_t) interface, as opposed to a (const unsigned char*, size_t) interface.  With the latter, one would have to code:
>>
>> template <class Hasher, class T>
>> inline
>> std::enable_if_t
>> <
>>      is_contiguously_hashable<T, Hasher>{}
>> hash_append(Hasher& h, T const& t) noexcept
>> {
>>      h(reinterpret_cast<const unsigned char*>(std::addressof(t)), sizeof(t));
>> }
> I agree "void *" is simpler, but I continue to believe this is a more dangerous
> interface, allowing to inadvertently pass stuff with padding in it.  Note
> that the user is not expected to write this code, but rely on the standard
> library to hash scalar types.
>
> (If hashing comes up in Urbana-Champaign, please grab me so that I can voice
> a "strongly against" for this particular aspect.)
>
> (You can use two static_casts via "void *" if the reinterpret_cast is too
> dreadful.)
>
> Jens
>

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Jens Maurer <Jens.Maurer@gmx.net>
Date: Thu, 28 Aug 2014 16:25:04 +0200
Raw View
On 08/28/2014 02:51 AM, Miro Knejp wrote:
> Maybe this whole issue simply shows that algorithms with strict size
> requirements should not be defined in terms of char, short, int but
> int8_t, int16_t, and so on. If hash_append has (by default) only
> overloads for exactly sized types then the compiler should pick the
> correct one when the user feeds it unsized types like int. On machines
> where no int8_t exists no hash_append overload for int8_t exists.

If we go that route, we should use "uint8_t" etc, not the signed variants,
which have more platform freedom.

And, for a hash algorithm that operates on octets, what does it mean
to hash a uint16_t value, call it x?  I think the definition should be

   hash the two octets obtained by    x & 0xff  and   (x>>8) & 0xff,
    in sequence

but that should be stated explicitly, and also how the user can get
that result if he chooses to directly call the hash algorithm.

> Where the same source code is to produce equal hashes for the same data
> structures on different machines then the only portable way is to use
> the explicitly sized types. That is honestly the *only* way to be sure
> of consistent hash values and should maybe be added as a note somewhere.

I don't think so.  If you strictly think about portably serializing
values to octets, the types on the platforms (other than the
integer/floating-point distinction) becomes meaningless, and you should
only think about how to represent values in a portable manner, using
(for example) a value-dependent variable-length integer representation.

(I'm not saying this is the only choice, but we shouldn't claim
hash algorithms such as sha256 if the results aren't really portable.)

> Though I think "char" (not "signed char" or "unsigned char" as those are
> 3 different types) should be treated implementation-defined as the
> standard uses this type only for character values in strings.

This is not exactly true.  Standard filestreams use these for (possibly)
binary data, too.  (Unfortunately, in my opinion.)

>   On a
> machine where char has more than 8 bit only the implementation knows
> which of these bits are representative of the value of a string
> character/codepoint.  Same goes for wchar_t.

Note that for "unsigned char", all bits contribute to the value
(3.9.1p1 basic.fundamental).  (No padding allowed.)  That also
holds if "char" happens to be unsigned.

> Regarding the void* question, I think Howard's is_contiguously_hashable
> covers this nicely. If your type has no padding in it, specialize the
> trait. If it does, provide your own hash_append overload and feed it the
> required members. I think a good set of predefined overloads for
> hash_append would be
>
> hash_append(Hasher&, const T&) // enable if is_contiguously_hashable<T,
> Hasher> is true

Other than for nerd-ness, why is it easier to partially specialize
is_contiguously_hashable<my_T,Hasher> instead of overloading

void hash_append(Hasher& h, const my_T& value) {
  std::hash_append_contiguous(h, value);
}

?

The latter seems easier to read, and makes it easier to provide a
different answer for a different Hasher.

> hash_append(Hasher&, array_view<T>) // enable if hash_append(Hasher, T)
> is well-formed
> hash_append(Hasher&, basic_string_view<Char, Traits>) // enable if
> hash_append(Hasher, Char) is well-formed

Yes, this is basic decomposition.  We should have something like that.
(I'd be happy to omit the enable_if dance; you'll simply get an error
if T doesn't have a hash_append(), which is good.)

Question: Do you hash the length of the string or array separately, or
just the elements?  In the latter case, hashing two empty strings
is indistinguishable from hashing three empty strings.  This seems
undesirable for crypto-hashes.

> Having is_contiguously_hashable<T, Hasher> predefined as true for the
> types [u]int[8|16|32|64]_t, char, wchar_t and char[16|32]_t allows the
> implementation to select which integral types are acceptable and can
> then internally provide specializations with tag dispatching. Using
> unsized types like short or int would pick the proper overload depending
> on what int##_t typedefs alias to.

So would  std::hash_append()  overloads that each take one of the
scalar types mentioned above.  (The standard library can avoid
duplicate overloads for the uintX_t typedefs conflicting with char etc.)

I don't see why we need an   is_contiguously_hashable<T, Hasher>   trait.

> A further alternative would be to provide a third parameter in the form
> hash_append(Hasher&, const T&, valid_bits<N>), so even on architectures
> without direct int8_t support one could use ints for storage and only
> mask the leading N bits as relevant for the hash.

For scalar T, this should be the job of the standard library implementation.
I don't see a need for valid_bits<N> with a generic "T".  That said, I'm
not opposed to adding

  hash_append(Hasher&, const std::bitset<N>&);

where N is divisible by 8.  This allows to pass an octet without
concerns about C++ built-in types at all.

> This should make any need for a hash_append(void*, size_t) overload
> obsolete. The Hasher itself needs a (void* p, size_t n) overload where n
> denotes the number of valid OCTETS pointed to by p.

Then it's hard to call that function in portable code, because
sizeof(int) = 1 on some platforms where "int" is 32 bits (and "char", too).

>   hash_append() has
> then already taken care of endianess and other details by applying
> Hasher's traits. The nice hing here is that if a type has no padding and
> the user *decided that endianess, etc. does not matter* then enabling
> is_contiguously_hashable makes hash_append() feed the entire structure
> to (void*, size_t).

I believe it's a user-level policy decision for his struct type T to
determine whether it can be hashed contiguously, possibly depending
on the hasher (think two hash tables keyed off on different things,
both pointing to T objects).  And that policy decision is best left
to the user's overload of  hash_append(T).

Jens

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Thu, 28 Aug 2014 13:41:54 -0400
Raw View
On Aug 27, 2014, at 6:41 PM, Jens Maurer <Jens.Maurer@gmx.net> wrote:

> Hi Howard!

Hi Jens! :-)

I suspect you intended this as a private message, but it came through publi=
c.  I'm responding publicly because I think you make some good points below=
 and I thought everyone here could benefit from your points, and my respons=
e to some of them.  Hope that's ok.

> I'll have to ask a few more questions here.  If something gets
> standardized in this area, I'd like to see a roadmap how all
> platforms supported by C++ can get portable (crypto) hash values,
> even if not maximally efficient on "strange" environments.
>=20
> I'm assuming that (in an abstract sense) a crypto hash algorithm
> (and most others) hash a sequence of octets (i.e. 8-bit quantities).
>=20
> (SHA256 and others can do odd trailing bits, but let's ignore
> this for now.)
>=20
> I presume I'm passing these octets to the hash algorithm using an
> array of (unsigned) char, right?

As currently coded, there are hash_append overloads for both unsigned char,=
 C-arrays of unsigned char, std::array<unsigned char, N>, etc.  There are a=
lso hash_append overloads for all other arithmetic types, and std-defined c=
ontainers.  For each type, the hash_append function is responsible for deci=
ding how that type should present itself to a generic hashing algorithm.  F=
or example an unsigned char would just say: Consume this byte!

>=20
> Is this assumption also true for platforms where (unsigned) char
> is e.g. 32-bits (DSPs)?  If so, the following implementation doesn't
> work there, because it makes no effort to split e.g. T=3D=3Dint (suppose
> it's 32-bit) into four individual (unsigned) char objects.

The hash_append infrastructure makes no assumptions on the size of a byte. =
 However concrete hashing algorithms (such as SHA256) most certainly will m=
ake such assumptions.

>=20
> (Oh, and on such platforms, 1 =3D=3D sizeof(char) =3D=3D sizeof(int).)
>=20
>> hash_append(Hasher& h, T const& t) noexcept
>> {
>>    h(std::addressof(t), sizeof(t));
>> }
>=20
>=20
> On 08/24/2014 07:39 PM, Howard Hinnant wrote:
>> If we are dealing with a platform/HashAlgorithm disagreement in endian, =
then an alternative hash_append can be used for scalars:
>=20
> So, the endianness is a boolean, not a three-way type?  Either you're "na=
tive" or not
> seems all that matters, from the code you presented.

As currently coded, a hashing algorithm would set a member static const enu=
m to one of three values:

    static constexpr xstd::endian endian =3D xstd::endian::native;
    static constexpr xstd::endian endian =3D xstd::endian::big;
    static constexpr xstd::endian endian =3D xstd::endian::little;

The meaning of these is to ask the hash_append overload for scalars influen=
ced by endian (larger than char) to change the endian from native, to the r=
equested endian, prior to sending the bytes into the hashing algorithm.  Co=
ncretely, in the order shown above:

1.  Map native endian to native endian (presumably this is always a no-op).
2.  Map native endian to big endian.  This will be a no-op on big endian ma=
chines.
3.  Map native endian to little endian.  This will be a no-op on little end=
ian machines.

Non-fingerprinting hash applications will probably always use the native ma=
pping (i.e. they don't care about endian).

> I think we're mixing two slightly related, but distinct aspects here.
>=20
> One aspect is the fact that hashing an array of "short" values with a por=
table
> hash (such as sha256) should give the same value across all platforms.
> (The cost of endianess normalization is negligible compared to the cost
> of running sha256 at all.)  Maybe we need to identify those hash algorith=
ms
> that are supposed to give portable results.
>=20
> So, for example, hashing this:
>=20
>  short a[] =3D { 1, 2, 3, 4 };
>=20
> with sha256 should give consistent cross-platform results.
> Thus, when feeding octets to the hash algorithm, we must have a
> (default) idea how to map a "short" value into (two? four? more?)
> octets.  That might be big or little endian, or something entirely
> different (e.g. a variable-length encoding, which might actually
> turn out to be the most portable).

<nod> This is the aspect addressed by the enum above.

> The other aspect is the fact that hash algorithms such as sha256 like
> to process e.g. 32-bits (=3D 4 octets) at once.  When reading four
> octets from memory, it's helpful to be able to simply read them
> into a register on "suitable" platforms and only do the endianess
> (or other) conversion on the remainder of the platforms.  But, on
> the abstract level, this is not a configuration option, it's a
> question of correctness.
>=20
> [Giving the user a way to opt-out of this endianess correctness is
> fine for me (emphasis: opt-out).]
>=20
>=20
> I don't think a single "endian" value captures both aspects.

Agreed.  I see the second aspect above is an implementation detail of the h=
ashing algorithm.  This detail does not impact the hashing algorithm's inte=
rface, except as to impact its results.  I see no motivation to "leak" this=
 implementation detail into a standard specification, unless the standard i=
s to specify concrete hashing algorithms.  In that event we could choose an=
y number of options, of which I really have little opinion.  For example:  =
Sha256_output_little_endian as one hashing algorithm and Sha256_output_big_=
endian as another.  Or perhaps Sha256<endian> is another solution. =20

<shrug>, the hash_append proposal is hashing algorithm neutral, and does no=
t address concrete hashing algorithms.  It only addresses a technique for e=
asily switching among concrete hashing algorithms.  And this is why hash_ap=
pend does have to deal with the "input endian" to the hashing algorithm, bu=
t does not address the "output endian" aspect.

>> uhash<sha256> h1;  // don't worry about endian
>> uhash<sha256_little> h2;  // ensure scalars are little endian prior to h=
ashing
>=20
> The simple name must result in the portable hash value.

My understanding from http://en.wikipedia.org/wiki/Sha256 is that SHA256 ou=
tput endian is always big.

>=20
>> Finally note that the implementation of hash_append is made simpler by t=
he use of the (const void*, size_t) interface, as opposed to a (const unsig=
ned char*, size_t) interface.  With the latter, one would have to code:
>>=20
>> template <class Hasher, class T>
>> inline
>> std::enable_if_t
>> <
>>    is_contiguously_hashable<T, Hasher>{}
>>>=20
>> hash_append(Hasher& h, T const& t) noexcept
>> {
>>    h(reinterpret_cast<const unsigned char*>(std::addressof(t)), sizeof(t=
));
>> }
>=20
> I agree "void *" is simpler, but I continue to believe this is a more dan=
gerous
> interface, allowing to inadvertently pass stuff with padding in it.  Note
> that the user is not expected to write this code, but rely on the standar=
d
> library to hash scalar types.
>=20
> (If hashing comes up in Urbana-Champaign, please grab me so that I can vo=
ice
> a "strongly against" for this particular aspect.)

Will do.

Howard

>=20
> (You can use two static_casts via "void *" if the reinterpret_cast is too
> dreadful.)
>=20
> Jens

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Thu, 28 Aug 2014 14:00:12 -0400
Raw View
On Aug 27, 2014, at 8:51 PM, Miro Knejp <miro.knejp@gmail.com> wrote:

> Maybe this whole issue simply shows that algorithms with strict size requ=
irements should not be defined in terms of char, short, int but int8_t, int=
16_t, and so on. If hash_append has (by default) only overloads for exactly=
 sized types then the compiler should pick the correct one when the user fe=
eds it unsized types like int. On machines where no int8_t exists no hash_a=
ppend overload for int8_t exists.

Agreed.  We should think of hash_append similar to swap.  swap(int8_t&, int=
8_t&) exists only where int8_t exists. :-)  But swap(signed char&, signed c=
har&) exists everywhere.  And so should hash_append(H&, signed char).

>=20
> Where the same source code is to produce equal hashes for the same data s=
tructures on different machines then the only portable way is to use the ex=
plicitly sized types. That is honestly the *only* way to be sure of consist=
ent hash values and should maybe be added as a note somewhere.
>=20
> Though I think "char" (not "signed char" or "unsigned char" as those are =
3 different types) should be treated implementation-defined as the standard=
 uses this type only for character values in strings. On a machine where ch=
ar has more than 8 bit only the implementation knows which of these bits ar=
e representative of the value of a string character/codepoint. Same goes fo=
r wchar_t.

This is pretty much the role of the is_uniquely_represented<T> trait.  The =
implementation will have to decide if a scalar uniquely represented.

A type T is uniquely represented if for all combinations of two values of a=
 type, say x and y, if x =3D=3D y, then it must also be true that memcmp(ad=
dressof(x), addressof(y), sizeof(T)) =3D=3D 0. I.e. if x =3D=3D y, then x a=
nd y have the same bit pattern representation. A 2's complement int satisfi=
es this property because every bit pattern an int can have results in a dis=
tinct value (rule 2). And there are no "padding bits" which might take on r=
andom values.

> Regarding the void* question, I think Howard's is_contiguously_hashable c=
overs this nicely. If your type has no padding in it, specialize the trait.=
 If it does, provide your own hash_append overload and feed it the required=
 members. I think a good set of predefined overloads for hash_append would =
be
>=20
> hash_append(Hasher&, const T&) // enable if is_contiguously_hashable<T, H=
asher> is true
> hash_append(Hasher&, array_view<T>) // enable if hash_append(Hasher, T) i=
s well-formed
> hash_append(Hasher&, basic_string_view<Char, Traits>) // enable if hash_a=
ppend(Hasher, Char) is well-formed
>=20
> Having is_contiguously_hashable<T, Hasher> predefined as true for the typ=
es [u]int[8|16|32|64]_t, char, wchar_t and char[16|32]_t allows the impleme=
ntation to select which integral types are acceptable and can then internal=
ly provide specializations with tag dispatching. Using unsized types like s=
hort or int would pick the proper overload depending on what int##_t typede=
fs alias to.
>=20
> A further alternative would be to provide a third parameter in the form h=
ash_append(Hasher&, const T&, valid_bits<N>), so even on architectures with=
out direct int8_t support one could use ints for storage and only mask the =
leading N bits as relevant for the hash.
>=20
> This should make any need for a hash_append(void*, size_t) overload obsol=
ete. The Hasher itself needs a (void* p, size_t n) overload where n denotes=
 the number of valid OCTETS pointed to by p. hash_append() has then already=
 taken care of endianess and other details by applying Hasher's traits. The=
 nice hing here is that if a type has no padding and the user *decided that=
 endianess, etc. does not matter* then enabling is_contiguously_hashable ma=
kes hash_append() feed the entire structure to (void*, size_t). Appropriate=
 warning signs should be positioned around is_contiguously_hashable to make=
 the user aware of its positive and negative implications.

The implementation should provide:

   hash_apppend(Hasher&, const T&)

for all arithmetic T, T*, and probably enums and nullptr_t as well.  Additi=
onally the spec should provide hash_append for containers, pair, tuple, etc=
..  Anything that we today have a std::hash<T> for, we should definitely hav=
e a std::hash_append for.  And more.

A hash_append is the API that everyday programmers will use to build their =
own hash_append.  For example:

class Customer
{
    std::string firstName_;
    std::string lastName_;
    int         age_;
public:
    // ...
    template <class HashAlgorithm>
    friend
    void
    hash_append(HashAlgorithm& h, const Customer& c)
    {
         using std::hash_append;
         hash_append(c.firstName_, c.lastName_, c.age_);
    }
};

And this is further composable:

class Sale
{
    Customer customer_;
    Product  product_;
public:
    // ...
    template <class HashAlgorithm>
    friend
    void
    hash_append(HashAlgorithm& h, const Sale& s)
    {
         using std::hash_append;
         hash_append(s.customer_, s.product_);
    }
};

As soon as we start saying something like:  "hash_append only exists for a =
small set of types", then we have completely changed the entire design, and=
 completely missed the point of it.

hash_append is not about programmers stuffing bytes into a hash algorithm (=
that's the std::lib's job).  hash_append is about building a system whereby=
 programmers can write their hashing support just once, for all hashing alg=
orithms, and with absolutely no need for a hash_combine step.

hash_append should be as ubiquitous as swap, and operator=3D=3D.

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Miro Knejp <miro.knejp@gmail.com>
Date: Thu, 28 Aug 2014 22:42:44 +0200
Raw View
Am 28.08.2014 um 16:25 schrieb Jens Maurer:
> On 08/28/2014 02:51 AM, Miro Knejp wrote:
>> Maybe this whole issue simply shows that algorithms with strict size
>> requirements should not be defined in terms of char, short, int but
>> int8_t, int16_t, and so on. If hash_append has (by default) only
>> overloads for exactly sized types then the compiler should pick the
>> correct one when the user feeds it unsized types like int. On machines
>> where no int8_t exists no hash_append overload for int8_t exists.
> If we go that route, we should use "uint8_t" etc, not the signed variants=
,
> which have more platform freedom.
Possibly
>
> And, for a hash algorithm that operates on octets, what does it mean
> to hash a uint16_t value, call it x?  I think the definition should be
>
>     hash the two octets obtained by    x & 0xff  and   (x>>8) & 0xff,
>      in sequence
>
> but that should be stated explicitly, and also how the user can get
> that result if he chooses to directly call the hash algorithm.
I'd say it's the job of hash_append() provided by the implementation for=20
the native bultin types to take care of this. If the Hasher has endian=20
requirements, x gets swapped as necessary and its contiguous (value=20
representing)*octets* fed to the Hasher in order of least to most=20
significant (or reversed). Every introductory course to programming I=20
have ever witnessed talks in lengths about bytes and how they make up=20
ints and what endianess is and so forth. I consider this basic=20
knowledge. The standardese has to describe this in detail of course (as=20
far as it can considering the C++ abstract machine), but I think for the=20
sake of this discussion what happens is clear. The real challenge I=20
think is figuring out how to define it portably for machines not=20
operating on multiples of 8 bit (if this is deemed relevant).
>> Where the same source code is to produce equal hashes for the same data
>> structures on different machines then the only portable way is to use
>> the explicitly sized types. That is honestly the *only* way to be sure
>> of consistent hash values and should maybe be added as a note somewhere.
> I don't think so.  If you strictly think about portably serializing
> values to octets, the types on the platforms (other than the
> integer/floating-point distinction) becomes meaningless, and you should
> only think about how to represent values in a portable manner, using
> (for example) a value-dependent variable-length integer representation.
>
> (I'm not saying this is the only choice, but we shouldn't claim
> hash algorithms such as sha256 if the results aren't really portable.)
Then I wonder who you want to put the burden on: the implementation or=20
the user? I do do think portability is possible (certainly for machines=20
with 8*n bits) by only providing hash_append for selected native bultin=20
(maybe only unsigned) types which are either equivalent on all=20
architectures supporting them; or don't exist at all. At least if you=20
are on an architecture without these typedefs the compiler will kindly=20
remind you that your data structure is incompatible, instead of=20
producing funny hashes.
>> Though I think "char" (not "signed char" or "unsigned char" as those are
>> 3 different types) should be treated implementation-defined as the
>> standard uses this type only for character values in strings.
> This is not exactly true.  Standard filestreams use these for (possibly)
> binary data, too.  (Unfortunately, in my opinion.)
Well sadly we have the sized typedefs officially only since 11...
>> hash_append(Hasher&, array_view<T>) // enable if hash_append(Hasher, T)
>> is well-formed
>> hash_append(Hasher&, basic_string_view<Char, Traits>) // enable if
>> hash_append(Hasher, Char) is well-formed
> Yes, this is basic decomposition.  We should have something like that.
> (I'd be happy to omit the enable_if dance; you'll simply get an error
> if T doesn't have a hash_append(), which is good.)
Whether it's an enable_if dance or predefined overloads doesn't really=20
matter as long as it does the job. This was more of an exposition.
> Question: Do you hash the length of the string or array separately, or
> just the elements?  In the latter case, hashing two empty strings
> is indistinguishable from hashing three empty strings.  This seems
> undesirable for crypto-hashes.
This is an issue that was discussed previously and I don't know if there=20
was a consensus. It's nested containers where it becomes headache-inducing.
>> Having is_contiguously_hashable<T, Hasher> predefined as true for the
>> types [u]int[8|16|32|64]_t, char, wchar_t and char[16|32]_t allows the
>> implementation to select which integral types are acceptable and can
>> then internally provide specializations with tag dispatching. Using
>> unsized types like short or int would pick the proper overload depending
>> on what int##_t typedefs alias to.
> So would  std::hash_append()  overloads that each take one of the
> scalar types mentioned above.  (The standard library can avoid
> duplicate overloads for the uintX_t typedefs conflicting with char etc.)
is_contiguously_hashable for the scalar types was just a tool for the=20
enable_if exposition (which may or may not be used). The implementation=20
should know how to turn these types into octet streams.
> I don't see why we need an   is_contiguously_hashable<T, Hasher>   trait.
As far as I understad it's primarily for optimization and boilerplate=20
reduction. If all bytes of my struct contribute to its value=20
representation without any padding, and endianess doesn't apply, I can=20
shove the entire struct (or array thereof) in the hungry maw of (void*,=20
size_t), and as has been mentioned earlier there are CPUs with hash/CRC=20
instructions so there is optimization potential present. This is usually=20
true for scalar types. For compound types it is upon the author of the=20
type to decide if this is true or not. This is obviouly not a good=20
solution if the hash has to be communicated to the outside world where=20
endianess may matter. The introduction of hash_append_contiguous() may=20
make this trait obsolete. There's only the question left whether=20
defining a class partial specialization or a forwarding function=20
overload is less effort and less prone to error by the user. But I think=20
the trait approach makes it easier to apply the information=20
transitively. Plus you get a query for static_assert to check whether=20
your assumptions about the types of member variables hold:

// All three types have no padding, only value bytes, endianess doesn't=20
matter.
// Ensured with type_traits and whatnot.
struct A { ... };
struct B { ... };
struct C
{
     A a;
     B b;
     int i;
};

// Overload solution 1
void hash_append(Hasher& h, const A& x) { ... }
void hash_append(Hasher& h, const B& x) { ... }
void hash_append(Hasher& h, const C& x)
{
     // Compiler may not figure out the entire struct can go in one big=20
gulp,
     // making the optimization unlikely.
     hash_append(h, x.a);
     hash_append(h, x.b);
     hash_append(h, x.i);
}

// Overload solution 2
void hash_append(Hasher& h, const C& x)
{
     // There is no way to ensure this is actually correct
     // because we cannot check it for A and B
     hash_append_contiguous(addressof(x), sizeof(x));
}

// Traits solution
struct is_contiguously_hashable<Hasher, A> : true_type { };
struct is_contiguously_hashable<Hasher, B> : true_type { };
struct is_contiguously_hashable<Hasher, C> : true_type
{
     // Sprinkle code with magic static_assert dust to ensure
     // our assumptions about A and B are true.
     // (especially if their traits are defined in some other place)
};

>
>> A further alternative would be to provide a third parameter in the form
>> hash_append(Hasher&, const T&, valid_bits<N>), so even on architectures
>> without direct int8_t support one could use ints for storage and only
>> mask the leading N bits as relevant for the hash.
> For scalar T, this should be the job of the standard library implementati=
on.
> I don't see a need for valid_bits<N> with a generic "T".  That said, I'm
> not opposed to adding
>
>    hash_append(Hasher&, const std::bitset<N>&);
>
> where N is divisible by 8.  This allows to pass an octet without
> concerns about C++ built-in types at all.
Good point about bitset. About my motivation: consider you have to use=20
uint32_t for your data type because the machine has no uint8_t but only=20
the lower 8 bit of the integer are to be hashed (the integer is=20
abstracted in a way as to behave like an 8 bit value), the upper 24 bits=20
must be discarded (not masked) in order to produce identical signatures=20
as on some remote machine that has direct 8 bit support. This=20
information cannot be provided using only the builtin scalar types.=20
bitset is a nice solution to this.
>
>> This should make any need for a hash_append(void*, size_t) overload
>> obsolete. The Hasher itself needs a (void* p, size_t n) overload where n
>> denotes the number of valid OCTETS pointed to by p.
> Then it's hard to call that function in portable code, because
> sizeof(int) =3D 1 on some platforms where "int" is 32 bits (and "char", t=
oo).
I didn't specify n in terms of sizeof() but *octets*. The hasher has to=20
know how to extract the octets from the pointed to contiguous octet=20
stream. Forget sizeof(char), this is very carefully defined as the=20
number of octets pointed to by the pointer. If sizeof(int) =3D=3D=20
sizeof(char) =3D=3D 1, int has 32 bits, and all bits shall be hashed, then=
=20
it still holds that n =3D=3D 4.
>>    hash_append() has
>> then already taken care of endianess and other details by applying
>> Hasher's traits. The nice hing here is that if a type has no padding and
>> the user *decided that endianess, etc. does not matter* then enabling
>> is_contiguously_hashable makes hash_append() feed the entire structure
>> to (void*, size_t).
> I believe it's a user-level policy decision for his struct type T to
> determine whether it can be hashed contiguously, possibly depending
> on the hasher (think two hash tables keyed off on different things,
> both pointing to T objects).  And that policy decision is best left
> to the user's overload of  hash_append(T).
The "user-level policy" you refer to is presented by Howard as=20
is_contiguously_hashable. But regardless of any traits one has always=20
the liberty of overloading hash_append() for cases like this or call=20
hash_append for the relevant members manually for a case like you=20
describe. I see is_contiguously_hashable primarily as a hint for=20
optimization (and reduction of boilerplate , and compile time checking=20
with static_assert).
>
> Jens
>


Am 28.08.2014 um 20:00 schrieb Howard Hinnant:
> On Aug 27, 2014, at 8:51 PM, Miro Knejp <miro.knejp@gmail.com> wrote:
>
>> Regarding the void* question, I think Howard's is_contiguously_hashable =
covers this nicely. If your type has no padding in it, specialize the trait=
.. If it does, provide your own hash_append overload and feed it the require=
d members. I think a good set of predefined overloads for hash_append would=
 be
>>
>> hash_append(Hasher&, const T&) // enable if is_contiguously_hashable<T, =
Hasher> is true
>> hash_append(Hasher&, array_view<T>) // enable if hash_append(Hasher, T) =
is well-formed
>> hash_append(Hasher&, basic_string_view<Char, Traits>) // enable if hash_=
append(Hasher, Char) is well-formed
>>
>> Having is_contiguously_hashable<T, Hasher> predefined as true for the ty=
pes [u]int[8|16|32|64]_t, char, wchar_t and char[16|32]_t allows the implem=
entation to select which integral types are acceptable and can then interna=
lly provide specializations with tag dispatching. Using unsized types like =
short or int would pick the proper overload depending on what int##_t typed=
efs alias to.
>>
>> A further alternative would be to provide a third parameter in the form =
hash_append(Hasher&, const T&, valid_bits<N>), so even on architectures wit=
hout direct int8_t support one could use ints for storage and only mask the=
 leading N bits as relevant for the hash.
>>
>> This should make any need for a hash_append(void*, size_t) overload obso=
lete. The Hasher itself needs a (void* p, size_t n) overload where n denote=
s the number of valid OCTETS pointed to by p. hash_append() has then alread=
y taken care of endianess and other details by applying Hasher's traits. Th=
e nice hing here is that if a type has no padding and the user *decided tha=
t endianess, etc. does not matter* then enabling is_contiguously_hashable m=
akes hash_append() feed the entire structure to (void*, size_t). Appropriat=
e warning signs should be positioned around is_contiguously_hashable to mak=
e the user aware of its positive and negative implications.
> The implementation should provide:
>
>     hash_apppend(Hasher&, const T&)
>
> for all arithmetic T, T*, and probably enums and nullptr_t as well.  Addi=
tionally the spec should provide hash_append for containers, pair, tuple, e=
tc.  Anything that we today have a std::hash<T> for, we should definitely h=
ave a std::hash_append for.  And more.
>
> (...)
>
> As soon as we start saying something like:  "hash_append only exists for =
a small set of types", then we have completely changed the entire design, a=
nd completely missed the point of it.
>
> hash_append is not about programmers stuffing bytes into a hash algorithm=
 (that's the std::lib's job).  hash_append is about building a system where=
by programmers can write their hashing support just once, for all hashing a=
lgorithms, and with absolutely no need for a hash_combine step.
I agree, and I was merely talking about the atomic building blocks (the=20
std::lib job of stuffing bytes) which depend on the=20
implementation/machine, Hasher endianess and whatnot. If the arithmetic=20
Ts (and underlying values for enums) are exposed in the interface using=20
explicitly sized integer types. There should of course be further=20
overloads for the compound std types, containers, and stuff.
> hash_append should be as ubiquitous as swap, and operator=3D=3D.
>
> Howard
>
Miro

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Jens Maurer <Jens.Maurer@gmx.net>
Date: Fri, 29 Aug 2014 00:11:26 +0200
Raw View
On 08/28/2014 07:41 PM, Howard Hinnant wrote:
> I suspect you intended this as a private message, but it came through pub=
lic.=20

It was intended as a public message.  Let's try something different:

Hi everybody!

>> I'll have to ask a few more questions here.  If something gets
>> standardized in this area, I'd like to see a roadmap how all
>> platforms supported by C++ can get portable (crypto) hash values,
>> even if not maximally efficient on "strange" environments.
>>
>> I'm assuming that (in an abstract sense) a crypto hash algorithm
>> (and most others) hash a sequence of octets (i.e. 8-bit quantities).
>>
>> (SHA256 and others can do odd trailing bits, but let's ignore
>> this for now.)
>>
>> I presume I'm passing these octets to the hash algorithm using an
>> array of (unsigned) char, right?
>=20
> As currently coded, there are hash_append overloads for both unsigned cha=
r, C-arrays of unsigned char, std::array<unsigned char, N>, etc.  There are=
 also hash_append overloads for all other arithmetic types, and std-defined=
 containers.  For each type, the hash_append function is responsible for de=
ciding how that type should present itself to a generic hashing algorithm.

I agree that the standard library should provide hash_append overloads for
all scalar types and standard containers, including C-style arrays.
(Btw, does hash_append on a container also hash the size, or just the
contents, i.e. the sequence of elements?)

That's not what I'm concerned about here.  I'm concerned about writing a
crypto hash sum that plays nicely with the framework, and is maximally
portable both in its implementation and its result value.

> For example an unsigned char would just say: Consume this byte!

A byte is not (necessarily) an octet.  I'm concerned about this
particular gap.

>> Is this assumption also true for platforms where (unsigned) char
>> is e.g. 32-bits (DSPs)?  If so, the following implementation doesn't
>> work there, because it makes no effort to split e.g. T=3D=3Dint (suppose
>> it's 32-bit) into four individual (unsigned) char objects.
>=20
> The hash_append infrastructure makes no assumptions on the size of a byte=
..  However concrete hashing algorithms (such as SHA256) most certainly will=
 make such assumptions.

I want to write a portable implementation of a hash sum that also
works on a machine where 1 =3D=3D sizeof(char) =3D=3D sizeof(int) (=3D 32 b=
its).

If I've understood the interface correctly, both hashing an int x and
a char c will end up with a call to my hash algorithm h like this:

   h(&x, 1);
   h(&c, 1);

and the interface is type-erased (i.e. uses "void*" or "unsigned char*").

On a usual platform, the calls will end up like this (ignoring endianess
for now):

   h(&x, 4);
   h(&c, 1);

I don't think "h" will ever be able to produce the same hash sum on both
platforms, even if specifically tailored for the particular platform.
It seems too much information is lost on the   sizeof(char) =3D=3D sizeof(i=
nt)
platform.

One way to address this is to split the "int" into four octets and
assign a separate "unsigned char" for each octet in the hash_append
function on the DSP-style platform.  Then both calls end up as

   h(&x, 4);
   h(&c, 1);


>> On 08/24/2014 07:39 PM, Howard Hinnant wrote:
>>> If we are dealing with a platform/HashAlgorithm disagreement in endian,=
 then an alternative hash_append can be used for scalars:
>>
>> So, the endianness is a boolean, not a three-way type?  Either you're "n=
ative" or not
>> seems all that matters, from the code you presented.
>=20
> As currently coded, a hashing algorithm would set a member static const e=
num to one of three values:
>=20
>     static constexpr xstd::endian endian =3D xstd::endian::native;
>     static constexpr xstd::endian endian =3D xstd::endian::big;
>     static constexpr xstd::endian endian =3D xstd::endian::little;
>=20
> The meaning of these is to ask the hash_append overload for scalars influ=
enced by endian (larger than char) to change the endian from native, to the=
 requested endian, prior to sending the bytes into the hashing algorithm.  =
Concretely, in the order shown above:
>=20
> 1.  Map native endian to native endian (presumably this is always a no-op=
).
> 2.  Map native endian to big endian.  This will be a no-op on big endian =
machines.
> 3.  Map native endian to little endian.  This will be a no-op on little e=
ndian machines.

> Non-fingerprinting hash applications will probably always use the native =
mapping (i.e. they don't care about endian).

Yes.

It seems to me that these choices are, strictly speaking, not a property of=
 the
(crypto) hash algorithm (that is only concerned with octets coming in), but=
 with
the preferences / situation in which it is used.  As someone else pointed o=
ut,
we're essentially defining an ephemeral serialization format for purposes o=
f
computing the hash.

I'd like to ask:

 - that (core) hash algorithm implementations such as SHA256 do not
specify the "endian" thing (it doesn't mean anything at this level) and

 - that there is a config option to do scalar endian conversions if so desi=
red.

Example:

  std::uhash<std::sha256>        // unportable for scalars > char
  std::uhash<std::sha256, std::endian::big>   // convert scalars > char to =
"big endian" prior to feeding octets to hash algorithm

This also supports strange VAX-endianess as the native endian
convention, I believe.

>> The other aspect is the fact that hash algorithms such as sha256 like
>> to process e.g. 32-bits (=3D 4 octets) at once.  When reading four
>> octets from memory, it's helpful to be able to simply read them
>> into a register on "suitable" platforms and only do the endianess
>> (or other) conversion on the remainder of the platforms.  But, on
>> the abstract level, this is not a configuration option, it's a
>> question of correctness.

> Agreed.  I see the second aspect above is an implementation detail of the=
 hashing algorithm.  This detail does not impact the hashing algorithm's in=
terface, except as to impact its results.  I see no motivation to "leak" th=
is implementation detail into a standard specification, unless the standard=
 is to specify concrete hashing algorithms.  In that event we could choose =
any number of options, of which I really have little opinion.  For example:=
  Sha256_output_little_endian as one hashing algorithm and Sha256_output_bi=
g_endian as another.  Or perhaps Sha256<endian> is another solution. =20

Well, there is a lost optimization opportunity if you're hashing an
array of unsigned int (32 bits) on a platform and with an endianess
choice that is just "right".  Otherwise, you get two endianess conversions
back-to-back: One for the scalar > char thing from above, and one when
sha256 tries to form its internal 32-bit chunks.

>> The simple name must result in the portable hash value.

Let me retract that; see details above.

> My understanding from http://en.wikipedia.org/wiki/Sha256 is that SHA256 =
output endian is always big.

The output is a sequence of octets.  It might be that this sequence is inte=
rpreted in
big endian style for (some) display purposes, but that's a minor detail.

Jens

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Howard Hinnant <howard.hinnant@gmail.com>
Date: Thu, 28 Aug 2014 22:58:25 -0400
Raw View
On Aug 28, 2014, at 6:11 PM, Jens Maurer <Jens.Maurer@gmx.net> wrote:

> On 08/28/2014 07:41 PM, Howard Hinnant wrote:
>> I suspect you intended this as a private message, but it came through pu=
blic.=20
>=20
> It was intended as a public message.  Let's try something different:
>=20
> Hi everybody!

:-)

>>> I'll have to ask a few more questions here.  If something gets
>>> standardized in this area, I'd like to see a roadmap how all
>>> platforms supported by C++ can get portable (crypto) hash values,
>>> even if not maximally efficient on "strange" environments.
>>>=20
>>> I'm assuming that (in an abstract sense) a crypto hash algorithm
>>> (and most others) hash a sequence of octets (i.e. 8-bit quantities).
>>>=20
>>> (SHA256 and others can do odd trailing bits, but let's ignore
>>> this for now.)
>>>=20
>>> I presume I'm passing these octets to the hash algorithm using an
>>> array of (unsigned) char, right?
>>=20
>> As currently coded, there are hash_append overloads for both unsigned ch=
ar, C-arrays of unsigned char, std::array<unsigned char, N>, etc.  There ar=
e also hash_append overloads for all other arithmetic types, and std-define=
d containers.  For each type, the hash_append function is responsible for d=
eciding how that type should present itself to a generic hashing algorithm.
>=20
> I agree that the standard library should provide hash_append overloads fo=
r
> all scalar types and standard containers, including C-style arrays.
> (Btw, does hash_append on a container also hash the size, or just the
> contents, i.e. the sequence of elements?)

As currently coded, the run-time-sized containers append the size of the co=
ntainer.  This is to prevent hash collision between:

vector<vector<int>>{{}, {1}} and vector<vector<int>>{{1}, {}},

which if they did not include a size() in their message, would both result =
in the same message to the hash algorithm, and thus both generate the same =
the hash code.

Append was chosen as opposed to prepend so that forward_list<T> would not h=
ave to traverse twice during hash_append.

Markers such as "begin container" and "end container" were also considered,=
 but appending the size was considered to be simpler and more economical th=
an the alternatives.

> That's not what I'm concerned about here.  I'm concerned about writing a
> crypto hash sum that plays nicely with the framework, and is maximally
> portable both in its implementation and its result value.
>=20
>> For example an unsigned char would just say: Consume this byte!
>=20
> A byte is not (necessarily) an octet.  I'm concerned about this
> particular gap.

I think designing for a portable implementation of (for example) SHA256 for=
 non-8-bit-byte platforms is an over-reaching goal.  Forgetting hash_append=
, and even C++, and designing whatever API you cared for, how would you wri=
te an implementation of SHA256 that was byte-size agnostic?  And how would =
that impact the API of SHA256?

Today, the SHA256 implementations I've seen (written in C) are simply #ifde=
f'd on endian, and are not portable to non-8-bit-byte platforms.  The hash_=
append proposal is not trying to impact this status-quo.  I consider all of=
 this HashAlgorithm details to be worked out by the HashAlgorithm author.

>>> Is this assumption also true for platforms where (unsigned) char
>>> is e.g. 32-bits (DSPs)?  If so, the following implementation doesn't
>>> work there, because it makes no effort to split e.g. T=3D=3Dint (suppos=
e
>>> it's 32-bit) into four individual (unsigned) char objects.
>>=20
>> The hash_append infrastructure makes no assumptions on the size of a byt=
e.  However concrete hashing algorithms (such as SHA256) most certainly wil=
l make such assumptions.
>=20
> I want to write a portable implementation of a hash sum that also
> works on a machine where 1 =3D=3D sizeof(char) =3D=3D sizeof(int) (=3D 32=
 bits).

I anxiously await your prototype.

> If I've understood the interface correctly, both hashing an int x and
> a char c will end up with a call to my hash algorithm h like this:
>=20
>   h(&x, 1);
>   h(&c, 1);
>=20
> and the interface is type-erased (i.e. uses "void*" or "unsigned char*").

You are correct, assuming on this platform that all bits in both int and  c=
har participate in the type's representation.  OTOH, if char is padded with=
 24 bits of random values, the hash_append will be prohibited (by the speci=
fication) from sending those random bits to the hash algorithm.  hash_appen=
d might (for example) zero all the padding bits before sending the byte to =
the hashing algorithm.

> On a usual platform, the calls will end up like this (ignoring endianess
> for now):
>=20
>   h(&x, 4);
>   h(&c, 1);

Correct again.

> I don't think "h" will ever be able to produce the same hash sum on both
> platforms, even if specifically tailored for the particular platform.
> It seems too much information is lost on the   sizeof(char) =3D=3D sizeof=
(int)
> platform.

<shrug> My understanding is that hash algorithms consume bytes (and sometim=
es bits).  They don't care what the type is.  If we want to enforce that th=
e same byte representation for two different types sends different byte str=
eams to the hashing algorithm, hash_append is the right place to make that =
customization.

> One way to address this is to split the "int" into four octets and
> assign a separate "unsigned char" for each octet in the hash_append
> function on the DSP-style platform.  Then both calls end up as
>=20
>   h(&x, 4);
>   h(&c, 1);

The hash algorithm 'h' is going to see bytes, no matter what we do (void* o=
r unsigned char*).  Whether hashing algorithms change those bytes into octe=
ts (or words) or not is completely within their implementation.

If the committee proclaims that the "message" sent by a 32 bit int should b=
e different than the "message" sent by a 32 bit char should be different, s=
o be it.  The current hash_append proposal purposefully does not dictate su=
ch details.  My feeling is that dictating such details is bound to adversel=
y impact efficiency, but I am happy to see those details worked out in comm=
ittee.

>>> On 08/24/2014 07:39 PM, Howard Hinnant wrote:
>>>> If we are dealing with a platform/HashAlgorithm disagreement in endian=
, then an alternative hash_append can be used for scalars:
>>>=20
>>> So, the endianness is a boolean, not a three-way type?  Either you're "=
native" or not
>>> seems all that matters, from the code you presented.
>>=20
>> As currently coded, a hashing algorithm would set a member static const =
enum to one of three values:
>>=20
>>    static constexpr xstd::endian endian =3D xstd::endian::native;
>>    static constexpr xstd::endian endian =3D xstd::endian::big;
>>    static constexpr xstd::endian endian =3D xstd::endian::little;
>>=20
>> The meaning of these is to ask the hash_append overload for scalars infl=
uenced by endian (larger than char) to change the endian from native, to th=
e requested endian, prior to sending the bytes into the hashing algorithm. =
 Concretely, in the order shown above:
>>=20
>> 1.  Map native endian to native endian (presumably this is always a no-o=
p).
>> 2.  Map native endian to big endian.  This will be a no-op on big endian=
 machines.
>> 3.  Map native endian to little endian.  This will be a no-op on little =
endian machines.
>=20
>> Non-fingerprinting hash applications will probably always use the native=
 mapping (i.e. they don't care about endian).
>=20
> Yes.
>=20
> It seems to me that these choices are, strictly speaking, not a property =
of the
> (crypto) hash algorithm (that is only concerned with octets coming in),

but the algorithm will see bytes,

> but with
> the preferences / situation in which it is used.  As someone else pointed=
 out,
> we're essentially defining an ephemeral serialization format for purposes=
 of
> computing the hash.
>=20
> I'd like to ask:
>=20
> - that (core) hash algorithm implementations such as SHA256 do not
> specify the "endian" thing (it doesn't mean anything at this level) and

do you mean aspect 1 (input), or aspect 2 (output), or both?

>=20
> - that there is a config option to do scalar endian conversions if so des=
ired.
>=20
> Example:
>=20
>  std::uhash<std::sha256>        // unportable for scalars > char
>  std::uhash<std::sha256, std::endian::big>   // convert scalars > char to=
 "big endian" prior to feeding octets to hash algorithm

This part sounds exactly what I'm proposing, except that the specification =
is applied to std::sha256 (and all hashing algorithms in general), instead =
of to std::uhash (and all hash functors in general).

The hash functor is the wrong place to specify endian requests because the =
hash functor should be as simple as possible.  It is only responsible for:

1.  Initializing the hash algorithm.
2.  Updating the hash algorithm with a single value.
3.  Finalizing the hash algorithm.

    template <class T>
    result_type
    operator()(T const& t) const noexcept
    {
        Hasher h;
        hash_append(h, t);
        return static_cast<result_type>(h);
    }

With as-simple-as-possible hash functor requirements, one maximizes the abi=
lity of the programmer to create a custom hash functor to do things such as=
 seeding and/or salting.

On the other hand, hash_append is the perfect place to respond to such a re=
quest as hash_append will know what type to be hashed it is dealing with (w=
hich may or may not have endian concerns), and what type of hash algorithm =
it is dealing with.  If the hash algorithm specifies the desired endian inp=
ut, this is very easily taken care of.  For example:

    template <class HashAlgorithm>
    hash_append(HashAlgorithm& h, char c) noexcept
    {
        h(&c, 1);  // never any endian concerns
    }

On the other hand, for scalars > 1:

    template <class HashAlgorithm>
    hash_append(HashAlgorithm& h, int i) noexcept
    {
        if (HashAlgorithm::endian !=3D endian::native)
            i =3D convert_native_to(i, HashAlgorithm::endian)
        h(&i, sizeof(i));
    }

The above does not have to be the literal implementation.  I would prefer c=
ompile-time branching instead of run-time branching on the compile-time que=
stion of endian.  But I'm just trying to simplify the presentation.  And no=
te that for platforms where sizeof(int) =3D=3D sizeof(char) (and all bits a=
re part of the representation), the std::lib implementation simplifies down=
 to:

    template <class HashAlgorithm>
    hash_append(HashAlgorithm& h, char c) noexcept
    {
        h(&c, 1);  // never any endian concerns
    }

    template <class HashAlgorithm>
    hash_append(HashAlgorithm& h, int i) noexcept
    {
        h(&i, 1);  // never any endian concerns
    }

I.e. the implementation of the std::lib isn't portable.  But that's ok.  Th=
e std::lib implementors write non-portable code so that we don't have to.

If all bits *are not* part of the representation, then the std::lib impleme=
ntation must mask off or zero the padding bits prior to sending a message t=
o the hash algorithm.


>=20
> This also supports strange VAX-endianess as the native endian
> convention, I believe.

Agreed.

>=20
>>> The other aspect is the fact that hash algorithms such as sha256 like
>>> to process e.g. 32-bits (=3D 4 octets) at once.  When reading four
>>> octets from memory, it's helpful to be able to simply read them
>>> into a register on "suitable" platforms and only do the endianess
>>> (or other) conversion on the remainder of the platforms.  But, on
>>> the abstract level, this is not a configuration option, it's a
>>> question of correctness.
>=20
>> Agreed.  I see the second aspect above is an implementation detail of th=
e hashing algorithm.  This detail does not impact the hashing algorithm's i=
nterface, except as to impact its results.  I see no motivation to "leak" t=
his implementation detail into a standard specification, unless the standar=
d is to specify concrete hashing algorithms.  In that event we could choose=
 any number of options, of which I really have little opinion.  For example=
:  Sha256_output_little_endian as one hashing algorithm and Sha256_output_b=
ig_endian as another.  Or perhaps Sha256<endian> is another solution. =20
>=20
> Well, there is a lost optimization opportunity if you're hashing an
> array of unsigned int (32 bits) on a platform and with an endianess
> choice that is just "right".  Otherwise, you get two endianess conversion=
s
> back-to-back: One for the scalar > char thing from above, and one when
> sha256 tries to form its internal 32-bit chunks.

This is precisely the optimization that my hash_proposal has gone to great =
lengths to preserve.  And also for std::string, std::vector<int>, std::arra=
y<int>, std::pair<int, int>, std::vector<std::pair<int, int>>, std::vector<=
std::tuple<int, int, int, int>>, etc, etc.

>>> The simple name must result in the portable hash value.
>=20
> Let me retract that; see details above.
>=20
>> My understanding from http://en.wikipedia.org/wiki/Sha256 is that SHA256=
 output endian is always big.
>=20
> The output is a sequence of octets.  It might be that this sequence is in=
terpreted in
> big endian style for (some) display purposes, but that's a minor detail.

If by "display purposes" you also mean transmission across a network, I agr=
ee, though for me it is a major detail.

Howard

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Myriachan <myriachan@gmail.com>
Date: Thu, 28 Aug 2014 20:45:28 -0700 (PDT)
Raw View
------=_Part_410_1374794668.1409283929071
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Wednesday, August 27, 2014 3:41:18 PM UTC-7, Jens Maurer wrote:
>
> Hi Howard!=20
>
> I'll have to ask a few more questions here.  If something gets=20
> standardized in this area, I'd like to see a roadmap how all=20
> platforms supported by C++ can get portable (crypto) hash values,=20
> even if not maximally efficient on "strange" environments.=20
>
> I'm assuming that (in an abstract sense) a crypto hash algorithm=20
> (and most others) hash a sequence of octets (i.e. 8-bit quantities).=20
>
> (SHA256 and others can do odd trailing bits, but let's ignore=20
> this for now.)=20
>
> I presume I'm passing these octets to the hash algorithm using an=20
> array of (unsigned) char, right?=20
>
> Is this assumption also true for platforms where (unsigned) char=20
> is e.g. 32-bits (DSPs)?  If so, the following implementation doesn't=20
> work there, because it makes no effort to split e.g. T=3D=3Dint (suppose=
=20
> it's 32-bit) into four individual (unsigned) char objects.=20
>
> (Oh, and on such platforms, 1 =3D=3D sizeof(char) =3D=3D sizeof(int).)=20
>

That's kind of the wrong way to look at SHA-256 and related=20
algorithms--they are actually bitwise algorithms, not bytewise.  For a=20
certain class of hash functions, naming off the top of my head RIPEMD-160,=
=20
MD5, SHA-0/1/224/256/384/512, your message is treated as some number of=20
input bits.  These input bits are chopped up into a block size of some=20
number of bits, typically 512 or 1024.  The internal hash state is updated=
=20
using a compression function with the previous value of the hash state and=
=20
the input block.  This repeats until there is no more input.

At the end, a single "1" bit is added to the input bits.  Then "0" bits are=
=20
added until the input size has reached (BLOCKSIZE - bitsizeof(SIZEFIELD))=
=20
mod BLOCKSIZE.  The SIZEFIELD is how big the length indicator is; this is=
=20
64 bits for the 512-bit blocks and 128 for the 1024-bit blocks.  This=20
length, which is counted in bits, again pointing out that these algorithms=
=20
work with bits and not bytes, is appended, which, given the modulo,=20
completes the final block.  The compression function is applied one last=20
time, and you have a result.

Most implementations simply ignore the fact that these are bitwise=20
algorithms.  They write an extra 0x80 byte--all these algorithms use=20
big-endian bit order, even the little-endian byte order algorithms, like=20
MD5--do the zero padding, then append the byte length multiplied by 8.


On Thursday, August 28, 2014 7:58:31 PM UTC-7, Howard Hinnant wrote:
>
>
> I think designing for a portable implementation of (for example) SHA256=
=20
> for non-8-bit-byte platforms is an over-reaching goal.  Forgetting=20
> hash_append, and even C++, and designing whatever API you cared for, how=
=20
> would you write an implementation of SHA256 that was byte-size agnostic?=
=20
>  And how would that impact the API of SHA256?=20
>
> Today, the SHA256 implementations I've seen (written in C) are simply=20
> #ifdef=E2=80=99d on endian, and are not portable to non-8-bit-byte platfo=
rms.  The=20
> hash_append proposal is not trying to impact this status-quo.  I consider=
=20
> all of this HashAlgorithm details to be worked out by the HashAlgorithm=
=20
> author.=20
>

Most are not portable to systems with INT_MAX > UINT32_MAX, either, as I=20
pointed out earlier...

Melissa

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

------=_Part_410_1374794668.1409283929071
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Wednesday, August 27, 2014 3:41:18 PM UTC-7, Jens Maure=
r wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0=
..8ex;border-left: 1px #ccc solid;padding-left: 1ex;">Hi Howard!
<br>
<br>I'll have to ask a few more questions here. &nbsp;If something gets
<br>standardized in this area, I'd like to see a roadmap how all
<br>platforms supported by C++ can get portable (crypto) hash values,
<br>even if not maximally efficient on "strange" environments.
<br>
<br>I'm assuming that (in an abstract sense) a crypto hash algorithm
<br>(and most others) hash a sequence of octets (i.e. 8-bit quantities).
<br>
<br>(SHA256 and others can do odd trailing bits, but let's ignore
<br>this for now.)
<br>
<br>I presume I'm passing these octets to the hash algorithm using an
<br>array of (unsigned) char, right?
<br>
<br>Is this assumption also true for platforms where (unsigned) char
<br>is e.g. 32-bits (DSPs)? &nbsp;If so, the following implementation doesn=
't
<br>work there, because it makes no effort to split e.g. T=3D=3Dint (suppos=
e
<br>it's 32-bit) into four individual (unsigned) char objects.
<br>
<br>(Oh, and on such platforms, 1 =3D=3D sizeof(char) =3D=3D sizeof(int).)
<br></blockquote><div><br>That's kind of the wrong way to look at=20
SHA-256 and related algorithms--they are actually bitwise algorithms,=20
not bytewise.&nbsp; For a certain class of hash functions, naming off the t=
op
 of my head RIPEMD-160, MD5, SHA-0/1/224/256/384/512, your message is=20
treated as some number of input bits.&nbsp; These input bits are chopped up=
=20
into a block size of some number of bits, typically 512 or 1024.&nbsp; The=
=20
internal hash state is updated using a compression function with the=20
previous value of the hash state and the input block.&nbsp; This repeats=20
until there is no more input.<br><br>At the end, a single "1" bit is=20
added to the input bits.&nbsp; Then "0" bits are added until the input size=
=20
has reached (BLOCKSIZE - bitsizeof(SIZEFIELD)) mod BLOCKSIZE.&nbsp; The=20
SIZEFIELD is how big the length indicator is; this is 64 bits for the=20
512-bit blocks and 128 for the 1024-bit blocks.&nbsp; This length, which is=
=20
counted in bits, again pointing out that these algorithms work with bits
 and not bytes, is appended, which, given the modulo, completes the=20
final block.&nbsp; The compression function is applied one last time, and y=
ou have a result.<br><br>Most
 implementations simply ignore the fact that these are bitwise=20
algorithms.&nbsp; They write an extra 0x80 byte--all these algorithms use=
=20
big-endian bit order, even the little-endian byte order algorithms, like
 MD5--do the zero padding, then append the byte length multiplied by 8.<br>=
</div><br><br>On Thursday, August 28, 2014 7:58:31 PM UTC-7, Howard Hinnant=
 wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.=
8ex;border-left: 1px #ccc solid;padding-left: 1ex;">
<br>I think designing for a portable implementation of (for example) SHA256=
 for non-8-bit-byte platforms is an over-reaching goal. &nbsp;Forgetting ha=
sh_append, and even C++, and designing whatever API you cared for, how woul=
d you write an implementation of SHA256 that was byte-size agnostic? &nbsp;=
And how would that impact the API of SHA256?
<br>
<br>Today, the SHA256 implementations I've seen (written in C) are simply #=
ifdef=E2=80=99d on endian, and are not portable to non-8-bit-byte platforms=
.. &nbsp;The hash_append proposal is not trying to impact this status-quo. &=
nbsp;I consider all of this HashAlgorithm details to be worked out by the H=
ashAlgorithm author.
<br></blockquote><br>Most are not portable to systems with INT_MAX &gt; UIN=
T32_MAX, either, as I pointed out earlier...<br><br>Melissa<br></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

------=_Part_410_1374794668.1409283929071--

.


Author: Jens Maurer <Jens.Maurer@gmx.net>
Date: Fri, 29 Aug 2014 23:48:59 +0200
Raw View
On 08/29/2014 04:58 AM, Howard Hinnant wrote:
> As currently coded, the run-time-sized containers append the size of the =
container.

Good.

> Append was chosen as opposed to prepend so that forward_list<T> would not=
 have to traverse twice during hash_append.

Good point.

> I think designing for a portable implementation of (for example) SHA256 f=
or non-8-bit-byte platforms is an over-reaching goal.  Forgetting hash_appe=
nd, and even C++, and designing whatever API you cared for, how would you w=
rite an implementation of SHA256 that was byte-size agnostic?  And how woul=
d that impact the API of SHA256?

I'll take that as a challenge.

> Today, the SHA256 implementations I've seen (written in C) are simply #if=
def'd on endian, and are not portable to non-8-bit-byte platforms.

I agree with that.

>   OTOH, if char is
> padded with 24 bits of random values, the hash_append will be
> prohibited (by the specification) from sending those random bits to
> the hash algorithm.  hash_append might (for example) zero all the
> padding bits before sending the byte to the hashing algorithm.

This cannot happen for "char" because "char" has no padding.
See 3.9.1p1 "all bits of the object representation participate in the value
representation."

(It would be hard to reliably zero the padding, because random noise might
pop up again any time I read the "char".)

Padding can happen for "int" and other scalars.

> <shrug> My understanding is that hash algorithms consume bytes (and somet=
imes bits).  They don't care what the type is.

My understanding is that hash algorithms consume octets (and sometimes bits=
).
They don't care what the type is as long as they get values in the range
0-255 (one octet) at a time.  Anything else would be non-portable on
the input side, so I cannot expect portable output values.

>> One way to address this is to split the "int" into four octets and
>> assign a separate "unsigned char" for each octet in the hash_append
>> function on the DSP-style platform.  Then both calls end up as
>>
>>   h(&x, 4);
>>   h(&c, 1);
>=20
> The hash algorithm 'h' is going to see bytes, no matter what we do (void*=
 or unsigned char*).  Whether hashing algorithms change those bytes into oc=
tets (or words) or not is completely within their implementation.

We could say that we pass "unsigned char *" and constrain each "unsigned ch=
ar" to the minimum
value range that is portably representable in an "unsigned char", i.e. 0 ..=
 255.

> If the committee proclaims that the "message" sent by a 32 bit int should=
 be different than the "message" sent by a 32 bit char should be different,=
 so be it.  The current hash_append proposal purposefully does not dictate =
such details.  My feeling is that dictating such details is bound to advers=
ely impact efficiency, but I am happy to see those details worked out in co=
mmittee.

See you there :-)

>> I'd like to ask:
>>
>> - that (core) hash algorithm implementations such as SHA256 do not
>> specify the "endian" thing (it doesn't mean anything at this level) and
>=20
> do you mean aspect 1 (input), or aspect 2 (output), or both?

Aspect 1 (input, e.g. mapping of source-level "short" to bytes).

>> - that there is a config option to do scalar endian conversions if so de=
sired.
>>
>> Example:
>>
>>  std::uhash<std::sha256>        // unportable for scalars > char
>>  std::uhash<std::sha256, std::endian::big>   // convert scalars > char t=
o "big endian" prior to feeding octets to hash algorithm
>=20
> This part sounds exactly what I'm proposing, except that the specificatio=
n is applied to std::sha256 (and all hashing algorithms in general), instea=
d of to std::uhash (and all hash functors in general).

std::uhash<> is one specific hash functor that uses ADL on hash_append() to=
 do its job.

> The hash functor is the wrong place to specify endian requests because th=
e hash functor should be as simple as possible.  It is only responsible for=
:
>=20
> 1.  Initializing the hash algorithm.
> 2.  Updating the hash algorithm with a single value.
> 3.  Finalizing the hash algorithm.
>=20
>     template <class T>
>     result_type
>     operator()(T const& t) const noexcept
>     {
>         Hasher h;
>         hash_append(h, t);
>         return static_cast<result_type>(h);
>     }

My suggestion would lead to

    template <class T>
    result_type
    operator()(T const& t) const noexcept
    {
        ConvertToBigEndian<Hasher> h;
        hash_append(h, t);
        return static_cast<result_type>(h);
    }

in one of the three specializations (the other two are similar).
The hash functor requirements remain the same (and very simple).

> With as-simple-as-possible hash functor requirements, one maximizes the a=
bility of the programmer to create a custom hash functor to do things such =
as seeding and/or salting.

Thinking about it, I see the endian normalization for aspect 1 (input)
as a similar user-level customization as seeding or salting.  This
should not be related to something like std::sha256.

> On the other hand, hash_append is the perfect place to respond to such a =
request as hash_append will know what type to be hashed it is dealing with =
(which may or may not have endian concerns), and what type of hash algorith=
m it is dealing with.  If the hash algorithm specifies the desired endian i=
nput, this is very easily taken care of.  For example:
>=20
>     template <class HashAlgorithm>
>     hash_append(HashAlgorithm& h, char c) noexcept
>     {
>         h(&c, 1);  // never any endian concerns
>     }

Agreed.

> On the other hand, for scalars > 1:
>=20
>     template <class HashAlgorithm>
>     hash_append(HashAlgorithm& h, int i) noexcept
>     {
>         if (HashAlgorithm::endian !=3D endian::native)
>             i =3D convert_native_to(i, HashAlgorithm::endian)
>         h(&i, sizeof(i));
>     }

My suggestion is simpler: Let's have a more specialized overload:

    template <class HashAlgorithm>
    hash_append(ConvertToBigEndian<HashAlgorithm>& h, int i) noexcept
    {
        i =3D convert_native_to_big_endian(i);
        h.base()(&i, sizeof(i));
    }

No fancy traits, no template metaprogramming.

> The above does not have to be the literal implementation.  I would prefer=
 compile-time branching instead of run-time branching on the compile-time q=
uestion of endian.  But I'm just trying to simplify the presentation.  And =
note that for platforms where sizeof(int) =3D=3D sizeof(char) (and all bits=
 are part of the representation), the std::lib implementation simplifies do=
wn to:

Sure.  (Not that either option would produce different code with today's op=
timizers.)
=20
> I.e. the implementation of the std::lib isn't portable.  But that's ok.  =
The std::lib implementors write non-portable code so that we don't have to.

Fully agreed here.  However, I see this interface framework as something th=
at
could have been user-written, so I'd like to see how it breaks for users
trying to do the same thing.

>> Well, there is a lost optimization opportunity if you're hashing an
>> array of unsigned int (32 bits) on a platform and with an endianess
>> choice that is just "right".  Otherwise, you get two endianess conversio=
ns
>> back-to-back: One for the scalar > char thing from above, and one when
>> sha256 tries to form its internal 32-bit chunks.
>=20
> This is precisely the optimization that my hash_proposal has gone to grea=
t lengths to preserve.=20

How so?  Suppose we want to hash std::vector<int> and we're on a little end=
ian machine.
Further, suppose the "portable hash value" convention, as defined by some u=
ser
community, says you should hash an "int" value as a big-endian sequence of =
bytes.
So, for "aspect 1" (endian conversion while forming bytes), you'll do a byt=
e reversal
before passing the bytes through the "void *" interface border.

Then, according to http://tools.ietf.org/html/rfc6234#page-47 (bottom part)=
,
we'll have an endian conversion for forming 32-bit "words" from input bytes=
 for
SHA256 again.  This is "aspect 2", and beyond the "void *" interface border=
..

I can't see how to avoid the useless double endian conversion for this spec=
ific
case with your design.

>>> My understanding from http://en.wikipedia.org/wiki/Sha256 is that SHA25=
6 output endian is always big.
>>
>> The output is a sequence of octets.  It might be that this sequence is i=
nterpreted in
>> big endian style for (some) display purposes, but that's a minor detail.
>=20
> If by "display purposes" you also mean transmission across a network, I a=
gree, though for me it is a major detail.

Fine.  As long as there is a well-defined sequence of octets coming
out of it, I'm happy to call it duck-endian if need be.

Jens

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/.

.


Author: Markus Mayer <lotharlutz@gmx.de>
Date: Sun, 31 Aug 2014 19:15:21 +0200
Raw View
Thanks for your valuable feedback.

If checked the proposal for its feasibility for different byte-orders
and for odd-sized architectures. Here are my findings and fixes:
- The result type is a array of std::uint_least8_t. Each cell contains a
value between 0 and 255 (including 0 and 255). This can be represented
by every architecture. Byte-order is no problem because it is sequence
of single bytes.

- process_bytes take an address and a count. It hashes 'count' bytes
(which have CHAR_BITS bits) starting a the given address in the order
they are represented in memory. Every bit of this range contributes to
the result. No byte-order conversion is performed. It is not the job of
the hash function. It must be handled by an upper layer.

With the current proposal your are unable to hash a range that is not a
multiple of a byte. It is possible to add a process_bits(unsigned char
byte, std::size_t bit_count), but I thinks it is not that useful. We can
add it once we have experience how it is actually used.


Regarding the 'const void* buffer vs. const unsinged char* buffer':
I think 'const unsigned char*' is clearer, because it requires an
explicit cast. This is like a warning sign "Warning: you have to care
about byte-order, padding and other stuff by yourself!".

The 'const void*' solution is more convenient when you want to hash a
struct or a single variable.

I came to the conclusion that it depends on how common the 'interprete
this as a byte range' case is. I tried to come up with some actual
numbers, but my google fun isn't strong enough.

For my usage in 80% I do not have a unsigned char*. But what about your
experience?


Other changes:
- Add a constexpr block_size. It specifies the smallest number of bytes
(having CHAR_BITS bits), the buffer length have to be to achieve optimal
performance. It is implementation defined, because it is a property of
the specific implementation.


Open questions:
- process_bytes vs. operator(), or keep both
- const void* buffer vs. const unsinged char* buffer

class hash_function
{
public:
    typedef std::array<std::uint_least8_t, ALGORITHM_DEFINED> result_type;
    static constexpr std::size_t block_size = IMPLEMENTATION_DEFINED;
    //Default contructable
    //Copyable
    //Moveable

    hash_function& process_bytes( const void *buffer, std::size_t
byte_count);

    hash_function& operator() ( const void *buffer, std::size_t byte_count);

    void reset();

    result_type hash_value() const;

};

template<class InputIt, class Hasher>
process_blockwise(InputIt first, InputIt last, Hasher& hash_fn);

//Maybe add process_blockwise_n

regards
    Markus

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Jens Maurer <Jens.Maurer@gmx.net>
Date: Wed, 10 Sep 2014 21:32:01 +0200
Raw View
On 08/31/2014 07:15 PM, Markus Mayer wrote:
> Regarding the 'const void* buffer vs. const unsinged char* buffer':
> I think 'const unsigned char*' is clearer, because it requires an
> explicit cast. This is like a warning sign "Warning: you have to care
> about byte-order, padding and other stuff by yourself!".
>
> The 'const void*' solution is more convenient when you want to hash a
> struct or a single variable.
>
> I came to the conclusion that it depends on how common the 'interprete
> this as a byte range' case is. I tried to come up with some actual
> numbers, but my google fun isn't strong enough.
>
> For my usage in 80% I do not have a unsigned char*. But what about your
> experience?

My experience is that type-safety is one of the strongest
assets of C++.  Also, I dislike an interface that makes it
easy to fall into a portability trap without warning.
(Yes, it's also bad that C++ allows narrowing conversions
on integers (which is a portability issue), but fortunately,
some compilers warn about these nowadays.)

Jens

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Zhihao Yuan <zy@miator.net>
Date: Wed, 10 Sep 2014 16:40:55 -0400
Raw View
--bcaec51a70a29a92c40502bc1174
Content-Type: text/plain; charset=UTF-8

On Wed, Sep 10, 2014 at 3:32 PM, Jens Maurer <Jens.Maurer@gmx.net> wrote:

>
> > For my usage in 80% I do not have a unsigned char*. But what about your
> > experience?
>
> My experience is that type-safety is one of the strongest
> assets of C++.  Also, I dislike an interface that makes it
> easy to fall into a portability trap without warning.
> (Yes, it's also bad that C++ allows narrowing conversions
> on integers (which is a portability issue), but fortunately,
> some compilers warn about these nowadays.)
>

Maybe we should standardize `storageof` as an `addressof`
returning `unsigned char*` for anything contiguously hashable.
The term `storage` comes from `aligned_storage`, which
give you an `unsigned char[]`.

--
Zhihao Yuan, ID lichray
The best way to predict the future is to invent it.
___________________________________________________
4BSD -- http://bit.ly/blog4bsd

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

--bcaec51a70a29a92c40502bc1174
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Wed, Sep 10, 2014 at 3:32 PM, Jens Maurer <span dir=3D"=
ltr">&lt;<a href=3D"mailto:Jens.Maurer@gmx.net" target=3D"_blank">Jens.Maur=
er@gmx.net</a>&gt;</span> wrote:<br><div class=3D"gmail_extra"><div class=
=3D"gmail_quote"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8=
ex;border-left:1px #ccc solid;padding-left:1ex"><span class=3D""><br>
&gt; For my usage in 80% I do not have a unsigned char*. But what about you=
r<br>
&gt; experience?<br>
<br>
</span>My experience is that type-safety is one of the strongest<br>
assets of C++.=C2=A0 Also, I dislike an interface that makes it<br>
easy to fall into a portability trap without warning.<br>
(Yes, it&#39;s also bad that C++ allows narrowing conversions<br>
on integers (which is a portability issue), but fortunately,<br>
some compilers warn about these nowadays.)<span class=3D"HOEnZb"></span><br=
></blockquote></div><br></div><div class=3D"gmail_extra">Maybe we should st=
andardize `storageof` as an `addressof`<br></div><div class=3D"gmail_extra"=
>returning `unsigned char*` for anything contiguously hashable.<br></div><d=
iv class=3D"gmail_extra">The term `storage` comes from `aligned_storage`, w=
hich<br></div><div class=3D"gmail_extra">give you an `unsigned char[]`.<br>=
</div><div class=3D"gmail_extra"><br>-- <br>Zhihao Yuan, ID lichray<br>The =
best way to predict the future is to invent it.<br>________________________=
___________________________<br>4BSD -- <a href=3D"http://bit.ly/blog4bsd" t=
arget=3D"_blank">http://bit.ly/blog4bsd</a>
</div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

--bcaec51a70a29a92c40502bc1174--

.