Thread

Topic: rand and random_shuffle

Author: kanze@gabi-soft.de (James Kanze)
Date: Sat, 4 Jan 2003 00:25:02 +0000 (UTC) Raw View

allan_w@my-dejanews.com (Allan W) wrote in message
news:<7f2735a5.0212311040.3a95e05b@posting.google.com>...
> kanze@gabi-soft.de (James Kanze) wrote
> > james@rethguals.fsnet.co.uk ("James Slaughter") wrote
> > > I was recently perusing the C99 standard when I spotted the
> > > following remark, under section 7.20.2.1 (para 3):

> > > "The implementation shall behave as if no library function calls
> > > the rand function."

> > > What does this mean for C++ library implementations?

> > That the implementation shall behave as if no library function calls
> > the rand function.

> I can make a guess as to the motivation. For a given seed value passed
> to srand(), the rand() function always produces the same sequence of
> values. A library call to rand() would consume one of those values.

Quite.

It sounds funny, but in many cases, it is important for rand() to be
predictable.  If I give it the same seed, I should recover the same
series of values.  (This is important for debugging, if for no other
reason.)

> > > I find no reference to, or exemption from, this restriction in the
> > > C++ standard, but do notice that several very good implementations
> > > of 'random_shuffle' et al do call the C function.

> > I don't know whether it is intentional or not, but if they do so
> > without somehow saving the C function's state and resuming it after,
> > they are not conform to the standard as it is written.

> And if they do save the state and resume after... well, that wouldn't
> be very random behavior, right?

The standard sets no requirements for the quality of the randomness.
Neither for rand() nor for random_shuffle().

> Imagine calling random_shuffle() three times without any intervening
> calls to rand(). The shuffle would be identical for all three calls.

> > as std::numeric_limits<size_t> -- on my system, RAND_MAX == 32767,
> > whereas std::numeric_limits<size_t>::max() == 18446744073709551615;
> > the difference is far from negligible (and it really isn't uncommon
> > to have containers with more than 32767 elements).  For that matter,
> > whatever the value of RAND_MAX, the type of the return value cannot
> > exceed 2147483647, which is still a good deal less than the
> > theoretical maximum number of elements in a container.

> [ 18446744073709551615 is 2<<64-1, and 32767 is 2<<15-1. ]

> I haven't yet used any 64-bit systems. However, I suspect that
> RAND_MAX would commonly be either 2<<64-1 or 2<<63-1 [ =
> 9223372036854775807 ].

Well, it can't be the former, because it has to fit in an int:-).

For various historical reasons, all of the systems I can recall have
RAND_MAX == 32767.  Don't ask me why: that was the value of RAND_MAX in
early Unix implementations, but I can't for the life of me imagine how
increasing it could break code.

Note that RAND_MAX is not necessarily the period.  (I'm too lazy to fire
up adb and find out for sure, but I seem to recall having run some tests
which suggested that Sun's rand() had a period of a little less than
2^31.  But in both 32 and 64 bit mode, RAND_MAX is 32767.)

> > There's also a problem in the standard, in that the parameter of the
> > user defined function is a ptrdiff_t, and not a size_t.  In
> > practice, this shouldn't cause a problem very often, however.

> And I suspect that on your system,
> std::numeric_limits<ptrdiff_t>::max() == 9223372036854775807

Correct.

The system isn't exactly exotic: a very ordinary Sparc under Solaris.
(Sparcs have been 64 bits for at least five years now.)

> > Finally, of course, if I define my own allocator, with a
> > size_type/difference_type which is a user defined type, rand is also
> > no good.

> The standard isn't clear on this. It requires that rand(n) returns "a
> randomly chosen value, which lies in the interval [0,n), and which has
> a type that is convertible to
> iterator_traits<RandomAccessIterator>::difference_type". But it
> doesn't say anything about n.

It does say one thing: n has type int.  On my system, size_t is a 64 bit
type, and int's are 32 bits.  So even in the best of cases, rand() won't
be able to cover the range.  As it is, rand() never returns a value
greater than 32767.

What's the point of having 18446744073709551615 bytes of address space
if you can't shuffle arrays of more than 32767 elements:-)?

> It seems to me that conceptually, it would be possible to have
> random_shuffle use a random generator whose range is far more
> constrained than size_t. For instance, it could call a function that
> was functionally identical to rand(), save only that it has it's own
> key. Conceptually, this difference could be handled by calling rand()
> several times if the container has more than RAND_MAX elements, and
> somehow combining the results. I am aware that this would tend to skew
> the random results.

If rand() is any good, there are ways of avoiding this skew.

> I am also aware that random_shuffle would specifically have to be
> handled to recognize the fact that the container has more than
> RAND_MAX elements, which would be particularly difficult in the case
> of the three-argument version.

In the case of the three argument-version, who cares.  It's the users
problem.

> > There's obviously a problem here (defect report).

> Yes.

It occurs to me that there is perhaps another defect as well.  Does the
rule that a library function cannot call rand also apply if I pass rand
as the parameter to the three argument version of random_shuffle?
(IMHO, it certainly shouldn't.)

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: thomas.hyer@ubsw.com (Tom Hyer)
Date: Tue, 31 Dec 2002 19:22:04 +0000 (UTC) Raw View

kanze@gabi-soft.de (James Kanze) wrote in message news:<d6651fb6.0212300528.5506960e@posting.google.com>...
> james@rethguals.fsnet.co.uk ("James Slaughter") wrote in message
> news:<auffq7$10j$1@newsg4.svr.pol.co.uk>...
>
> > I was recently perusing the C99 standard when I spotted the following
> > remark, under section 7.20.2.1 (para 3):
>
> > "The implementation shall behave as if no library function calls the
> > rand function."
>
> > What does this mean for C++ library implementations?
>
> That the implementation shall behave as if no library function calls the
> rand function.
>

    Bear in mind that the rand function has some internal state, which
changes every time it is called.  So what this means is that if you
don't call rand explicitly, then its state won't change due to other
library calls you make.

-- Tom Hyer

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kanze@gabi-soft.de (James Kanze)
Date: Mon, 30 Dec 2002 20:21:49 +0000 (UTC) Raw View

james@rethguals.fsnet.co.uk ("James Slaughter") wrote in message
news:<auffq7$10j$1@newsg4.svr.pol.co.uk>...

> I was recently perusing the C99 standard when I spotted the following
> remark, under section 7.20.2.1 (para 3):

> "The implementation shall behave as if no library function calls the
> rand function."

> What does this mean for C++ library implementations?

That the implementation shall behave as if no library function calls the
rand function.

> I find no reference to, or exemption from, this restriction in the C++
> standard, but do notice that several very good implementations of
> 'random_shuffle' et al do call the C function.

I don't know whether it is intentional or not, but if they do so without
somehow saving the C function's state and resuming it after, they are
not conform to the standard as it is written.

Have you reported this error to your vendor.  On most machines I've
worked on, RAND_MAX has been too small to make rand useful here anyway.

> Is this restriction new for C99 (in which case it doesn't apply to
> C++),

No.

> or intended to be deliberately ignored by C++ (does it only encompass
> the C library functions?).

If there had been actual intent, I'm sure words would have been added to
say so.  More likely the issue was just overlooked.

> Indeed, if the version of random_shuffle which does not accept a
> generator does not use rand, it's not immediately clear how it should
> be seeded, but there would appear to be a lack of specification in
> this area.

There is absolutely no specification concerning random_shuffle without a
user specified generator.  Generally, when a function exists in two
forms, like this, the standard specifies what function should be used in
the case where the user doesn't provide one.  In this case, of course,
rand cannot be used for several reasons -- a library function is not
allowed to call it, as you've mentionned, but also, the standard
requires a return type in the interval [0, n), where n can be as large
as std::numeric_limits<size_t> -- on my system, RAND_MAX == 32767,
whereas std::numeric_limits<size_t>::max() == 18446744073709551615; the
difference is far from negligible (and it really isn't uncommon to have
containers with more than 32767 elements).  For that matter, whatever
the value of RAND_MAX, the type of the return value cannot exceed
2147483647, which is still a good deal less than the theoretical maximum
number of elements in a container.

There's also a problem in the standard, in that the parameter of the
user defined function is a ptrdiff_t, and not a size_t.  In practice,
this shouldn't cause a problem very often, however.

Finally, of course, if I define my own allocator, with a
size_type/difference_type which is a user defined type, rand is also no
good.

There's obviously a problem here (defect report).  Since the standard
doesn't say anything, it's pretty much implemnentation dependant as to
what function is used by default, but it is clear that rand cannot be
used, even without the specific words banning its use.  And as you say,
there is the problem of seeding whatever is used.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: allan_w@my-dejanews.com (Allan W)
Date: Thu, 2 Jan 2003 06:37:21 +0000 (UTC) Raw View

kanze@gabi-soft.de (James Kanze) wrote
> james@rethguals.fsnet.co.uk ("James Slaughter") wrote
> > I was recently perusing the C99 standard when I spotted the following
> > remark, under section 7.20.2.1 (para 3):
>
> > "The implementation shall behave as if no library function calls the
> > rand function."
>
> > What does this mean for C++ library implementations?
>
> That the implementation shall behave as if no library function calls the
> rand function.

I can make a guess as to the motivation. For a given seed value passed
to srand(), the rand() function always produces the same sequence of
values. A library call to rand() would consume one of those values.

> > I find no reference to, or exemption from, this restriction in the C++
> > standard, but do notice that several very good implementations of
> > 'random_shuffle' et al do call the C function.
>
> I don't know whether it is intentional or not, but if they do so without
> somehow saving the C function's state and resuming it after, they are
> not conform to the standard as it is written.

And if they do save the state and resume after... well, that wouldn't be
very random behavior, right? Imagine calling random_shuffle() three times
without any intervening calls to rand(). The shuffle would be identical
for all three calls.

> as std::numeric_limits<size_t> -- on my system, RAND_MAX == 32767,
> whereas std::numeric_limits<size_t>::max() == 18446744073709551615; the
> difference is far from negligible (and it really isn't uncommon to have
> containers with more than 32767 elements).  For that matter, whatever
> the value of RAND_MAX, the type of the return value cannot exceed
> 2147483647, which is still a good deal less than the theoretical maximum
> number of elements in a container.

[ 18446744073709551615 is 2<<64-1, and 32767 is 2<<15-1. ]

I haven't yet used any 64-bit systems. However, I suspect that RAND_MAX
would commonly be either 2<<64-1 or 2<<63-1 [ = 9223372036854775807 ].

> There's also a problem in the standard, in that the parameter of the
> user defined function is a ptrdiff_t, and not a size_t.  In practice,
> this shouldn't cause a problem very often, however.

And I suspect that on your system,
std::numeric_limits<ptrdiff_t>::max() == 9223372036854775807

> Finally, of course, if I define my own allocator, with a
> size_type/difference_type which is a user defined type, rand is also no
> good.

The standard isn't clear on this. It requires that rand(n) returns "a
randomly chosen value, which lies in the interval [0,n), and which
has a type that is convertible to
iterator_traits<RandomAccessIterator>::difference_type". But it doesn't
say anything about n.

It seems to me that conceptually, it would be possible to have
random_shuffle use a random generator whose range is far more
constrained than size_t. For instance, it could call a function that
was functionally identical to rand(), save only that it has it's own
key. Conceptually, this difference could be handled by calling rand()
several times if the container has more than RAND_MAX elements, and
somehow combining the results. I am aware that this would tend to
skew the random results. I am also aware that random_shuffle would
specifically have to be handled to recognize the fact that the
container has more than RAND_MAX elements, which would be particularly
difficult in the case of the three-argument version.

> There's obviously a problem here (defect report).

Yes.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: james@rethguals.fsnet.co.uk ("James Slaughter")
Date: Thu, 26 Dec 2002 19:40:28 +0000 (UTC) Raw View

Dear All -

I was recently perusing the C99 standard when I spotted the following
remark, under section 7.20.2.1 (para 3):

"The implementation shall behave as if no library function calls the rand
function."

What does this mean for C++ library implementations? I find no reference to,
or exemption from, this restriction in the C++ standard, but do notice that
several very good implementations of 'random_shuffle' et al do call the C
function.

Is this restriction new for C99 (in which case it doesn't apply to C++), or
intended to be deliberately ignored by C++ (does it only encompass the C
library functions?). Indeed, if the version of random_shuffle which does not
accept a generator does not use rand, it's not immediately clear how it
should be seeded, but there would appear to be a lack of specification in
this area.

Please correct me :)

Regards,
James.


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]