Thread

Topic: possible additions to 26.4 [random number generation]

Author: Walter E Brown <wb@fnal.gov>
Date: Tue, 13 Nov 2007 17:25:18 CST Raw View

On 2007-11-12 11:10 AM, jkherciueh@gmx.net wrote:
> I recently embarked on implementing all distributions from N2461, which
> turned out to be a fun exercise. In the process, I noted that the random
> library is well-thought out (in particular the separation of distributions
> from their parameter types is an interesting detail).
>
> Anyway, I found that a few issues could be addressed better. Given that the
> library is well-designed, I might very well be mistaken; and therefore I
> would like to get some feed-back on the following ideas. In particular, if
> any of the following have been discussed and rejected, I would be
> interested in the rationale (or a pointer thereto).
>
>
>
>
> a) Support for unpredictable seeds
> ==================================
>
> The default constructor for all engines creates an object with fully
> predictable state. In order to create a random number generator whose
> initial state is different, one needs to provide a seed. If I want to
> create a random sequence that varies between runs of the program, I will
> need a somewhat unpredictable seed. May I suggest to add a convenience
> function in the utilities section 26.4.7:
>
>   <implementation defined arithmetic type> unpredictable_seed ();
>
> This should return (through implementation-defined means) what the name
> suggests: a seed-value that is hard to guess (at least harder than the
> current time).

This is the sort of use for which the seed_seq utility was intended.  By
constructing a seed_seq object from, say, the current time and the
process id, one obtains an entity that makes a good seed for any engine.
  Even if many seeds are needed (e.g., for tasks executing
concurrently), even small differences in the time and/or the process id
will lead to large differences in the outputs obtained from seed_seq's
constructed from them, and hence also to large differences in the
initial states of the engines.

>
>
>
>
> b) Basic support for combinatorial distributions
> ================================================
>
> I understand the rationale from N1588 (section 5) that it would not be
> feasible to include distributions that return collections of values instead
> of values. However, I think it would be possible and useful to make it
> easier for users to build such distributions on top of the sampling
> distributions provided by the library. In particular, I think it would be
> useful to add a discrete distribution with modifiable weights to 26.4.8.5,
> something like this:
>
>   template < typename WeightType = unsigned long, typename IntType = int >
>   class dynamic_discrete_distribution {
>   public:
>
>     typedef WeightType  weight_type;
>     typedef IntType     result_type;
>
>     class param_type {
>     public:
>
>       typedef dynamic_discrete_distribution distribution_type;
>       typedef WeightType weight_type;
>       typedef IntType    result_type;
>
>       param_type ();
>
>       template < typename InputIterator >
>       param_type ( InputIterator firstW, InputIterator lastW );
>
>       vector< double > probabilities ( void ) const;
>
>       result_type min ( void ) const;
>       result_type max ( void ) const;
>
>       weight_type weight ( result_type index ) const;
>       void weight ( result_type index, weight_type new_weight );
>       weight_type total ( void ) const;
>
>     }; // param_type
>
>     dynamic_discrete_distribution ();
>
>     template < typename InputIterator >
>     dynamic_discrete_distribution
>        ( InputIterator firstW, InputIterator lastW );
>
>     explicit
>     dynamic_discrete_distribution ( param_type const & parm );
>
>     vector< double > probabilities ( void ) const;
>
>     param_type param ( void ) const;
>
>     void param ( param_type const & parm );
>
>     result_type min ( void ) const;
>     result_type max ( void ) const;
>
>     weight_type weight ( result_type index ) const;
>     void weight ( result_type index, weight_type new_weight );
>     weight_type total ( void ) const;
>
>     void reset ( void );
>
>     template < typename UniformRNG >
>     result_type operator() ( UniformRNG & urng, param_type const & parm );
>
>     template < typename UniformRNG >
>     result_type operator() ( UniformRNG & urng );
>
>   }; // dynamic_discrete_distribution
>
>   Note: the sum of weights can be 0, however in that state,
>   sampling has undefined behavior.
>
>
> Of course, sampling from a distribution with changing weights can be
> realized on top of discrete_distribution<>. However, there is (a) a
> performance penalty in the case of many weights and many modifications; and
> one suffers (b) from a loss in code-coherence: Consider the following piece
> (which is the core part of generating a random combination without
> repetitions):
>
>   std::tr1::mt19937 urng;
>   dynamic_discrete_distribution< unsigned int > box
>     ( count.begin(), count.end() );
>   while ( box.total() > 0 ) {
>     unsigned int index = box( urng );
>     box.weight( index, box.weight( index ) - 1 );
>     std::cout << index << '\n';
>   }
>
> versus:
>
>   std::tr1::mt19937 urng;
>   unsigned int total = std::accumulate ( count.begin(), count.end(), 0u );
>   discrete_distribution<> dist;
>   while ( total > 0 ) {
>     kubux::discrete_distribution<>::param_type parm
>       ( count.begin(), count.end() );
>     unsigned int index = dist( urng, parm );
>     -- count[index];
>     -- total;
>     std::cout << index << '\n';
>   }
>
> In the second snippet, the programmer faces the alternative of either
> recomputing the total from scratch each iteration (not shown) or updating
> the total manually (shown). In the first snippet, the data is centralized
> and all invariants are handled by the box-object.

We agree that this is an interesting distribution.  However, it is not
clear that it is sufficiently general to warrant inclusion in the C++0X
standard.

We view your use case as simulating the drawing, without replacement, of
numbered objects from a box.  An alternative solution, only making use
of existing library components, would entail creating an std:vector<> to
contain instances of the numbered objects, then applying
std::random_shuffle() to that vector, then "drawing" sequentially from
the resulting shuffled vector.

>
>
>
>
> c) Convenient generation of uniform samples in half-open ranges
> ===============================================================
>
> This would probably be what occasional users of the random-facilities will
> be looking for most often. To make their life easier (predicting that they
> will resort to rand() if not handed something that is _very_ convenient to
> use), I would suggest to add a convenience class to 26.4.7 along the
> following lines:
>
>   template < typename Engine = implementation_defined >
>   class urandom {
>
>     // default construct with unpredictable inital state
>     urandom ();
>
>     // construct from seed
>     template < typename SeedType >
>     urandom ( SeedType const & seed )
>
>     // return a uniform deviate in [0,bound)
>     template < typename ArithmeticType >
>     ArithmeticType operator() ( ArithmeticType bound );
>
>   }; // urandom
>
> It can be used to create a (static) rng-object, which can replace rand() and
> takes care of the most basic sampling problem.

We are sympathetic to this idea, but believe there is insufficient time
to craft a formal proposal before C++0X is finalized.  In part, this is
because the above urandom template meets neither the requirements of an
engine nor those of a distribution, and so we would need to spell out
very carefully (especially in light of the forthcoming Concepts work)
what its exact requirements would be.

However, there is a second Technical Report already being drafted.  We
would be willing to work with you to craft a formal proposal targeting
that future TR.

Best,

-- WEB

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: jkherciueh@gmx.net
Date: Wed, 14 Nov 2007 11:17:09 CST Raw View

Walter E Brown wrote:

> On 2007-11-12 11:10 AM, jkherciueh@gmx.net wrote:
[snip]
>> a) Support for unpredictable seeds
>> ==================================
>>
>> The default constructor for all engines creates an object with fully
>> predictable state. In order to create a random number generator whose
>> initial state is different, one needs to provide a seed. If I want to
>> create a random sequence that varies between runs of the program, I will
>> need a somewhat unpredictable seed. May I suggest to add a convenience
>> function in the utilities section 26.4.7:
>>
>>   <implementation defined arithmetic type> unpredictable_seed ();
>>
>> This should return (through implementation-defined means) what the name
>> suggests: a seed-value that is hard to guess (at least harder than the
>> current time).
>
> This is the sort of use for which the seed_seq utility was intended.  By
> constructing a seed_seq object from, say, the current time and the
> process id, one obtains an entity that makes a good seed for any engine.
>   Even if many seeds are needed (e.g., for tasks executing
> concurrently), even small differences in the time and/or the process id
> will lead to large differences in the outputs obtained from seed_seq's
> constructed from them, and hence also to large differences in the
> initial states of the engines.

Thanks for the explanation.



>> b) Basic support for combinatorial distributions
>> ================================================
>>
>> I understand the rationale from N1588 (section 5) that it would not be
>> feasible to include distributions that return collections of values
>> instead of values. However, I think it would be possible and useful to
>> make it easier for users to build such distributions on top of the
>> sampling distributions provided by the library. In particular, I think it
>> would be useful to add a discrete distribution with modifiable weights to
>> 26.4.8.5, something like this:
>>
>>   template < typename WeightType = unsigned long, typename IntType = int
>>   > class dynamic_discrete_distribution {
>>   public:
>>
>>     typedef WeightType  weight_type;
>>     typedef IntType     result_type;
>>
>>     class param_type {
>>     public:
>>
>>       typedef dynamic_discrete_distribution distribution_type;
>>       typedef WeightType weight_type;
>>       typedef IntType    result_type;
>>
>>       param_type ();
>>
>>       template < typename InputIterator >
>>       param_type ( InputIterator firstW, InputIterator lastW );
>>
>>       vector< double > probabilities ( void ) const;
>>
>>       result_type min ( void ) const;
>>       result_type max ( void ) const;
>>
>>       weight_type weight ( result_type index ) const;
>>       void weight ( result_type index, weight_type new_weight );
>>       weight_type total ( void ) const;
>>
>>     }; // param_type
>>
>>     dynamic_discrete_distribution ();
>>
>>     template < typename InputIterator >
>>     dynamic_discrete_distribution
>>        ( InputIterator firstW, InputIterator lastW );
>>
>>     explicit
>>     dynamic_discrete_distribution ( param_type const & parm );
>>
>>     vector< double > probabilities ( void ) const;
>>
>>     param_type param ( void ) const;
>>
>>     void param ( param_type const & parm );
>>
>>     result_type min ( void ) const;
>>     result_type max ( void ) const;
>>
>>     weight_type weight ( result_type index ) const;
>>     void weight ( result_type index, weight_type new_weight );
>>     weight_type total ( void ) const;
>>
>>     void reset ( void );
>>
>>     template < typename UniformRNG >
>>     result_type operator() ( UniformRNG & urng, param_type const & parm
>>     );
>>
>>     template < typename UniformRNG >
>>     result_type operator() ( UniformRNG & urng );
>>
>>   }; // dynamic_discrete_distribution
>>
>>   Note: the sum of weights can be 0, however in that state,
>>   sampling has undefined behavior.
>>
>>
>> Of course, sampling from a distribution with changing weights can be
>> realized on top of discrete_distribution<>. However, there is (a) a
>> performance penalty in the case of many weights and many modifications;
>> and one suffers (b) from a loss in code-coherence: Consider the following
>> piece (which is the core part of generating a random combination without
>> repetitions):
>>
>>   std::tr1::mt19937 urng;
>>   dynamic_discrete_distribution< unsigned int > box
>>     ( count.begin(), count.end() );
>>   while ( box.total() > 0 ) {
>>     unsigned int index = box( urng );
>>     box.weight( index, box.weight( index ) - 1 );
>>     std::cout << index << '\n';
>>   }
>>
>> versus:
>>
>>   std::tr1::mt19937 urng;
>>   unsigned int total = std::accumulate ( count.begin(), count.end(), 0u
>>   ); discrete_distribution<> dist;
>>   while ( total > 0 ) {
>>     kubux::discrete_distribution<>::param_type parm
>>       ( count.begin(), count.end() );
>>     unsigned int index = dist( urng, parm );
>>     -- count[index];
>>     -- total;
>>     std::cout << index << '\n';
>>   }
>>
>> In the second snippet, the programmer faces the alternative of either
>> recomputing the total from scratch each iteration (not shown) or updating
>> the total manually (shown). In the first snippet, the data is centralized
>> and all invariants are handled by the box-object.
>
> We agree that this is an interesting distribution.  However, it is not
> clear that it is sufficiently general to warrant inclusion in the C++0X
> standard.
>
> We view your use case as simulating the drawing, without replacement, of
> numbered objects from a box.  An alternative solution, only making use
> of existing library components, would entail creating an std:vector<> to
> contain instances of the numbered objects, then applying
> std::random_shuffle() to that vector, then "drawing" sequentially from
> the resulting shuffled vector.

I understand that something has to be expected to be useful before it can be
considered for inclusion into the standard. So, allow me to strengthen my
case by presenting a few other possible use cases, which I think are harder
to deal with using only means that are currently proposed.


Consider the following simulation of diffusion:

int main ( void ) {
  std::tr1::mt19937 urng;
  /*
    Setup: we are given two boxes A and B and n kinds of items.
    Box A contains a_1, a2_, ..., a_n items of each kind and
    box B contains b_1, b_2, ..., b_n items of each kind. At each
    step an item is picked at random and moved to the other box.
  */
  // We model the case n = 4;
  // The following vector is the vector a_1,...,a_n,b1,...,b_n
  // We start with all items in box A.
  std::vector< unsigned long > counts;
  counts.push_back( 200000 );
  counts.push_back( 150000 );
  counts.push_back( 4000000 );
  counts.push_back( 50000 );
  counts.push_back( 0 );
  counts.push_back( 0 );
  counts.push_back( 0 );
  counts.push_back( 0 );
  dynamic_discrete_distribution< unsigned long > two_row_grid
    ( counts.begin(), counts.end() );
  while ( true ) {
    for ( int i = 0; i < 1000; ++i ) {
      unsigned long index = two_row_grid( urng );
      unsigned long where_to_put = ( index + 4 ) % 8;
      two_row_grid.weight( index,
                           two_row_grid.weight( index ) -1 );
      two_row_grid.weight( where_to_put,
                           two_row_grid.weight( where_to_put ) + 1 );
    }
    for ( int i = 0; i < 4; ++i ) {
      std::cout << std::setw(8) << two_row_grid.weight( i ) << " ";
    }
    std::cout << std::endl;
    for ( int i = 4; i < 8; ++i ) {
      std::cout << std::setw(8) << two_row_grid.weight( i ) << " ";
    }
    std::cout << std::endl;
    std::cout << std::endl;
  }
}

If you did this using 4,000,000+200,000 + 150,000 + 50,000 = 4,400,000
objects in a vector, each keeping track in which box it currently is, the
program would use much more resources and spend most of the time computing
totals for reporting.

I think that a discrete distribution with modifiable weights would be a very
useful building block in simulations where items move around, die or
procreate, and the number of items is large enough to warant not modeling
them individually but using counts. Think of colored dots moving within a
graph where the probabilities of a dot being chosen for some action
(possibly involving moving to a neighbor) depend on how many dots of each
color are present at the given vertex. For more complicated graphs, it
becomes less and less obvious how to model that using standard containers
and random draws and shuffles. Here is a simple random walk simulation to
illustrate the point:

int main ( void ) {
  std::tr1::mt19937 urng;
  /*
    Simple random walk with destruction at either right end.

    We have n sites arranged in a line. At each step an object
    is chosen at random. It then moves to the left or right
    with equal probability. If it leaves the range, it drops
    out.
  */
  // Setup:
  unsigned int const num_sites = 10;
  std::vector< unsigned long > counts;
  for ( unsigned int i = 0; i < num_sites; ++i ) {
    counts.push_back( 1000 );
  }
  bernouli_distribution coin;
  dynamic_discrete_distribution< unsigned int > walk
    ( counts.begin(), counts.end() );

  // Simulation:
  while ( walk.total() > 0 ) {
    unsigned int site = walk( urng );
    walk.weight( site, walk.weight( site ) - 1 );
    bool left = coin( urng );
    if ( left ) {
      if ( site > 0 ) {
        --site;
        walk.weight( site, walk.weight( site ) + 1 );
      }
    } else {
      if ( site < walk.max() ) {
        ++site;
        walk.weight( site, walk.weight( site ) + 1 );
      }
    }
    for ( unsigned int i = 0; i < num_sites; ++i ) {
      std::cout << walk.weight( i ) << " ";
    }
    std::cout << std::endl;
  }
}

Note: when such a stochastic model arises from discretization of a
continuous phenomenon (like heat), the number of sites can easily grow
large, and the number of items moving around can be huge.


More complex use cases also exist where the weights are not just item
counts, e.g., a battle simulation where rifles and machine guns are
involved. Simulating each bullet, the firing frequencies will determine the
probability of the bullet coming from a machine gun or a riffle. Thus, the
weight change for a party depends on whether a riffle or a machine gun
carrier got hit.


Also, even in the simple drawing from a box example one can already see a
certain space overhead in the vector+shuffle solution. If the number of
objects is huge, that can be a drawback.


(BTW: the use cases so far suggest that changing the weight is more useful
than setting the weight. So, the interface is up for debate, I guess.)



>> c) Convenient generation of uniform samples in half-open ranges
>> ===============================================================
>>
>> This would probably be what occasional users of the random-facilities
>> will be looking for most often. To make their life easier (predicting
>> that they will resort to rand() if not handed something that is _very_
>> convenient to use), I would suggest to add a convenience class to 26.4.7
>> along the following lines:
>>
>>   template < typename Engine = implementation_defined >
>>   class urandom {
>>
>>     // default construct with unpredictable inital state
>>     urandom ();
>>
>>     // construct from seed
>>     template < typename SeedType >
>>     urandom ( SeedType const & seed )
>>
>>     // return a uniform deviate in [0,bound)
>>     template < typename ArithmeticType >
>>     ArithmeticType operator() ( ArithmeticType bound );
>>
>>   }; // urandom
>>
>> It can be used to create a (static) rng-object, which can replace rand()
>> and takes care of the most basic sampling problem.
>
> We are sympathetic to this idea, but believe there is insufficient time
> to craft a formal proposal before C++0X is finalized.  In part, this is
> because the above urandom template meets neither the requirements of an
> engine nor those of a distribution, and so we would need to spell out
> very carefully (especially in light of the forthcoming Concepts work)
> what its exact requirements would be.

I see. Which papers should I read to get up to speed in this Concepts issue?


As for the requirements, I had in mind that the SeedType would be either any
arithmetic type or seed_seq. This way, the constructors would be largely
parallel to the constructors of engines.

With regard to the ArithmeticType, I had in mind that integer and floating
point types should be allowed. Maybe, in view of the general requirements
clause 26.4.1.1 one could specify that by splitting the case into the
following signatures:

     template < typename IntType >
     IntType operator() ( IntType bound );

  requires: bound > 0;
  returns: a random integer evenly distributed in [0,bound)

     template < typename RealType >
     RealType operator() ( RealType bound );

  requires: bound > 0;
  returns: a random real evenly distributed in [0,bound)


> However, there is a second Technical Report already being drafted.  We
> would be willing to work with you to craft a formal proposal targeting
> that future TR.

Sounds good.


Thanks

Kai-Uwe Bux

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: francis.glassborow@btinternet.com (Francis Glassborow)
Date: Wed, 14 Nov 2007 21:42:12 GMT Raw View

jkherciueh@gmx.net wrote:
> I understand that something has to be expected to be useful before it can be
> considered for inclusion into the standard. So, allow me to strengthen my
> case by presenting a few other possible use cases, which I think are harder
> to deal with using only means that are currently proposed.
>

Well being useful is sort of a requirement but there are many useful
things (even C++ things :-) ) that will never be in the Standard. Those
who work on the C++ Standard have limited resources so tend to work on
the things they find useful, in our more altruistic moments we have been
known to work on things that have no practical use to ourselves but that
appear to make C++ a better tool for others.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: wb@fnal.gov (Walter E Brown)
Date: Thu, 15 Nov 2007 20:50:08 GMT Raw View

On 2007-11-14 11:17 AM, jkherciueh@gmx.net wrote:
> Walter E Brown wrote:
>
>> On 2007-11-12 11:10 AM, jkherciueh@gmx.net wrote:
<snip>
>
>>> b) Basic support for combinatorial distributions
>>> ================================================
>>>
>>> I understand the rationale from N1588 (section 5) that it would not be
>>> feasible to include distributions that return collections of values
>>> instead of values. However, I think it would be possible and useful to
>>> make it easier for users to build such distributions on top of the
>>> sampling distributions provided by the library. In particular, I think it
>>> would be useful to add a discrete distribution with modifiable weights to
>>> 26.4.8.5, something like this:
>>>
<code snipped>
>>>   Note: the sum of weights can be 0, however in that state,
>>>   sampling has undefined behavior.
>>>
>>>
>>> Of course, sampling from a distribution with changing weights can be
>>> realized on top of discrete_distribution<>. However, there is (a) a
>>> performance penalty in the case of many weights and many modifications;
>>> and one suffers (b) from a loss in code-coherence: Consider the following
>>> piece (which is the core part of generating a random combination without
>>> repetitions):
>>>
>>>   std::tr1::mt19937 urng;
>>>   dynamic_discrete_distribution< unsigned int > box
>>>     ( count.begin(), count.end() );
>>>   while ( box.total() > 0 ) {
>>>     unsigned int index = box( urng );
>>>     box.weight( index, box.weight( index ) - 1 );
>>>     std::cout << index << '\n';
>>>   }
>>>
>>> versus:
>>>
>>>   std::tr1::mt19937 urng;
>>>   unsigned int total = std::accumulate ( count.begin(), count.end(), 0u
>>>   ); discrete_distribution<> dist;
>>>   while ( total > 0 ) {
>>>     kubux::discrete_distribution<>::param_type parm
>>>       ( count.begin(), count.end() );
>>>     unsigned int index = dist( urng, parm );
>>>     -- count[index];
>>>     -- total;
>>>     std::cout << index << '\n';
>>>   }
>>>
>>> In the second snippet, the programmer faces the alternative of either
>>> recomputing the total from scratch each iteration (not shown) or updating
>>> the total manually (shown). In the first snippet, the data is centralized
>>> and all invariants are handled by the box-object.
>> We agree that this is an interesting distribution.  However, it is not
>> clear that it is sufficiently general to warrant inclusion in the C++0X
>> standard.
>>
>> We view your use case as simulating the drawing, without replacement, of
>> numbered objects from a box.  An alternative solution, only making use
>> of existing library components, would entail creating an std:vector<> to
>> contain instances of the numbered objects, then applying
>> std::random_shuffle() to that vector, then "drawing" sequentially from
>> the resulting shuffled vector.
>
> I understand that something has to be expected to be useful before it can be
> considered for inclusion into the standard. So, allow me to strengthen my
> case by presenting a few other possible use cases, which I think are harder
> to deal with using only means that are currently proposed.
>
> Consider the following simulation of diffusion:
<code snipped>
>
> If you did this using 4,000,000+200,000 + 150,000 + 50,000 = 4,400,000
> objects in a vector, each keeping track in which box it currently is, the
> program would use much more resources and spend most of the time computing
> totals for reporting.
>
> I think that a discrete distribution with modifiable weights would be a very
> useful building block in simulations where items move around, die or
> procreate, and the number of items is large enough to warant not modeling
> them individually but using counts. Think of colored dots moving within a
> graph where the probabilities of a dot being chosen for some action
> (possibly involving moving to a neighbor) depend on how many dots of each
> color are present at the given vertex. For more complicated graphs, it
> becomes less and less obvious how to model that using standard containers
> and random draws and shuffles. Here is a simple random walk simulation to
> illustrate the point:
<code snipped>
>
> Note: when such a stochastic model arises from discretization of a
> continuous phenomenon (like heat), the number of sites can easily grow
> large, and the number of items moving around can be huge.
>
> More complex use cases also exist where the weights are not just item
> counts, e.g., a battle simulation where rifles and machine guns are
> involved. Simulating each bullet, the firing frequencies will determine the
> probability of the bullet coming from a machine gun or a riffle. Thus, the
> weight change for a party depends on whether a riffle or a machine gun
> carrier got hit.
>
> Also, even in the simple drawing from a box example one can already see a
> certain space overhead in the vector+shuffle solution. If the number of
> objects is huge, that can be a drawback.
>
> (BTW: the use cases so far suggest that changing the weight is more useful
> than setting the weight. So, the interface is up for debate, I guess.)

While you do appear to have strengthened your case for such a
distribution, I believe that there are still a number of details to be
worked out (including, as you point out, the precise interface).  You
might consider implementing one or more of the possible variations and
submitting the results (say, to Boost) for public exposure and feedback.
  Once a consensus has been reached and some experience gained, it may
be right to propose the end product for inclusion in TR2.

>
>>> c) Convenient generation of uniform samples in half-open ranges
>>> ===============================================================
>>>
>>> This would probably be what occasional users of the random-facilities
>>> will be looking for most often. To make their life easier (predicting
>>> that they will resort to rand() if not handed something that is _very_
>>> convenient to use), I would suggest to add a convenience class to 26.4.7
>>> along the following lines:
>>>
>>>   template < typename Engine = implementation_defined >
>>>   class urandom {
>>>
>>>     // default construct with unpredictable inital state
>>>     urandom ();
>>>
>>>     // construct from seed
>>>     template < typename SeedType >
>>>     urandom ( SeedType const & seed )
>>>
>>>     // return a uniform deviate in [0,bound)
>>>     template < typename ArithmeticType >
>>>     ArithmeticType operator() ( ArithmeticType bound );
>>>
>>>   }; // urandom
>>>
>>> It can be used to create a (static) rng-object, which can replace rand()
>>> and takes care of the most basic sampling problem.
>> We are sympathetic to this idea, but believe there is insufficient time
>> to craft a formal proposal before C++0X is finalized.  In part, this is
>> because the above urandom template meets neither the requirements of an
>> engine nor those of a distribution, and so we would need to spell out
>> very carefully (especially in light of the forthcoming Concepts work)
>> what its exact requirements would be.
>
> I see. Which papers should I read to get up to speed in this Concepts issue?
>
> As for the requirements, I had in mind that the SeedType would be either any
> arithmetic type or seed_seq. This way, the constructors would be largely
> parallel to the constructors of engines.
>
> With regard to the ArithmeticType, I had in mind that integer and floating
> point types should be allowed. Maybe, in view of the general requirements
> clause 26.4.1.1 one could specify that by splitting the case into the
> following signatures:
>
>      template < typename IntType >
>      IntType operator() ( IntType bound );
>
>   requires: bound > 0;
>   returns: a random integer evenly distributed in [0,bound)
>
>      template < typename RealType >
>      RealType operator() ( RealType bound );
>
>   requires: bound > 0;
>   returns: a random real evenly distributed in [0,bound)
>
>> However, there is a second Technical Report already being drafted.  We
>> would be willing to work with you to craft a formal proposal targeting
>> that future TR.
>
> Sounds good.

Then we'll wait to hear from you privately when you are ready, and we
can start to plan the document.  (There's no hurry, however, as
proposals for TR2 will be considered only after the backlog of C++0X
issues has been addressed.)

Best,

-- WEB

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: jkherciueh@gmx.net
Date: Mon, 12 Nov 2007 17:10:47 GMT Raw View

I recently embarked on implementing all distributions from N2461, which
turned out to be a fun exercise. In the process, I noted that the random
library is well-thought out (in particular the separation of distributions
from their parameter types is an interesting detail).

Anyway, I found that a few issues could be addressed better. Given that the
library is well-designed, I might very well be mistaken; and therefore I
would like to get some feed-back on the following ideas. In particular, if
any of the following have been discussed and rejected, I would be
interested in the rationale (or a pointer thereto).




a) Support for unpredictable seeds
==================================

The default constructor for all engines creates an object with fully
predictable state. In order to create a random number generator whose
initial state is different, one needs to provide a seed. If I want to
create a random sequence that varies between runs of the program, I will
need a somewhat unpredictable seed. May I suggest to add a convenience
function in the utilities section 26.4.7:

  <implementation defined arithmetic type> unpredictable_seed ();

This should return (through implementation-defined means) what the name
suggests: a seed-value that is hard to guess (at least harder than the
current time).




b) Basic support for combinatorial distributions
================================================

I understand the rationale from N1588 (section 5) that it would not be
feasible to include distributions that return collections of values instead
of values. However, I think it would be possible and useful to make it
easier for users to build such distributions on top of the sampling
distributions provided by the library. In particular, I think it would be
useful to add a discrete distribution with modifiable weights to 26.4.8.5,
something like this:

  template < typename WeightType = unsigned long, typename IntType = int >
  class dynamic_discrete_distribution {
  public:

    typedef WeightType  weight_type;
    typedef IntType     result_type;

    class param_type {
    public:

      typedef dynamic_discrete_distribution distribution_type;
      typedef WeightType weight_type;
      typedef IntType    result_type;

      param_type ();

      template < typename InputIterator >
      param_type ( InputIterator firstW, InputIterator lastW );

      vector< double > probabilities ( void ) const;

      result_type min ( void ) const;
      result_type max ( void ) const;

      weight_type weight ( result_type index ) const;
      void weight ( result_type index, weight_type new_weight );
      weight_type total ( void ) const;

    }; // param_type

    dynamic_discrete_distribution ();

    template < typename InputIterator >
    dynamic_discrete_distribution
       ( InputIterator firstW, InputIterator lastW );

    explicit
    dynamic_discrete_distribution ( param_type const & parm );

    vector< double > probabilities ( void ) const;

    param_type param ( void ) const;

    void param ( param_type const & parm );

    result_type min ( void ) const;
    result_type max ( void ) const;

    weight_type weight ( result_type index ) const;
    void weight ( result_type index, weight_type new_weight );
    weight_type total ( void ) const;

    void reset ( void );

    template < typename UniformRNG >
    result_type operator() ( UniformRNG & urng, param_type const & parm );

    template < typename UniformRNG >
    result_type operator() ( UniformRNG & urng );

  }; // dynamic_discrete_distribution

  Note: the sum of weights can be 0, however in that state,
  sampling has undefined behavior.


Of course, sampling from a distribution with changing weights can be
realized on top of discrete_distribution<>. However, there is (a) a
performance penalty in the case of many weights and many modifications; and
one suffers (b) from a loss in code-coherence: Consider the following piece
(which is the core part of generating a random combination without
repetitions):

  std::tr1::mt19937 urng;
  dynamic_discrete_distribution< unsigned int > box
    ( count.begin(), count.end() );
  while ( box.total() > 0 ) {
    unsigned int index = box( urng );
    box.weight( index, box.weight( index ) - 1 );
    std::cout << index << '\n';
  }

versus:

  std::tr1::mt19937 urng;
  unsigned int total = std::accumulate ( count.begin(), count.end(), 0u );
  discrete_distribution<> dist;
  while ( total > 0 ) {
    kubux::discrete_distribution<>::param_type parm
      ( count.begin(), count.end() );
    unsigned int index = dist( urng, parm );
    -- count[index];
    -- total;
    std::cout << index << '\n';
  }

In the second snippet, the programmer faces the alternative of either
recomputing the total from scratch each iteration (not shown) or updating
the total manually (shown). In the first snippet, the data is centralized
and all invariants are handled by the box-object.




c) Convenient generation of uniform samples in half-open ranges
===============================================================

This would probably be what occasional users of the random-facilities will
be looking for most often. To make their life easier (predicting that they
will resort to rand() if not handed something that is _very_ convenient to
use), I would suggest to add a convenience class to 26.4.7 along the
following lines:

  template < typename Engine = implementation_defined >
  class urandom {

    // default construct with unpredictable inital state
    urandom ();

    // construct from seed
    template < typename SeedType >
    urandom ( SeedType const & seed )

    // return a uniform deviate in [0,bound)
    template < typename ArithmeticType >
    ArithmeticType operator() ( ArithmeticType bound );

  }; // urandom

It can be used to create a (static) rng-object, which can replace rand() and
takes care of the most basic sampling problem.



Best

Kai-Uwe Bux

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]