Validating a Random Number Generator

Validating a Random Number Generator Based on: A Test of Randomness Based on the Consecutive Distance Between Random Number Pairs By: Matthew J. Duggan, John H. Drew, Lawrence M. Leemis Presented By: Sarah Daugherty MSIM 852 Fall 2007

Introduction • Random numbers are critical to Monte Carlo simulation, discrete event simulation, and bootstrapping • There is a need for RNG with good statistical properties. • One of the most popular methods for generating random numbers in a computer program is a Lehmer RNG.

Lehmer Random Number Generators • Lehmer’s algorithm: an iterative equation produces a stream of random numbers. • Requires 3 inputs: m, a, and x0. • m = modulus, a large fixed prime number • a = multiplier, a fixed positive integer < m • x0 = initial seed, a positive integer < m • Produces integers in the range (1, m-1)

Problem • Lehmer RNG are not truly random • With carefully chosen m and a, it’s possible to generate output that is “random enough” from a statistical point of view. • However, still considered good generators because their output can be replicated, they’re portable, efficient, and thoroughly documented. • Marsaglia (1968) discovered too much regularity in Lehmer RNG’s.

Marsaglia’s Discovery • He observed a lattice structure when consecutive random numbers were plotted as overlapping ordered pairs. • ((x0, x1, x2,…, xn), (x1, x2,…, xn+1)) • Lattice created using m = 401, a = 23. • Does not appear to be random at all; BUT a degree of randomness MAY be hidden in it.

Solution • Find the hidden randomness in the order in which the points are generated. • The observed distribution of the distance between consecutive RN’s should be close to the theoretical distance. • Develop a test based on these distances. • Hoping to observe that points generally are not generated in order along a plane or in a regular pattern between planes.

Overlapping vs. Non-overlapping Pairs • Considering distance between consecutive pairs of random numbers, points can be overlapping or non-overlapping. • Overlapping: (xi, xi+1), (xi+1, xi+2) • Non-overlapping: (xi, xi+1), (xi+2, xi+3) • Both approaches are valid. • The non-overlapping case is mathematically easier in that the 4 numbers represented are independent therefore the 2 points they represent are also independent.

Non-overlapping Theoretical Distribution • If we assume X1, X2, X3, X4 are IID U(0,1) random variables, we can find the distance between (X1, X2) and (X3, X4) by:

Non-overlapping Theoretical Distribution • The cumulative distribution, F(x), of D.

^ ^ ^ Goodness-of-Fit Test • Now we can compare our theoretical distribution against the Lehmer generator. • Convert the distances between points into an empirical distribution, F(x), which will allow us to perform a hypothesis test. ^ N(x) = # of values that do not exceed x n = # of distances collected

Classification of Results • Based on results of 3 hypothesis tests (KS, CVM, and AD tests), each RNG can be classified as: • Good – the null hypothesis was not rejected in any test. • Suspect – the null hypothesis was rejected in 1 or 2 of the tests. • Bad – the null hypothesis was rejected in all 3 tests.

Results • Interesting cases are when a multiplier is rejected by only 1 or 2 of the 3 tests. See a = 3 in table.

F(x) (solid) vs. F(x) (dotted) ^ Random number pairs Distances connecting pairs Good Suspect Bad

Summary • A test of randomness was developed for Lehmer RNG’s based on distance between consecutive pairs of random numbers. • Since some multipliers are rejected by only one or two of the 3 hypothesis tests, the distance between parallel hyperplanes should not be used as the only basis for a test of randomness. The order in which pairs are generated is a second factor to consider.

Critique • Potential – limited. Many other tests exist for validating RNG’s. • Impact – minimal. Frequently used RNG’s use a modulus much larger than the m=401 used here. • Overall – paper is well written; in it’s current state, this test is a justified addition to collection of tests for RNG’s. • Future – use larger modulus; improve theoretical distribution by improving numerical calculations of integral for cdf; test other non-Lehmer generators such as additive linear, composite, or quadratic.

Validating a Random Number Generator