340 likes | 497 Vues
Lesson 2: RNGs and Choosing from distributions. Random number generators (RNG) General need for random numbers Pseudo-random vs. random Linear congruential generators Extension of period of generator by combining several generators Choosing from distributions Discrete Continuous: Direct
 
                
                E N D
Lesson 2: RNGs and Choosing from distributions • Random number generators (RNG) • General need for random numbers • Pseudo-random vs. random • Linear congruential generators • Extension of period of generator by combining several generators • Choosing from distributions • Discrete • Continuous: Direct • Continuous: Rejection • Probability mixing • Multidimensional distributions • Coding event problems: FORTRAN code MCBASE.FOR
Basic view of MC process • We begin with the same black box drawing: • Lesson 1 was concerned with the outputs. • Lesson 2 is concerned with the inputs. • The quality of outputs depends on quality of inputs. • The rest of the course is about the black box
The quest for randomness • In the early days (1950s) of computer-based Monte Carlo, there was work done on MECHANICAL or ELECTRONIC random number generators to give us “true” binary randomness (0 or 1)=digits of fraction • Relay flops • Decay rates (odd or even) in • Etc. • Many of them “filtered” the answer to correct for non-50/50 defects • Two successive identical bits = 1 OR two successive different bits = 0 • Closer to 50/50 but (at least) twice as slow • Abandoned as too slow, too unwieldy, and—ultimately—undesirable
Pseudorandom generators (RNG) • We cannot have (nor do we really want) truly random variables. • Any random number generator from a computer is not truly random (since it is repeatable). • Referred to as “pseudorandom” • This repeatability is actually very important to us for validation reasons. Also, this repeatability is the basis of some Monte Carlo perturbation methods. • The most common form is that of "linear congruential generators" which generate a series of bounded integers, with each one using the one before it with the formula:
Possible difficulties • Periodicity: Each number depends uniquely on the one before it. So once a number repeats, the rest of the series will be the same as it was before. How many unique numbers you get is the PERIOD. • It can be shown that a "full" period of m can be assured by choosing: • a(mod 4)=1 • m as a power of 2 • b odd. • The finiteness of the string also means that certain domains of the problem may be unreachable. (That is, there will be “gaps” of width 1/m.) [STARS] • We have to be careful not to depend on the later digits of a uniform deviate: they may be less random than the early digits. (They may be the “remnants” of repeating digits of fractions.)
Extending the period • Even with “full period” L.C.G. random number generators, we still have the related problems: • The period can be no longer than m • m must be a number that can be represented as an integer: where N is the number of bits used to store integers (usu. 16. 32, or 64) • Fortunately, we can solve this problem by combining the output of several uncorrelated L.C.G.’s: • The resulting period is the PRODUCT of the individual periods • The following slide shows how KENO and MCNP implement this
KENO RNG (FORTRAN 77) DOUBLE PRECISION FUNCTION FLTRN() INTEGER X, Y, Z, X0, Y0, Z0, XM, YM, ZM DOUBLE PRECISION V, X1, Y1, Z1 EXTERNAL RANDNUM COMMON /QRNDOM/ X, Y, Z SAVE /QRNDOM/ PARAMETER ( MX = 157, MY = 146, MZ = 142 ) PARAMETER ( X0 = 32363, Y0 = 31727, Z0 = 31657 ) PARAMETER ( X1 = X0, Y1 = Y0, Z1 = Z0 ) X = MOD(MX*X,X0) Y = MOD(MY*Y,Y0) Z = MOD(MZ*Z,Z0) V = X/X1 + Y/Y1 + Z/Z1 FLTRN = V - INT(V) RETURN END
MCNP5 RNG (FORTRAN 95) integer, private, parameter :: R8 = selected_real_kind( 15, 307 ) integer, private, parameter :: I8 = selected_int_kind( 18 ) ... integer(I8), private, SAVE :: & & RN_MULT = 5_I8**19, & & RN_ADD = 0_I8, & & RN_STRIDE = 152917, & & RN_SEED0 = 5_I8**19, & & RN_MOD = 2_I8**48, & & RN_MASK = 2_I8**48 - 1_I8, & & RN_PERIOD = 2_I8**46 real(R8), private, SAVE :: & & RN_NORM = 1.0_R8/2._R8**48 ... function rang() implicit none real(R8) :: rang RN_SEED = iand( iand( RN_MULT*RN_SEED, RN_MASK) + RN_ADD, RN_MASK) rang = RN_SEED * RN_NORM RN_COUNT = RN_COUNT + 1 return end function rang
“Dimensionality” • Monte Carlo theory uses the word “dimension” in a way that is a little foreign to normal usage, so be ready • The dimension of a Monte Carlo algorithm is how many random numbers have to be drawn to generate one estimate • Examples: • Coin toss problem: 1 dimension • Our pi problem: 2 dimension • Homework #1 problem of x1+x2>1.4: 2 dimension • Neutron transport problem simulation: ? • Related to the “unit hypercube” point of view
“Discrepancy” • Monte Carlo theory also has the special word “discrepancy” which is important • The discrepancy of a RNG is a measure of the biggest “hole” in a ordered series of random numbers • Usually defined (the so-called “star discrepancy”) as the biggest difference between what fraction SHOULD have been chosen in a space from the origin (0,0,…,0) to any point (x,y,z,…) and the actual number that was ACTUALLY chosen [blackboard example] • The fraction that should have fallen in is x*y*z… • You only have to check at actual chosen points (with and without the point) • To my surprise, the size of the discrepancy is proportional to • It is the reason for the slow convergence of Monte Carlo
Quasi-random numbers • The logical extension of the idea of stratified sampling is to try to make N=M • Each of the i=1,2,…,N random numbers is located in its own subdomain ((i-1)/N,i/N) • This looks (and acts) in 1D like a simple Riemann choice • The trick is to make them jump around “randomly” • Result: Quasi-random numbers (also called “Low discrepancy sequences”) • They are not really random, so they have to be used very carefully. • They are not used very much for Monte Carlo methods in radiation transport. • But, because they are of such importance to the general field of Monte Carlo (i.e., beyond transport methods), you should have a taste of what they are, how to get and use them.
Quasi-random numbers (2) • "Quasi"-random numbers are not random at all; they are a deterministic, equal-division, sequence that are "ordered" in such a way that they can be used by Monte Carlo methods. • The easiest to generate are “van der Corput” sequences, which are found by "reflecting" the digits of the base counting integers about their radix point. • You pick a base. • Count in that base. • Reflect the number • Interpret as base 10.
Quasi-random numbers (3) • After M=BaseN-1 numbers in the sequence, the sequence consists of the equally spaced fractions (i/(M+1), i=1,2,3,...M). • Therefore, the sequence is actually deterministic, not stochastic. (Like a numerical method.) • The beauty of the sequence is that the re-ordering results in the an (almost) even coverage of the domain (0,1) even if you stop in the middle. • Because you need ALL of the numbers in the sequence to cover the (0,1) range (which is not true of the pseudo-random sequence), it is important that all of the numbers of the sequence be used on the SAME decision. • So, if your problem has more than one decision (e.g., the pi problem had 2—x and y coordinates) you have to let each decision have its own RNG
Quasi-random numbers (4) • For a Monte Carlo process that has more than one decision, a different sequence must be used for each decision. (This is why we don’t use them.) • The most common way this situation is handled is to use prime numbers in order—2,3,5,7,11,etc.—for decisions. (“Halton” sequence, after UNC professor) • Asymptotic estimate of error is which means closer to ~ 1/N • but deteriorates very quickly with dimensionality • A Java subroutine is given in the public area. • The resulting standard deviations calculated and printed by standard MC codes are NOT accurate
Overview of pdf and cdf • Basic definition of probability distribution function (p.d.f.): • And its integral, the cumulative distribution function (c.d.f.):
Overview of pdf and cdf (2) • Corollaries of these definitions:
Mapping x->x using p(x) • Our basic technique is to use a unique y->x • y=x from (0,1) and x from (a,b) • We are going to use the mapping backwards
Mapping (2) Note that: • x(a)=0 • x(b)=1 • Function is non-decreasing over domain (a,b) Our problem reduces to: • Finding x(x) • Inverting to get x(x), a formula for turning pseudo-random numbers into numbers distributed according to desired p(x)
Mapping (3) • We must have:
Resulting general procedure • Form CDF: • Set equal to pseudo-random number: • Invert to get formula that translates from x to x:
Uniform distribution • For our first distribution, pick x uniformly in range (a,b): • Step 1: Form CDF.
Uniform distribution (2) • Step 2: Set pseudo-random number to CDF: • Step 3: Invert to get x(x): • Example: Choose m uniformly in (-1,1):
Discrete distribution • For a discrete distribution, we have N choices of state i, each with probability , so: • Step 1: Form CDF:
Discrete distribution (2) • Step 2: Set pseudo-random number to CDF: • Step 3: Invert to get x(x):
Discrete distribution (3) • Example: Choose among 3 states with relative probabilities of 4, 5, and 6.
Continuous distribution: Direct • This fits the “pure” form developed before. • Form CDF: • Set equal to pseudo-random number: • Invert to get formula that translates from x to x:
Continuous: Direct (2) • Example: Pick x from:
Homework from text • Verify your answers for problems 1-1 through 1-5 by running Monte Carlo estimates of the average (and standard deviation of the population).