1 / 31

Parallel White Noise Generation on a GPU via Cryptographic Hash

Parallel White Noise Generation on a GPU via Cryptographic Hash. Stanley Tzeng Li-Yi Wei Microsoft Research Asia. What is White Noise?. Spatial domain: uniform random number Frequency domain: white noise. spatial domain. frequency domain. Importance. Mother of all random numbers

aliya
Télécharger la présentation

Parallel White Noise Generation on a GPU via Cryptographic Hash

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel White Noise Generationon a GPU via Cryptographic Hash Stanley Tzeng Li-Yi Wei Microsoft Research Asia

  2. What is White Noise? • Spatial domain: uniform random number • Frequency domain: white noise spatial domain frequency domain

  3. Importance • Mother of all random numbers • Commonly used, e.g. rand() in C/C++ • Major algorithms sequential • e.g. xn = a xn-1 + b mod c • Processors are becoming parallel • GPU, multi-core CPU, Cell • sequential algorithms cannot leverage that

  4. Contribution • ☺Parallel algorithm for white noises • independent evaluation for every sample • easy implementation as a GPU pixel shader • speed faster than sequential algorithms • quality same or better • usage similar to texture mapping

  5. PRNG (Pseudo Random Number Generator) • The main source of randomness in programs • Desirable properties • white noise statistics • repeatable • fast computation • low memory usage

  6. input trivially prepared in parallel, e.g. linear ramp feed input value into hash, independently and in parallel output white noise key idea: borrow cryptographic hash! Core Idea input hash output

  7. Hash • (however nice) input → (unrecognizable) mess

  8. Cryptographic Hash • A subclass of hash • Commonly used for security applications • e.g. password, digital signature • Properties • irreversible – cannot find input from hash output • decorrelating – similar inputs, dissimilar outputs • uniform probability – all outputs likely to occur

  9. Cryptographic Hash - Example • irreversible, decorrelating, uniform probability CHash ("The quick brown fox jumps over the lazy dog") = 9e107d9d372bb6826bd81d3542a419d6 CHash ("The quick brown fox jumps over the lazy eog") = ffd93f16876049265fbaef4da268dd0e

  10. Cryptographic Hash as a PRNG • White noise statistics • CHash is cryptographically secure • Repeatable • CHash is invariant with same input • Fast computation • CHash is parallel + constant cost • Low memory usage • CHash maintains no state • Order-independent i.e. Random accessible • important for parallel GPU applications hash

  11. Which Cryptographic Hash? • Many options • MD5, SHA, RIPEMD, Tiger, block cipher, etc • Desirable properties • white noise quality • fast computation • power-of-2 aligned (output & operations) • pure pixel shader, no state maintenance

  12. Our Hash of Choice: MD5 [Rivest 1992] • 128-bit outputs and 32-bit operation • Small number of constants fit entirely in shader • Fastest among those satisfying quality criteria • Not 100% secure [Wang and Yu 2005] • but good enough for our goal

  13. MD5 Algorithm Overview Scrambling (bit op, table, arithmetic) Input Output shift table sin table 64 rounds

  14. Performance Bottlenecks for Pixel Shader Scrambling (bit op, table, arithmetic) Input Output shift table sin table 64 rounds

  15. Our Optimization Scrambling (bit op, table, arithmetic) Input Output reduced shift table shift table sin table sin function 64 rounds loop unrolling

  16. Previous PRNG • GPU • BBS [Blum et al. 1986, Olano 2005] • Oextremely fast • X not good quality • CEICG [Entacher et al. 1998, Sussman et al. 2006] • O decent quality • X processing time varies • AES [NIST 2001, Yamanouchi 2007] • O invertible (not hash) • X not good quality • CPU • rand • Ocommonly used • X not good quality • drand48 • O better quality • X slower • Mersenne Twister [Matsumoto and Nishimura 1998] • O high quality and fast • X not random accessible

  17. Assessing Quality: DIEHARD [Marsaglia 1995] • De facto standard on measuring PRNG quality • Runs 15 different tests on the bits generated • Outputs p-val. If p == 0 || p == 1, fail. BIRTHDAY SPACINGS TEST, M= 512 N=2**24 LAMBDA= 2.0000 Results for aes.bin For a sample of size 500: mean aes.bin using bits 1 to 24 2.036 duplicate number number spacings observed expected 0 66. 67.668 1 130. 135.335 2 148. 135.335 3 80. 90.224 4 44. 45.112 5 20. 18.045 6 to INF 12. 8.282 Chisquare with 6 d.o.f. = 4.50 p-value= .391147

  18. Cumulative Distribution Function • Shows how data is distributed within set • Given x in data, what % of data values are ≤ x 100% 100 % 0 % 0 % X=0 X=0 1 1 Normal Distribution Uniform Distribution

  19. Kolmogorov-Smirnov Test • Determines how two sets of data are alike • Looks at max difference D between distribution functions 100 % not alike 100 % alike D D 0 % 0 % X=0 1 X=0 1

  20. Assessing Quality: DIEHARD • Run the results of the DIEHARD test (p-value) through a KS-test. Look at D-value. 100 Uniform Distribution Curve P-value Curve D-Value D Smaller D is better quality! 0 Cumulative Distribution Function

  21. Assessing Quality: Power Spectrum • Radial mean: should be uniform • Radial variance: should be low & uniform Power spectrum density Radial mean Radial variance (Anisotropy)

  22. Assessing Speed: Batch Rendering • Clock time to generate random bits • n2 x 128 bits image, n = 512, 1024, 2048 and 4096 n2 n2

  23. Assessing Speed: Texture Subset(For random accessibility) A • A huge virtual texture • clock time for access A B • measure difference • (smaller is better) 220 220 B

  24. Test Results: DIEHARD Results the higher the better the lower the better

  25. Test Results: Power Spectrum Tests MD5 M. Twister GPU BBS

  26. Test Results: Batch Render Speed

  27. Test Results: Texture Subset Speed

  28. Trading Quality for Speed • Reducing # of rounds • Ofaster speed • Xlower quality

  29. Applications Texture tiling (fragment shader) Fractal terrain (vertex shader)

  30. Future Work • Implement our method in hardware • very similar to texture unit but much smaller • (no need for cache) • Alternative hashes • ride with advances in cryptographic hash

  31. Thank You!

More Related