Statistical Evaluation Tools in Cryptography: Theory and Applications

Randomness in CRYPTO(proof, estimate and apply) Emil Simion University Politehnica of Bucharest Bucharest, ROMANIA Email: esimion@fmi.unibuc.ro

Motivation

Agenda • Statistical evaluation tools. Examples. • Math. of statistical evaluation: decision & errors. Strong low of large numbers. • Model overview: NIST 800-22 STS. Examples of applications. • Tests improvements & New integration methods. • Work in progress. • Real life: - PlayStation 3 gaming-console; - Dual EC DRBG. • References.

Statistical tools (1) • Used for testing the degree of randomness of functions used in cryptographic applications; • Examples: • A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications, 800-22, 2001:15 statistical tests created by the Computer Security Research Center. It was used for evaluation of the candidates for Advanced Encryption Standard (FIPS PUB 197);

Statistical tools (2) Examples: • The Art of Computer Programming, Seminumerical Algorithms, Donald Knuth: frequency, serial, gap, poker, coupon collector’s, permutation, run, maximum-of-t, collision, birthday spacings, and serial correlation; • Crypt-XSsuite(Information Security Research Centre from Australia):frequency, binary derivative, change point, runs, sequence complexity and linear complexity;

Statistical tools (3) Examples: • DIEHARD suite(George Marsaglia)birthday spacings, overlapping permutations, ranks of 31 x 31, 32 x 32 and 6x8 matrices, monkey tests on 20-bit Words, monkey tests OPSO (Overlapping-Pairs-Sparse-Occupancy), OQSO (Overlapping-Quadruples-Sparse- Occupancy), overlapping sums etc.

Evaluation: Statistical testing • Basic of statistical testing: • decision rule based on a mathematical formula which decides between two or more statistical hypothesis; • two kinds of statistical errors: •  = probability to reject a true hypothesis; •  = probability to accept a false hypothesis; Classification of statistical tests: • Neyman-Pearson which defines the most powerful test, for  fixed is minimized the value of ; • Based on confidence intervals:  is fixed, we build using test function a confidence region and after that we estimate the value of second order risk . • Sequential: three possibilities accept, reject or make a new sample.

Statistical hypothesis Two statistical hypothesis regarding the binary sequence x{0,1}n, which is under test: a) H0: the sequence x is produced by a binary memory less source: Pr(X=1)=p0 and thus Pr(X=0)=1-p0 (in this case we say that the sequence does not present any predictable component) and the alternative: b) H1 the sequence x is produced by a binary memory less source: Pr(X=1)=p1 and Pr(X=0)=1-p1 with p0 p1, (in this case we say that the sequence present a predictable component).

Statistical errors • In statistical testing are two kinds of errors: the first order error denoted by  (also called level of significance, it is the probability of occurrence of a false positive result) and second order of error denoted by  (is the probability of the occurrence of a false negative result). This errors have the following interpretation:  =Pr (reject H0 | H0 is true)=1-Pr (not reject H0|H0 is true) and  =Pr (not reject H0 | H0 is false)=1-Pr (reject H0 | H0 is false). • These two errors can't be minimized simultaneous (Neymann-Pearson tests minimize the value of  for a given ).

Decision rule • Decision based on confidence intervals: • An example of decision rule is the following: for a fixed value of  we find the confidence region for the test statistic and check if the test statistic value is in the confidence region. • The confidence region is computed using the quantiles of order /2 and 1- /2 (for example the quantile u of order  is defined by Pr(X<u )=). • Decision based on P-value: • Let us denote ftest the value of test function. Another equivalent method is to compare the P-value = Pr(X<ftest) with  and decide the randomness if P-value .

Example: Graphical interpretation

Strong low of large numbers Leapunov’s theorem states that if (fn) is a sequence of independent random variables with the same distribution of mean m and variance , then for “large” n values we have:

Strong low of large numbers De Moivre’s theorem states that if (fn) is a sequence of binary independent random variables with Pr(X=1)=p and Pr(X=0)=q, then for “large” n values we have:

Model: NIST STS 800-22 • Statistical testing package designed by NIST (National Institute for Standards and Technologies) published in NIST SP 800-22; • Is composed from 15 statistical tests based on confidence intervals; • In NIST SP 800-22 is presented the pseudo-code and the description of each statistical test. Also there are some testing strategies and interpretation schemas; • The test detects a general class of defects of (pseudo)random generators; • NIST gives the source code and a software which has a GUI (in fact there are some versions available); • Rejection rate was setup at 1%.

NIST evaluation procedure General Evaluation Procedure a. Compute the test statistic. b. Compute its p-value. c. Compare the p-value to const: - SUCCESS whenever p-value > c - FAILURE whenever p-value < c c (0.001, 0.01].

Statistical criteria used to evaluate the AES candidate One of the criteria used to evaluate the Advanced Encryption Standard (AES) candidate algorithms was their demonstrated suitability as random number generators. That is, the evaluation of their outputs utilizing statistical tests should not provide any means by which to computationally distinguish them from truly random sources.

Two fundamental papers Juan Soto and Lawrence Bassham (NIST) IR 6390Randomness Testing of the Advanced Encryption Standard Candidate Algorithms IR 6483 Randomness Testing of the Advanced Encryption Standard Finalist Candidates

Samples constructions • 128-Bit Key Avalanche and Plaintext Avalanche; • Plaintext/Ciphertext Correlation; • Cipher Block Chaining Model; • Random Plaintext/Random 128-Bit Keys; • Low Density Plaintext and Low Density 128-Bit Keys; • High Density Plaintext and High Density 128-Bit Keys.

Example of frequency test improvement Input: binary sequence sn=s1,…,sn. Denote probability p0 of occurrences of symbol 1, and q0=1-p0 the occurrences of symbol 0. Output: The decision of acceptation or rejection of randomness, that the sequence sn is the output of a symmetric binary source with Pr (S=1)=p0, the alternative hypothesis being Pr (S=1)=p1p0. STEP 0. Read the sequence sn and the rejection rate . STEP 1. Compute the test function: STEP 2. If f[u/2, u1-/2] we accept the hypothesis of randomness, else reject it. STEP 3. Compute the second error probability(probability of acceptance a false hypothesis):

Minimum size sample Minimum size sample to achieve, with error >0, a given rejection rate :

New integration methods • NIST 800-22 provides two methods for integrating the 15 tests results, namely: • percentage of passed tests; • uniformity of P values. • New integration methodsfor the tests from the same “world”: • majority decision; • maximum value decision, based on the max value of tests statistics (ref. distribution normal); • sum of square decision, based on the sum of squares of the tests results (ref. distribution 2).

Work in progress • Independence (or correlation) between tests: one solution isby using a validated TRNG for estimating (statistics) the correlation coefficient; The tests are not jointly independent, making it difficult to compute a overall rejection rate (i.e. the power of the test). If the statistical tests would be independent, then the overall rejection rate, would be computed using the probability of the complementary event 1-(1-)15. • Computing the rejection rate of NIST 800-22 (difficult task because tests are not independent). How? Probably by estimation … (statistics).

Real life (extract the signing key used in the PlayStation 3 gaming-console ) Scenario: Crane Comp decides to protect, using ECDSA, the software application. But: use the same ephemeral key in signing two pieces of the software. Attack: Eve: recover the private signing key. The end of the story?

ECDSA How ECDSA works: similarly with El Gamal but with arithmetic over EC. Input: M – message, compute H(M) –hash Step 1: Choose a random k such that 0 < k < p − 1 and gcd(k, p − 1) = 1. Step 2: Compute r=gkmod p. Step 3: Compute: s=(H(M)-xr)k-1mod (p-1). Step 4: If start s=0 over again. Output: signature (r,s) of message M.

Digital signatures - ElGamal vulnerability Use the same k for signing two different messages M1 and M2. Derive the private signing key from signatures (r1,s1) and (r2,s2): r=r1=r2 x=(s1H(M2)-s2H(M1))(r(s1-s2))-1mod (p-1). Similarly vulnerability of ECDSA but with arithmetic over EC.

Lesson learned Security is a process not a product!

DUAL_EC_DRBG – the beginning 2000: NSA drives to include Dual_EC_DRBG in ANSI X9.82, when the standardization process kicks off in the early 2000; 2004: RSA makes Dual_EC_DRBG the default CSPRNG in BSAFE; 2005: The first draft of NIST SP 800-90A is released to the public, includes Dual_EC_DRBG; 2007: Bruce Schneier publishes an article with the title "Did NSA Put a Secret Backdoor in New Encryption Standard?“

DUAL_EC_DRBG – the revealing 2013, june 6th: The first news stories (unrelated to Dual_EC_DRBG) based on Edward Snowden's leak of NSA documents are published; 2013, sept. 5th: Existence of NSA's Bullrun program is revealed, based on the Snowden leaks. One of the purposes of Bullrun is described as being "to covertly introduce weaknesses into the encryption standards followed by hardware and software developers around the world." The New York Times states that "the NSA had inserted a back door into a 2006 standard adopted by NIST... called the Dual EC DRBG standard." 2013, sept. 10th: Gail Porter, director of the NIST Public Affairs Office, released a statement, saying that "NIST would not deliberately weaken a cryptographic standard “; 2013, december: Reuters reports on the existence of a $10 million deal between RSA and NSA to set Dual_EC_DRBG as the default CSPRNG in BSAFE. RSA deny …

DUAL_EC_DRBG – the end 2014, april: Following a public comment period and review, NIST removed Dual_EC_DRBG as a cryptographic algorithm from its draft guidance on random number generators, recommending "that current users of Dual_EC_DRBG transition to one of the three remaining approved algorithms as quickly as possible. "; 2014, august: Checkoway et al publish a research paper analyzing the practicality of using the EC-DRBG to build an asymmetric backdoor into SSL and TLS; 2015 feb: NSA's 'Apology' For Backdooring Crypto Standard (Michael Wertheimer, Director of Research at the National Security Agency).

DUAL_EC_DRBG simplified version attack a simplified version of Dual_EC_DRBG; the algorithm specification specifies an elliptic curve, which is basically just a finite cyclic (and thus Abelian) groupG. The algorithm also specifies two group elementsP,Q (chosen by an employee of the NSA); in the simplified algorithm, the state of the PRNG at timetis some integers.

run the PRNG forward one step we computesP(recall we use additive group notation; this is the same asPs, if you prefer multiplicative notation), convert this to an integer, and call itr. we computerP, convert it to an integer, and call its′(this will become the new state in the next step). we computerQand output it as this step's output from the PRNG. (OK, technically, we convert it to a bitstring in a particular way, but you can ignore that.)

Trapdoor we're pretty much guaranteed thatP=eQfor some integere. we don't know whateis, and it's hard for us to find it (that requires solving the discrete log problem on an elliptic curve, so this is presumably hard). however, since the NSA chose the valuesP, Q, it could have chosen them by pickingQrandomly, pickingerandomly, and settingP=eQ. in particular, the NSA could have chosen them so that they knowe.

Trapdoor usage and here the numbereis a backdoor that lets you break the PRNG. suppose the NSA can observe one output from the PRNG, namely,rQ. they can multiply this bye, to geterQ. Now notice thaterQ=r(eQ)=rP=s′. So, they can infer what the next state of the PRNG will be. this means they learn the state of your PRNG! that's really bad -- after observing just one output from the PRNG, they can predict all future outputs from the PRNG with almost no work. this is just about as bad a break of the PRNG as could possibly happen.

Consequences? of course, only people who knowecan break the PRNG. so this is a special kind of backdoor: the NSA presumably knows how to break Dual_EC_DRBG, but it's very unlikely that anyone else knows how to break it. even though we know now the backdoor exists, it's very hard to recover the secret backdoore, because that requires solving a discrete log problem.

Complicated? an analogous: a PRNG that is just like Dual_EC_PRNG, except that it uses integers, instead of elliptic curves. In particular, everything will be conceptually basically the same -- the backdoor will be the same, the PRNG will be the same -- this will just be easier to understand without any background in elliptic curve cryptography. Hopefully this will give the intuition better. so let’s specify: the PRNG; the backdoor; breaking the PRNG.

The PRNG the algorithm specifies a prime numberp, and two integersg,hthat are both less thanp. The algorithm tells you that the state of the PRNG at each point in time is some numbersthat satisfies1≤s<p. To step the PRNG forward by one iteration, you setr=gsmod p,s′=grmod p, update the state tos′, and outputt=hrmod p.

The backdoor the backdoor is a secret numberesuch that g=hemod p. the NSA, who created the algorithm specification, choseg, hby pickinghrandomly, pickingerandomly, setting g=hemod p, and then publishingg,h,p(but keepingesecret, sinceeis the backdoor).

Breaking the PRNG here's how the NSA can break this PRNG: - suppose they observe one outputtfrom the PRNG; - they computetemod p; - notice that te=(hr)e=hre=(he)r=gr=s′ (mod p). this means they were able to computes′, the next state of the PRNG. So, after observing just one output from the PRNG, they can infer the next state of the PRNG and thereby predict what all future outputs from the PRNG will be.

Lesson learned In security don’t hope or believe. You must know! Don’t use “bad“ pseudorandom generators!

References A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications, 800-22. (2001). NIST Special Publication. E. Simion et. al., Walsh-Hadamard Randomness Test and New Methods of Test Results Integration, Bulletin of Transilvania University of Braşov, vol. 2(51) Series III-2009, pg. 93-106. E. Simion and M. Andraşiu, Evaluation of cryptographic algorithms,Journal of Information Systems & Operation Management, vol. 5, no. 1, 2011, ISSN: 1843 – 4711, pp.51-61. E. Simion, The relevance of statistical tests in cryptography, IEEE Security & Privacy, 1540-7993/15, January/February 2015, pp. 66-70, (ISI Web of Science CPCI-S, Impact Factor 0.721, Influence Score 0.514, IEEEXplore).

Thank you!

Statistical Evaluation Tools in Cryptography: Theory and Applications

Statistical Evaluation Tools in Cryptography: Theory and Applications

Presentation Transcript

Proof, Computation, Randomness Kurt G del John von Neumann and Theoretical Computer Science

Understanding Randomness

Randomness and Repetition

Randomness and PSEudorandomness

Likelihood and Randomness

The Scientific Method and the Cycle of Proof

5.1 Indirect Proof

Randomness

Randomness and probability

Randomness in Computation and Communication

Randomness and Probability

7.1 Triangle Application Theorems and 7.2 Two Proof-Oriented Theorems

RANDOMNESS

Splash Screen

Clustering and randomness

How to get more mileage from randomness extractors

How To Apply Makeup For Oily Skin

Get Instant Funds with Car Title Loans in Saint John

Apply For Marriage Certificate In Mumbai

The Scientific Method and the Cycle of Proof

Apply similarity properties in the coordinate plane.

How to Get a Home Loan in Few Simple Steps