Merkle Tree Traversal in Log Space & Time

Merkle Tree Traversal in Log Space & Time Michael Szydlo, RSA Eurocrypt 2004 May 6, 2004

Presentation overview • Review of Merkle Authentication Trees • Define the Traversal Problem • Describe classic traversal technique • Present new, space-efficient algorithm • Concluding comments

Merkle trees • Introduced by Ralph Merkle, 1979 • “Classic” cryptographic construction • Involves combining hash functions on binary tree structure • A public-key authentication scheme • Using only one-way hash function as building blocks • No number theory or trapdoor permutations • Also public-key signatures (Lamport’s one-time signatures) • Theoretical and practical contexts • Receive less practical attention today due to (e.g, RSA, DSA) • Not terribly inefficient. No number theory – advantage? • Our contribution • Re-examine efficiency aspects of construction • New algorithm - answer an “old question” about Merkle trees

Merkle tree data structure • Binary tree, nodes are assigned (e.g. 160 bit) values • Extra, secret values associated to each leaf. xxxxxx Interior nodes • v=Hash( vleft || vright ) xxxxxx xxxxxx leaves xxxxx xxxxx xxxxxx xxxxx • vi =Hash( si ) xxxxxx xxxxxxx xxxxxx xxxxxxx si secret

A Public / Private key pair • How to generate a public key pair • Select a random (e.g 160 bit) secret S • Derive leaf secrets si = PRF(S || i ) • Use hash function to get leaf / interior node values • Publish root value as P • Key generation has a cost • Tree of height H has N= 2H leaves • Nodes at height h will depend on 2h leaf values • Obtaining P requires calculating all N leaf values plus 2H-1 more hash function evaluations

Authenticating a secret • Prover wishes to reveals si to identify herself • Prover sends i,si (each secret used just once) • Additional data required:”sibling node” values • Verifier checks si against the public key P • Hash first si • Hash result together with its sibling in tree • Repeat, moving up tree • Check result with root

Sibling node values required xxxxxx Root value is public H Sibling nodes required to authenticate secret xxxxxx xxxxxx H xxxxx xxxxx H xxxxxx • Verify secret value by hashing, then hashing together with sibling, etc. • Accept if you match with the root value s0

Digital signatures, too • Use up 1 leaf per authentication • Digital Signature– use multiple leaves • Extends Lamport’s one-time signature scheme • Want to sign m = (m0, m1,… m159) • Requires 160 pairs of secrets {si ti} • si included in signature if mi =0. Otherwise ti is. • Verification requires sibling nodes, as above • Merkle construction provides signatures • Security intuitive, how about efficiency?

Efficiency questions • Tacit assumption - all node values saved. • A useful Merkle tree has many leaves! • E.g., N= 230 allows many authentications / signatures. • Not practical for a weak prover! • Store all node values? – too much space! • N= 2H leaves, N-1 interior nodes • Recalculate from scratch? - too much time! • Interior node near the top requires 2H-1 Hash operations

The traversal problem • Formulate efficient Prover algorithm. • Must output authentication data for each leaf, in sequence: (on round i, si with associated sibling nodes) • Prover has limited memory • Prover should compute few Hash values per round • Metrics • Space: 1 Unit = 1 stored node value • Time: 1 Unit = 1 leaf calc. or 1 interior node calc. • Note - this analysis fixes the security parameter.

Traversal challenge Higher node – used for 220 rounds, costs ~221 …………………………………………… …………………………………………… Lower node – used for 25 rounds, costs ~26 ( Note ‘per round’ cost is <2 )

Merkle’s amortization technique • Used space-efficient node computation • Costly nodes computed over many rounds • Form of the algorithm – on each round • Output si with sibling values • Discard “expired” sibling values • For each height, working on preparing “upcoming” sibling • Upcoming values should be ready on time • Merkle’s result for tree with N=2H leaves • O(log(N)) = O(H) time per round. • Space bounded O(log(N)2) = O(H2)

TREEHASH • Calculate a height h node using space= h+1 • Simply erase values no longer required • Adding leaf or internal node is 1 “unit” of work • Evolving set of stored node – call tail nodes • Example with h=3

Merkle’s amortization (2) • Prover’s initial internal state • Contains Current and Next sibling value for each height h<H • Prover’s internal state (later points) • Contains Current sibling value for each height h<H • For each height, contains Next sibling, OR a partial TREEHASH computation for Next. • Per-round update procedure • Output leaf secret and Current sibling nodes • Discard “expired” sibling nodes, promote Next to Current • Spend maximum 2 units of work towards the TREEHASH procedure for each height

Merkle’s amortization (3) • Nodes are ready on time • 2 units per round is enough • The cost of 2h+1 spread over 2h rounds • Time per round linear in tree-height • O(log(N)) = O(H) time per round. • Total Space quadratic in tree-height • Each height TREEHASH may be in progress. • Space for TREEHASH < 1+2+3+……H • Space bound - O(log(N)2) = O(H2)

Recap of classic traversal • Merkle’s Solution indeed satisfactory • Medium / Large Merkle trees practical • Less efficient than number theory approaches • Security properties transparent • No random oracles, etc • Conjecture classic traversal is “optimal”?

Related work • Time-space trade-off. RSA’03 • Jakobsson, Micali, Leighton, Szydlo • Idea use “sub trees” of height T • Speed up Prover by a factor of T ! • Increases space by a factor of 2T

This work • New traversal algorithm • Still O(log(N)) time • Space required reduced to O(log(N)) • This is optimal in sense • Space at least O(log(N)) - easy to see • No traversal algorithm has both • If time < O(log(N)) • space= O(log(N)) • Proof in paper

Motivation for improvement • Tails of Concurrent TREEHASH computations • Graphic reminder of why space is O(log(N)2) Tail at height h - up to h+1 values up to h tail pebbles up to h-1 tail pebbles Many tails contain pebbles at the same height. Can this be avoided ?

Wasteful concurrent computation • Example - two TREEHASH instances. • Each must compute a node value at height 3 as a sub-goal • Assume start at same time • Classic traversal – 2 units of work to each • Maximum space 4+4 =8 • Re-allocate 4-units/per round • Complete first, then do second • Maximum space 1+4 =5 • Rescheduling save space, complete nodes on time. • Look for scheduling algorithm to avoid such concurrent node computations.

New algorithm:“Zipping” up the tails • Apply budget to meet two kinds of requirements • Avoid working on height h nodes from different tails • Ensure completion of nodes with short deadline. • Solution: this compromise algorithm satisfies both • Focus computational attention on nodes with shortest deadline • Delay beginning new height h node until other TREEHASH are partially completed, with no tail nodes below height h • So we zip up the tails before diverting attention • Essentially rigging it to have fewer tail nodes • What is the effect of this rescheduling ? • Question 1: Are the nodes completed on time ? • Question 2: How much space do you need now ?

Nodes completed on time • Informal justification • For a node at height h node, the delay < 2h+1 • This is only 2 per round over period of 2h rounds • Long time to recover from delay • Formal proof involves computation • Fix any period of 2h rounds • Identify all “deadlines”, maximum delay • Tabulate total required computation units • This is less than total budget over period • Experimental verification (via implementation) • Algorithm works time 2 log(N) per round

Less space is used • Easy to see why space is O(log(N)) • At each height at most 4 values are stored. • Exactly one current sibling value • At most 1 completed next sibling value • At most 2 tail values • Total space required 3 log(N) • Tail pebbles happen when a sibling incomplete

Result of new algorithm • Traversal of a Merkle tree with N leaves • Space bounded by 3 log(N) • [ node storage units ] • Time is 2 log(N) • [ leaf calc units, hash evaluation units ] • Answers classic Merkle traversal problem. • Asymptotically optimal

Improved constants? • The constants are not optimal • Example - retain left nodes to half time • Manuscript on webpage rsasecurity.com / szydlo.com • Can technique be combined with JMLS’03? • The main focus was to increase speed, at space cost • Zipping technique still always saves some space

Practical ramifications • Merkle authentication & signatures more feasible on space constrained devices • Easy relationship between tree size and speed • Speed up if smaller tree size acceptable • Possible bonus for longer term assurance • hedge against number theory breakthrough

Conclusions • Merkle Trees - interesting after 25 years. • Viable for practical applications? • Need not be only a theoretical construction • More efficient than widely believed. • Further directions • Use as a tool in larger crypto protocols • Improve constants • good implementations, compare speed to RSA • What else can we do without number theory based cryptography?

Merkle Tree Traversal in Log Space & Time