Factoring Large Numbers with Programmable Hardware

Factoring Large Numbers with Programmable Hardware Hea Joung Kim and William H. Mangione-Smith UCLA Electrical Engineering Dept. kimmer,billms@icsl.ucla.edu

Background: Rivest, Shamir, Adleman (RSA) • Cryptosystem based on the fact that large numbers are hard to factor (Many secure websites use RSA) RSA is used for secure internet transactions and the cryptosystem relies on the fact that large numbers are hard to factor • RSA-N: n = p*q ….. Y = (p-1)*(q-1)=n+1-p-q • e is a key that shares no common factors with Y and d=e-1 mod Y (e and d are the keys) • enciphering transformation • f(P) = Pe mod n … where P is the Plain Text • deciphering transformation • f -1(C) = Cd mod n where C is the Cipher Text • Why factor large numbers? • to verify that RSA is safe

Number Factoring • x2 - y2 = (x-y)(x+y) = n • x2=y2 mod n • sb2 mod n where s=x2 and b=y • n=p*q can be rewritten as n=((p+q)/2) 2-((p-q)/2)2=x2-y2 • x ranging from sqrt(n) to n ( sqrt(n) < x < n). • gcd(x-y,n) or gcd(x+y,n). • For example, n=23360947609, • sqrt(n) is equal to 152842.88. • try x=152,843 and find that y2 is not a perfect square. • Eventually, we try x=152,845 and find that x2-n=646416=y2=(804) 2. • Performing the gcd(x+y,n) using the Euclidean algorithm, we find p=153,649 and q=152,041. • Actual factoring algorithms reduce the guessing time using complex mathematics

Quadratic Sieve • N=p*q is the number to factor, try to find x2=y2mod N • generate large number of y1,y2,…ym; compute yi2 mod N and factor it into products of primes pjfrom a prime base B of small primes (if factors completely called smooth) • Instead of guessing randomly for yi start with yi=N • Q(z) = (z + N)2 mod N; where z=1,2…m • In non modular form Q(z)= (z +N)2 - N • Solve the quadratic equation Q(z)=0 mod pj for two roots r1 and r2; solution are r1+pj*l and r2+pj*l for l>0 • Initialize a sieve array to zero over [-M,M] and for the zth entry and it successive prime increment add the logp

Sievers Usage of CPU time This table clearly shows that the SIEVE dominates the runtime in a Number Factoring Program MPQS.c is used to extract these numbers. The program is available on the web.

Sieving in Hardware • Sieving is the core process used in number factoring • The primary task performs simple additions on an extremely large memory array • The memory arrays are addressed in increments of prime numbers • Each memory location represents an integer that needs to be analyzed -- Q(z) for z=1,2,…m • The prime numbers are selected from a base set B • The goal is to find integers that can be square rooted and are composed of primes raised to some even power

Sieving Sieving is the most compute intensive. It’s primary purpose is to find numbers that are composites of many primes raise to some even power. s1 = b12 mod n= p1a11 *p2a12 * p3a13 *p4a14 * … *pia1j s2 = b22 mod n= p1a21 *p2a22 * p3a23 *p4a24 * … *pia2j s3 = b32 mod n= p1a31 *p2a32 * p3a33 *p4a34 * … *pia3j : sk = bk2 mod n= p1ak1 *p2ak2 * p3ak3 *p4ak4 * … *piakj Sieving finds numbers that are perfect squares to solve x^2=y^2 mod n

Basic Sieving Operation To find numbers composed of primes raise to some power - we take a root (integer converted to an array location) and address location by increment by the prime number and then add the log of the prime

Significance of Prime Number: Addressing Unit • Each FPGA has 4 - 256Kx16 srams • The initial number is some root of the polynomial • From the root, memory accesses are made to prime interval locations in the array • With 4 independently addressable srams, there is never a memory access conflict • For example, if the root = 2 and the prime is 3, it is clear that all 4 memory accesses with be to different SRAMs. This exploits the unique property of prime numbers such that there is no conflicts when accessing multiple independent memory banks

System Overview Mojave Board • Mojave interfaces to the i960 development system for in-house testing Static FPGA Bus Connector Host Processor Pentium Processor PCI Slot PCI Bus

96 96 96 96 F1 F3 F2 F4 40 40 40 40 38 38 38 38 UCLA Mojave 3.0 Board Architecture = 256K x 16bit Asynchrous Memory Ext I/O Ext I/O F0 i960 Cyclone board w/32 MB DRAM PCI

Implementation on the Mojave Board • The SRAMs are the sieve arrays (or large integer set) • The main FPGA behaves as the interface to i960 processor and the 16MB of DRAM available on the cyclone • 6MB is used for the roots and prime base set in the sieving operations • The main FPGA acts as the master in reading the prime and root numbers • The FPGA processes on all 4 SRAMs simultaneously • Read contents, add logp and writeback • Currently this operates at 16MHz

Parallel Operation with Multiple FPGAs Further parallel execution can be realized with 4 processing elements simply due to the prime number addressing

Performance Comparisons

Conclusion • The current bottleneck is the memory access time of 70ns SRAMs (an upgraded board with 8ns SRAMs will improve the performance) • The unique properties of prime numbers allows for highly parallel accesses to memory and thus improving performance • More I/Os dedicated to the SRAMs on an FPGA will further boost performance • In this instances, a faster logic device or ASIC is not necessary if the memory access time is the bottleneck • Multiple FPGA implementation work is in progress

Factoring Large Numbers with Programmable Hardware