1 / 26

An Efficient Polynomial Multiplier in GF(2 m ) and ist Application to ECC Designs

An Efficient Polynomial Multiplier in GF(2 m ) and ist Application to ECC Designs. Steffen Peter and Peter Langendörfer. Outline. Motivation and introduction into ECC Basic polynomial multiplication approaches Combinatorial polynomial multiplier Iterative polynomial multiplier

archer
Télécharger la présentation

An Efficient Polynomial Multiplier in GF(2 m ) and ist Application to ECC Designs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Efficient Polynomial Multiplier in GF(2m) and ist Application to ECC Designs Steffen Peter and Peter Langendörfer

  2. Outline • Motivation and introduction into ECC • Basic polynomial multiplication approaches • Combinatorial polynomial multiplier • Iterative polynomial multiplier • Implications for the ECC design

  3. Elliptic Curve Cryptography • Asymmetric cryptography • Trapdoor : Elliptic Curve Point Multiplication • one can compute: Q = kP • it is infeasible to determine k for given Q and P • Higher security with shorter keys than RSA • Recommended key lengths [Lenstra & Verheul “Selecting Cryptographic Key Sizes”]

  4. ECC in Software or Hardware? • 233 Bit ECC • on MIPS (Software) or • ECC hardware accelerator? • Time for one ECPM: • MIPS: 410 ms • HW: 0.4 ms • Energy for one ECPM: • MIPS: 16.5 mWs • HW: 0.03 mWs

  5. ECC Pyramid

  6. EC Cryptographic Operations • Cryptographic protocols • Signature generation/verification • Encryption/decryption • Executed on a CPU • May use ECC accelerator for sub-routines CPU (MIPS, ARM, LEON,…) ECC Co-processor

  7. EC Point Operations • Operations on points on the Elliptic Curve • Point addition: Point + Point • Point multiplication: integer · Point • (Montgomery/Lopez-Dahab Point Multiplication) • Executed on the Co-processor CPU ECC Co-processor

  8. EC Point Operations • Asymmetric cryptography • Trapdoor : Elliptic Curve Point Multiplication • one can compute: Q = kP • it is infeasible to determine k for given Q and P

  9. Finite Field Operations • Operations in the finite field • Addition/subtraction (m-bit XOR) • Multiplication (m-bit · m-bit) • Squaring (much faster than multiplication) • Division (very expensive) • Each EC point operation requires operations in the finite field • E.g one 233 bit EC Point multiplication • 1200 Additions • 1500 Multiplications (233 bit multiplication) • 800 Squaring • 1 division

  10. Basic Field Operations • Prime Fields (GF(p)) • p is a very large prime (about 200 bits) • requires carries for additions • preferred for software implementations • Binary Extension Fields (GF(2m)) • m is bit length of the field (typical 160-283 bit) • easy hardware representation (m-bit array) • no carries (additions are simple XOR operations) •  preferred for hardware implementations

  11. Utilization /Area of Functional Blocks • Asymmetric cryptography • Trapdoor : Elliptic Curve Point Multiplication • one can compute: Q = kP • it is infeasible to determine k for given Q and P Utilization 15% 95% 50% Area 70% 5% 20%

  12. Classic (school) Polynomial Multiplication ∙ = a(x) b(x) a(x) & b(x0) + a(x) & b(x1) + a(x) & b(x2) + a(x) & b(x3) . . . + + a(x) & b(xm-2) + a(x) & b(xm-1) c(x) = a(x) ∙ b(x)

  13. Classic Polynomial Multiplication • Gate count: m2 AND gates (m-1)2 XOR gates • Longest path: 1 AND + log2(m) XOR & + & + & + & + & + & + & + &

  14. Classic Karatsuba Multiplication  a(x) A1 A0  b(x) B1 B0 A0∙B0 + A0∙B0 + (A1+ A0) ∙ (B1+ B0) + A1∙B1 + A1∙B1 c(x) = a(x) ∙ b(x) 4 additions (XOR) + 3 multiplications per level (CPM: 3 additions + 4 multiplications)

  15. Classic Karatsuba Multiplication • Gate count: AND gates XOR gates • Longest path: 1AND + 3 log2mXOR 3 XORs each 3 XORs each 3 XORs each & & & & & & & &

  16. Iterative Karatsuba Multiplication • Split factors in 4 segments A(x) = a3…a0 B(x) = b3…b0 • Perform 9 partial multiplications • Result is 8 segments C(x) = c7…c0

  17. Iterative Karatsuba Multiplication (2) • Optimized aggregation plan Reduces number of XOR operations to 34 (instead of 40 for classic Karatsuba) • Without additional costs • constant number of ANDs • constant longest path • Can be applied recursively • 256 bit mul = 9 x 64 bit mul • 64 bit mul = 9 x 16 bit mul • 16 bit mul = 9 x 4 bit mul

  18. Comparison • Hybrid RAIK is smallest polynomial multiplication unit • BUT: CPM is faster 9x 9x 9x

  19. Recursive combinatorial multiplication units • Perform multiplication within one clock cycle • Do not need state information • Technical feasible up to 256 bit • huge complexity • high latency • Practically questionable • Data transport/bus becomes bottleneck A B MUL 256 bit 16 ns C = A·B

  20. Iterative multiplication units • More than one clock cycle per Multiplication • Iterative unit embeds smaller recursive unit • Highly regular structure • flexible • little overhead 9 times Control Partial Multiplier Aggregation A Selection B C 256 bit 64 bit 128 bit 511 bit

  21. Iterative multiplication units • 256 bit polynomial multipliers

  22. Set up an ECC accelerator design • Asymmetric cryptography • Trapdoor : Elliptic Curve Point Multiplication • one can compute: Q = kP • it is infeasible to determine k for given Q and P • 283 bit • Bus • Registers • Alu • Speed requirements • 4 segment • - Multiplier(72 bit embedded) • Adapt control logic

  23. ECC designs 163 – 571 bit • Time per ECPM

  24. ECC designs 163 – 571 bit • Energy per ECPM and silicon area (IHP 0.25um CMOS)

  25. Conclusions • Polynomial multiplication is the most challenging operation in the finite field: • executed 1500 times for one 233 bit ECPM • Most silicon area (70%) • Highest utilization (95%) • Large combinatorial multiplier are feasible • hRAIK is the smallest • Classic polynomial is the fastest • For ECC designs iterative Karatsuba approaches are well suited • Adaptable • Small • Energy efficient

  26. Thank You Questions? peter@ihp-microelectronics.com

More Related