1 / 1

Fault-tolerant Typed Assembly Language

Fault-tolerant Typed Assembly Language. Q. Frances Perry, Lester Mackey, George A. Reis, Jay Ligatti, David I. August, and David Walker Princeton University. Transient Faults. TAL FT : An Idealized Fault-tolerant System. Formalizing the TAL FT System. Performance Results.

tyson
Télécharger la présentation

Fault-tolerant Typed Assembly Language

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fault-tolerant Typed Assembly Language Q Frances Perry, Lester Mackey, George A. Reis, Jay Ligatti, David I. August, and David Walker Princeton University Transient Faults TALFT: An Idealized Fault-tolerant System Formalizing the TALFT System Performance Results • Operational semantics !’ specifies how a program executes. • is a machine state (R,C,M,Q,ir). • Special operational rules randomly introduce faults by arbitrarily modifying a value in the register file R or the queue Q • Judgment `Z states that machine state  is well-typed under zap tag Z. • Z is either empty or a color c, representing the color of the computation that has been corrupted. • If Z is empty then all standard typing invariants must hold. • If Z is a color c, then values colored c do not need to conform to their given types. • We simulated the TALFT hardware using the Velocity Research Compiler. • Despite essentially doubling the number of instructions, TALFT is only 34% slower than the equivalent fault-intolerant code. • Goal: Detect all faults that change a program’s observable behavior and prove that well-typed programs always detect a single fault. • General Strategy: Two redundant and independent copies of each computation (labeled green and blue) that are compared before stores and control flow transfers. • Transient faults (also known as soft fails) occur when an energetic particle strikes a transistor, causing it to change state. • Does not permanently damage hardware. • May corrupt computation by altering stored values and signal transfers. 2014 (estimated) 2006 1 + 1 = 18 • In 2004, a typical laptop with 1GB RAM had 1 soft failure per year. • Fault rates increase with altitude. • Fault rates on a trans-pacific flight increase 300x to nearly 1 fault per roundtrip. The TALFT Machine • A machine state consists of a code memory C, a value memory M, a register file R, a current instruction ir, and a store queue Q. Conclusion • Sun, HP, and Cypress Semiconductor have admitted that transient faults caused crashes and problems at eBay, AOL, Los Alamos, and other major sites. • Faster clock rates, increasing transistor density, decreasing voltages, and smaller feature size all contribute to increasing fault rates of approximately 8% per generation. • Transient faults are already a significant concern, and their impact is increasing. • TALFT formalizes idealized hardware for a hybrid hardware-software fault-tolerance scheme. • TALFT’s type system captures the invariants necessary to reason about the system. • We formally proved that all well-typed programs are fault-tolerant relative to the fault model. . . . 0x0393 mov r1 4 0x0394 mov r3 4 0x0395 stG r2 r1 0x0396 stB r4 r3 . . . C The Fault-tolerance Theorem • A program is fault tolerant when all the faulty executions simulate fault-free executions of the program. • The relationship 1simc2 says that 1 and 2 are identical modulo values colored c. • (Simplified) Fault-Tolerance Theorem: Either a faulty execution is indistinguishable from the equivalent fault-free execution, or it terminates in the fault state. • If `  and  simc 1 and  ! ’ • then either 1! 2 and ’ simc 2 • or 1!fault. • A value is observed when it is stored to memory. • Q buffers a store from the green computation until the corresponding blue store is performed. • Special hardware is used by memory and control flow instructions to detect mismatches between the computations and trigger the special fault state. For More Information… • Transient faults are a well-known problem in the architecture community. • There are many existing solutions, but do these solutions actually work? • Solutions generally consist of algorithms stated in English, and it is left to the audience to judge correctness. • See our upcoming paper in PLDI ’07 • http://www.cs.princeton.edu/~frances/tal_ft-pldi07.pdf • Visit the Project ap Homepage • http://www.cs.princeton.edu/sip/proj/zap Frances Perry. CRA-W/CDC Programming Languages Summer School. May 9-11, 2007.

More Related