Fault Diagnosis Overview

Fault Diagnosis Overview David Lavo UC Santa Cruz January 13, 2005

Introduction: What is Fault Diagnosis? Components: What’s involved? Algorithm details: How does it work? Diagnosis in practice: How does it really work? Research: Why does (or doesn’t) it work? How should it work? Outline Fault Diagnosis Overview

What is Fault Diagnosis? • A guess as to what’s wrong with a malfunctioning circuit • Narrows the search for physical root cause • Makes inferences based on observed behavior • Usually based on the logical operation of the circuit Fault Diagnosis Overview

Defective Circuit Observed Behavior Location or Fault Diagnosis Diagnosis Algorithm Physical Analysis VLSI Fault Diagnosis (in One Slide) Tests

Two Types of Diagnosis • Circuit Partitioning (“Effect-Cause” Diagnosis) • Identify fault-free or possibly-faulty portions • Identify suspect components, logic blocks, interconnects • Model-Based Diagnosis (“Cause-Effect” Diagnosis) • Assume one or more specific fault models • Compare behavior to fault simulations Fault Diagnosis Overview

Circuit Partitioning • Separate known-good portions of circuit from likely areas of failure • Simplest method: identify failing flip-flops • Tester can identify failing flops or outputs • Input cone of logic is suspect • Intersection of multiple cones is highly suspect • Single clock pulse with scan can be used for sequential/functional fails Fault Diagnosis Overview

Back-Tracing Failures

aka Effect-Cause Diagnosis • Reasoning based on observed behavior and expected (good-circuit) functions • Commonly used at system and board-levels • Tries to separate good and suspect areas • Advantage: Simple and general • Disadvantage: Not very precise, often gives no indication of defect mechanism Fault Diagnosis Overview

Cause-Effect Diagnosis • Start from possible causes (fault models), compare to observed effects • A simulator is used to predict behavior of the circuit in the presence of various faults • Match prediction(s) against observed behavior • Advantage: Implicates a mechanism as well as a location • Disadvantage: Can be fooled by unmodeled defects Fault Diagnosis Overview

Behavior Signature 010001010100010101010 … Defective Circuit Comparison & Conclusion Diagnosis Algorithm 010100110000101010100 … 101000100001011101100 … 010100010100011101100 … 000111000101010011110 … Fault Simulator Candidate Signatures Cause-Effect Diagnosis Tests

Components of Fault Diagnosis • Fault models • Fault simulators • Fault dictionaries • Diagnosis algorithms Fault Diagnosis Overview

Fault Models • A fault model is an abstraction of a type of defect behavior • A fault instance is the application of a model to a circuit wire, node, gate, etc. • Used to create and evaluate test sets • For diagnosis, they can be used to simulate and predict faulty behaviors Fault Diagnosis Overview

Stuck-at Fault Model • The most-used fault model (by far) • Simple to simulate and enumerate • Effective for testing, fault grading, and diagnosis of some defects • Many defects are not well represented by the stuck-at model Node A stuck-at 1: 0/1 A 0/1 1 B (Fault-free/faultylogic values)

0 X 0 1 1 Y 1/0 1 Bridging Fault Model • Shorts are a common defect type in CMOS • Different bridging fault models have varying accuracy and precision, from simplistic to very sophisticated • Difficult or impractical to enumerate Nodes X and Y bridged: Node X forces Y to a value of 0

Some Diagnostic Fault Models Gate Fault Net Fault Bridging Fault Path Fault

Fault Simulators • A fault simulator can simulate instances of a particular fault model • Inputs: • Circuit (netlist) • Test set • Faultlist (list of fault instances) • Output: circuit response • Usually, simulates the presence of a single fault instance (“single-fault assumption”) Fault Diagnosis Overview

Fault Dictionaries • A fault dictionary is a database of the simulated responses for all faults in faultlist • Used by some diagnosis algorithms for convenience: • Fast: no simulation at time of diagnosis • Self-contained: netlist, simulator, and test set not needed after dictionary creation • Can be very large, however! Fault Diagnosis Overview

The Full-Response Dictionary • For each fault ( f ), store the response to each test vector ( v ) • One bit per vector, pass ( 0 ) or fail ( 1 ) • For each vector, store the expected output response ( o ) • Total storage requirement: f  v  o bits Fault Diagnosis Overview

The Pass-Fail Dictionary • For each fault, store only the test vector responses • One bit per vector, pass ( 0 ) or fail ( 1 ) • Total storage requirement: f  v bits • Much smaller than full-response, and often practical for even very large circuits Fault Diagnosis Overview

Dynamic Diagnosis • Alternative to dictionary-based diagnosis • Fault simulation is only done for certain faults, based on test results • Only simulate faults in input cones of failing flip-flops/outputs • Dictionary is eliminated, but requires complete netlist and test pattern file • Used by most commercial ATPG tools: Mentor Fastscan, Synopsys, Cadence, etc. Fault Diagnosis Overview

Algorithm Details • Role of a diagnosis algorithm • Scoring methods • Types of diagnosis algorithms Fault Diagnosis Overview

Diagnosis Algorithms • Algorithms compare observed behavior to predicted behaviors • An algorithm attempts to “explain” the observed failures with fault candidates • The job of a diagnosis algorithm is to report the best fault candidate(s) • “Best” is determined by scoring method Fault Diagnosis Overview

Fault Candidate Scoring • Two common scoring methods • Match/mismatch points • Fault candidate probability • Other common scorings: • Hamming distance • Set intersection/overlap • Nearest neighbor Fault Diagnosis Overview

Match/mismatch Point Scoring • Award points for matching observed failures • Optionally deduct points for not predicting fails • Nonprediction: A behavior not predicted by candidate • Misprediction: A prediction not fulfilled by behavior • Commercial tools (e.g. Fastscan) are usually biased to lowest nonprediction Fault Diagnosis Overview

Probabilistic Scoring • Probability score based on matches and mismatches and error assumptions • Weights for non- and mis-prediction • Different prediction probabilities for different fault candidates (bridges vs. stuck-at) • Usually normalized so that total of all candidates equals 1.0 • UCSC method uses probabilities to compare stuck-at candidates to bridges in same diagnosis Fault Diagnosis Overview

Types of Diagnosis Algorithms • Stuck-at • Most common, best supported by tools • Surprisingly effective (~60% exact matches) • Very fast • IDDQ • Orthogonal set of failing data • Requires interpretation of tester results • Not well supported by tools Fault Diagnosis Overview

IDDQ Threshold Setting

Types of Diagnosis Algorithms (Cont) • Bridging-fault • May better represent common CMOS faults • More complicated fault model • Biggest problem: candidate selection • Other possible (future) directions: • Functional fails • Delay fails • Parametric failures Fault Diagnosis Overview

Diagnosis in Practice • Using a diagnosis • Translating the results: circuit navigation • Evaluating diagnosis quality • Commercial diagnosis tools Fault Diagnosis Overview

Using a Diagnosis • Fault diagnosis is used to aid physical inspection and root-cause identification • Diagnosis output is logical, not physical: • Abstract faults (such as stuck-at) • Gates, ports (nodes), and nets • No information about location or size • Translation to physical location requires navigation of circuit Fault Diagnosis Overview

Types of Circuit Navigation • Netlist • Examine RTL (Verilog/VHDL etc) for gates and data paths • Schematic • Symbolic view of gates and wires • Layout/artwork • Graphical view of metal lines, poly, vias, cell boundaries, etc. Fault Diagnosis Overview

Circuit Netlist module TOP (CLK, Reset, StartOut, SiReady, Rst_CntN, Up_DnN, Wr, SDin, Wr_RAM, Wr_Rreg, RAM_Addr, ATG_TESTMODE, BIST_TESTMODE, SDout, TwoOnes, OneOne, NoOnes, TwoZeros, OneZero, NoZeros); input CLK; inout Reset, StartOut, SiReady, Rst_CntN, Up_DnN, Wr, SDin, Wr_RAM; inout [2:0] RAM_Addr; inout ATG_TESTMODE; inout BIST_TESTMODE; inout SDout, OneZero, NoZeros; inout TwoOnes, OneOne, NoOnes, TwoZeros, Wr_Rreg; // Tie off cells TLOW tielow1 (.Q(tielow)); THIGH tiehigh1 (.Q(tiehigh)); // Inverted CLK wire CLK_N; INVFF clkinv (.Q(CLK_N), .A(CLK)); //PADS PADNMIOSCM0H08N05B50 PAD001_StartOut (.PUEN(tiehigh), .PDE(tielow), .IEN(tielow), .I(StartOut_I), .SIGNAME(StartOut), .INMODE(in_mode_avail), .TESTI(jumper001), .TESTIEN(tiehigh), .SCANIN(jumper001), .OUTMODE(out_mode_avail), .TESTO(tiehigh), .TESTOEN(tiehigh), .O(tielow), .OEN(tiehigh));

Netlist Navigation • Either use text editor on netlist, or use browser function in simulator • Browsers allow you to trace forward and backward and see logic values • Can be used to view hierarchy and functional blocks • Can be tedious Fault Diagnosis Overview

Circuit Schematic

Schematic Navigation • Either hand-drawn (from netlist navigation) or tool-generated gate symbols and wires • Schematic tools in simulators also allow forward and backward traversal and display of logic values • Used to verify fault propagation • Does not reflect physical distances Fault Diagnosis Overview

Circuit Artwork

Layout (Artwork) Navigation • Use routing/floorplanning tools to view artwork • Can usually input cell or wire name and tool will highlight the object • Useful for determining (x,y) values • Also good for evaluating physical implications of a set of fault candidates • Faults clustered in a small area are good • Faults/nets spread around large die areas are bad Fault Diagnosis Overview

Net runs across die: physical examination is almost impossible Faults contained in small area: physical examination is possible Fault Proximity

Evaluating a Diagnosis • A diagnosis without one or a few strong (high-scoring) candidates is usually poor • Can indicate: • Multiple defects • Unmodeled (complex) behavior • Inappropriate algorithm • If the diagnosis is poor, either try another algorithm or look for more data (failures) Fault Diagnosis Overview

Evaluating a Diagnosis (cont) • Many diagnoses (~60%) implicate a single stuck-at fault • Usually a good sign, but you must consider equivalent faults • Many defects can mimic a stuck-at fault, without being a short to Vdd or Gnd • Consider nearby nodes also, if practical Fault Diagnosis Overview

Dominance Bridging Fault Strong inverter FIB short Weak inverter Top candidate is stuck-at fault on this node.

Candidate #2 is Best Candidate #2 Candidate #1 Candidate #3 FIB short

Commercial Tool:Mentor Graphics • ATPG tool: Fastscan • Stuck-at diagnosis only • No IDDQ capability • Orders candidates by number of matched failures (biased to lowest non-prediction) • Also has netlist & schematic browser • Based on Waicukauski & Lindbloom (D&T‘89) Fault Diagnosis Overview

Commercial Tool: Synopsys • ATPG tool: TetraMAX • J. Waicukauski moved to Synopsys after writing Fastscan • Diagnosis capability unknown: assumed to be similar to Fastscan Fault Diagnosis Overview

Commercial Tool: Cadence • ATGP tool: Encounter Test • Test and diagnosis tools purchased from IBM • IBM has had good diagnosis research, but Encounter’s capabilities are unknown • Also of interest: Silicon Ensemble - routing tool • Graphical artwork viewer • Good for highlighting nets and cells based on diagnosis results • Good for determining (x,y) and producing screen shots Fault Diagnosis Overview

Prior Art • Waicukauski & Lindbloom, IEEE Design & Test, Aug. ‘89 • Most widely-used algorithm for commercial tools • Finds candidates to match individual tests, attempts to “explain” all failing tests • Abramovici & Breuer, IEEE Trans. Computing, June ‘80 • Effect-cause diagnosis • Permanent stuck-at fault assumption • Aitken & Maxwell, HP Journal, Feb. ’95 • Analysis of relative importance of models vs. algorithms • Lavo, Larrabee, et. Al., Proceedings of ITC ’98 • Probabilistic scoring • Mixed-model diagnosis • Bartenstein et. Al., Proceedings of ITC ’01 • SLAT: Single Location At-a-Time diagnosis • Focus on matching per-vector results Fault Diagnosis Overview

Fault Diagnosis Overview

Fault Diagnosis Overview

Presentation Transcript

Chiller Fault Detection and Diagnosis (FDD)

LOGIC SIMULATION AND FAULT DIAGNOSIS

P2P Distributed Fault Diagnosis for SIP Services

Lecture 3 – Diagnosis: overview

Faults in Circuits and Fault Diagnosis

Fault Diagnosis for Timed Automata

Fault Diagnosis* of Software Systems

Introduction to Fault Diagnosis and Isolation(FDI)

Exclusive Test and its Application to Fault Diagnosis

Diagnosis with Fault Modes

MONITORING AND FAULT DIAGNOSIS OF INDUCTION MOTORS

Design, Control and Fault Diagnosis of Industrial Drives

Automatic Fault Diagnosis --SE Workshop--

Fault Diagnosis System for Wireless Sensor Networks

Fault Diagnosis

Laboratory Diagnosis: An Overview

Embedded Fault Diagnosis for Digital Logic Exploiting Regularity

Fault Detection and Diagnosis (II)

Overview of Various Industrial Fault Diagnosis Methods

Laboratory Diagnosis: An Overview

Fault Detection and Diagnosis