Data Mining Applied To Fault Detection

Data Mining Applied To Fault Detection Shinho Jeong Jaewon Shim Hyunsoo Lee {cinooco, poohut, darth7}@icu.ac.kr Digital Media Lab

Introduction • Aims of work • Neural Network Implementation of the Non-linear PCA model using Principal Curve algorithm to increase both rapidity & accuracy of fault detection. • Data mining? • Extracting useful information from raw data using statistical methods and/or AI techniques. • Characteristics • Maximum use of data available. • Rigorous theoretical knowledge not required. • Efficient for a system with deviation between actual process and first principal based model . • Application • Process monitoring • Fault detection/diagnosis/isolation • Process estimation • Soft sensor Digital Media Lab

Fault introduction Fault Detection? Digital Media Lab

Issues • Major concerns • Rapidity • Ability to detect fault situation at an earlier stage of fault introduction. • Accuracy • Ability to distinguish fault situation from possible process variations. • Trade-off problem • Solve through • Frequent acquisition of process data. • Derivation of efficient process model through data analysis using Data mining methodologies. Digital Media Lab

Inherent Problems • Multi-colinearity problem • Due to high correlation among variables. • Likely to cause redundancy problem. • Derivation of new uncorrelated feature variables required. • Dimensionality problem • Due to more variables than observations. • Likely to cause over-fitting problem in model-building phase. • Dimensional reduction required. • Non-linearity problem • Due to non-linear relation among variables. • Pre-determination of degree of non-linearity required. • Application of non-linear model required. • Process dynamics problem • Due to change ofoperating conditions with time. • Likely to cause change of correlation structure among variables. Digital Media Lab

Statistical Approach • Statistical data analysis • Uni-variate SPC • Conventional Shewart, CUSUM, EWMA, etc. • Limitations • Perform monitoring for each process variable. • Inefficient for multi-variate system. • More concerned with how variables co-vary. • Need for multi-variate data analysis • Multi-variate SPC • PCA • Most popular multi-variate data analysis method. • Basis for regression modesl(PLS, PCR, etc). Digital Media Lab

Linear PCA(1) • Features • Creation of… • Fewer => solve ‘Dimensionality problem‘ & • Orthogonal => solve ‘Multi-colinearity problem‘ new feature variables(Principal components) through linear combination of original variables. • Perform Noise reduction additionally. • Basis for PCR, PLS. • Limitation • Linear model => inefficient for nonlinear process. Digital Media Lab

Encoding mapping Decoding mapping Linear PCA(2) • Theory Digital Media Lab

Linear PCA(3) • ERM inductive principle • Limitation • Alternatives • Extension of linear functions to non-linear ones using… • Neural networks. • Statistical method. Digital Media Lab

Kramer’s Approach • Limitations • Difficult to train the networks with 3 hidden layers. • Difficult to determine the optimal # of hidden nodes. • Difficult to interpret the meaning of the bottle-neck layer. Digital Media Lab

Non-linear PCA(1) • Principal curve(Hastie et al. 1989) • Statistical, Non-linear generalization of the first linear Principal component. • Self-consistency principle • Projection step(Encoding) • Conditional averaging(Decoding) Digital Media Lab

Non-linear PCA(2) • Limitations • Finiteness of data. • Unknown density distribution. • No a priori information about data. • Additional consideration • Conditional averaging => Locally weighted regression, Kernel regression • Increasing flexibility(Span decreasing) • Span : fraction of data considered to be in the neighborhood. ~ smoothness of fit ~ generalization capacity Digital Media Lab

Proposed Approach(1) • LPCA v.s. NLPCA Digital Media Lab

Proposed Approach(1) • Creation of Non-linear principal scores Digital Media Lab

Proposed Approach(2) • Implementation of Auto-associative N.N. Digital Media Lab

drift Case Study • Objective • Fault detection during operating mode change using 6 variables • Data acquisition & Model building • NOC data : 120 observations => NLPCA model building • Fault data : another 120 observations Digital Media Lab

5 iterations 1st MLP N.N. 30 iterations 2nd MLP N.N. 50 iterations Model Building • Auto-associative N.N. using 2 MLP’s • Principal curve fitting Digital Media Lab

Fault introduction Monitoring Result • NLPCA model more efficient than LPCA model!!! Digital Media Lab

Conclusion • Result • Fault Detection performance was enhanced in terms of both speed and accuracy when applied to a test case. • Future work • Integration of ‘Fault Diagnosis’ and ‘Fault Isolation’ methods to perform complete process monitoring on a single platform. Digital Media Lab

Data Mining Applied To Fault Detection