480 likes | 862 Vues
Making the Most of Process Information via Multiscale and Bayesian Methods Bhavik R. Bakshi Department of Chemical Engineering Ohio State University Columbus, OH 43210 CPACT Conference, Edinburgh, April 25-26, 2002 Overview of Research Group
 
                
                E N D
Making the Most of Process Information viaMultiscale and Bayesian Methods Bhavik R. Bakshi Department of Chemical Engineering Ohio State University Columbus, OH 43210 CPACT Conference, Edinburgh, April 25-26, 2002
Overview of Research Group • Goal: Develop tools and techniques for efficient and sustainable process engineering • Projects focus on process and global scales • Process scale • Multiscale and Bayesian methods for extracting knowledge from process data • Global scale • Economically and ecologically conscious process engineering • Develop rigorous and systematic methods and explore their applications
Motivation for Multiscale and Bayesian Methods • Processes and data are usually multiscale in nature • Events and features at multiple scales • Multirate measurements • Autocorrelated stochastic processes • Variety of process knowledge and information available • Measured data • Fundamental, empirical or heuristic knowledge • Single-scale and non-Bayesian methods lead to • Inferior analysis and modeling • Inefficient computation and use of available information • Disintegrated operation • Multiscale and Bayesian methods can perform better
Multiphase Flow • Flow regimes in fluidized bed • Partial models and data are available for each regime intensity time Homogeneous Flow Heterogeneous Flow Slug Flow
Sheet and Film Manufacturing • Different sampling interval in each channel • Dynamic models are also available sensor direction machine direction
Chemical Process Operation Planning Scheduling Planning Scheduling Supervisory Control Supervisory Control Monitoring and Diagnosis Monitoring and Diagnosis Regulatory Control Regulatory Control Data Acquisition Data Acquisition Process Process • Efficient operation requires reasoning at different scales • Process data and knowledge are available
Objectives • Develop methods for efficient process operation that can exploit • Multiscale nature of processes • All available process data and knowledge • Focus on the following tasks • Process Monitoring • Fault Diagnosis • Empirical Modeling • Data Rectification and Estimation • Analysis of complex chemical and biological systems • Integrate process operation tasks
Outline • Introduction to • Bayesian methods • Wavelet analysis • General Approach for Multiscale Methods • Fault Detection and Diagnosis • MSPCA, MSART • Empirical Modeling • Bayesian PCA, Bayesian Latent Variable Regression • Dynamic Data Rectification • Linear systems with and without accurate models • Nonlinear systems • Approaches are general and broadly applicable to variety of modeling and analysis tasks
Bayesian Estimation Prior knowledge, P(H) Rev. Thomas Bayes 1702-1761 Bayesian estimate, H Posterior,P(H|D) (Current Belief) ^ Info. from data, P(D|H) (New Belief) (New Information) Loss Function (Select sample from posterior) • Statistical framework for combining priorknowledge with empirical observations • Posterior becomes prior at next time • Bayes Rule, P(H | D) =P(D | H) P(H) • P(D)
Illustration of Bayesian Estimation • P(H|D) 1 as t t t=1 t=2 t=3 ... Prior Posterior Posterior/ Prior Posterior/ Prior Posterior/ Prior Data Data Data • A newly born baby sees the sun setting and wonders, “Will it be back?” (Malakoff, 1999) • Prior knowledge: sun may or may not rise, P(H) = 0.5 • Data obtained everyday = Sun rises • Posterior at t=k becomes prior at t=k+1
Challenges in Bayesian Analysis • Need distributions for prior and likelihood • Bad prior can give slow convergence and misleading answer • Gaussian densities are mathematically convenient but may not represent reality • Can be computationally expensive, particularly for non-Gaussian densities • Potential solutions • Use Empirical Bayes methods - estimate prior from measured data • Combine Bayesian analysis with Multiscale analysis • Markov Chain Monte Carlo methods
Multiscale Nature of Variables Equipment degradation w Sensor failure Noise Sensor failure Disturbance Equipment failure Equipment • Delta functions • Fourier Transform degradation • Linear Filters • Wavelet Transform Disturbance Noise 0 20 40 60 80 100 Equipment failure time, t Process Signal 0 20 40 60 80 Time 100 w t
Wavelets Haar wavelet Haar scaling function m=1, k=0 m=2, k=4 m=1, k=0 m=2, k=4 (x) y(x) x Daubechies-6 scaling function Daubechies-6 wavelet (x) y(x) x • Family of basis functions of fixed shape • Translations and dilations of mother wavelet ymk(x) = 2-m/2y(2-mx - k) m, k are integers
Wavelet Decomposition G H m=1 G H m=2 w Original signal m=0 t Scaled signals, ym Wavelet Transform/Detail signal, dm
Properties of Wavelets • Represents signals and functions as • y(t) = SSdmkymk(t) + SyLkfLk(t) • Localized in time and frequency • Deterministic features are captured by few large coefficients • Approximate eigenfunctions • Stochastic processes are approximately decorrelated • Can be orthonormal • Fast computation, O(N) • Extended to libraries of basis functions • Wavelet packets, cosine packets, etc.
Multiscale Feature Extraction Original Signal Wavelet Coef. m=1 Wavelet Coef. m=2 Wavelet Coef. m=3 Scaled Coef. m=3 Threshold & Reconstruct
Analysis of Stochastic Processes • Wavelet coefficients are approximately uncorrelated and Gaussian ARIMA ACF PDF Original, y0 Wavelet coeffs., d1, d2 Last Scaled Signal, y2
Process Operation Tasks • Process Monitoring / Fault Detection • Detect abnormal operation from measured data • Empirical Modeling • Determine relationship between variables based on measured data • Data Rectification • Clean measured data by removing errors and satisfying process models
General Multiscale Methodology coarse Operate on HLX . . . . Operate on GLX . . ^ ^ WT X, q X W Operate on GmX fine Operate on G1X • Convert traditional to multiscale methods (Bakshi, 1999) • Can use models at each scale and across scales
Multiscale Statistical Process Control(Bakshi, 1998, Aradhye et al., 2000a, b) • SPC detects abnormal behavior from measured data • Lacks generality, best for certain types of changes • Shewhart charts for large shifts • CUSUM, EWMA for small shifts • Assumes uncorrelated measurements • Multivariate SPC reduces dimensionality by linear or nonlinear modeling • Normal and abnormal behavior usually occur at different scales • MSSPC should perform better
Detecting Mean Shift by MSSPC 8 4 0 -4 3 0 6 3 -3 W WT 0 0 3 -4 -2 60 140 140 0 40 80 100 120 60 80 100 120 20 0 20 40 0 time -3 4 0 -4 • Uncorrelated data with mean shift of 2s • First shift detection at scale m=2 • Current shift detection in last scaled signal
Example of Univariate MSSPC SPC MSSPC • Mean shift of size 5 in iid Gaussian measurements • MSSPC detection limits adapt to signal features
General Framework for SPC • Existing SPC filters operate at different fixed scales • MSSPC subsumes existing methods CUSUM Shewhart MA EWMA Haar Daubechies-4, boundary corrected
Library of MSSPC Filters Moving Avg. CUSUM Moving Avg. Shewhart
Multivariate SPC x2 X2 Normal PC1 * * * * * PC2 *** * ** * * * * ** ** * ** * * ** * * ** + ** + ** + + + + + + + + + + + + + + + + + + + + Abnormal x1 X1 • Univariate charts are inconvenient for multivariate tasks • Multivariate modeling reduces dimensionality • Linear modeling (PCA, PLS) • Nonlinear (clustering, NLPCA) • Detect changes in transformed space
Clustering with ART Typical process data • Features of Adaptive Resonance Theory (ART) • Adaptive clustering • Inspired by neural networks (Carpenter and Grossberg) • Useful for change detection and diagnosis X2 Normal * * * * * *** * ** * * * * ** ** * ** * * ** * * ** + ** + ** + + + + + + + + + + + + + + + + + + + + Known operational event X1
MSSPC - Industrial Validation • Case Studies • Change in Furnace Feed • Valve Leak Malfunction • Cold Weather Malfunction • Feed Malfunction • Event start and end determined with operator input • Cannot perform ARL analysis • Plot “Missed Alarm Rate” versus “False Alarm Rate” for different detection parameters • Better method has smaller missed alarm rate for same number of false alarms
Data - Valve Leak Malfunction • Three redundant sensors
Performance - Valve Leak • Multiscale methods do better ART PCA Missed Alarm Rate MSART MSPCA False Alarm Rate
MSART vs. Operator - Valve Leak • MSART detects leak ~ 200 minutes before operator Abnormal Operator Normal Time step (minutes)
Data - Cold Weather Event • Valve failure due to low ambient temperature • Single measured variable
Performance - Cold Weather Event • Approximately stationary and Gaussian data • MSPCA does best ART Missed Alarm Rate MSART PCA MSPCA False Alarm Rate
MSSPC - Summary • MSSPC provides better average performance for a variety of types and magnitudes of faults • Recommended when nature of features representing process change is unknown • If type of feature to be detected is known a priori, better to use traditional methods • Extension to reduce user-defined parameters, and to bigger library of basis functions is in progress • Bayesian MSSPC can do better, but requires probability of faults
Linear Regression • All methods determine a model of the form Y = Zb • Inputs, Z, may be combined to form latent variables, T, in reduced dimension space (PCA, PLS) T = ZP • Latent Variable Regression (LVR) model Y = ZPb • Ideal method • Handles collinear variables • Accounts for errors in both input and output variables • Integrates regression and filtering • Incorporates external information and multiscale behavior ^ ^ ^ ^ ^ ^
Bayesian PCA and LVR • Maximize posterior P(T, P, r, b|Z, Y) = P(Z, Y |T, P, b, r) P(T, P, r, b) • Approach • Solve conventional regression problem • Estimate prior from conventional solution • Solve Bayesian regression problem by iterating between • Rectification to estimate T, P • Parameter estimation to obtain b • Assumptions • Noise and underlying measurements are Gaussian • Regression parameters are Gaussian • Rank is known ^ ^
BPCA - Example • Three correlated variables u3 = u1 + u2; u1 ~ N(3,1); u2 ~ N(1,4) • Measurements corrupted by additive Gaussian noise Z = U + e • MSE for 100 realizations • Smaller coeffs. MSE for higher dimensional problems Method Prior Inputs Coeffs. PCA uniform 2.72 0.187 MLPCA uniform 2.11 0.093 BPCA empirical 1.40 0.092 BPCA exact 1.22 0.000
BLVR - Example • Three correlated variables u3 = u1 + u2; u1 ~ N(3,2); u2 ~ N(1,4) • Noise-free output x = 0.8u1 + 0.8u2 • Measurements corrupted by additive Gaussian noise y = x + ex; Z = U + eu • MSE for 100 realizations Method Prior Inputs Outputs Coeffs. OLS uniform 1.32 0.66 0.010 PLS uniform 1.18 0.71 0.012 BLVR empirical 0.69 0.60 0.007 BLVR exact 0.66 0.55 0.000
Bayesian Regression - Summary • Bayesian approach can improve PCA and LVR without additional data • Can deal with • Errors in all variables • Correlated variables • External information • Prior knowledge may be obtained from • Data being modeled, via empirical Bayes approach • Historical data • Many opportunities for further work
Data Rectification and Estimation • Estimate measured variables and unknown quantities • Bayesian problem formulation Given y1:k = {y1, y2, ..., yk} maximize P(xk|y1:k) subject to xk = fk-1(xk-1, wk-1) state eqn. yk = hk(xk, vk) measurement eqn. g1(xk) = 0 equality constr. g2(xk) ≥ 0 inequality constr. • Existing methods rely on many assumptions .
Existing Methods for NDDR • Extended Kalman Filtering (Jazwinski, 1970) • Assumes fixed Gaussian distributions, • Uses linearized models • Cannot satisfy constraints • Moving Horizon Estimation (Robertson, Lee and Rawlings, 1996; Rao and Rawlings, 2002) • Satisfies constraints • Assumes fixed Gaussian distributions • Computationally expensive due to non-recursive solution • Existing methods solve the convenient NDDR problem, not the real one • Actual probability distributions are infinite dimensional and change in size and shape
Evolution of Probability Distributions • Evolution of posterior for popular adiabatic CSTR • Gaussian approximation is even more inaccurate with constraints
Results of CSTR Example • Perfect initial guess • 100 realizations, 1600 measurements /realization, 500 samples/realization • Work in progress • Relevant to model predictive control, Bayesian neural networks, etc.
Rectification without Accurate Models ^ y x -1 -1 m m • Most processes are dynamic but lack accurate models • Wavelet representation captures dynamics in variation of variance across scales w(m) ~ N(0,Pd ); Ax(m) = Hm; Bx(m) = Gm • Rectify coefficients at each scale (Bakshi et al., 2001) dm = Km(CTRm dm + Pd md ) • Features of multiscale approach • More accurate than single-scale approaches • More computationally efficient since scales with less information can be identified before rectification m
Example • Level control process (Bellingham and Lees, 1977) hk+1 = 0.995 -0.1373 hk + 0.00012 0 F3k xk+1 0 1 xk 0 1 ek • F3k and ek are iid Gaussian • [hk xk F3k ek] are corrupted by iid Gaussian noise • None None 1.00 • Max. Likelihood Steady state 0.67 • Single scale Bayes Steady state 0.40 • Multiscale Bayes Steady state 0.06 • Single scale Bayes Dynamic 0.05 • Multiscale Bayes Dynamic 0.03 Method Model MSE
Data Rectification - Summary • Existing approaches to nonlinear estimation and rectification requires assumptions • Gaussian noise, prior • Non-time varying distributions • Assumptions are readily violated • Proposed approach relies on Monte Carlo sampling • More accurate that existing methods • Computationally less expensive than MHE • Many opportunities for further work
Summary • Large amounts of measured data and process knowledge are available • Existing methods do not make the most of available data and knowledge • Processes are multiscale, but methods are single-scale • Fundamental models and partial knowledge are underutilized • Developed new multiscale and Bayesian methods for, • Fault detection and diagnosis, • Dynamic data rectification, and • Empirical modeling • Significant opportunities for future research and applications
Future Work • Nonlinear dynamic data rectification • Bayesian nonlinear regression/neural networks • Estimation of multirate systems and missing data • Integrated rectification, monitoring, diagnosis, and supervision • Bioinformatics and genomics • Process scale-up
Acknowledgments • Graduate students and post-docs • Prof. Sridhar Ungarala • Dr. Hrishikesh Aradhye • Collaborators • Prof. Prem K. Goel • Dr. Manabu Kano • Financial Support • National Science Foundation (CTS 9733627) • Abnormal Situation Management Consortium • Du Pont Education Fund • Technical Association of Pulp and Paper Industry • American Chemical Society - Petroleum Research Fund • Dr. Mohamed Nounou • Mr. Wen-Shiang Chen • Prof. Xiaotong Shen