Change Detection in Stochastic Shape Dynamical Models with Application to Activity Recognition

Change Detection in Stochastic Shape Dynamical Models with Application to Activity Recognition Namrata Vaswani Thesis Advisor: Prof. Rama Chellappa

Problem Formulation • The Problem: • Model activities performed by a group of moving and interacting objects (which can be people or vehicles or robots or diff. parts of human body). Use the models for abnormal activity detection andtracking • Our Approach: • Treat objects as point objects: “landmarks”. • Changing configuration of objects: deforming shape • ‘Abnormality’: change from learnt shape dynamics • Related Approaches for Group Activity: • Co-occurrence statistics, Dynamic Bayes Nets

The Framework • Define aStochastic State-Space Model (a continuous state HMM) for shape deformations in a given activity, with shape & scaled Euclidean motion forming the hidden state vector and configuration of objects forming the observation. • Use a particle filter to track a given observation sequence, i.e. estimate the hidden state given observations. • Define Abnormality as a slow or drastic change in the shape dynamics with unknown change parameters.We propose statistics for slow & drastic change detection.

Overview • The Group Activity Recognition Problem • Slow and Drastic Change Detection • Landmark Shape Dynamical Models • Applications, Experiments and Results • Principal Components Null Space Analysis • Future Directions & Summary of Contributions

A Group of People: Abnormal Activity Detection Normal Activity Abnormal Activity

Human Action Tracking Cyan: Observed Green: Ground Truth Red: SSA Blue: NSSA

Overview • The Group Activity Recognition Problem • Slow and Drastic Change Detection • Landmark Shape Dynamical Models • Applications, Experiments and Results • Principal Components Null Space Analysis

The Problem • General Hidden Markov Model (HMM): Markov state sequence {Xt}, Observation sequence {Yt} • Finite duration change in system model which causes a permanent change in probability distribution of state • Change is slow or drastic:Tracking Error & Observation Likelihood do not detect slow changes. Use dist. of Xt • Change parameters unknown: useLog-Likelihood(Xt) • State is partially observed: use MMSE estimate of LL(Xt) given observations, E[LL(Xt)|Y1:t)] = ELL • Nonlinear dynamics: Particle filter to estimate ELL

The General HMM Qt(Xt+1|Xt) … State, Xt State, Xt+1 Xt+2 ψt(Yt|Xt) Observation, Yt Observation, Yt+1 Yt+2

Related Work • Change Detection using Particle filters, unknown change parameters • CUSUM on Generalized LRT (Y1:t) : assumes finite parameter set • Modified CUSUM statistic on Generalized LRT (Y1:t) • Testing if {uj =Pr(Yt<yj|Y1:t-1) } are uniformly distributed • Tracking Error (TE): error b/w Yt & its prediction based on past • Threshold negative log likelihood of observations (OL) • All of these approaches use observation statistics: Do not detect slow changes • PF is stable and hence is able to track a slow change • Average Log Likelihood of i.i.d. observations used often • But ELL =E[-LL(Xt)|Y1:t] (MMSE of LL given observations) in context of general HMMs is new

Particle Filtering

Change Detection Statistics • Slow Change: Propose Expected Log Likelihood (ELL) • ELL = Kerridge Inaccuracy b/w πt(posterior) and pt0(prior) ELL(Y1:t )=E[-log pt0(Xt)|Y1:t]=Eπ[-log pt0(Xt)]=K(πt: pt0) • A sufficient condition for “detectable changes” using ELL • E[ELL(Y1:t0)] = K(pt0:pt0)=H(pt0), E[ELL(Y1:tc)]= K(ptc:pt0) • Chebyshev Inequality: With false alarm & miss probabilities of 1/9, ELL detects all changes s.t. K(ptc:pt0) -H(pt0)>3 [√Var{ELL(Y1:tc)} +√Var{ELL(Y1:t0)}] • Drastic Change: ELL does not work, use OL or TE • OL: Neg. log of current observation likelihood given past OL = -log [Pr(Yt|Y0:t-1,H0) ] = -log[<pt|t-1 , ψt>] • TE: Tracking Error. If white Gaussian observation noise, TE ≈ OL

ELL & OL: Slow & Drastic Change • ELL fails to detect drastic changes • Approximating posterior for changed system observations using a PF optimal for unchanged system: error large for drastic changes • OL relies on the error introduced due to the change to detect it • OL fails to detect slow changes • Particle Filter tracks slow changes “correctly” • Assuming change till t-1 was tracked “correctly” (error in posterior small), OL only uses change introduced at t, which is also small • ELLuses total change in posterior till time t & the posterior is approximated “correctly” for a slow change: so ELL detects a slow change when its total magnitude becomes “detectable” • ELL detects change before loss of track, OL detects after

A Simulated Example OL • Change introduced in system model from t=5 to t=15 ELL

Practical Issues • Defining pt0(x): • Use part of state vector which has linear Gaussian dynamics: can define pt0(x) in closed form OR • Assume a parametric family for pt0(x), learn parameters using training data • Declare a change when either ELL or OL exceed their respective thresholds. • Set ELL threshold to a little above H(pt0) • Set OL threshold to a little above E[OL0,0]=H(Yt|Y1:t-1) • Single frame estimates of ELL or OL may be noisy • Average the statistic or average no. of detects or modify CUSUM

Change Detection Yes Change (Slow) ELL=Ep[-log pt0(Xt)] > Threshold? PF pt|t-1N pt|tN = ptN Yt Yes Change (Drastic) OL= -log[<pt|t-1 , ψt>] > Threshold?

Approximation Errors • Total error < Bounding error + Exact filtering error + PF error • Bounding error: Stability results hold only for bounded fn’s but LL is unbounded. So approximate LL by min{-log pt0(Xt),M} • Exact filtering error: Error b/w exact filtering with changed system model & with original model. Evaluating πtc,0 (using Qt0) instead of πtc,c (using Qtc) • PF Error: Error b/w exact filtering with original model & particle filtering with original model. Evaluating πtc,0,Nwhich is a Monte Carlo estimate of πtc,0

Stability / Asymptotic Stability • The ELL approximation error averaged over observation sequences & PF realizations is eventually monotonically decreasing (& hence stable), for large enough Nif • Change lasts for a finite time • “Unnormalized filter kernels” are mixing • Certain boundedness (or uniform convergence of bounded approximation) assumptions hold • Asymptotically stable if the kernels are uniformly “mixing” • Use stability results of [LeGland & Oudjane] • Analysis generalizes to errors in MMSE estimate of any fn of state evaluated using a PF with system model error

“Unnormalized filter kernel” “mixing” • “Unnormalized filter kernel”,Rt, is state transition kernel,Qt,weighted by likelihood of observation given state • “Mixing”: measures the rate at which the transition kernel “forgets” its initial condition or eqvtly. how quickly the state sequence becomes ergodic. Mathematically, • Example [LeGland et al] : State transition, Xt =Xt-1+nt is not mixing. But if Yt=h(Xt)+wt, wtis truncated noise, then Rt is mixing

Complementary Behavior of ELL & OL • ELL approx. error, etc,0,is upper bounded by an increasing function of OLkc,0, tc< k < t • Implication: Assume “detectable” change i.e. ELLc,clarge • OL fails => OLkc,0,tc<k<t small => ELL error, etc,0 small=> ELLc,0 large => ELL detects • ELL fails => ELLc,0 small =>ELL error, etc,0 large => at least one of OLkc,0,tc<k<t large => OL detects

“Rate of Change” Bound • The total error in ELL estimation is upper bounded by increasing functions of the “rate of change” (or “system model error per time step”) with all increasing derivatives. • OLc,0 is upper bounded by increasing function of “rate of change”. • Metric for “rate of change” (or equivalently “system model error per time step”) for a given observation Yt: DQ,t is

The Bound Assume: Change for finite time, Unnormalized filter kernels mixing, Posterior state space bounded

Implications • If change slow, ELL works and OL does not work • ELL error can blow up very quickly as rate of change increases (its upper bound blows up) • A small error in both normal & changed system models introduces less total error than a perfect transition kernel for normal system & large error in changed system • A sequence of small changes will introduce less total error than one drastic change of same magnitude

Possible Applications • Abnormal activity detection, Detecting motion disorders in human actions, Activity Segmentation • Neural signal processing: detecting changes in stimuli • Congestion Detection • Video Shot change or Background model change detection • System model change detection in target tracking problems without the tracker loses track

What is Shape? • Shapeis the geometric information that remains when location, scale and rotation effects are filtered out [Kendall] • Shape of k landmarks in 2D • Represent the X & Y coordinates of the k points as a k-dimensional complex vector: Configuration • Translation Normalization: Centered Configuration • Scale Normalization: Pre-shape • Rotation Normalization: Shape

Related Work • Related Approaches for Group Activity • Co-occurrence Statistics • Dynamic Bayesian Networks • Shape for robot formation control • Shape Analysis/Deformation: • Pairs of Thin plate splines, Principal warps • Active Shape Models: affine deformation in configuration space • ‘Deformotion’: scaled Euclidean motion of shape + deformation • Piecewise geodesic models for tracking on Grassmann manifolds • Particle Filters for Multiple Moving Objects: • JPDAF (Joint Probability Data Association Filter): for tracking multiple independently moving objects

Motivation • A generic and sensor invariant approach for “activity” • Only need to change observation model depending on the “landmark”, the landmark extraction method and the sensor used • Easy to fuse sensors in a Particle filtering framework • “Shape”: invariant to translation, zoom, in-plane rotation • Single global framework for modeling and tracking independent motion + interactions of groups of objects • Co-occurrence statistics: Req. individual & joint histograms • JPDAF: Cannot model object interactions for tracking • Active Shape Models: good for only approx. rigid objects • Particle Filter is better than the Extended Kalman Filter • Able to get back in track after loss of track due to outliers, • Handle multimodal system or observation process

Hidden Markov Shape Model Xt=[Shape(zt), Shape Velocity(ct), Scale(st), Rotation(θt)] Xt+1 State Dynamics Observation Model Yt = h(Xt) + wt = ztstejθ + wt wt : i.i.d observation noise Using complex notation Observation, Yt = Centered Configuration Shape X Rotation, SO(2)X Scale, R+ = Centered Config, Ck-1

State Dynamics Shape Dynamics: Linear Markov model on shape velocity • Shape “velocity” at t in tangent space w.r.t. shape at t-1, zt-1 • Orthogonal basis of the tangent space, U(zt-1) • Linear Gauss-Markov model for shape velocity • “Move” zt-1by an amount vt on shape manifold to get zt Motion (Scale, Rotation): • Linear Gauss-Markov dynamics for log st, unwrapped θt

The HMM Observation Model: [Shape,Motion]Centered Configuration System Model : Shape and Motion Dynamics Motion Dynamics: Shape Dynamics: • Linear Gauss-Markov models for log st and θt • Can be stationary or non-stationary

Three Cases • Non-Stationary Shape Activity (NSSA) • Tangent space, U(zt-1), changes at every t • Most flexible: Detect abnormality and also track it • Stationary Shape Activity (SSA) • Tangent space, U(μ), is constant (μ is a mean shape) • Track normal behavior, detect abnormality • Piecewise Stationary Shape Activity (PSSA) • Tangent space is piecewise constant, U(μk) • Change time: fixed or decided on the fly using ELL • PSSA + ELL: Activity Segmentation

Stationary, Non-Stationary Stationary Shape Non-Stationary Shape

Learning Procrustes’ Mean • Procrustes meanof set of preshapes wi [Dryden,Mardia]:

Abnormal Activity Detection • Define abnormal activity as • Slow or drastic change in shape statistics with change parameters unknown. • System is a nonlinear HMM, tracked using a PF • This motivated research on slow & drastic change detection in general HMMs • Tracking Error detects drastic changes. Weproposed a statistic called ELL for slow change. • Use a combination of ELL & Tracking Error and declare change if either exceeds its threshold.

Tracking to obtain observations • CONDENSATION tracker framework • State: Shape, shape velocity, scale, rotation, translation, Observation: Configuration vector • Measurement model: Motion detection locally around predicted object locations to obtain observation • Predicted object configuration obtained by prediction step of Particle filter • Predicted motion information can be used to move the camera (or any other sensor) • Combine with abnormality detection: for drastic abnormalities will not get observation for a set of frames, if outlier then only for 1-2 frames

Activity Segmentation • Use PSSA model for tracking • At time t, let current mean shape = μk • Use ELL w.r.t. μk to detect change time, tk+1 (segmentation boundary) • At tk+1, set current mean shape to posterior Procrustes mean of current shape, i.e. μk+1=largest eigenvector of Eπ[ztzt*]=Σi=1N zt(i)zt(i)* • Setting the current mean as above is valid only if tracking error (or OL) has not exceeded the threshold (PF still in track)

A Common Framework for… • Tracking • Groups of people or vehicles • Articulated human body tracking • Abnormal Activity Detection / Activity Id • Suspicious behavior, Lane change detection • Abnormal action detection, e.g. motion disorders • Human Action Classification, Gait recognition • Activity Sequence Segmentation • Fusing different sensors • Video, Audio, Infra-Red, Radar

Experiments • Group Activity: • Normal activity: Group of people deplaning & walking towards airport terminal: used SSA model • Abnormality: A person walks away in an un-allowed direction: distorts the normal shape • Simulated walking speeds of 1,2,4,16,32 pixels per time step (slow to drastic distortion in shape) • Compared detection delays using TE and ELL • Plotted ROC curves to compare performance • Human actions: • Defined NSSA modelfor tracking a figure skater • Abnormality: abnormal motion of one body part • Able todetect as well as track slow abnormality

Abnormality • Abnormality introduced at t=5 • Observation noise variance = 9 • OL plot very similar to TE plot (both same to first order) Tracking Error (TE) ELL

ROC: ELL • Plot of Detection delay against Mean time b/w False Alarms (MTBFA) for varying detection thresholds • Plots for increasing observation noise Drastic Change: ELL Fails Slow Change: ELL Works

ROC: Tracking Error(TE) • ELL: Detection delay = 7 for slow change , Detection delay = 60 for drastic • TE: Detection delay = 29 for slow change, Detection delay = 4 for drastic Slow Change: TE Fails Drastic Change: TE Works

Slow Change: Works! Drastic Change: Works! ROC: Combined ELL-TE • Plots for observation noise variance = 81 (maximum) • Detection Delay < 8 achieved for all rates of change

Human Action Tracking Normal Action SSA better than NSSA Abnormality NSSA works, SSA fails Green: Observed, Magenta: SSA, Blue: NSSA

NSSA Tracks and Detects Abnormality Abnormality introduced at t=20 Tracking Error ELL Red: SSA, Blue: NSSA

Typical Data Distributions ‘Apples from Apples’ problem: All algorithms work well ‘Apples from Oranges’ problem: Worst case for SLDA, PCA

PCNSA Algorithm • Subtract common mean μ, Obtain PCA space • Project all training data into PCA space, evaluate class mean, covariance in PCA space: μi, Σi • Obtain class Approx. Null Space (ANS) for each class: Mi trailing eigenvectors of Σi • Valid classification directions in ANS: if distance between class means is “significant”: WiNSA • Classification: Project query Y into PCA space, X=WPCAT(Y- μ), choose Most Likely class, c, as

Classification Error Probability • Two class problem. Assumes 1-dim ANS, 1 LDA direction • Generalizes to M dim ANS and to non-Gaussian but unimodal & symmetric distributions

Change Detection in Stochastic Shape Dynamical Models with Application to Activity Recognition