Activity Analysis in Video

Activity Analysis in Video Spring 2005 Computational Intelligence Seminar Series Partial Review of the Paper “Discovery and Segmentation of Activities in Video” By Matthew Brand (MIRL) Presented by Derek Anderson

Topics • TigerPlace Project • Monitoring Silhouette Activity • Monitoring Object Activity • Monitoring both (separate or combined) • Hidden Markov Models (Brief Introduction) • Evolutionary Computing for Structure Discovery • Matthew Brands Approach to Activity Recognition

(c) (c) Behavior Reasoning Behavior Reasoning Alerts Alerts physical physical video video Alarm Filter Alarm Filter activity activity activity activity descriptor descriptor descriptor descriptor (a) (a) (b) (b) Activity Activity Activity Activity Caregivers Caregivers Analysis Analysis Analysis Analysis and and gait monitor gait monitor Residents Residents anonymized anonymized sensor sensor stove temp. sensor stove temp. sensor video video event event motion sensor motion sensor sensor sensor Video Video video sensor video sensor event event Data Data network network sensor mat sensor mat Manager Manager Alerts Alerts bed sensors bed sensors Context for this Presentation • TigerPlace Project • One component of our system will involve analyzing video (in real-time) and recognizing an important set of “short term” activities

Sensor and Video Networks • We are doing the research for the video sensor network • iPAQ hx4700 series PDA with HP PhotoSmart Digital Cameras • The results from the video network can be combined with other sources of information from the sensor network (gait monitor, bed sensors, …) to reduce false alarm rates and help increase the overall confidence that the activities occurred • Is this going to be handled inside the behavior reasoning component of the system … (fuzzy rules)? • Fuzzy Integrals? • Fuzzy Integral: use each of the sources of information in the sensor and video networks, taking into account how reliable each individually are (possible for different kinds of tasks), and asses our confidence in a particular hypothesis, which is an individual activity?

Important Elderly Activities • What kind of activities to recognize? • Presently, we are deciding on an initial set to study • A few possibilities include • Total body motion • Falling down (and not being able to get up) • Someone entering and leaving their bed • Sitting and getting up from a chair • Partial body motion • Taking their medicine • Drinking

What features for the video system? Common approach: Silhouette’s Silhouette is an image based representation of individual with nearly all personal and distinguishing information removed Features from silhouettes will be used to monitor an individuals activity These silhouettes will be initially extracted through image subtraction against a known and stationary background (cleaned up with binary morphology, reconstruction operator) Monitoring while Ensuring Privacy

What the Silhouette's really look like (still a very ideal setting) Conventional Morphological Opening of Extracted Silhouette (Left) Morphological Reconstruction Operation on Extracted Silhouette (Right)

Silhouette motion over time(identification of activity regions) Consecutive Silhouette Subtraction (left) and after additional Erosion Operation (right)

New Application? • Do not necessarily focus on the silhouettes, but rather the objects in the environment (or the co-interaction of the two) • Object or interesting landmark identification • SIFT (Scale Invariant Feature Transform) • Interesting enough texture on everything? • Where are the camera’s placed? • Too complex to apply at first? • Will it run real time (present equation, Bob = NO) • Low level simple image processing techniques • Have to see what the resolution and quality of the images are • Use simpler image processing techniques to recognize particular objects • How to deal with some occlusion (why co-interaction might be helpful) • Used the YUV color space to help identify skin regions that helped in dealing with occlusion for objects the individual would interact with (tracked the hands) • NLM Short-Term Fellowship (Summer 2004) • At the end of the summer, I used Bob’s SIFT implementation to identify key points from a pill bottle (used the minimum spanning tree and density measure) • Helped reduce some of the false alarms (in the pill taking activity)

Activity Recognition • I don’t think that we have decided on the exact approach to use yet? • Looks like some form of HMMs might be as good of place as any to start? • Simple • DOHMMs, COHMMs, or MDCOHMMs • HHMMs (Hierarchical) • Learning Hierarchical Hidden Markov Models for Video Structure Discovery • Entropic HMMs (Structure discovery) • Discovery and Segmentation of Activities in Video

1 2 2 1 1 1 1 … 2 2 2 2 … K … … … … x1 K K K K x2 x3 xK … Temporal Pattern Recognition • Hidden Markov Models (HMM) are statistical methods (stochastic networks) that model sequential patterns that arise from a set of observation sequences which are believed to have come from the process of interest. • HMMs are known for their application in areas such as natural speech recognition, word and symbol recognition, etc ... • HMMs are a doubly embedded stochastic process with an underlying process that is not observable (hidden), but can only be observed through another set of stochastic processes that produce the sequence of observations.

Mixture Density Continuous Observation HMM

HMM Problems • Given the observation sequence O = O1O2O3…Ot, and a model m = (A, B, p), how do we efficiently compute P(O | m)? • Given the observation sequence O and a model m, how do we choose a corresponding state sequence Q = q1q2q3…qt which is optimal in some meaningful sense? • How do we adjust the model parameters to maximize P(O | m)?

Structure Discovery • A serious problem related to the deployment of HMMs involves how to specify or learn the HMM model structure • Matthew Brand has proposed a method based on entropy to learn an “optimal” model structure • We might look at identifying a general way to learn the model structure in a simpler fashion, independent of the HMM type, since this will be used in not just a “lab” setting • I am presently looking into using Evolutionary Computing (EC) techniques to evolve and learn the HMM structure automatically • The difference would be related to the “compression” aspect and the few number of observations samples Brand claims works

S1 S1 S4 S2 S4 S3 S2 S3 S1 S2 S1 S3 S2 S1 S3 S2 S1 S2 S3 Generation t+1 Generation t EP Overview F(Pi) Selection F(Pi) {P1, P2, P3, O1, O2, O3} Mutation S1 S4 S2 S3 F(Pi) S1 S2 S3 S1 F(Oi) F(Oi) HMM F(Oi) Generation t

Walk before we start running • Initially • Test how well the procedure works on a fully connected DOHMM when we only mutate the states (add and remove operators) • Test a few different measures of complexity (the different fitness functions) • Each chromosome in a generation acts like a seed to the next iterations Baum-Welch algorithm • Later • Consider a more complicated MDCOHMM model • Try to derive a series of equations and mutation operators that can take an initial population estimated by the Baum-Welch and evolve what was found (I believe that this would be a completely new technique)

Matthew Brands Approach • The principle of maximum likelihood is not valid for small data sets, the training is rarely enough to wash out the sampling artifacts (i.e. noise) • He also leaves out the obvious, related to if we have enough observations to estimate all the different parameters in the network (the degrees of freedom) • We may only have a few number of observations with a few “reflective” sub-observation sequences • He advocates replacing the Baum-Welch formulae with parameter estimators based that minimize entropy • Claim is that this exploits the duality between learning and compression

Entropy Minimization

First Setup • Variety of activity, from picking up the phone (a few seconds) to activities such as writing (could take up to hours) • Used a “blob” representation consisting of ellipse parameters fitting the single largest connected set of active pixels • Background subtraction through identifying a statistical model of the background and an adaptive Gaussian color/location model (pixels that have changed and others due to motion) • Cleaned up the “blob” through dilation (he makes reference to using a seed from the previous frame) • Observation vector uses high level geometric features, calculated from the mean and eigenvectors of a 2D Gaussian fitted to the foreground pixels • 30 minutes of data taken at random • removed frames when no one is in the video • roughly 21 minutes after this

Training • Only three sequences used for training • Varied from 100 to 1,900 frames in length • # states = {12, 16, 20, 25, and 30}

Procedure 1: Model Activity

Procedure 2: Monitoring Traffic

Monitoring Simultaneous Processes • HMMs traditionally are used to model a single hidden process • Brand modified (don’t know if he is the first, he claims this is novel) HMMs to take a varying number of observations per time step • The new image representation is a variable length list of flow vectors between two subsequent images • Flow vectors that are smaller than some predefined threshold are disregarded • The model learns the typical locations and directions of the moving pixels, and the dynamic changes of these patterns

Internals • Brand uses a modified version of a multivariate Gaussian mixture model • He deals with multiple observations per time step by treating each frame’s flow-list as an observation sequence for a mixture model at one time step

multi-observation-mixture+counter (MOMC) HMM • First term is a distribution on the obv count • The mixture Gaussians are 4D observing flow vectors in (x,y,dx,dy) space • The mixture components model motion in particular directions and locations • The counter variable essentially models the combined surface area of the moving objects

Any Questions?

HMM Links • Hidden Markov Models (General Introductions) • http://uirvli.ai.uiuc.edu/dugad/hmm_tut.html • http://www.cse.ucsc.edu/research/compbio/html_format_papers/hughkrogh96/cabios.html • Baum-Welch algorithm and the EM (Simpler math derivation) • (Bilmes) http://citeseer.ist.psu.edu/bilmes98gentle.html • Entropic Hidden Markov Models (Matthew Brand) • Discovery and Segmentation of Activities in Video (IEEE Transactions on pattern analysis and machine intelligence, Vol 22, No. 8, Aug 2000) • Fuzzy Hidden Markov Models (Gader and Mohammed) • Generalized Hidden Markov Models – Part I: Theoretical Frameworks (IEEE Transactions on Fuzzy Systems, Vol 8, No 1, Feb 2000)

Activity Analysis in Video

Activity Analysis in Video

Presentation Transcript

Activity 37 Analysis

VIDEO QUIZ SIMPLE MACHINE ACTIVITY

MUSIC VIDEO ANALYSIS

MUSIC VIDEO ANALYSIS

Music video analysis

Music Video Analysis

Soccer Video Analysis

Music Video Analysis

Rap Video Analysis

Unusual Activity in Video

Music Video Analysis

Video Analysis

MUSIC VIDEO ANALYSIS

Human Activity Analysis

Video Analysis in Physical Education

Gask Video Analysis

Activity 78 Analysis

SAMPLE ACTIVITY 1 ( Video )

Activity 34 Analysis

Activity Analysis of Sign Language Video

Activity 38 Analysis

Gask Video Analysis