1 / 53

Hyeonsoo , Kang

Unsupervised Mining of Statistical Temporal Structures in Video. Hyeonsoo , Kang. ▫ Introduction. ▫ Structure of the algorithm. Model learning algorithm [Review HMM] Feature selection algorithm . ▫ Results. What is “supervised learning?”. What is “supervised learning?”.

iman
Télécharger la présentation

Hyeonsoo , Kang

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unsupervised Mining of Statistical Temporal Structures in Video Hyeonsoo, Kang

  2. ▫ Introduction ▫ Structure of the algorithm Model learning algorithm [Review HMM] Feature selection algorithm ▫ Results

  3. What is “supervised learning?”

  4. What is “supervised learning?”  It is the way of doing such that the algorithm designers manuallyidentifyimportant structures, collect labeled data for training, and apply tools to earn the classifiers

  5. Good Works for domain-specific problems at a small scale Burden of labeling and training Bad Cannot be readily extended to diverse new domains at a large scale.

  6. Good Works for domain-specific problems at a small scale Let’s aim at an automated method which works just fine for domain-specific problems but also flexible & scalable! Burden of labeling and training Bad Cannot be readily extended to diverse new domains at a large scale.

  7. Good Works for domain-specific problems at a small scale Let’s aim at an automated method which works just fine for domain-specific problems but also flexible & scalable! Burden of labeling and training Bad But is that possible …? Cannot be readily extended to diverse new domains at a large scale.

  8. Observations? A temporal sequence of nine shots, each shot is a second apart

  9. Similar color & movements A temporal sequence of nine shots, each shot is a second apart

  10. Observations? A temporal sequence of nine shots, each shot is a second apart

  11. Different color A temporal sequence of nine shots, each shot is a second apart

  12. Observations? A temporal sequence of nine shots, each shot is a second apart

  13. Different camera walk A temporal sequence of nine shots, each shot is a second apart

  14. Let’s focus on a particular domain of videos, such that Video structures is in a discrete state-space The features, i.e., observations from data are stochastic (small statistical variations on the raw features) The sequence is highly correlated in time

  15. Unsupervised learning approaches are chiefly twofold: (a) Model learning algorithm (b) Feature selection algorithm

  16. (a) Model learning algorithm Using a fixed feature set manually selected based on heuristics build a model of good performance to distinguish high-level structures of the given video (b) Feature selection algorithm Using both the model learning algorithm and the feature selection algorithm results a model and a set of features that distinguish high-level structures of the given video well

  17. (a) Model learning algorithm Using a fixed feature set manually selected based on heuristics build a model of good performance to distinguish high-level structures of the given video (b) Feature selection algorithm Using both the model learning algorithm and the feature selection algorithm results a model and a set of features that distinguish high-level structures of the given video well

  18. (a) Model learning algorithm Base line: uses a two level HHMM to model structures in video. 2. HHMM ::= Hierarchical Hidden Markov Model. Hierarchical Hidden Markov Model is a statistical model derived from the Hidden Markov Model (HMM). The HHMM utilizes its structure to solve a subset of the problems more efficiently, but can be transformed into a standard HMM. Therefore, the coverage of HHMM and HMM are the same, but their performance.

  19. (a) Model learning algorithm Base line: uses a two level HHMM to model structures in video. 2. HHMM ::= Hierarchical Hidden Markov Model. Hierarchical Hidden Markov Model is a statistical model derived from the Hidden Markov Model (HMM). The HHMM utilizes its structure to solve a subset of the problems more efficiently, but can be transformed into a standard HMM. Therefore, the coverage of HHMM and HMM are the same, but their performance. Wait, what is HMM then?

  20. [Quick Review: HMM] • Consider a simple 3-state Markov model of the weather. We assume that once a day (e.g., at noon), the weather is observed as being one of the following: • (S1) State 1: rain (or snow) • (S2) State 2: cloudy • (S3) State 3: sunny • We postulate that the weather on dayt is characterized by a single one of the three states above, and that the matrix A of state transition probabilities is • A = {aij} = • Given that the weather on day 1 (t = 1) is sunny (state 3), we can ask the question: What is the probability (according to the model) that the weather for the next 7 days will be “sunny-sunny-rain-sunny-cloudy-sunny- …?”

  21. [Quick Review: HMM] • Stated more formally, we define the observation sequence O as • O = {S3, S3, S3, S1, S1, S3, S2, S3} • “sunny-sunny-rain-sunny-cloudy-sunny- …?” • corresponding to t = 1, 2, …, 8, and we wish to determine the probability of O, given the model. • This probability can be expressed (and evaluated) as • P(O|Model) • = P[S3, S3, S3, S1, S1, S3, S2, S3 | Model]

  22. [Quick Review: HMM] • Stated more formally, we define the observation sequence O as • O = {S3, S3, S3, S1, S1, S3, S2, S3} • “sunny-sunny-rain-sunny-cloudy-sunny- …?” • corresponding to t = 1, 2, …, 8, and we wish to determine the probability of O, given the model. • This probability can be expressed (and evaluated) as • P(O|Model) • = P[S3, S3, S3, S1, S1, S3, S2, S3 | Model] • = P[S3]  P[S3|S3]  P[S3|S3]  P[S1|S3]  P[S1|S1] •  P[S3|S1] P[S2|S3]  P[S3|S2]

  23. [Quick Review: HMM] • Stated more formally, we define the observation sequence O as • O = {S3, S3, S3, S1, S1, S3, S2, S3} • “sunny-sunny-rain-sunny-cloudy-sunny- …?” • corresponding to t = 1, 2, …, 8, and we wish to determine the probability of O, given the model. • This probability can be expressed (and evaluated) as • P(O|Model) • = P[S3, S3, S3, S1, S1, S3, S2, S3 | Model] • = P[S3]  P[S3|S3]  P[S3|S3]  P[S1|S3]  P[S1|S1] •  P[S3|S1] P[S2|S3]  P[S3|S2] • =  a33  a33  a31  a11  a13  a32  a23 A = {aij} =

  24. [Quick Review: HMM] • Stated more formally, we define the observation sequence O as • O = {S3, S3, S3, S1, S1, S3, S2, S3} • “sunny-sunny-rain-sunny-cloudy-sunny- …?” • corresponding to t = 1, 2, …, 8, and we wish to determine the probability of O, given the model. • This probability can be expressed (and evaluated) as • P(O|Model) • = P[S3, S3, S3, S1, S1, S3, S2, S3 | Model] • = P[S3]  P[S3|S3]  P[S3|S3]  P[S1|S3]  P[S1|S1] •  P[S3|S1] P[S2|S3]  P[S3|S2] • =  a33  a33  a31  a11  a13  a32  a23 • = 1  (0.8)(0.8)(0.1)(0.4)(0.3)(0.1)(0.2)

  25. [Quick Review: HMM] • Stated more formally, we define the observation sequence O as • O = {S3, S3, S3, S1, S1, S3, S2, S3} • “sunny-sunny-rain-sunny-cloudy-sunny- …?” • corresponding to t = 1, 2, …, 8, and we wish to determine the probability of O, given the model. • This probability can be expressed (and evaluated) as • P(O|Model) • = P[S3, S3, S3, S1, S1, S3, S2, S3 | Model] • = P[S3]  P[S3|S3]  P[S3|S3]  P[S1|S3]  P[S1|S1] •  P[S3|S1] P[S2|S3]  P[S3|S2] • =  a33  a33  a31  a11  a13  a32  a23 • = 1  (0.8)(0.8)(0.1)(0.4)(0.3)(0.1)(0.2) • = 1.536 X 10-4 • Where we use the notation • = P[q1 = Si], 1 <= i <= N • to denote the initial state probabilities. MM Observable

  26. [Quick Review: HMM] • HiddenMarkov Model is not too different from the observable MM, just that each state now does not correspond to an observable (physical) event. •  For example, assume the following scenario. You are in a room with a curtain through which you cannot see what is happening. On the other side of the curtain is another person who is performing a coin (or multiple coins) tossing experiment. The other person will not tell you anything about what he is doing exactly; he will only tell you the result of each coin flip. • An HMM is characterized by the following: • N, the number of states in the model • M, the number of distinct observation symbols per state • The state transition probability distribution A = {aij} • The observation symbol probability distribution in state j, B = {bj(k)}, where • Bj(k) = P[vk att|qt = Sj], • 1 <= j <= N, 1 <= k <= M. • 5) The initial state distribution = {} where • = P[q1 = Si], • 1 <= i <= N.

  27. [Quick Review: HMM] HMM requires specification of two model parameters (N and M), specification of observation symbols, and the specification of the three probability measures, A, B, . Since N and M are implicit in other variables, we can use the compact notation

  28. (a) Model learning algorithm 1. Base line: uses HHMM 2. HHMM ::= Hierarchical Hidden Markov Model. Hierarchical Hidden Markov Model is a statistical model derived from the Hidden Markov Model (HMM). The HHMM utilizes its structure to solve a subset of the problems more efficiently, but can be transformed into a standard HMM. Therefore, the coverage of HHMM and HMM are the same, but their performance. Wait, what is HMM then?  Now, to build a HHMM model, we need to estimate parameters, as we have seen in HMM model,

  29. (a) Model learning algorithm • Wait, what is HMM then?  Now, to build a HHMM model, we need to estimate parameters, as we have seen in HMM model, • We model the recurring event in each video as HMMs, and the higher-level transitions between these events as another level of Markov chain. • This two-level HHMM: lower-level states represent variations that can occur within the same event (observations, i.e., measurements taken from the raw video, with mixture of Gaussian distribution) • Higher level structure elements usually correspond to • semantic events.

  30. An example of HHMM

  31. An example of HHMM rain sunny cloudy And lower nodes represent some variations …

  32. (a) Model learning algorithm • 3. To estimate parameters we use • Expectation Maximization (EM) algorithm • Bayesian Learning Techniques • Reverse-Jump Markov Chain Monte Carlo (RJ MCMC) • Bayesian Information Criteria (BIC)

  33. (a) Model learning algorithm • 3. To estimate parameters we use • Expectation Maximization (EM) algorithm • Bayesian Learning Techniques • Reverse-Jump Markov Chain Monte Carlo (RJ MCMC) • Bayesian Information Criteria (BIC) • Model parameters are updated using EM • Model structure learning uses MCMC; parameter learning for HHMM using EM is known to converge to a local maximum of the data likelihood since EM is an hill-climbing algorithm. – But searching for a global maximum in the likelihood landscape is intractable.  we adopt randomized search

  34. (a) Model learning algorithm • 3. To estimate parameters we use • Expectation Maximization (EM) algorithm • Bayesian Learning Techniques • Reverse-Jump Markov Chain Monte Carlo (RJ MCMC) • Bayesian Information Criteria (BIC) • Model parameters are updated using EM • Model structure learning uses MCMC; parameter learning for HHMM using EM is known to converge to a local maximum of the data likelihood since EM is an hill-climbing algorithm. – But searching for a global maximum in the likelihood landscape is intractable.  we adopt randomized search • However, I will not go through them one by one… if you are interested, you can find it in the paper: Xie, Lexing, et al. [1].

  35. (a) Model learning algorithm Using a fixed feature set manually selected based on heuristics build a model of good performance to distinguish high-level structures of the given video (b) Feature selection algorithm Using both the model learning algorithm and the feature selection algorithm results a model and a set of features that distinguish high-level structures of the given video well

  36. (a) Model learning algorithm Using a fixed feature set manually selected based on heuristics build a model of good performance to distinguish high-level structures of the given video (b) Feature selection algorithm Using both the model learning algorithm and the feature selection algorithm results a model and a set of features that distinguish high-level structures of the given video well

  37. Into what aspects does the feature selection can be divided and why?

  38. Into what aspects does the feature selection can be divided and why? • Feature selection is divided into two aspects: • Eliminating irrelevant features – usually irrelevant features disturb the classifier and degrade classification accuracy • Eliminating redundant ones– redundant features add to computational cost without bringing in new information.

  39. (b) Feature selection algorithm • 1. We use filter-wrapper methodsand wrapper step corresponds to eliminating irrelevant features, and filter step corresponds to eliminating redundant ones. • Wrapper step – partitions the feature pool into consistent groups • Filter step – eliminates redundant dimensions • 2. For example there are features like … • Dominant Color Ratio (DCR), Motion Intensity (MI), the least-square estimation of camera translation (MX, MY), and five audio features – Volume, Spectral roll-off (SR), Low-band energy (LE), High-band energy (HE), and Zero-crossing rate (ZCR)

  40. (b) Feature selection algorithm 3. Algorithm structure The big picture would be: Viterbi state sequence  information gain Markov blanket filtering HHMM BIC fitness

  41. (b) Feature selection algorithm 3. Algorithm structure The big picture would be: Viterbi state sequence  information gain Markov blanket filtering HHMM BIC fitness In detail:

  42. Experiments and Results For soccer videos, the main evaluation focused on distinguishing the two semantic evens, play and break (a) Model learning algorithm

  43. Experiments and Results For soccer videos, the main evaluation focused on distinguishing the two semantic evens, play and break (a) Model learning algorithm We use a fixed set of features manually selected on heuristics (dominant color ratio and motion intensity) (Xu et al., 2001; Xie et al., 2002b)

  44. Experiments and Results For soccer videos, the main evaluation focused on distinguishing the two semantic evens, play and break (a) Model learning algorithm • We use a fixed set of features manually selected on heuristics (dominant color ratio and motion intensity) (Xu et al., 2001; Xie et al., 2002b) • Built four different learning schemes against the ground truth: • Supervised HMM • Supervised HHMM • Unsupervised HHMM without model adaptation • Unsupervised HHMM with model adaptation

  45. Experiments and Results

  46. Experiments and Results For soccer videos, the main evaluation focused on distinguishing the two semantic evens, play and break (b) Feature selection algorithm • Based on the good performance of the model parameter and structure learning algorithm, we test the performance of the automatic feature selection method that iteratively wraps around, and filters. • A 9-dimensional feature vector sampled at every 0.1 seconds including: • Dominant Color Ratio (DCR), Motion Intensity (MI), the least-square estimation of camera translation (MX, MY), and five audio features – Volume, Spectral roll-off (SR), Low-band energy (LE), High-band energy (HE), and Zero-crossing rate (ZCR)

  47. Experiments and Results • Evaluation against the play/break labels showed a 74.8 % accuracy. • For clip Spain, the final selected feature set was {DCR, Volume}; with 74.8% accuracy • For clip Korea, the final selected feature set is {DCR, MX}; with 74.5% accuracy • [Testing on the baseball video] • Yielded three consistent compact feature groups: {HE, LE, ZCR}, {DCR, MX}, {Volume, SR} • Resulting segments have consistent perceptual properties, with one cluster of segments mostly corresponding to pitching shots and other field shots when the game is in play, while the other cluster contains most of the cutaways shots, score boards and game breaks, respectively.

  48. Summary With a specific domain of videos (sports; soccer and baseball), our unsupervised learning method can perform well. Our method was chiefly twofold, one was model learning algorithm and the other feature selection algorithm. In model learning algorithm, We used HHMM as the basic model and used other techniques such as Expectation Maximization (EM) algorithm, Bayesian Learning Techniques, Reverse-Jump Markov Chain Monte Carlo (RJ MCMC), and Bayesian Information Criteria (BIC) to set the parameters for the model. In feature selection algorithm, together with a model of good performance, we used filter-wrapper methods to eliminate irrelevant and redundant features.

  49. Questions 1. What is supervised learning?  2. What is the benefit of using unsupervised learning?  3. Into what aspects does the feature selection can divided and why? 

  50. Questions 1. What is supervised learning?   the algorithm designers manually identify important structures, collect labelled data for training, and apply supervised learning tools to learn the classifiers. 2. What is the benefit of using unsupervised learning?  (A) It alleviates the  burden of labelling and training. (B) also it provides a scalable solution for generalizing video indexing techniques. 3. Into what aspects does the feature selection can divided and why?  Feature selection: is divided into two aspects  (1) eliminating irrelevant features: Usually irrelevant features disturb the classifier and degrade classification accuracy (2) eliminating redundant ones: Redundant features add to computational cost without bringing in new information.

More Related