1 / 35

Human Detection under Partial Occlusions using Markov Logic Networks

Human Detection under Partial Occlusions using Markov Logic Networks. Raghuraman Gopalan and William Schwartz Center for Automation Research University of Maryland, College Park. Human Detection. Human Detection. Holistic window-based: Dalal and Triggs CVPR (2005) Tuzel et al CVPR (2007)

phiala
Télécharger la présentation

Human Detection under Partial Occlusions using Markov Logic Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Human Detection under Partial Occlusions using Markov Logic Networks Raghuraman Gopalan and William Schwartz Center for Automation Research University of Maryland, College Park

  2. Human Detection

  3. Human Detection • Holistic window-based: • Dalal and Triggs CVPR (2005) • Tuzel et al CVPR (2007) • Part-based: • Wu and Nevatia ICCV(2005) • Mikolajczyk et al ECCV (2004) • Scene-related cues: • Torralba et al IJCV (2006)

  4. The occlusion challenge 0.1452 0.1272 0.0059* 0.0816 Body parts occluded by objects Person occluded by another person * Probability of presence of a human obtained from Schwartz et al ICCV (2009)

  5. Related work Integrating probability of human parts using first-order logic (FOL): Schwartz et al ICB (2009) Bilattice-based logical reasoning: Shet et al CVPR (2007)

  6. Our approach: Motivation A data-driven, part-based method • Probabilistic logical inference using Markov logic networks (MLN) [Domingos et al, Machine Learning (2006)] • Representing `semantic context’ between the detection probabilities of parts. • Within-window, and between-windows • With and without occlusions

  7. Multiple detection windows Final Result Queries: - person(d1)? - occluded(d1)? - occludedby(d1,d2)? Our approach: An overview Part detector’s outputs Face detector outputs Learning contextual rules Instantiation of the MLN Inference

  8. Main questions • How to integrate detector’s outputs to detect people under occlusion? • Enforce consistency according to spatial location of detectors → removal of false alarms. • Exploit relations between persons to solve inconsistencies → explain occlusions. • Both using MLN, which combines FOL and graphical models in a single representation → avoids contradictions.

  9. Multiple detection windows Final Result Queries: - person(d1)? - occluded(d1)? - occludedby(d1,d2)? Our approach: An overview Part detector’s outputs Face detector outputs Learning contextual rules Instantiation of the MLN Inference

  10. Part-based detectors • To handle human detection under occlusion, our original detector is split into parts, then MLN is used to integrate their outputs. top top-torso torso torso-legs original legs top-legs

  11. Detector – An overview • Exploit the use of more representative features to provide richer set of descriptors to improve detection results – edges, textures, and color. • Consequences of the feature augmentation: • extremely high dimensional feature space (>170,000) • number of samples in the training dataset is smaller than the dimensionality • These characteristics prevent the use of classical machine learning such as SVM, but make an ideal setting for Partial Least Squares (PLS)*. * H. Wold, Partial Least Squares, Encyclopedia of statistical sciences, 6:581-591 (1985)

  12. Detection using PLS • PLS models relations between predictors variables in matrix X (n x p) and response variables in vector y (n x 1), where n denotes number of samples, p the number of features. • T, U are (n x h) matrices of h extracted latent vectors. P (p x h) and q (1 x h) represent the matrices loadings and E (n x p) and f (n x 1) are the residuals of X and Y, respectively. • PLS method NIPALS (nonlinear iterative partial least squares) finds the set of weight vectors W(px h)={w1,w2,….wh} such that

  13. Multiple detection windows Final Result Queries: - person(d1)? - occluded(d1)? - occludedby(d1,d2)? Our approach: An overview Part detector’s outputs Face detector outputs Learning contextual rules Instantiation of the MLN Inference

  14. Context: Consistency between the detector outputs • Each detector acts in a specific region of the body. One can look at the output of sensors acting in the same spatial location to check for consistency – similar responses are expected. Example: Given that top-torso detector outputs high probability, top and torso detectors need to output high probability as well since they intersect the region covered by top-torso. top-torso top torso topTorso(d1) ^ top(d1) ^ torso(d1) → person(d1) (consistent) topTorso(d1) ^ (¬top(d1) v ¬torso(d1)) → ¬person(d1) (false alarm) First order logic rules:

  15. Context: Understanding relationship between different windows • Low response given by a detector might be caused by a second detection window (a person may be occluding another and causing low response of the detectors). d2 First order logic rule: d1 intersect(d1,d2) ^ person(d1) ^ matching(d1,d2) → person(d2) ^ occluded(d2) ^ occludedby(d2,d1) matching(d1,d2) is true if: - Detectors at visible parts of d2 have high response. - detectors at occluded parts of d2 have low response while sensors located at the corresponding positions of d1 have high response. - d1, and d2 are persons - d1 and d2 intersect

  16. Multiple detection windows Final Result Queries: - person(d1)? - occluded(d1)? - occludedby(d1,d2)? Our approach: An overview Part detector’s outputs Face detector outputs Learning contextual rules Fi Instantiation of the MLN Inference

  17. 3. Inference using MLN* - The basic idea • A logical knowledge base (KB) is a set of hard constraints (Fi) on the set of possible worlds • Let’s make them soft constraints:When a world violates a formula,It becomes less probable, not impossible • Give each formula a weight (wi)(Higher weight  Stronger constraint) Contents of the next three slides are partially adapted from Markov Logic Networks tutorial by Domingos et al, ICML (2007)

  18. MLN – At a Glance • Logical language: First-order logic • Probabilistic language: Markov networks • Syntax: First-order formulas with weights • Semantics: Templates for Markov net features • Learning: • Parameters: Generative or discriminative • Structure: ILP with arbitrary clauses and MAP score • Inference: • MAP: Weighted satisfiability • Marginal: MCMC with moves proposed by SAT solver • Partial grounding + Lazy inference / Lifted inference

  19. MLN- Definition • A Markov Logic Network (MLN) is a set of pairs (Fi, wi) where • Fi is a formula in first-order logic • wi is a real number

  20. Example: Humans & Occlusions

  21. Example: Humans & Occlusions Two constants: Detection window 1 (D1) and Detection window 2 (D2) D1 D2

  22. Example: Humans & Occlusions Two constants: Detection window 1 (D1) and Detection window 2 (D2) One node for each grounding of each predicate in the MLN Human(D1) Human(D2) Parts(D1) Parts(D2)

  23. Example: Humans & Occlusions Two constants: Detection window 1 (D1) and Detection window 2 (D2) Occlusion(D1,D2) Occlusion(D1,D1) Human(D1) Human(D2) Occlusion(D2,D2) Parts(D1) Parts(D2) Occlusion(D2,D1)

  24. Example: Humans & Occlusions Two constants: Detection window 1 (D1) and Detection window 2 (D2) Occlusion(D1,D2) Occlusion(D1,D1) Human(D1) Human(D2) Occlusion(D2,D2) Parts(D1) Parts(D2) Occlusion(D2,D1) One feature for each grounding of each formula Fi in the MLN, with the corresponding weight wi

  25. Example: Humans & Occlusions Two constants: Detection window 1 (D1) and Detection window 2 (D2) Occlusion(D1,D2) Occlusion(D1,D1) Human(D1) Human(D2) Occlusion(D2,D2) Parts(D1) Parts(D2) Occlusion(D2,D1)

  26. Example: Humans & Occlusions Two constants: Detection window 1 (D1) and Detection window 2 (D2) Occlusion(D1,D2) Occlusion(D1,D1) Human(D1) Human(D2) Occlusion(D2,D2) Parts(D1) Parts(D2) Occlusion(D2,D1)

  27. Instantiation • MLN is template for ground Markov nets • Probability of a world x: • Learning of weights, and inference performed using the open-source Alchemy system [Domingos et al (2006)] No. of true groundings of formula Fi Weight of formula Fi

  28. Multiple detection windows Final Result Queries: - person(d1)? - occluded(d1)? - occludedby(d1,d2)? Our approach: An overview Part detector’s outputs Face detector outputs Learning contextual rules Instantiation of the MLN Inference

  29. Results

  30. Results

  31. Results

  32. Comparisons • Dataset details: • 200 images • 5 to 15 humans per image • Occluded humans ~ 35%

  33. Comparisons

  34. Conclusions • A data-driven approach to detect humans under occlusions • Modeling semantic context of detector probabilities across spatial locations • Probabilistic contextual inference using Markov logic networks • Question of interest: Integrating analytical models for occlusions and context with this data-driven method

  35. Questions ?

More Related