360 likes | 821 Vues
WP 4: Context Aware Video Acquisition James L. Crowley Professeur I.N.P.Grenoble Project PRIMA, Laboratoire GRAVIR INRIA Rhône Alpes WP 4: Context Aware Video Acquisition Plan Situations and Actions for the Lecture Scenario Assembling a Process Federation
E N D
WP 4: Context Aware Video Acquisition • James L. Crowley • Professeur I.N.P.Grenoble • Project PRIMA, Laboratoire GRAVIR • INRIA Rhône Alpes
WP 4: Context Aware Video Acquisition • Plan Situations and Actions for the Lecture Scenario Assembling a Process Federation Version 1.0 of the Intelligent Camera Man Current Work
FAME Augmented Meeting Environment • 5 Sony Steerable Cameras • Wide Angle Camera • Microphone Array • 3 Video Interaction Devices: • Vertical • Horizontal • Steerable
WP 4: Context Aware Video Acquisition • Plan Situations and Actions for the Lecture Scenario • Process Federation for CA Video Acquistion v1.0 • Task, Situation and Context • Compiling the Situation Graph Assembling a Process Federation Version 1.0 of the Intelligent Camera Man Current Work
Supervisory Process Event Bus Agent Tracker Agent Tracker Agent Tracker Speech Detection Speech Location Camera 1 Camera 1 Camera 1 Microphone Array Camera Control Federation
Task, Situation and Context • Situation: An configuration of roles and relations. • Role: Interpretation of an entity or agent • Relation: A predicate over entities and agents • A task model describes the state space of situations • and the actions of the system for each situation • Approach: Compile a federation of processes to observe the agents and entities that define situations.
Task, Situation and Context • Basic Concepts: • Property: Any value observed by a process • Entity: A “correlated” set of properties • Composite entity: A composition of entities • Relation: A predicate defined over entities • Agent: An entity that can act. • Situation: A configuration of roles and relations. • Context: A network of situations
Context Aware Video Acquisition • Design Method: 1) Define actions to be taken by system 2) Define situations for each action 3) Define roles and relations 4) Define observation processs 5) Compile situation graph into supervisor rules.
Actions to be taken by Context Aware Video Acquisition System v1.0 • Record Shots: • A1 Record wide angle view of the scene • A2 Record the speaker • A3 Record the audience
Camera 3 Camera 1 Camera 2 Camera Angles
Situations for the Context Aware Video Acquisition System • Situations: S0 empty room Æ A1 S1 Actor enters the room Æ A1 S2 Speaker (actor) speaks Æ A2 S3 Audience (actor) asks a question Æ A3
JESS (CLIPS in Java) Events Events Process1 Process 2 Process 3 Data Properties Process Federation Tool • JESS (CLIPS in Java) Environment sends messages to processes
Define the roles and relations for Context Aware Video Acquisition System • Roles and relations for camera controller • R1: Agent in audience asks a question • (Agent in audience is speaking) • R2: Lecturer (Agent at lecture-position) • R3: Arriver (Agent at door) • R4: Audience (Agents in audience region) • R5: Speaker (Agent currently speaking)
Compiling Situations to Rules XML: <role name="lecturer" arity="1"> <description> The person giving a lecture </description> </role> SituationsPetri NetXML Description->Rules
Compiling Situations to Rules CLIPS: (defrule t2EventTransition ?tr <- (transitionTrigger (name "t2") (lock ?l&:(neq ?l 0))) ?pre_S1 <- (situation (name "S1") (entities ?newComer)) (situationPlace (name "S1") (mark ?m_S1&:(neq ?m_S1 0))) (or (not (newComer (isPlayedBy ?newComer))) ) (lecturer (isPlayedBy ?new_lecturer)) (speaker (isPlayedBy ?new_speaker)) (isSameAs (isVerifiedBy ?new_speaker ?new_lecturer)) ?post_S2 <- (situation (name "S2")) => (modify ?tr (lock (- ?l 1))) (assert (event (name "t2"))) (modify ?pre_S1 (entities)) (modify ?post_S2 (entities ?new_lecturer ?new_speaker)) ) SituationsPetri NetXML Description->Rules
WP 4: Context Aware Video Acquisition • Plan Situations and Actions for the Lecture Scenario Assembling a Process Federation • Perceptual Processes • Tracking Bodies Hands and Faces • Recognizing and Locating Speech Sounds Version 1.0 of the Intelligent Camera Man Current Work
Supervisory Process Event Bus Agent Tracker Agent Tracker Agent Tracker Speech Detection Speech Location Camera 1 Camera 1 Camera 1 Microphone Array Video Acquisition Process Federation
Processs to observe agents, entities and relations • P1: Supervisory Controller • P2: Visual Tracking Process for agents with Camera 1 (Wide Angle camera) • P3: Visual Tracking Process for agents with Camera 2 (Audience region) • P4: Visual Tracking Process for agents with Camera 3 (lecturer region) • P5: Speech preprocessing and detection • P6: Speech position estimation.
Agent Detection and Tracking Process • Observation Modules: • Color Histogram Ratio • Background Difference • Receptive Field Histograms • Motion History Image
Agent Detection and Tracking Process • Process Phases: • While True Do • Acquire next image • Calculate ROI for targets • Verify and update targets • Detect new targets • Regulate module parameters • Interpret entities • Process messages
Agent Detection and Tracking • Actors: Composite Entities. • Entity Tracker: Background difference, motion and color • Entity Grouper: Assigns roles to blobs as body, hands, face or eyes
Software bus Audio Localisation Audio Router Synchronized audio channels Channel 1 Time Difference Of Arrivals (TDOA) Geometric coordinates evaluation (4 microphones => 3D localisation) Channel 2 Voice activity detection : - energy analysis - speech signal Recognition Channel 3 TCP/IP Client Channel 4 TCP/IP server Speech recognition Channel n • Audio preprocessing : • hardware offset • echo cancellation TCP/IP Client Acoustic Perception Processes
supervisory Controller Receive : - configuration messages - commands Send : - position of video targets Software bus Send : - speech activity messages - position of audio targets Receive : - configuration messages - position of video targets Send : - configuration messages Receive : - position of video targets - position of audio localisation targets - Speech activity Send : speech activity messages Receive : configuration messages Audio Router Audio Localisation Context tracker Acoustic Process Federation
Multi channel Acoustic Server Process Channel selection Speech Waveform and Spectrogram Remove soundcard offset Usage Adaptive cepstral echo cancellation Final voice activity detection (using energy and neural net results) Temporal energy analysis Voice detection
Microphones Room map Current target Software bus status connection status Processing flag Configuration Confidence Acoustic Position Estimation
WP 4: Context Aware Video Acquisition • Plan Situations and Actions for the Lecture Scenario Assembling a Process Federation Version 1.0 of the Intelligent Camera Man Current Work
WP 4: Context Aware Video Acquisition • Plan Situations and Actions for the Lecture Scenario Assembling a Process Federation Version 1.0 of the Intelligent Camera Man Current Work • Adding Camera Pan-Tilt Control • Adding New cameras • Estimating Face Orientation • Integration of Topic Spotter
WP 4: Context Aware Video Acquisition • Plan Situations and Actions for the Lecture Scenario Assembling a Process Federation Version 1.0 of the Intelligent Camera Man Current Work
WP 4: Context Aware Video Acquisition • Plan Situations and Actions for the Lecture Scenario Assembling a Process Federation Version 1.0 of the Intelligent Camera Man Current Work
WP 4: Context Aware Video Acquisition • PRIMA Group, Laboratoire GRAVIR (UMR) • INPG (P2), INRIA (P8), UJF (P9), CNRS(P10) • Personnel Contributing during period: • James L. Crowley (Prof. INPG) • Augustin Lux (Prof INPG) • Patrick Reignier (MdC UJF) • Dominique Vaufreydaz (Post Doc INPG) • Alban Caparossi (Engineer UJF) • Stan Borkowski (Doctoral Student, INPG) • Hai Tranh (Doctoral Student, INPG) • Nicolas Gourier (Doctoral Student, INRIA)