400 likes | 676 Vues
Smart Meeting Systems. Josh Reilly. Why are Smart Meeting Systems worth studying?. Objectives of a Smart Meeting System. Improves the productivity of a team by automating the: Capture of the meeting Processing of the meeting for valuable information
E N D
Smart Meeting Systems Josh Reilly
Objectives of a Smart Meeting System • Improves the productivity of a team by automating the: • Capture of the meeting • Processing of the meeting for valuable information • Displaying of that information accurately and effectively to the end user through a client application
Organization of Smart Meeting System Processes A smart meeting system can be decomposed into three sets of processes • Meeting Capture • Meeting Recognition • Semantic Processing
Meeting Capture • Gathering raw inputs from the meeting • Video Capture • Audio Capture • Other Context
Video Capture • Video feeds from: • Cameras for the attendees • Could use a single static camera • Could use a single camera with pan, tilt, zoom (PTZ) capabilities • Recommend camera view of every contributor's face • Visual Aids • Separate camera • Digital feed from device
Microsoft Distributed Meetings ProjectVideo Capture RingCam • Array of 90º Cameras • 360º Panoramic view
Audio Capture • Use an array of microphones • Placed on the table • Placed on the ceiling • Worn on the person • Levels need to be controlled so that they are similar levels for each contributor
Microsoft Distributed Meetings Project Audio Capture • RingCam • Has an array of microphones on its base.
Other Context Capture • RFID to track attendees • Attendees swipe their RFID cards when they enter the meeting to add their ID to the list of people attending this meeting • Motion Detectors • to track the locations of attendees within the room
Meeting Recognition • The processing of the raw capture before it is organized into something useful • Steps: • Person Identification • Attention Detection • Activity Recognition • Hot Spot Recognition • Summarization
Person Identification • Person Identification is associating sections of video, audio, and the visual aids that were captured from the meeting with the attendee(s) that they belong to • Face Recognition • Face Tracking • Speech Recognition • SSL • Beamforming
Person IdentificationFace Recognition • Facial Recognition • Identify the person speaking from a list of attendees • Eigenface Approach • Challenges • Poor Quality Images • Poor Room Lighting • Continuously changing facial expressions • Occlusion
Face RecognitionThe Eigenface Approach • All faces are assumed to be made up of different percentages of different eigenfaces • A set of eigenfaces is a set of very generalized pictures of faces that were generated so that each has a basic ingredient that can be used to make a face Eigenfaces from AT&T Laboratories Cambridge
Person IdentificationSpeech Recognition • Speech Recognition • Match the voice of the person speaking to someone on the list of attendees • Using Voice recognition in conjunction with face recognition allows for an accurate identification of the speaker • Sound Source Localization (SSL) • Used to determine which camera is pointed at the speaker • Could be used to point PTZ camera • Beamforming
Person IdentificationWriter Recognition • Writer Recognition • When someone writes on the whiteboard, they may not be in clear view of the cameras • Writing recognition algorithms can be used to identify who wrote what during a meeting
Attention Detection • Attention Detection • Attempt to determine who is looking at whom during a meeting. • Provides information used for activity recognition and hot spot recognition • Done using: • Hidden Markov Models (HMM) • Sound Source Localization (SSL) • Known layout of room
Activity Recognition • Determine what is happening during the meeting • Step 1: • Determine what each individual is doing at each point during the meeting • Person Identification, Attention Detection, SSL, Gesture Recognition • Step 2: • Take that information to determine what activity the entire group is engaging in at each point during the meeting
Hot Spot Recognition • Find the important parts of the meeting • Using sound queues • Ex: Changes in pitch • Using activity recognition • When people are nodding • When their focus changes
Summarization • Takes all of the information that the smart meeting system has learned about the meeting and creates a quick overview of the events that took place during that meeting. • This information will be used in the semantic processing stage
Semantic Processing • Takes the information from the meeting recognition step and makes it usable by the end user. • Meeting Annotation • Meeting Indexing • Meeting Browsing
Meeting Annotation • Describe the raw data from the meeting from each viewpoint • Attempt to label all meeting segments • Implicitly • Automatically • Explicitly • By Hand
Meeting AnnotationImplicit • Automated Annotation • Assumes that the meeting recognition processes performed with relatively high efficiency • Tags every person in the video • Narrates what was happening during the meeting • Has not been achieved
Meeting AnnotationExplicit • Annotation By Hand • When the recognition processes fail to gather sufficient correct information about the raw data • Users will have to go through the meeting and tag the people attending as well as indicate what events are happening all through the meeting
Meeting Indexing • Indexing is done at all levels of data from a raw audio feed to the annotations • The best form of indexing to use is the event-based indexing • An index is created every time an event occurs • This is the best way for users to find a specific spot in the meeting when performing a query
Meeting Browsing • The interface that the end user uses to retrieve information from the meetings • Functions: • Can browse/search a list of all meetings for a specific meeting • Can browse/search the contents of the chosen meeting • Aided by tools like bookmarks, a meeting outline, and queries (content, people, camera angles, visual aids, etc...)
Remote Attendee • Use the smart meeting system as the attendee's eyes and ears • Microsoft's PING project • Uses a monitor and speaker to display the remote attendee's voice and audio during the meeting • However, the remote attendee is often ignored
Carnegie Mellon University’sMeeting System Architecture Lacks • Activity Recognition • Hot Spot Recognition • Annotations
University of California, San DiegoAVIARY System Architecture • 2 PCs • 4 Static Cameras • 4 PTZ Cameras • No SSL
Technology Limitations • Speech recognition and facial recognition algorithms are not yet as efficient as they should be in order for a smart meeting system to perform accurately
Workspace Limitations • Cameras and microphones can block view, distract, or intimidate attendees during the meeting • Security and Privacy needs to be addressed
References [1] Zhiwen Yu and Yuichi Nakamura. 2010. Smart meeting systems: A survey of state-of-the-art and open issues. ACM Comput. Surv. 42, 2, Article 8 (March 2010), 20 pages. DOI=10.1145/1667062.1667065 http://doi.acm.org/10.1145/1667062.1667065 [2] Ross Cutler , Yong Rui , Anoop Gupta , Jj Cadiz , Ivan Tashev , Li-wei He , Alex Colburn , Zhengyou Zhang , Zicheng Liu , Steve Silverberg. (2002). Distributed Meetings. A Meeting Capture and Broadcasting System. 10 pages. http://research.microsoft.com/en-us/um/people/yongrui/ps/mm02.pdf [3] Harold Fox. 2004. The eFacilitator: A Meeting Capture Application and Infrastructure. 89 pages. http://hdl.handle.net/1721.1/17672 [4] Yong Rui, Eric Rudolph, Li-wei He, Rico Malvar, Michael Cohen, Ivan Tashev. 2006. Ping: A Group-To-Individual Distributed meeting System. 4 pages. http://research.microsoft.com/apps/pubs/default.aspx?id=76779 • [5] Dar-Shyang Lee, Berna Erol, Jamey Graham, Jonathan Hull, Norihiko Murata. 2011. Portable Meeting Recorder. 10 pages. http://rii.ricoh.com/sites/default/files/Portable_Meeting_Recorder.pdf