1 / 46

Human-Centered Computing

Frank Shipman Professor, Department of Computer Science and Engineering Associate Director, Center for the Study of Digital Libraries Texas A&M University. Human-Centered Computing. Outline. Short discussion of research area Supporting access to sign language video

tave
Télécharger la présentation

Human-Centered Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Frank Shipman Professor, Department of Computer Science and Engineering Associate Director, Center for the Study of Digital Libraries Texas A&M University Human-Centered Computing

  2. Outline • Short discussion of research area • Supporting access to sign language video • Observations of potential user community causes redefinition of the problem • Multi-application user interest modeling • Iterative design moving from concept to relatively complete system

  3. Research “Area” • Many interests • Multimedia • New Media • Computers and Education • Computers and Design • Software Engineering • Computer-Supported Cooperative Work • Human-Computer Interaction • Knowledge-Based Systems Best descriptions I have come up with: • Cooperative Problem Solving Systems • Systems where humans & computers cooperatively solve problems (humans are part of overall system) • Intelligent User Interfaces • Interactive systems that process information in non-trivial ways AI IR HCI MM

  4. What is human-centered computing? • Developing software or computational techniques with a deep understanding of the human activities they will support • Implications • Most often need to study the human activity before designing the software • Design may be (likely will be) a cooperative problem solving system rather than a software system

  5. Cooperative Problem Solving System • What is a cooperative problem solving system? • A system that includes human and software components to perform a task or solve a problem • Implications • Take advantage of the asymmetry of partners in system design • Evaluation of overall system involves humans

  6. First Example: Supporting Access toSign Language Video

  7. Sharing Sign Language Video • Opportunity • Cameras in laptops and attached to computers enable easy capture of sign language video • Video sharing sites (e.g. YouTube) allow the publication of such expressions • Practice • Pointers to the videos are passed around in other media (e.g. email, Facebook) • Some sites specifically support the sign language community

  8. Sharing Sign Language Video • Locating a sign language video on a particular topic is still difficult • The community-specific sites have limited collections • People must upload to the site or • Must add a pointer for each video to the site • Locating desired videos within the large video sharing sites rely on metadata (e.g. tags) • Tags must be accurately applied indicating both the language and the topic

  9. How Good is Text-based Search? • Search for sign language discussions of the top 10 news queries for 2011 from Yahoo! • Queries performed with the addition of “ASL” and “sign language”

  10. Why Tags Are Not Enough • Consider results from the first page of results for the query “sign language” • Tags are ambiguous • In sign language vs. about sign language • Different meanings of sign language • Sign language as a song title Duarte, Gutierrez-Osuna, and Shipman, Texas A&M University

  11. Automatic Identification of SL Video • Our approach is to develop a technique that can automatically identify if a video is in sign language • To run on a site the size of YouTube • Should be accurate enough to be run without human verification of results • Should be efficient enough to be run during video upload without significant extra resources

  12. What is Sign Language Video • We decided to scope the problem by focusing on the equivalent of sign language documents • Recorded by an individual with the intent of being watched • What we are not trying to identify (yet) • Videos of sign language conversations • Sign language translations

  13. Related and Prior Work • Work on sign language recognition • Recognizing what is being said in sign language • Often assumes the video is in sign language • Too heavyweight for our purpose • Detecting sign language • Recognizing when a person starts signing for more efficient resource utilization • Not designed to work on likely false positives

  14. Designing a SL-Video Classifier • Our classifier • processes a randomly selected 1 minute segment from the middle of the video • returns a yes/no decision being a SL video • Design method • Use standard video processing techniques • Five video features selected based on their expected relation to SL video • Test classifiers provided with one or more of the features

  15. Video Processing • Background Modeling • Convert to greyscale • Dynamic model (to cope with changes in signer body position and lighting) • BPt = .96 * BP(t-1) + .04 P • Foreground object detection • Pixels different from background model by more than a threshold are foreground pixels • Spatial filter removes regions of foreground pixels smaller than a minimum threshold • Face location to determine position of foreground relative to the face • Videos without a single main face are not considered as potential SL videos

  16. Five Visual Features • VF1: overall amount of activity • VF2: distribution of activity in camera view • VF3: rate of change in activity • VF4: symmetry of motion • VF5: non-facial movement • SVM classifier worked best

  17. Corpus for Evaluation • Created corpus of 98 SL videos and 94 likely false positive (non-SL) videos • Majority of non-SL videos were likely false positives based on visual analysis • Person facing camera moving their hands and arms (e.g. gesturing presenter, weather forecaster) • Small number of non-SL videos were selected were false positives based on tag search • Number kept small because these are likely easier than the others to detect

  18. Evaluation Method • Common method for testing classifier • Each classifier tested on 1000 executions in each context • Randomly select training and testing sets each execution • Metrics • Precision – % of SL videos classified as SL videos that really are SL videos • Recall – % of SL videos correctly classified as SL videos • F1 score – harmonic mean of precision and recall

  19. Overall Results • All five features, varying size of training set • While larger training sets improve recall the effect is fairly small • Later results are with 15 training videos/class.

  20. All But One Feature • Comparing the results when one feature is removed from the classifier • Removing VF4 (symmetry of motion) has the largest effect meaning it has the most useful information not found in the other features

  21. Only One Feature • Comparing the results when only one feature is provided to the classifier • Again, VF4 (symmetry of motion) has the most valuable information • VF4 alone does better than the other four features combined

  22. Discussion of Failures (False Positives) • Our non-SL videos were chosen to be hard • Precision of ~80% means about one in five videos identified as sign language was really one of these • Performance on the typical video sharing site would be much better because most non-SL videos would be easy to classify • We are happy with this performance

  23. Discussion of Failures (False Negatives) • Examining the SL videos not recognized by the classifier • Some failures were due to signers frequently turning away from the camera • Others were due to the background being similar in color to the signer’s skin tone • Still others were due to movement in the background • Backing off our requirements for the signer to face the camera and improving our background model would help in many of these cases

  24. HCC Conclusions • Examined current practice to determine need for system • Identified new problem of locating SL videos • Quantified the difficulty with existing tools • Developed method • Tested with real world data • Future work • Deploy system to test if it meets the need

  25. Example 2: Multi-Application User Interest Modeling

  26. Task: Information Triage • Many tasks involve selecting and reading more than one document at once • Information triage places different demands on attention than single-document reading activities • Continuum of types of reading: • working in overview (metadata), • reading at various levels of depth (skimming), • reading intensively • How can we bring user’s attention to content they will find valuable?

  27. User Interest Modeling • User model – a system’s representation of characteristics of its user • Generally used to adapt/personalize system • Can be preferences, accessibility issues, etc. • User interest model – a representation of the user’s interests • Motivation: information overload • History: many of the concepts found in work on information filtering (early 1990s)

  28. Interest Modeling for Information Triage • Prior interest models tend to assume one application • Example: browser observing page views and time on page • Multiple applications are involved in information triage (searching, reading, and organizing) • When applications do share a user model, it is with regard to a well-known domain model • Example: knowledge models shared by educational applications • Not possible since triage deals with decisions about relative value among documents of likely value

  29. Acquiring User Interest Model • Explicit Methods • users tend not to provide explicit feedback • long tail assumptions not applicable • Implicit Methods • Reading time has been used in many cases • Scrolling and mouse events have been shown somewhat predictive • Annotations have been used to identify passages of interest • Problem: Individuals vary greatly and have idiosyncratic work practices

  30. Potential Value?: A First Study • Study designed to look at: • deciding what to keep • expressing an initial view of relationships • Part of a larger study: • 8 subjects in role of a reference librarian, selecting and organizing information on ethnomathematics for a teacher • Setting: top 20 search results from NSDL & top 20 search results from Google presented in VKB 2 • Subjects used VKB 2 to organize and Web browser to read • After task, subjects were asked to identify: • 5 documents they found most valuable • 5 documents they found least valuable

  31. Many User Actions Anticipate Document Assessment Correlated actions (p < .01) (from most to least correlated) • Number of object moves • Scroll offset • Number of scrolls • Number of border color changes • Number of object resizes • Total number of scroll groups • Number of scrolling direction changes • Number of background color changes • Time spent in document • Number of border width changes • Number of object deletions • Number of document accesses • Length of document in characters Blue – from VKB White – from browser

  32. Interest Models Based on the data from first study, we developed four interest models • Three were mathematically derived • Reading-Activity Model • Organizing-Activity Model • Combined Model • One hand-tuned model included human assessment based on observations of user activity and interviews with users.

  33. Evaluation of Models • 16 subjects with same: • Task (collecting information on ethnomathmatics for teacher) and • Setting (20 NSDL and 20 Google results) • Different rating of documents • Subjects rated all documents on a 5-point Likert scale (with 1 meaning “not useful” and 5 meaning “very useful”)

  34. Predictive Power of Models • Models limited due to data from original study • Used aggregated user activity and user evaluations to evaluate models Lower residue indicates better predictions • Combined model better than reading-activity model (p=0.02) and organizing-activity model (p=0.07) Model Avg. Residue Std. Dev. Reading-activity model 0.258 0.192 Organizing-activity model 0.216 0.146 Combined model 0.175 0.138 Hand-tuned model 0.197 0.134

  35. Reading Application Interest User Interest Profile Estimation Engine Manager Reading Application Reading Application Organizing Location/Overview Interest Profile Application Application Architecture for Interest Modeling • Results of study motivated development of infrastructure for multi-application interest modeling

  36. New Tools: VKB 3 • New Document Object • User expression via coloring document objects’ user layer leads to user interests • System layer used to indicate documents’ relations to inferred interests MainLayer SystemLayer

  37. New Tools: WebAnnotate

  38. Annotation-based Visualizations

  39. Evaluation of the New Design • 20 subjects organized 40 documents about “antimatter” returned by Yahoo! search • Subjects assessed the relevance of each document at the end of the task • 10 with and 10 withoutsuggestions/thumbnails • Measured • Task switching • Time on documents

  40. Results • Task Switching • Fewer but longer reading sessions with new interface • Average reading time • 10.7 seconds with new features • 4.3 seconds without • p < 0.0001 • Interpretation: People are doing more in-depth reading

  41. Results • Document Attention • 6 of 10 subjects with new interface had correlations between reading time and document value • Only 2 subjects with old interface had significant correlations • Interpretation: New interface users located and spent more time on documents of value to their task

  42. HCC Conclusions • Question simplifying assumptions • Recognized that users are engaged with multiple documents and multiple applications simultaneously • Iterate between design and user studies • Design software as an extensible environment enabling easier redesign • New system resulted in more in-depth reading and more time spent on relevant documents

  43. Broad View of Computer Science • Many really important problems require cooperative problem solving systems • Solutions that assume we can vary the behavior of only one of the computer and the user are less likely to succeed • Need CPS design, development, and evaluation skills • Recognize whether the problem is one of computation, representation, or interaction • You can be part of solving big problems

  44. Contact Information Email: shipman@cse.tamu.edu Web: www.csdl.tamu.edu/~shipman

More Related