1 / 18

Framework for Inferring Ongoing Activities of Workstation Users

Framework for Inferring Ongoing Activities of Workstation Users. Yifen Huang, Sophie Wang and Tom Mitchell School of Computer Science Carnegie Mellon University. Activity Example: Learned Activity Frame from TM email corpus [1448 msgs, Feb 2004]. ActivityCluster4 (105 emails)

kaipo
Télécharger la présentation

Framework for Inferring Ongoing Activities of Workstation Users

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Framework for Inferring Ongoing Activities of Workstation Users Yifen Huang, Sophie Wang and Tom MitchellSchool of Computer ScienceCarnegie Mellon University

  2. Activity Example: Learned Activity Framefrom TM email corpus [1448 msgs, Feb 2004] • ActivityCluster4 (105 emails) • Keywords: CALO, TFC, SRI, examples, heads, labeled, Leslie, HMM, contacts, email, task, estimates, zero, reschedule, baseline, Rebecca • PrimarySenders:Mitchell(39), Kaelbling(7), McCallum(6), Perrault(4), • UserActivityFraction: 105/1448=.072 of total emails • IntensityOfUserInvolvement: created 37% of traffic; (default 31%) • ExtractedNames: Leslie(23), Rebecca(21), Carlos(12), Ray(10), Stuart(9), William(9), April(9), … • ExtractedDates: Wed(39), Tues(33), Fri(25), Mon(23), Thurs(20),… Feb 18 (16) • ExtractedTimes: 5pm(24), noon(14), morning(8), 8am(7), before 5pm(7),... • RequestEmails: <emailA>, <emailB>, …

  3. Activity Example: Learned Activity Framefrom TM email corpus [1448 msgs, Feb 2004] • ActivityCluster5 (105 emails) • Keywords: CALO, TFC, SRI, examples, heads, labeled, Leslie, HMM, contacts, email, task, estimates, zero, reschedule, baseline, Rebecca • PrimarySenders:Mitchell(39), Kaelbling(7), McCallum(6), Perrault(4), • UserActivityFraction: 105/1448=.072 of total email • IntensityOfUserInvolvement: created 37% of traffic; (default 31%) • ExtractedNames: Leslie(23), Rebecca(21), Carlos(12), Ray(10), Stuart(9), William(9), April(9), … • ExtractedDates: Wed(39), Tues(33), Fri(25), Mon(23), Thurs(20),… Feb 18 (16) • ExtractedTimes: 5pm(24), noon(14), morning(8), 8am(7), before 5pm(7),... • RequestEmails: <emailA>, <emailB>, …

  4. Activity Example: Learned Activity Framefrom TM email corpus [1448 msgs, Feb 2004] • ActivityCluster5 (105 emails) • Keywords: CALO, TFC, SRI, examples, heads, labeled, Leslie, HMM, contacts, email, task, estimates, zero, reschedule, baseline, Rebecca • PrimarySenders:Mitchell(39), Kaelbling(7), McCallum(6), Perrault(4), • UserActivityFraction: 105/1448=.072 of total email • IntensityOfUserInvolvement: created 37% of traffic; (default 31%) • ExtractedNames: Leslie(23), Rebecca(21), Carlos(12), Ray(10), Stuart(9), William(9), April(9), … • ExtractedDates: Wed(39), Tues(33), Fri(25), Mon(23), Thurs(20),… Feb 18 (16) • ExtractedTimes: 5pm(24), noon(14), morning(8), 8am(7), before 5pm(7),... • RequestEmails: <emailA>, <emailB>, …

  5. Activity Example: Learned Activity Framefrom TM email corpus [1448 msgs, Feb 2004] • ActivityCluster5 (105 emails) • Keywords: CALO, TFC, SRI, examples, heads, labeled, Leslie, HMM, contacts, email, task, estimates, zero, reschedule, baseline, Rebecca • PrimarySenders:Mitchell(39), Kaelbling(7), McCallum(6), Perrault(4), • UserActivityFraction: 105/1448=.072 of total email • IntensityOfUserInvolvement: created 37% of traffic; (default 31%) • ExtractedNames: Leslie(23), Rebecca(21), Carlos(12), Ray(10), Stuart(9), William(9), April(9), … • ExtractedDates: Wed(39), Tues(33), Fri(25), Mon(23), Thurs(20),… Feb 18 (16) • ExtractedTimes: 5pm(24), noon(14), morning(8), 8am(7), before 5pm(7),... • RequestEmails: <emailA>, <emailB>, … I need to get to DARPA by COB tomorrow a list of CALO participants who need access to the IPTO booth. It seems to me we should ask for this for any of you who is likely to be there. Could you let me know asap if you *might* be there? No big deal if you end up not going. THanks, --r

  6. Content • Inferring on-going activities by clustering, social network filtering and information extraction • Getting information from the whole workstation • Accepting user’s feedback • Future work

  7. Inferring Activities Using Emails Activity clustersand descriptions Clustering Social network filtering Information extraction

  8. Unsupervised Learning of Activities • Cluster emails • (Text) We use multi-nomial Naïve Bayes model and refine clusters by applying EM algorithm, • Represent email by bag of words in subject and body • (Socialnetwork) Subdivide each cluster based on graph of email co-recipients • Make each cliqueofco-recipients a subcluster • For each cluster, extractinformation from the email text and headers

  9. Web Activity Directories Calendar Email To: Bill@ cmu.edu Subj: fMRI meeting We need to meet soon to discuss the paper deadline. To: Sue @ cmu.edu Subj: Re: fMRI meeting Ok, I suggest Wednesday at 4pm. fMRI paper writing People: Sue, Bill Document: <fileptr> Meetings: Aug 24, Emails: 1423, 1644, Leader: Bill Deadline: Jan 15 To: Bill@ cmu.edu Subj: Re: fMRI meeting See you then. Attached is the current draft.

  10. fMRI paper writing People: Sue, Bill Document: <fileptr> Meetings: Aug 24, Emails: 1423, 1644, Leader: Bill Deadline: Jan 15 Web Activity Directories Calendar Email To: Bill@ cmu.edu Subj: fMRI meeting We need to meet soon to discuss the paper deadline. To: Sue @ cmu.edu Subj: Re: fMRI meeting Ok, I suggest Wednesday at 4pm. To: Bill@ cmu.edu Subj: Re: fMRI meeting See you then. Attached is the current draft.

  11. Getting Information fromthe Whole Workstation • Bag of word features for any queries using Google desktop search • We can produce feature vectors for meetings,person names, and project keywords. • Cluster initialization using project keywords • Co-clustering meetings and emails • Inferring any queries to activities

  12. Cluster Initialization Using Bag of Features of Project Keywordsfrom YH email corpus [623 msgs, 2004] DI: an improved version of random initialization (0.46) GI: bag of features from Google desktop search for user-provided keywords (0.44)

  13. Content • Inferring on-going activities by clustering, social network filtering and information extraction • Getting information from the whole workstation • Accepting user’s feedback • Future work

  14. Collecting User’s Feedback

  15. X W β S ξ π G N M Speclustering Modelsplit specific topics from general topics • Each document has a cluster label S. • For each word in a document, there is a hidden variable X to indicate the word is generated by the cluster specific topic S or by the general topic G. • 3. Parameters can be estimated using the EM algorithm. Activity

  16. X W β S ξ π G N M EM Modification with User’s Feedback • Email-cluster association • Re-assign posterior probability p(cluster|email) according to user’s approval or disapproval. • Keyword-cluster association • Re-assign if the keyword is confirmed by the user and if the keyword is removed by the user.

  17. Folder Reconstruction Accuracy Using Speclustering Algorithm accuracy Iteration 149 feedback entries(76 keyword-cluster pairs, and 73 email-cluster pairs)

  18. Future Work • Jointly cluster meetings, people, files and other interesting entities. • preliminary results of jointly cluster emails and meetings • Found good match between emails and meetings • Didn’t visibly improve cluster quality • Allow richer user feedback. • Move from bag of features to structural data.

More Related