1 / 61

Topic and Role Discovery in Social Networks

Topic and Role Discovery in Social Networks. Andrew McCallum Andre Corrada-Emmanuel Xuerui Wang Computer Science Department University of Massachusetts Amherst Also including joint work with Natasha Mohanty. The #1 computer application:. Email. Managing and Understanding

sdaryl
Télécharger la présentation

Topic and Role Discovery in Social Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topic and Role Discoveryin Social Networks Andrew McCallum Andre Corrada-Emmanuel Xuerui Wang Computer Science Department University of Massachusetts Amherst Also including joint work with Natasha Mohanty

  2. The #1 computer application: Email.

  3. Managing and Understanding Connections of People in our Email World Workplace effectiveness ~ Ability to leverage network of acquaintances But filling Contacts DB by hand is tedious, and incomplete. Contacts DB Email Inbox Automatically WWW

  4. Contact Info and Person Name Extraction Social Network Analysis Person Name Extraction Homepage Retrieval Keyword Extraction Name Coreference System Overview CRF WWW Email names

  5. An Example To: “Andrew McCallum” mccallum@cs.umass.edu Subject ... Search for new people

  6. Example keywords extracted Summary of Results Contact info and name extraction performance (25 fields) Expert Finding:When solving some task, find friends-of-friends with relevant expertise. Avoid “stove-piping” in large org’s by automatically suggesting collaborators. Given a task, automatically suggest the right team for the job. (Hiring aid!) Social Network Analysis:Understand the social structure of your organization.Suggest structural changes for improved efficiency.

  7. Outline a • Email, motivation • ART Graphical Model. • Experimental Results • Enron Email (corpus) • Academic Email (one person) • RART: Roles for ART • Group-Topic Model • Experiments on voting data • Voting data from U.S. Senate and the U.N.

  8. Clustering words into topics withLatent Dirichlet Allocation [Blei, Ng, Jordan 2003] GenerativeProcess: Example: For each document: 70% Iraq war 30% US election Sample a distributionover topics,  For each word in doc Iraq war Sample a topic, z Sample a wordfrom the topic, w “bombing”

  9. Example topicsinduced from a large collection of text JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTUNITIES WORKING TRAINING SKILLS CAREERS POSITIONS FIND POSITION FIELD OCCUPATIONS REQUIRE OPPORTUNITY EARN ABLE SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BIOLOGY FIELD PHYSICS LABORATORY STUDIES WORLD SCIENTIST STUDYING SCIENCES BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIELD PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNIS TEAMS GAMES SPORTS BAT TERRY FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POLES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORCE MAGNETS BE MAGNETISM POLE INDUCED STORY STORIES TELL CHARACTER CHARACTERS AUTHOR READ TOLD SETTING TALES PLOT TELLING SHORT FICTION ACTION TRUE EVENTS TELLS TALE NOVEL MIND WORLD DREAM DREAMS THOUGHT IMAGINATION MOMENT THOUGHTS OWN REAL LIFE IMAGINE SENSE CONSCIOUSNESS STRANGE FEELING WHOLE BEING MIGHT HOPE DISEASE BACTERIA DISEASES GERMS FEVER CAUSE CAUSED SPREAD VIRUSES INFECTION VIRUS MICROORGANISMS PERSON INFECTIOUS COMMON CAUSING SMALLPOX BODY INFECTIONS CERTAIN WATER FISH SEA SWIM SWIMMING POOL LIKE SHELL SHARK TANK SHELLS SHARKS DIVING DOLPHINS SWAM LONG SEAL DIVE DOLPHIN UNDERWATER [Tennenbaum et al]

  10. Example topicsinduced from a large collection of text JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTUNITIES WORKING TRAINING SKILLS CAREERS POSITIONS FIND POSITION FIELD OCCUPATIONS REQUIRE OPPORTUNITY EARN ABLE SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BIOLOGY FIELD PHYSICS LABORATORY STUDIES WORLD SCIENTIST STUDYING SCIENCES BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIELD PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNIS TEAMS GAMES SPORTS BAT TERRY FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POLES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORCE MAGNETS BE MAGNETISM POLE INDUCED STORY STORIES TELL CHARACTER CHARACTERS AUTHOR READ TOLD SETTING TALES PLOT TELLING SHORT FICTION ACTION TRUE EVENTS TELLS TALE NOVEL MIND WORLD DREAM DREAMS THOUGHT IMAGINATION MOMENT THOUGHTS OWN REAL LIFE IMAGINE SENSE CONSCIOUSNESS STRANGE FEELING WHOLE BEING MIGHT HOPE DISEASE BACTERIA DISEASES GERMS FEVER CAUSE CAUSED SPREAD VIRUSES INFECTION VIRUS MICROORGANISMS PERSON INFECTIOUS COMMON CAUSING SMALLPOX BODY INFECTIONS CERTAIN WATER FISH SEA SWIM SWIMMING POOL LIKE SHELL SHARK TANK SHELLS SHARKS DIVING DOLPHINS SWAM LONG SEAL DIVE DOLPHIN UNDERWATER [Tennenbaum et al]

  11. From LDA to Author-Recipient-Topic (ART)

  12. Inference and Estimation • Gibbs Sampling: • Easy to implement • Reasonably fast r

  13. Outline a a • Email, motivation • ART Graphical Model. • Experimental Results • Enron Email (corpus) • Academic Email (one person) • RART: Roles for ART • Group-Topic Model • Experiments on voting data • Voting data from U.S. Senate and the U.N.

  14. Enron Email Corpus • 250k email messages • 23k people Date: Wed, 11 Apr 2001 06:56:00 -0700 (PDT) From: debra.perlingiere@enron.com To: steve.hooser@enron.com Subject: Enron/TransAltaContract dated Jan 1, 2001 Please see below. Katalin Kiss of TransAlta has requested an electronic copy of our final draft? Are you OK with this? If so, the only version I have is the original draft without revisions. DP Debra Perlingiere Enron North America Corp. Legal Department 1400 Smith Street, EB 3885 Houston, Texas 77002 dperlin@enron.com

  15. Topics, and prominent senders / receiversdiscovered by ART Topic names, by hand

  16. Topics, and prominent sender/receiversdiscovered by ART Beck = “Chief Operations Officer” Dasovich = “Government Relations Executive” Shapiro = “Vice President of Regulatory Affairs” Steffes = “Vice President of Government Affairs”

  17. Comparing Role Discovery Traditional SNA ART Author-Topic connection strength (A,B) = distribution over recipients distribution over authored topics distribution over authored topics

  18. Comparing Role DiscoveryTracy Geaconne  Dan McCarty Traditional SNA ART Author-Topic Different roles Different roles Similar roles Geaconne = “Secretary” McCarty = “Vice President”

  19. Comparing Role DiscoveryTracy Geaconne  Rod Hayslett Traditional SNA ART Author-Topic Not very similar Very similar Different roles Geaconne = “Secretary” Hayslett = “Vice President & CTO”

  20. Comparing Role DiscoveryLynn Blair  Kimberly Watson Traditional SNA ART Author-Topic Very similar Very different Different roles Blair = “Gas pipeline logistics” Watson = “Pipeline facilities planning”

  21. McCallum Email Corpus 2004 • January - October 2004 • 23k email messages • 825 people From: kate@cs.umass.edu Subject: NIPS and .... Date: June 14, 2004 2:27:41 PM EDT To: mccallum@cs.umass.edu There is pertinent stuff on the first yellow folder that is completed either travel or other things, so please sign that first folder anyway. Then, here is the reminder of the things I'm still waiting for: NIPS registration receipt. CALO registration receipt. Thanks, Kate

  22. McCallum Email Blockstructure

  23. Four most prominent topicsin discussions with ____?

  24. Two most prominent topicsin discussions with ____?

  25. Outline a a • Email, motivation • ART Graphical Model. • Experimental Results • Enron Email (corpus) • Academic Email (one person) • RART: Roles for ART • Group-Topic Model • Experiments on voting data • Voting data from U.S. Senate and the U.N. a

  26. Role-Author-Recipient-Topic Models

  27. Results with RART:People in “Role #3” in Academic Email • olc lead Linux sysadmin • gauthier sysadmin for CIIR group • irsystem mailing list CIIR sysadmins • system mailing list for dept. sysadmins • allan Prof., chair of “computing committee” • valerie second Linux sysadmin • tech mailing list for dept. hardware • steve head of dept. I.T. support

  28. Roles for allan (James Allan) • Role #3 I.T. support • Role #2 Natural Language researcher Roles for pereira (Fernando Pereira) • Role #2 Natural Language researcher • Role #4 SRI CALO project participant • Role #6 Grant proposal writer • Role #10 Grant proposal coordinator • Role #8 Guests at McCallum’s house

  29. Outline a a • Email, motivation • ART Graphical Model. • Experimental Results • Enron Email (corpus) • Academic Email (one person) • RART: Roles for ART • Group-Topic Model • Experiments on voting data • Voting data from U.S. Senate and the U.N. a a

  30. ART & RART: Roles but not Groups Traditional SNA ART Author-Topic Not Not Block structured Enron TransWestern Division

  31. A Group Model:“Stochastic Blockstructures Model”

  32. Group-Topic Model [Wang, Mohanty, McCallum 2005]

  33. U.S. Senate Data sets • 3426 bills from 16 years of voting records from the U.S. Senate • Yea / Nea / Abstain (absent) • Each bill comes with an abstract (text describing the contents of the bill).

  34. Topics Discovered Traditional “Mixtures of Unigrams” Group- Topic Model

  35. Groups from topic Education + Domestic Groups Discovered Agreement Index

  36. Senators who change Coalition Dependent on Topic e.g. Senator Shelby (D-AL) votes with the Republicans on Economic with the Democrats on Education + Domestic with a small group of maverick Republicans on Social Security + Medicaid

  37. U.N. Data Set • 931 U.N. Resolutions, voted on by 192 countries, from 1990-2003. • Yes / No / Abstain votes • List of keywords summarizes the content of the resolution. • Also experiments later with resolutions from 1960-2003

  38. Topics Discovered Traditional mixture of unigrams Group-TopicModel

  39. GroupsDiscovered

  40. Groups and Topics, Trends over Time

  41. Summary • Traditionally, SNA examines links, but not the language content on those links. • Presented ART, an Bayesian network for messages sent in a social network: captures topics and role-similarity. • RART explicitly represents roles. • Additional work • Group-Topic model discovers groupsand clusters attributes of relations.[Wang, Mohanty, McCallum, LinkKDD 2005]

  42. Outline a a • Assume you already understand Graphical Models & CRFs. • Intro to the importance of joint inference • Review of previous examples • Joint segmentation & coreference for citations • Inference: Sparse Belief Propagation • Learning: Piecewise Training • Social Network Analysis in Email • Author-Recipient-Topic Model • Enron and Academic Email • Group-Topic Model • Voting data from U.S. Senate and the U.N. • Demo of New Research Paper Search Engine a a a a a a a

  43. Previous Systems

  44. Previous Systems Cites Research Paper

  45. More Entities and Relations Expertise Cites Grant Research Paper Person Venue University Groups

More Related