1 / 11

Extracting Academic Affiliations

Extracting Academic Affiliations. Alicia Tribble Einat Minkov Andy Schlaikjer Laura Kieras. The Problem. Determine academic institutions with which a professor is or has been affiliated Where degrees earned Previous affiliations, including post-doc Current affiliation

Télécharger la présentation

Extracting Academic Affiliations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extracting Academic Affiliations Alicia Tribble Einat Minkov Andy Schlaikjer Laura Kieras

  2. The Problem • Determine academic institutions with which a professor is or has been affiliated • Where degrees earned • Previous affiliations, including post-doc • Current affiliation • Why would this be useful? • Studying social networks in academia • Person entity disambiguation

  3. Knowledge We Will Learn • Example text rules to be learned: • If string=“ <person> received his <degree> in <department> from <institution> ”, Then: 'Affiliated(<person>, <institution>)‘ • If string=“<degree> , <department> , <institution> ” on <person> ’s home page, Then: 'Affiliated(<person>, <institution>)'” • Class of beliefs to be learned: • Affiliated(<person>,<institution>)

  4. Sources of redundant information • URL of professor’s personal home page(e.g., www.cmu.edu/~xxx) • Text found on multiple web pages, especially in resume, CV, or biography section of personal home pages • Links incoming and outgoing from personal home pages

  5. Additional information • Dictionary of institution names • Dictionary of degrees • E.g. Ph.D., B.S., B. Tech., etc • Map of domain names to institution names • E.g cmu.edu -> Carnegie Mellon University • This could be learned but we will leave that for another group!

  6. Bootstrapping Logistics • Start with a few seed rules and seed facts • Use these rules to learn more facts, these facts to learn more rules, etc etc!

  7. Our seed facts • Affiliated(<Tom M. Mitchell>, <Stanford University>) • Affiliated(<Tom Mitchell>, <Carnegie Mellon University>) • Affiliated(<William Cohen>, <Duke University>)

  8. Our seed rules • If URL of personal web page is in the academic URL dictionary, then believe Affiliated(<person>, <institution>) • If looking at a resume or personal web page and any of the patterns below are found, then believe Affiliated(<person>,<institution>): • "<degree>.<department> <institution>. • "<degree>.<institution> <department>” • "<position>,<department> <institution>” • "<person> received <pronoun> <degree> from <institution>"

  9. Algorithm walk-through • Start with known belief Affiliated(William Cohen, Duke University) • Extract sentences from William Cohen web page that contain "William Cohen" and "Duke" • Found pattern "William Cohen received his bachelor's degree in Computer Science from Duke University in 1984 ” • Learned new pattern "received <pronoun> <degree> from <institution>”

  10. Walk-through continued • Search for new web pages matching our pattern "received his degree from” • Found example: "Adnan Darwiche is an Associate Professor of Computer Science at UCLA, having received his PhD and MS degrees in Computer Science from Stanford University” • Extracted belief Affiliated(Adnan Darwiche, Stanford University)

More Related