110 likes | 248 Vues
Extracting Academic Affiliations. Alicia Tribble Einat Minkov Andy Schlaikjer Laura Kieras. The Problem. Determine academic institutions with which a professor is or has been affiliated Where degrees earned Previous affiliations, including post-doc Current affiliation
E N D
Extracting Academic Affiliations Alicia Tribble Einat Minkov Andy Schlaikjer Laura Kieras
The Problem • Determine academic institutions with which a professor is or has been affiliated • Where degrees earned • Previous affiliations, including post-doc • Current affiliation • Why would this be useful? • Studying social networks in academia • Person entity disambiguation
Knowledge We Will Learn • Example text rules to be learned: • If string=“ <person> received his <degree> in <department> from <institution> ”, Then: 'Affiliated(<person>, <institution>)‘ • If string=“<degree> , <department> , <institution> ” on <person> ’s home page, Then: 'Affiliated(<person>, <institution>)'” • Class of beliefs to be learned: • Affiliated(<person>,<institution>)
Sources of redundant information • URL of professor’s personal home page(e.g., www.cmu.edu/~xxx) • Text found on multiple web pages, especially in resume, CV, or biography section of personal home pages • Links incoming and outgoing from personal home pages
Additional information • Dictionary of institution names • Dictionary of degrees • E.g. Ph.D., B.S., B. Tech., etc • Map of domain names to institution names • E.g cmu.edu -> Carnegie Mellon University • This could be learned but we will leave that for another group!
Bootstrapping Logistics • Start with a few seed rules and seed facts • Use these rules to learn more facts, these facts to learn more rules, etc etc!
Our seed facts • Affiliated(<Tom M. Mitchell>, <Stanford University>) • Affiliated(<Tom Mitchell>, <Carnegie Mellon University>) • Affiliated(<William Cohen>, <Duke University>)
Our seed rules • If URL of personal web page is in the academic URL dictionary, then believe Affiliated(<person>, <institution>) • If looking at a resume or personal web page and any of the patterns below are found, then believe Affiliated(<person>,<institution>): • "<degree>.<department> <institution>. • "<degree>.<institution> <department>” • "<position>,<department> <institution>” • "<person> received <pronoun> <degree> from <institution>"
Algorithm walk-through • Start with known belief Affiliated(William Cohen, Duke University) • Extract sentences from William Cohen web page that contain "William Cohen" and "Duke" • Found pattern "William Cohen received his bachelor's degree in Computer Science from Duke University in 1984 ” • Learned new pattern "received <pronoun> <degree> from <institution>”
Walk-through continued • Search for new web pages matching our pattern "received his degree from” • Found example: "Adnan Darwiche is an Associate Professor of Computer Science at UCLA, having received his PhD and MS degrees in Computer Science from Stanford University” • Extracted belief Affiliated(Adnan Darwiche, Stanford University)