1 / 46

The Descent of Hierarchy, and Selection in Relational Semantics*

The Descent of Hierarchy, and Selection in Relational Semantics*. Barbara Rosario, Marti Hearst, Charles Fillmore UC Berkeley. *with apologies to Charles Darwin. Noun Compounds (NCs). Technical text is rich with NCs

brenna
Télécharger la présentation

The Descent of Hierarchy, and Selection in Relational Semantics*

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Descent of Hierarchy, and Selection in RelationalSemantics* Barbara Rosario, Marti Hearst, Charles Fillmore UC Berkeley *with apologies to Charles Darwin

  2. Noun Compounds (NCs) • Technical text is rich with NCs Open-labeled long-term study of the subcutaneous sumatriptan efficacy and tolerability in acute migraine treatment. • Any sequence of nouns that itself functions as a noun • asthma hospitalizations • health care personnel hand wash

  3. NCs: 3 computational tasks • Identification • Syntactic analysis (attachments) • [Baseline [headachefrequency]] • [[Tensionheadache] patient] • Our Goal: Semantic analysis • Headache treatment treatment forheadache • Corticosteroid treatment treatment that uses corticosteroid

  4. Descent of Hierarchy • Idea: • Use the top levels of a lexical hierarchy to identify semantic relations • Hypothesis: • A particular semantic relation holds between all 2-word NCs that can be categorized by a lexical category pair.

  5. Outline • Related work • Linguistic motivation • Lexical Hierarchy: MeSH • Labeling NC relations • Method and results • Discussion of ambiguity

  6. Related work(Semantic analysis of NCs) • Rule-based • Finin (1980) • Detailed AI analysis, hand-coded • Vanderwende (1994) • automatically extracts semantic information from an on-line dictionary, manipulates a set of handwritten rules. 13 classes, 52% accuracy • Probabilistic • Lauer (1995): • probabilistic model, 8 classes, 47% accuracy • Lapata (2000) • classifies nominalizations into subject/object. 2 classes, 80% accuracy

  7. Related work(Semantic analysis of NCs) • Lexical Hierarchy • Barrett et al. (2001) • WordNet, heuristics to classify a NC given the similarity to a known NC • Rosario and Hearst (2001) • MeSH, Neural Network. 18 classes, 60% accuracy • Relations pre-defined

  8. Linguistic Motivation • Semantics of the NCs: head-modifier relationship • Head noun has argument structure • Meaning of the head noun determines what kinds of things can be done to it, what it is made of, what it is a part of…

  9. Linguistic Motivation (cont.) • Material + Cutlery  Made of • steel knife, plastic fork, wooden spoon • Food + Cutlery  Used on • meat knife, dessert spoon, salad fork • Profession + Cutlery  Used by • chef's knife, butcher's knife

  10. Outline • Related work • Linguistic motivation • Lexical Hierarchy: MeSH • Labeling NC relations • Method and results • Discussion of ambiguity

  11. The lexical Hierarchy: MeSH Tree Structures 1. Anatomy [A] 2. Organisms [B] 3. Diseases [C] 4. Chemicals and Drugs [D] 5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E] 6. Psychiatry and Psychology [F] 7. Biological Sciences [G] 8. Physical Sciences [H] 9. Anthropology, Education, Sociology and Social Phenomena [I] 10. Technology and Food and Beverages [J] 11. Humanities [K] 12. Information Science [L] 13. Persons [M] 14. Health Care [N] 15. Geographic Locations [Z]

  12. The lexical Hierarchy: MeSH 1. Anatomy [A]Body Regions [A01] 2. [B] Musculoskeletal System [A02] 3. [C] Digestive System [A03] 4. [D] Respiratory System [A04] 5. [E] Urogenital System [A05] 6. [F] …… 7. [G] 8. Physical Sciences [H] 9. [I] 10. [J] 11. [K] 12. [L] 13. [M]

  13. Descending the Hierarchy 1. Anatomy [A]Body Regions [A01] Abdomen [A01.047] 2. [B] Musculoskeletal System [A02] Back [A01.176] 3. [C] Digestive System [A03] Breast [A01.236] 4. [D] Respiratory System [A04] Extremities [A01.378] 5. [E] Urogenital System [A05] Head [A01.456] 6. [F] …… Neck [A01.598] 7. [G] …. 8. Physical Sciences [H] 9. [I] 10. [J] 11. [K] 12. [L] 13. [M]

  14. Descending the Hierarchy 1. Anatomy [A]Body Regions [A01] Abdomen [A01.047] 2. [B] Musculoskeletal System [A02] Back [A01.176] 3. [C] Digestive System [A03] Breast [A01.236] 4. [D] Respiratory System [A04] Extremities [A01.378] 5. [E] Urogenital System [A05] Head [A01.456] 6. [F] …… Neck [A01.598] 7. [G] …. 8. Physical Sciences [H] Electronics 9. [I] Astronomy 10. [J] Nature 11. [K] Time 12. [L] Weights and Measures 13. [M] ….

  15. Descending the Hierarchy 1. Anatomy [A]Body Regions [A01] Abdomen [A01.047] 2. [B] Musculoskeletal System [A02] Back [A01.176] 3. [C] Digestive System [A03] Breast [A01.236] 4. [D] Respiratory System [A04] Extremities [A01.378] 5. [E] Urogenital System [A05] Head [A01.456] 6. [F] …… Neck [A01.598] 7. [G] …. 8. Physical Sciences [H] Electronics Amplifiers 9. [I] Astronomy Electronics, Medical 10. [J] Nature Transducers 11. [K] Time 12. [L] Weights and Measures 13. [M] ….

  16. Descending the Hierarchy 1. Anatomy [A]Body Regions [A01] Abdomen [A01.047] 2. [B] Musculoskeletal System [A02] Back [A01.176] 3. [C] Digestive System [A03] Breast [A01.236] 4. [D] Respiratory System [A04] Extremities [A01.378] 5. [E] Urogenital System [A05] Head [A01.456] 6. [F] …… Neck [A01.598] 7. [G] …. 8. Physical Sciences [H] Electronics Amplifiers 9. [I] Astronomy Electronics, Medical 10. [J] Nature Transducers 11. [K] Time 12. [L] Weights and Measures Calibration 13. [M] ….Metric System Reference Standard

  17. Descending the Hierarchy 1. Anatomy [A]Body Regions [A01]Abdomen [A01.047] 2. [B] Musculoskeletal System [A02]Back [A01.176] 3. [C] Digestive System [A03]Breast [A01.236] 4. [D] Respiratory System [A04]Extremities [A01.378] 5. [E] Urogenital System [A05]Head [A01.456] 6. [F] …… Neck [A01.598] 7. [G] …. 8. Physical Sciences [H] ElectronicsAmplifiers 9. [I] AstronomyElectronics, Medical 10. [J] NatureTransducers 11. [K] Time 12. [L] Weights and Measures Calibration 13. [M] ….Metric System Reference Standard Homogeneous Heterogeneous

  18. Mapping Nouns to MeSH Concepts • headache recurrence C23.888.592.612.441 C23.550.291.937 • headache pain C23.888.592.612.441 G11.561.796.444 • breast cancer cells A01.236 C04 A11

  19. Levels of Description headache pain • Level 0: C.23 G.11 • Level 1: C23.888 G11.561 • Level 1: C23.888.592 G11.561.796 • … • Original: C23.888.592.612.441 G11.561.796.444

  20. Outline • Related work • Linguistic motivation • Lexical Hierarchy: MeSH • Labeling NC relations • Method and results • Discussion of ambiguity

  21. Descent of Hierarchy • Idea: • Words falling in homogeneous MeSH subhierarchies behave “similarly” with respect to relation assignment • Hypothesis: • A particular semantic relation holds between all 2-word NCs that can be categorized by a MeSH category pairs

  22. Grouping the NCs • CP: A02 C04 (Musculoskeletal System, Neoplasms) • skull tumors, bone cysts, bone metastases, skull osteosarcoma… • CP: C04 M01 (Neoplasms, Person) • leukemia survivor, lymphoma patients, cancer physician, cancer nurses…

  23. Distribution of Category Pairs

  24. Collection • ~70,000 NCs extracted from titles and abstracts of Medline • 2,627 CPs at level 0 (with at least 10 unique NCs) • We analyzed • 250 CPs with Anatomy (A) • 21 CPs with Natural Science (H01) • 3 CPs with Neoplasm (C04) • This represents 10% of total CPs and 20% of total NCs

  25. Classification Method • For each CP • Divide its NCs into “training-testing” sets • “Training”: inspect NCs by hand • Start from level 0 0 • While NCs are not all similar • descend one level of the hierarchy • Repeat until all NCs for that CP are similar

  26. Using the CPs for classification • CP: A02 C04 (Musculoskeletal System, Neoplasms) • skull tumors, bone cysts, bone metastases, skull osteosarcoma

  27. Using the CPs for classification • CP: A02 C04 (Musculoskeletal System, Neoplasms) • skull tumors, bone cysts, bone metastases, skull osteosarcoma • Similar NCs • All NCs under the CP A02 C04 have the same semantic relationship • Location of disease? Disease in Anatomy?

  28. Using the CPs for classification • CP: A02 C04 (Musculoskeletal System, Neoplasms) • skull tumors, bone cysts, bone metastases, skull osteosarcoma • Similar NCs • All NCs under the CP A02 C04 have the same semantic relationship • Location of disease? Disease in Anatomy? • Add CP: A02 C04 to the list of classification decisions Classification decisions A02 C04

  29. Using the CPs for classification • CP: B06 B06 (Plants, Plants) • eucalyptus trees, apple fruits, rice grains, potato plants Classification decisions A02 C04

  30. Using the CPs for classification • CP: B06 B06 (Plants, Plants) • eucalyptus trees, apple fruits, rice grains, potato plants • Similar • Same relationship • Add CP B06 B06 Classification decisions A02 C04 B06 B06

  31. Using the CPs for classification • CP: C04 M01 (Neoplasms, Person) • leukemia survivor, lymphoma patients, cancer physician, cancer nurses… • Person afflicted by Disease? Person who treat Disease? • Too different! • Second noun needs to be more specific: Descend one level for the second noun Person Classification decisions A02 C04 B06 B06

  32. Classification decisions A02 C04 B06 B06 C04 M01  C04 M01.643 C04 M01.526 Using the CPs for classification • CP: C04 M01 (Neoplasm, Person) • leukemia survivor, lymphoma patients, cancer physician, cancer nurses…  Too different! • CP: C04 M01.643 (Neoplasms, Patients) • leukemia survivor, lymphoma patients • Person afflicted by Disease • CP: C04 M01.526 (Neoplasms, Occupational Groups) • cancer physician, cancer nurses… • Person who treat Disease • OK

  33. Classification Decisions • A02 C04 • B06 B06 • C04 M01 • C04 M01.643 • C04 M01.526 • A01 H01 • A01 H01.770 • A01 H01.671 • A01 H01.671.538 • A01 H01.671.868 • A01 M01 • A01 M01.643 • A01 M01.526 • A01 M01.898

  34. Classification Decisions + Relations (future work) • A02 C04  Location of Disease • B06 B06  Kind of Plants • C04 M01 • C04 M01.643  Person afflicted by Disease • C04 M01.526  Person who treats Disease • A01 H01 • A01 H01.770 • A01 H01.671 • A01 H01.671.538 • A01 H01.671.868 • A01 M01 • A01 M01.643 • A01 M01.526 • A01 M01.898

  35. Classification Decisions + Relations (future work) • A02 C04  Location of Disease • B06 B06  Kind of Plants • C04 M01 • C04 M01.643  Person afflicted by Disease • C04 M01.526  Person who treats Disease • A01 H01 • A01 H01.770 • A01 H01.671 • A01 H01.671.538 • A01 H01.671.868 • A01 M01 • A01 M01.643  Person afflicted by Disease • A01 M01.526 • A01 M01.898

  36. Classification Decision Levels • Anatomy: 250 CPs • 187 (75%) remain first level • 56 (22%) descend one level • 7 (3%) descend two levels • Natural Science (H01): 21 CPs • 1 (4%) remain first level • 8 (39%) descend one level • 12 (57%) descend two levels • Neoplasms (C04) 3 CPs: • 3 (100%) descend one level

  37. Evaluation • Test the decisions on “testing” set • Count how many NCs that fall in the groups defined in the classification decisions are similar to each other • Accuracy: • Anatomy: 91% accurate • Natural Science: 79% • Neoplasm: 100% • Total Accuracy : 90.8% • Generalization: our 415 classification decisions cover ~ 46,000 possible CP pairs

  38. Outline • Related work • Linguistic motivation • Lexical Hierarchy: MeSH • Labeling NC relations • Method and results • Discussion of ambiguity

  39. Ambiguity – Two Types • Lexical ambiguity: • mortality • state of being mortal • death rate • Relationship ambiguity: • bacteria mortality • death of bacteria • death caused by bacteria

  40. Lexical Ambiguity vs. Multiple MeSH Senses • Lexical ambiguity different from multiple MeSH senses • Ex: Mortality has 4 senses • Public Health (G)  Data Collection  Vital Statistics   Mortality • Investigative Techniques (E)  Data Collection  Vital Statistics   Mortality • Information Science (L)  Data Collection  Vital Statistics   Mortality • Population Characteristics (N)  Demography  Vital Statistics   Mortality • On average, there are 1.5 MeSH senses per word for the nouns in our collection

  41. Four Cases Single MeSH senses Multiple MeSH senses Only one possible relationship: abdomen radiography, aciclovir treatment Only one possible relationship: alcoholism treatment Multiple relationships: hospital databases, education efforts, kidney metabolism Multiple relationships bacteria mortality Ambiguity of relationship

  42. Four Cases Single MeSH senses Multiple MeSH senses Only one possible relationship: abdomen radiography, aciclovir treatment Only one possible relationship: alcoholism treatment Multiple relationships bacteria mortality Multiple relationships: hospital databases, education efforts, kidney metabolism Most problematic cases Ambiguity of relationship … but rare!

  43. Conclusions • Very simple method for assigning semantic relations to two-word technical NCs • 90.8% accuracy • Grouping the NCs with respect to their semantic descriptors • Lexical resource (MeSH) useful for this task • Use the upper levels of the lexical hierarchy for an accurate classification, reducing therefore the space of the problem

  44. Future work • Analyze full spectrum of hierarchy • NCs with > 2 terms • [[growth hormone] deficiency] • Other syntactic structures • Non-biomedical words • Other ontologies (e.g.,WordNet)?

  45. And given enough data… • skull character • jaw depression • nose resuscitation • cadaver motion

  46. Thanks!For more information:http://bailando.sims.berkeley.edu/lindi/

More Related