1 / 48

Classifying Semantic Relations in Bioscience Texts

Classifying Semantic Relations in Bioscience Texts. Barbara Rosario Marti Hearst SIMS, UC Berkeley http://biotext.berkeley.edu Supported by NSF DBI-0317510 and a gift from Genentech. Treatment. Disease. Problem: Which relations hold between 2 entities?. Cure?. Prevent?. Side Effect?.

claiborne
Télécharger la présentation

Classifying Semantic Relations in Bioscience Texts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classifying Semantic Relations in Bioscience Texts Barbara Rosario Marti Hearst SIMS, UC Berkeley http://biotext.berkeley.edu Supported by NSF DBI-0317510 and a gift from Genentech

  2. Treatment Disease Problem: Which relations hold between 2 entities? Cure? Prevent? Side Effect?

  3. Hepatitis Examples • Cure • These results suggest that con A-induced hepatitis was ameliorated by pretreatment with TJ-135. • Prevent • A two-dose combined hepatitis A and Bvaccine would facilitate immunization programs • Vague • Effect of interferon on hepatitis B

  4. Two tasks • Relationship Extraction: • Identify the several semantic relations that can occur between the entities disease and treatment in bioscience text • Entity extraction: • Related problem: identify such entities

  5. The Approach • Data: MEDLINE abstracts and titles • Graphical models • Combine in one framework both relation and entity extraction • Both static and dynamic models • Simple discriminative approach: • Neural network • Lexical, syntactic and semantic features

  6. Outline • Related work • Data and semantic relations • Features • Models and results • Conclusions

  7. Several DIFFERENT Relations between the Same Types of Entities • Thus differs from the problem statement of other work on relations • Many find one relation which holds between two entities (many based on ACE) • Agichtein and Gravano (2000), lexical patterns for location of • Zelenko et al. (2002) SVM for person affiliation and organization-location • Hasegawa et al. (ACL 2004) Person-Organization -> President “relation” • Craven (1999, 2001) HMM for subcellular-location and disorder-association • Doesn’t identify the actual relation

  8. Related work: Bioscience • Many hand-built rules • Feldman et al. (2002), • Friedman et al. (2001) • Pustejovsky et al. (2002) • Saric et al.; this conference

  9. Data and Relations • MEDLINE, abstracts and titles • 3662 sentences labeled • Relevant: 1724 • Irrelevant: 1771 • e.g., “Patients were followed up for 6 months” • 2 types of Entities, many instances • treatment and disease • 7 Relationships between these entities The labeled data is available at http://biotext.berkeley.edu

  10. Semantic Relationships • 810: Cure • Intravenous immune globulin for recurrent spontaneous abortion • 616: Only Disease • Social ties and susceptibility to the common cold • 166: Only Treatment • Flucticasone propionate is safe in recommended doses • 63: Prevent • Statins for prevention of stroke

  11. Semantic Relationships • 36: Vague • Phenylbutazone and leukemia • 29: Side Effect • Malignant mesodermal mixed tumor of the uterus following irradiation • 4: Does NOT cure • Evidence for double resistance to permethrin and malathion in head lice

  12. Features • Word • Part of speech • Phrase constituent • Orthographic features • ‘is number’, ‘all letters are capitalized’, ‘first letter is capitalized’ … • MeSH (semantic features) • Replace words, or sequences of words, with generalizations via MeSH categories • Peritoneum -> Abdomen

  13. Features (cont.): MeSH • MeSH Tree Structures 1. Anatomy [A] 2. Organisms [B] 3. Diseases [C] 4. Chemicals and Drugs [D] 5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E] 6. Psychiatry and Psychology [F] 7. Biological Sciences [G] 8. Physical Sciences [H] 9. Anthropology, Education, Sociology and Social Phenomena [I] 10. Technology and Food and Beverages [J] 11. Humanities [K] 12. Information Science [L] 13. Persons [M] 14. Health Care [N] 15. Geographic Locations [Z]

  14. 1. Anatomy [A] Body Regions [A01] + Musculoskeletal System [A02] Digestive System [A03] + Respiratory System [A04] + Urogenital System [A05] + Endocrine System [A06] + Cardiovascular System [A07] + Nervous System [A08] + Sense Organs [A09] + Tissues [A10] + Cells [A11] + Fluids and Secretions [A12] + Animal Structures [A13] + Stomatognathic System [A14] (…..) Body Regions [A01] Abdomen [A01.047] Groin [A01.047.365] Inguinal Canal [A01.047.412] Peritoneum [A01.047.596] + Umbilicus [A01.047.849] Axilla [A01.133] Back [A01.176] + Breast [A01.236] + Buttocks [A01.258] Extremities [A01.378] + Head [A01.456] + Neck [A01.598] (….) Features (cont.): MeSH

  15. Models • 2 static generative models • 3 dynamic generative models • 1 discriminative model (neural networks)

  16. Static Graphical Models • S1: observations dependent on Role but independent from Relation given roles • S2: observations dependent on both Relation and Role S1 S2

  17. D1 D2 D3 Dynamic Graphical Models • D1, D2 as in S1, S2 • D3: only one observation per state is dependent on both the relation and the role

  18. Graphical Models • Relation node: • Semantic relation (cure, prevent, none..) expressed in the sentence

  19. Graphical Models • Role nodes: • 3 choices: treatment, disease, or none

  20. Graphical Models • Feature nodes (observed): • word, POS, MeSH…

  21. Graphical Models • For Dynamic Model D1: • Joint probability distribution over relation, roles and features nodes • Parameters estimated with maximum likelihood and absolute discounting smoothing

  22. Thompson et al. 2003 Frame classification and role labeling for FrameNet sentences Target word must be observed More relations and roles Our D1

  23. Neural Networks • Feed-forward network (MATLAB) • Training with conjugate gradient descent • One hidden layer (hyperbolic tangent function) • Logistic sigmoid function for the output layer representing the relationships • Same features • Discriminative approach

  24. Relation extraction • Results in terms of classification accuracy (with and without irrelevant sentences) • 2 cases: • Roles hidden • Roles given • Graphical models • NN: simple classification problem

  25. Relation classification: Results Neural Net always best

  26. Relation classification: Results With no smoothing, D1 best Graphical Model

  27. Relation classification: Results With Smoothing and No Roles, D2 best GM

  28. Relation classification: Results With Smoothing and Roles, D1 best GM

  29. Relation classification: Results Dynamic models always outperform Static

  30. Relation classification: Confusion Matrix Computed for the model D2, “rel + irrel.”, “only features”

  31. Role extraction • Results in terms of F-measure • Graphical models • Junction tree algorithm (BNT) • Relation hidden and marginalized over • NN • Couldn’t run it (features vectors too large) • (Graphical models can do role extraction and relationship classification simultaneously)

  32. Role Extraction: Results F-measures D1 best when no smoothing

  33. Role Extraction: Results F-measures D2 best with smoothing, but doesn’t boost scores as much as in relation classification

  34. Features impact: Role Extraction • Most important features: 1)Word, 2)MeSH • Models D1 D2 • All features 0.67 0.71 • No word 0.58 0.61 -13.4% -14.1% • No MeSH 0.63 0.65 -5.9%-8.4% (rel. + irrel.)

  35. Features impact: Relation classification • Most important features: Roles • Accuracy: D1 D2 NN • All feat. + roles 91.6 82.0 96.9 • All feat. – roles 68.974.9 79.6 -24.7% -8.7% -17.8% • All feat. + roles – Word 91.6 79.8 96.4 0%-2.8% -0.5% • All feat. + roles – MeSH 91.6 84.6 97.3 0%3.1% 0.4% (rel. + irrel.)

  36. Features impact: Relation classification • Most realistic case: Roles not known • Most important features: 1) Mesh 2) Word for D1 and NN (but vice versa for D2) • Accuracy: D1 D2 NN • All feat. – roles 68.974.9 79.6 • All feat. - roles – Word 66.7 66.1 76.2 -3.3% -11.8% -4.3% • All feat. - roles – MeSH 62.7 72.5 74.1 -9.1% -3.2% -6.9% (rel. + irrel.)

  37. Conclusions • Classification of subtle semantic relations in bioscience text • Discriminative model (neural network) achieves high classification accuracy • Graphical models for the simultaneous extraction of entities and relationships • Importance of lexical hierarchy • Future work: • A new collection of disease/treatment data • Different entities/relations • Unsupervised learning to discover relation types

  38. Thank you! Barbara Rosario Marti Hearst SIMS, UC Berkeley http://biotext.berkeley.edu

  39. Additional slides

  40. Smoothing: absolute discounting • Lower the probability of seen events by subtracting a constant from their count (ML estimate: ) • The remaining probability is evenly divided by the unseen events

  41. F-measures for role extraction in function of smoothing factors

  42. Relation accuracies in function of smoothing factors

  43. Role Extraction: Results Static models better than Dynamic for Note: No Neural Networks

  44. Features impact: Role Extraction • Most important features: 1)Word, 2)MeSH • Models D1 D2 Average • All features 0.67 0.71 • No word 0.58 0.61 -13.4% -14.1% -13.7% • No MeSH 0.63 0.65 -5.9%-8.4%-7.2% (rel. + irrel.)

  45. Features impact: Role extraction • Most important features: 1) Word, 2) MeSH • F-measures: D1 D2 Average • All features 0.720.73 • No word 0.65 0.66 -9.7% -9.6% -9.6% No MeSH 0.69 0.69 • -4.2% -5.5% -4.8% (only rel.)

  46. Features impact: Role extraction • Most important features: 1) Word, 2) MeSH • F-measures: D1 D2 • All features 0.720.73 • No word 0.65 0.66 • -9.7% -9.6% • No MeSH 0.69 0.69 • -4.2% -5.5% (only rel.)

  47. Features impact: Relation classification • Most important features: Roles • Accuracy: D1 D2 NN Avg. • All feat. + roles 91.6 82.0 96.9 • All feat. – roles 68.974.9 79.6 • -24.7% -8.7% -17.8% -17.1% • All feat. + roles – Word 91.6 79.8 96.4 • 0%-2.8% -0.5% -1.1% • All feat. + roles – MeSH 91.6 84.6 97.3 • 0%3.1% 0.4% 1.1% (rel. + irrel.)

  48. Features impact: Relation classification • Most realistic case: Roles not known • Most important features: 1) Mesh 2) Word for D1 and NN (but vice versa for D2) • Accuracy: D1 D2 NN Avg. • All feat. – roles 68.974.9 79.6 • All feat. - roles – Word 66.7 66.1 76.2 • -3.3% -11.8% -4.3% -6.4% • All feat. - roles – MeSH 62.7 72.5 74.1 -9.1% -3.2% -6.9% -6.4% (rel. + irrel.)

More Related