1 / 62

Hierarchical Text Categorization and its Application to Bioinformatics

Hierarchical Text Categorization and its Application to Bioinformatics. Stan Matwin and Svetlana Kiritchenko joint work with Fazel Famili (NRC), and Richard Nock (Université Antilles-Guyane) School of Information Technology and Engineering University of Ottawa. Outline.

Télécharger la présentation

Hierarchical Text Categorization and its Application to Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hierarchical Text Categorizationand its Application to Bioinformatics Stan Matwin and Svetlana Kiritchenko joint work with Fazel Famili (NRC), and Richard Nock (Université Antilles-Guyane) School of Information Technology and Engineering University of Ottawa

  2. Outline • What is hierarchical text categorization (HTC) • Functional gene annotation requires HTC • Ensemble-based learning and AdaBoost • Multi-class multi-label AdaBoost • Generalized local hierarchical learning method • New global hierarchical learning algorithm • New hierarchical evaluation measure • Application to Bioinformatics

  3. Text categorization • Given: dj D - textual documents C = {c1, …, c|C|} – predefined categories • Task: <dj, ci>  DC  {True, False} c1 c2 TC c3 c7 c6 c5 c4

  4. Hierarchical text categorization • Hierarchy of categories: ≤  CC - reflexive, anti-symmetric, transitive binary relation on C c1 HTC c2 c3 c5 c6 c7 c4

  5. Advantages of HTC • Additional, potentially valuable information • Relationships between categories • Flexibility • High levels: general topics • Low levels: more detail

  6. Outline • What is hierarchical text categorization (HTC) • Functional gene annotation requires HTC • Ensemble-based learning and AdaBoost • Multi-class multi-label AdaBoost • Generalized local hierarchical learning method • New global hierarchical learning algorithm • New hierarchical evaluation measure • Application to Bioinformatics

  7. Text classification and bioinformatics • Clustering and classification of gene expression data • DNA chip time series – performance data • Gene function, process,… – genetic knowledge - GO • Literature will connect the two - domain knowledge • Validation of results from performance data

  8. Example: Gene Ontology

  9. From data to knowledge via literature • Functional annotation of genes from biomedical literature

  10. Other applications • Web directories • Digital libraries • Patent databases • Biological ontologies • Email folders

  11. Outline • What is hierarchical text categorization (HTC) • Functional gene annotation requires HTC • Ensemble-based learning and AdaBoost • Multi-class multi-label AdaBoost • Generalized local hierarchical learning method • New global hierarchical learning algorithm • New hierarchical evaluation measure • Application to Bioinformatics

  12. Boosting • not a learning technique on its own, but a method in which a family of “weakly” learning agents (simple learners) is used for learning • based on the fact that multiple classifiers that disagree with one another can be together more accurate than its component classifiers • if there are L classifiers, each with an error rate < 1/2, and the errors are independent, then the prob. that the majority vote is wrong is the area under binomial distribution for more than L/2 hypotheses

  13. Why do we have committees (ensembles)?

  14. Boosting – the very idea • Train an ensemble of classifiers, sequentially • Each next classifier focuses more on the training instances on which the previous one has made a mistake • The “focusing” is done thru the weighting of the training instances • To classify a new instance, make the ensemble vote

  15. Boosting - properties • If each hl is only better than chances, boosting can attain ANY accuracy!! • No need for new examples, additional knowledge, etc • Original AdaBoost is on single-labeled data

  16. Outline • What is hierarchical text categorization (HTC) • Functional gene annotation requires HTC • Ensemble-based learning and AdaBoost • Multi-class multi-label AdaBoost • Generalized local hierarchical learning method • New global hierarchical learning algorithm • New hierarchical evaluation measure • Application to Bioinformatics

  17. AdaBoost.MH [Schapire and Singer, 1999] • (di, Ci)  ((di, l), Ci[l]), l  C • Initialize distribution P1(i,l) = 1/(mk) . • For t = 1, …, T: • Train weak learner using distribution Pt. • Get weak hypothesis ht: DC  . • Update: • The final hypothesis:

  18. BoosTexter [Schapire and Singer, 2000] • “Weak” learner: decision stump word w occurs doesn’t occur

  19. Thresholds for AdaBoost • AdaBoost often underestimates its confidences • 3 approaches to selecting better thresholds • single threshold for all classes • individual thresholds for each class • separate thresholds for each subtree rooted in the children of a top node (for tree-hierarchies only)

  20. Thresholds for AdaBoost

  21. Outline • What is hierarchical text categorization (HTC) • Functional gene annotation requires HTC • Ensemble-based learning and AdaBoost • Multi-class multi-label AdaBoost • Generalized local hierarchical learning method • New global hierarchical learning algorithm • New hierarchical evaluation measure • Application to Bioinformatics

  22. Hierarchical consistency • if (dj, ci) True, then (dj, Ancestor(ci)) True c1 c1 c2 c2 c3 c3 c5 c5 c4 c6 c7 c4 c6 c7 consistent inconsistent

  23. Hierarchical local approach c1 c2 c3 c5 c4 c6 c7 c8 c9

  24. Hierarchical local approach c1 c2 c3 c5 c4 c6 c7 c8 c9

  25. Hierarchical local approach c1 c2 c3 c5 c4 c6 c7 c8 c9

  26. Hierarchical local approach c1 c2 c3 c5 c4 c6 c7 c8 c9

  27. Hierarchical local approach c1 c2 c3 c5 c4 c6 c7 c8 c9 consistent classification

  28. Generalized hierarchical local approach • stop classification at an intermediate level if none of the children categories seem relevant • a category node can be assigned only after all its parent nodes have been assigned c1 c2 c3 c5 c4 c6 c7 c8 c9

  29. Outline • What is hierarchical text categorization (HTC) • Functional gene annotation requires HTC • Ensemble-based learning and AdaBoost • Multi-class multi-label AdaBoost • Generalized local hierarchical learning method • New global hierarchical learning algorithm • New hierarchical evaluation measure • Application to Bioinformatics

  30. New global hierarchical approach • Make a dataset consistent with a class hierarchy • add ancestor category labels • Apply a regular learning algorithm • AdaBoost • Make prediction results consistent with a class hierarchy • for inconsistent labeling make a consistent decision based on confidences of all ancestor classes

  31. New global hierarchical approach • Hierarchical (shared) attributes sports team, game, winner, etc. hockey NHL, Senators, goalkeeper, etc. football Super Bowl, Patriots, touchdown, etc.

  32. Outline • What is hierarchical text categorization (HTC) • Functional gene annotation requires HTC • Ensemble-based learning and AdaBoost • Multi-class multi-label AdaBoost • Generalized local hierarchical learning method • New global hierarchical learning algorithm • New hierarchical evaluation measure • Application to Bioinformatics

  33. Evaluation in TC c1 Correct category c2 c3 Incorrect category c5 c6 c7 c4

  34. c1 c1 c1 c2 c3 c2 c3 c2 c3 c5 c4 c6 c7 c5 c6 c7 c5 c6 c7 c4 c4 Weaknesses of standard measures H1 H2 H3 P(H1) = P(H2) = P(H3) R(H1) = R(H2) = R(H3) F(H1) = F(H2) = F(H3) Ideally, M(H1) > M(H3) and M(H2) > M(H3)

  35. Requirements for a hierarchical measure 1. to give credit to partially correct classification c1 c1 H1 H2 c2 c2 c3 c3 c5 c5 c4 c4 c6 c7 c6 c7 c8 c9 c10 c11 c8 c9 c10 c11 M(H1) > M(H2)

  36. Requirements for a hierarchical measure 2. to punish distant errors more heavily: • to give higher evaluation for correctly classifying one level down comparing to staying at the parent node c1 c1 H1 H2 c2 c2 c3 c3 c5 c5 c4 c4 c6 c7 c6 c7 c8 c9 c10 c11 c8 c9 c10 c11 M(H1) > M(H2)

  37. Requirements for a hierarchical measure 2. to punish distant errors more heavily: • gives lower evaluation for incorrectly classifying one level down comparing to staying at the parent node c1 c1 H1 H2 c2 c2 c3 c3 c5 c5 c4 c4 c6 c7 c6 c7 c8 c9 c10 c11 c8 c9 c10 c11 M(H1) > M(H2)

  38. Requirements for a hierarchical measure 3. to punish errors at higher levels of a hierarchy more heavily c1 c1 H1 H2 c2 c2 c3 c3 c5 c5 c4 c4 c6 c7 c6 c7 c8 c9 c10 c11 c8 c9 c10 c11 M(H1) > M(H2)

  39. Advantages of the new measure • Simple, straight-forward to calculate • Based solely on a given hierarchy (no parameters to tune) • Satisfies all three requirements • Has much discriminating power • Allows to trade off between classification precision and classification depth

  40. Our new hierarchical measure Correct category + all its ancestors (excluding root) c1 Correct category c2 c3 Incorrect category c5 c6 c7 c4

  41. c1 c1 c1 c2 c3 c2 c3 c2 c3 c5 c4 c6 c7 c5 c6 c7 c5 c6 c7 c4 c4 Our new hierarchical measure H1 H2 H3 correct: {c4}  {c2, c4} predicted: {c2}  {c2} {c4}  {c2, c4} {c5}  {c2, c5} {c4}  {c2, c4} {c7}  {c3, c7}

  42. Measure consistency • Definition [Huang & Ling, 2005]: f, g – measures on domain  R = {(a,b)|a,b , f(a)>f(b), g(a)>g(b)} S = {(a,b)|a,b , f(a)>f(b), g(a)<g(b)} f is statistically consistent with g if |R|>|S| • Experiment: • 100 randomly chosen hierarchies • New hierarchical F-measure and standard accuracy were consistent on 85% of random classifiers (|R|>5|S|)

  43. c1 c1 c1 c2 c3 c2 c3 c2 c3 c5 c4 c6 c7 c5 c6 c7 c5 c6 c7 c4 c4 Measure discriminancy • Definition [Huang & Ling, 2005]: f, g – measures on domain  P = {(a,b)|a,b , f(a)>f(b), g(a)=g(b)} Q = {(a,b)|a,b , f(a)=f(b), g(a)>g(b)} f is statistically more discriminating than g if |P|>|Q| • Examples: H1 H2 H3 For one accuracy value - 3 different hierarchical values

  44. Results: Hierarchical vs. Flat Synthetic data (hierarchical attributes)

  45. Results: Hierarchical vs. Flat Synthetic data (no hierarchical attributes)

  46. Results: Hierarchical vs. Flat Real data

  47. Results: Hierarchical vs. Local Synthetic data (hierarchical attributes)

  48. Results: Hierarchical vs. Local Synthetic data (no hierarchical attributes)

  49. Results: Hierarchical vs. Local Real data

More Related