1 / 17

Automating Keyphrase Extraction with Multi-Objective Genetic Algorithms (MOGA)

Automating Keyphrase Extraction with Multi-Objective Genetic Algorithms (MOGA). Jia-Long Wu Alice M. Agogino Berkeley Expert System Laboratory U.C. Berkeley. Outline. Role of Keyphrases Phrase Extraction Algorithms Phrase Extraction with Multi-Objective Genetic Algorithm

kalona
Télécharger la présentation

Automating Keyphrase Extraction with Multi-Objective Genetic Algorithms (MOGA)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automating Keyphrase Extraction with Multi-Objective Genetic Algorithms (MOGA) Jia-Long Wu Alice M. Agogino Berkeley Expert System Laboratory U.C. Berkeley

  2. Outline • Role of Keyphrases • Phrase Extraction Algorithms • Phrase Extraction with Multi-Objective Genetic Algorithm • Experiment and Results • Results Evaluation • Conclusion • Future Research

  3. Role of Keyphrases • Concept representations • Document indexing • Enhance document retrieval / Browsing • Query formulation assistance • Document surrogates

  4. Design Research Repository Unified Subject Headings Corporate Design Repository Design Education Materials Vision of Unified Language System Context Mapping Mechanism Semantic Network Unified Language System for Engineering Design

  5. Keyphrase Extraction Algorithms • Heuristic, Syntactic, Machine Learning • Requires prior training • Heuristic cut-off thresholds in number of phrases • Focuses on single document • Redundancy when aggregated for the whole document collection

  6. 3d scanning 1 abstraction 0 active control system 1 1 0 0 1 1 Candidate Phrases 0 1 1 0 1 Chromosome 1 0 0 0 1 Keyphrase Extraction with MOGA • Phrase extraction as an optimization problem • Candidate phrases generation • Optimize phrase selection with MOGA • Model & Genetic Operators Phenotype & Genotype Crossover Parents Offspring

  7. Keyphrase Extraction with MOGA • Optimize phrase selection with MOGA (cont.) • Model & Genetic Operators (cont.) • Evaluation fitness functions • Minimize clustering measure / dispersion (Bookstein ’98) • Minimize number of phrases • Non-Dominated Sorting Genetic Algorithm (NSGA-II) Mutation 1 0 0 1 0 1 1 0 1 0

  8. Experiment and Results • Data set 34 papers from Design Theory and Methodology Conference ’01 • Candidate phrases ~5000 noun phrases extracted • Genetic Algorithm Parameters • Population size 100 • Converges at 5000 generations • 5 hours on Xeon 1.8GHz CPU

  9. Experiment and Results Pareto plot of Dispersion versus Number of Phrases

  10. Experiment and Results Histogram of number of optimal solutions a keyphrase appears

  11. Evaluation

  12. Evaluation • 6 domain experts participated in the evaluation. • Core phrases vs. Non-core phrases. • Less than 10% are deemed irrelevant. • Significant deviation between evaluators.

  13. Conclusion • Keyphrase extraction can be successfully implemented as a multi-objective global optimization problem. • Reasonably good keyphrases can be extracted without prior training or domain knowledge. • Trade-off information between objectives such as number of phrases vs. average quality of phrases can be gained from Pareto solutions. • Preferences can be made based on the user needs and trade-off information.

  14. Future Research • Test on larger text collection. • Implement extracted keyphrases in IR system as browsing and query expansion tool and compare to full-text search IR system. • Evaluate with more raters and 1-5 scale. • Build domain thesauri with extracted keyphrases and semantic discovery algorithms (e.g. Latent Semantic Analysis).

  15. Metathesaurus in Digital Library

  16. Thank you! Comments? Questions? jialong@me.berkeley.edu aagogino@me.berkeley.edu

  17. Mode Analysis of Scaled Evaluation

More Related