Genetic Learning for Information Retrieval

Genetic Learning forInformation Retrieval Andrew Trotman Computer Science 365 * 24 * 60 / 40 = 13,140

X Genetic Learning • The Core Algorithm • Crossover, Mutation, Reproduction • Fitness proportionate selection • Genetic Algorithms • Chromosome is an array • Genetic Programming • Chromosome isan abstract syntax tree {A B C D E F} X {1 2 3 4 5 6}

Information Retrieval (Text) • Online Systems • Dialog, LexisNexis, etc. • Web Systems • Alta Vista, Excite, Google, etc. • Scientific Literature Systems • CiteSeer, PubMed, BioMedNet, etc. • Question: • How should scientific literature be ranked? • Less time searching / More time researching • Higher exposure for “good” work

How Google Works • PageRank • Document ranking from PageRank • A document’s PageRank is some factor (d) of the rank of incoming citations • A document’s influence is some factor of its rank and its outgoing citations • Characteristics of Scientific Literature • Citations unidirectional (backwards in time) • 12 month publication cycle • Scientific citation “cliques”

postings dictionary Record1: Of OtagoRecord2: Otago UniversityRecord3: OtagoRecord4: Of OF <1,1><4,1> OTAGO <2,1><3,1> UNIVERSITY <2,1> How IR works • Indexing • Build the dictionary • Construct the Postings (<d,f> pairs) • Searching • Look up terms in dictionary • Boolean resolution • Rank on density (probability, vector space, etc.) • Performance • Recall and precision

doc:1 docid:2 place:3 cntry:5 sport:6 name:4 rank:7 <doc><docid>1</docid><place><name>University of Otago</name></place><cntry>New Zealand</cntry></doc> <doc><docid>2</docid><cntry>New Zealand</cntry><sport>sailing</sport></doc> <doc><docid>3</docid><place><name>University of Otago</name><rank>top</rank></place></doc> Structured-IR • Sci-Lit documents have structure • Title, abstract, conclusions, etc. • <d,f> becomes <d,p,f>

Using Structure in Ranking • Documents have structure • Title, Abstract, Conclusions, etc. • Weight each structure on “importance” • Title higher than Abstract higher than … • How to choose the weights • Specified in the query (XIRQL) • Query feedback • Learn with a Genetic Algorithm • Adapt ranking model to use structure • Each tree node is a locus • Weights are genes

50 training queries 50 evaluation queries 25 generations Probabilistic IR Vector Space IR PROBABILISTIC IR 75.5% queries improved 6.7% increase in MAP (8.8% max) VECTOR SPACE IR 61% queries improved 4.7% increase in MAP (5.4% max) Experiment Results

Ranking Algorithms • Multitude exist • Probability, vector space, Boolean • Several published nomenclatures • Over 100,000 “published” algorithms • Purpose • Put relevant documents first • Sorting • Performance measures with precision • Sources • Some guy thought it up

50 training queries 50 evaluation queries 31 runs Weekend time limit Compare to Probabilistic 67% queries improved 15% increase in MAP Experiment Results

Function Comparison Vector Space Probability Learned wdq=StÎq(((((((((U / sqrt(sqrt(nt))) / (mq / sqrt((((Lq / (sqrt(sqrt(Ld)) / sqrt((U / nc)))) * min(mq, N)) / sqrt(((((((Tmax / sqrt(U)) / sqrt((((log2(sqrt(nt)) / sqrt(nt)) / sqrt(Umax)) / (M / nc)))) / sqrt((U / nc))) - uq) / mq) / sqrt(nt))))))) / sqrt((log(Tmax) / nc))) / sqrt(nt)) / sqrt(nt)) / sqrt((Lq / sqrt(((sqrt((sqrt(sqrt(Ld)) / sqrt((min(mq, sqrt((((log(Tmax) / nc) / sqrt(Umax)) / (mq / sqrt(((N * min((sqrt(nc) / sqrt(U)), Ld)) / sqrt(N))))))) / sqrt(Ld))))) / sqrt((Tmax / nc))) / sqrt(nt)))))) / sqrt((min(mq, N) / nc))) / sqrt((log(Tmax) / nc))) / sqrt(nt))

Conclusions • Using document structure improved ranking • Structure weights can be learned with a GA • GP can be used to learn ranking functions Speculation • Combining GA and GP to learn a structure ranking algorithm will better GA and GP alone

Questions?

Random NumbersAre your results an artifact of your random number generator?

Genetic Learning for Information Retrieval

Genetic Learning for Information Retrieval

Presentation Transcript

Statistical Learning Methods for Information Retrieval

Information retrieval

Information Retrieval

Galago for Information Retrieval

Machine Learning for multimedia information retrieval

Information Retrieval

Learning Techniques for Information Retrieval

Information Retrieval

Information Retrieval

Machine Learning and Information Retrieval

Learning to Rank for Information Retrieval

Introduction to Machine Learning for Information Retrieval

Information Retrieval

Information Retrieval

Information Retrieval

Structured Prediction and Active Learning for Information Retrieval

information retrieval

Information Retrieval