Download
automatic collection recruiter n.
Skip this Video
Loading SlideShow in 5 Seconds..
Automatic Collection “Recruiter” PowerPoint Presentation
Download Presentation
Automatic Collection “Recruiter”

Automatic Collection “Recruiter”

139 Vues Download Presentation
Télécharger la présentation

Automatic Collection “Recruiter”

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Automatic Collection “Recruiter” Shuang Song

  2. Project Goal • Given a collection, automatically suggest other items to add to the collection • Design a process to achieve the task • Apply different filtering algorithms • Evaluate the result

  3. Collection New Items 1 3 Query Terms Training Sets 2 External Source Filter Query Results The Process • Tokenization and frequency counting • New items extraction • New items filtering and ranking

  4. Filtering Algorithms • Latent Semantic Analysis (LSA) • Pre-processing, no stemming • SVD over term by document matrix • Pseudo-document representation of new items • Gzip Compression Algorithms

  5. Relevance Measure - LSA Collection Signature Vector Pseudo-document Vector V V* LSA Feature Space

  6. Relevance Measure - gzip

  7. First Experiment – Math Forum Collection • 19 courseware in the collection • 10 items in the experiment set • First 5 from math forum • The other 5 from other collections in www.smete.org

  8. First Experiment Result

  9. Second Experiment – Collaborative Filtering Collection • 12 papers in the collection • 11 items in the experiment set • First 10 from Citeseer • Query terms submitted: (information 284) (algorithm 250) (ratings 217) (filtering 159) (system 197) (query 149) (reputation 114) (reviewer 109) (collaborative 106) (recommendations 98) • Last one is the paper we read in class: “An Algorithm for Automated Rating of Reviewers”

  10. Second Experiment Result

  11. Second Experiment – User Study • 6 people in my research lab participated in this study • 3 of them with IR background • 3 of them without IR background • They were asked to rate the 11 items in the experiment set in according to the the degree of relevance to the given collection

  12. Second Experiment Result – Human Rating

  13. Second Experiment Result – Another View

  14. Second Experiment Result –comparison of w/o SVD and w/o weightings

  15. Second Experiment – Correlation with human rating

  16. Second Experiment –precision and recall (cutoff: RLSA >0.5 & Rgzip>0.2)

  17. Second Experiment –precision and recall (cutoff: RLSA >0.4 & Rgzip>0.17)

  18. Comparison of Two Filtering Algorithms • Gzip works well when input documents are just abstracts, while LSA works for both • LSA captures words association pattern and statistical importance, gzip scans for repetition only. • LSA is more computationally demanding, while gzip is simple • Effectiveness

  19. To Do List And Future Work • Accurate and trustworthy evaluation from expert (collection owner?) • Extract full text and abstract from Citeseer automatically