1 / 17

Comments from Pre-submission Presentation

Comments from Pre-submission Presentation. Q: Check why kNN is so lower than SVM on Reuters and 20 Newsgroups corpus? -10%. A: Refer to the following four references: [Joachims 98] [Debole 03 STM] [Dumais 98 Inductive] [Yang 99 Reexamination].

joshwa
Télécharger la présentation

Comments from Pre-submission Presentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comments from Pre-submission Presentation • Q: Check why kNN is so lower than SVM on Reuters and 20 Newsgroups corpus? -10%. • A: Refer to the following four references: [Joachims 98] [Debole 03 STM] [Dumais 98 Inductive] [Yang 99 Reexamination]

  2. [Joachims98][Debole03][Dumais98]Results on the Reuters Corpus

  3. [Yang 99 Re-examination]Significance Test • Micro-level analysis (s-test) • SVM > kNN >> {LLSF, NNet} >> NB • Macro-level analysis • {SVM, kNN, LLSF} >> {NB, NNet} • Error-rate based comparison • {SVM, kNN} > LLSF > NNet >> NB

  4. Comments from Pre-submission Presentation • 2. Explain why BEP & F1 in Chap 7 • -Add reference

  5. Breakeven point (1) • BEP, first proposed by Lewis[1992]. Later, he himself pointed out that BEP is not a good effectiveness measure, because • 1. there may be no parameter setting that yields the breakeven; in this case the final BEP value, obtained by interpolation, is artificial; • 2. to have P=R is not necessarily desirable, and it is not clear that a system that achieves high BEP can be tuned to score high on other effectiveness measure.

  6. Breakeven point (2) • Yang[1999Re-examinatio] also noted that when for no value of the parameters P and R are close enough, interpolated breakeven may not be a reliable indicator of effectiveness.

  7. Comments from Pre-submission Presentation • 3. Add more qualitative analysis would be better

  8. Analysis and Proposal: Empirical observation Comparison of idf, rf and chi2 value of four features in two categories of Reuters Corpus

  9. Comments from Pre-submission Presentation • 4. Chap 7 remove Joachims Results using quotation is fine

  10. Comments from Pre-submission Presentation • 5. Tone down “best” claims •  to our knowledge (experience, understanding) • Pay attention this usage when doing presentation

  11. Introduction:Other Text Representation • Word senses (meanings) [Kehagias 2001] • same word assumes different meanings in a different contexts • Term clustering [Lewis 1992] • group words with high degree of pairwise semantic relatedness • Semantic and syntactic representation [Scott & matwin 1999] • Relationship between words, i.e. phrases, synonyms and hypernyms

  12. Introduction:Other Text Representation • Latent Semantic Indexing [Deerwester 1990] • A feature reconstruction technique • Combination Approach [Peng 2003] • combine two types of indexing terms, i.e. words and 3-grams • In general, high level representation did not show good performance in most cases

  13. Literature Review:Knowledge-based Representation • Theme Topic Mixture Model – Graphical Model [Keller 2004] • Using keywords from summarization [Li 2003]

  14. Literature Review: 2. How to weight a term (feature) • [Salton 1988] elaborated three considerations: • 1. term occurrences closely represent the content of document • 2. other factors with the discriminating power pick up the relevant documents from other irrelevant documents • 3. consider the effect of length of documents

  15. Literature Review: 2. How to weight a term (feature) • 1. Term Frequency Factor • Binary representation (1 for present and 0 for absent) • Term frequency (tf): number of times a term occurs in a document • Log(tf): log operation to scale the effect of unfavorably high term frequency • Inverse term frequency (ITF)

  16. Literature Review: 2. How to weight a term (feature) • 2. Collection Frequency Factor • idf: the most-commonly used factor • Probabilistic idf: aka. term relevance weight • Feature selection metrics: chi^2, information gain, gain ratio, odds ratio, etc.

  17. Literature Review: 2. How to weight a term (feature) • 3. Normalization Factor • Combine the above two factors by using multiplication operation • In order to eliminate the length effect, we use the cosine normalization to limit the term weighting range within (0,1)

More Related