Comments from Pre-submission Presentation

Comments from Pre-submission Presentation • Q: Check why kNN is so lower than SVM on Reuters and 20 Newsgroups corpus? -10%. • A: Refer to the following four references: [Joachims 98] [Debole 03 STM] [Dumais 98 Inductive] [Yang 99 Reexamination]

[Joachims98][Debole03][Dumais98]Results on the Reuters Corpus

[Yang 99 Re-examination]Significance Test • Micro-level analysis (s-test) • SVM > kNN >> {LLSF, NNet} >> NB • Macro-level analysis • {SVM, kNN, LLSF} >> {NB, NNet} • Error-rate based comparison • {SVM, kNN} > LLSF > NNet >> NB

Comments from Pre-submission Presentation • 2. Explain why BEP & F1 in Chap 7 • -Add reference

Breakeven point (1) • BEP, first proposed by Lewis[1992]. Later, he himself pointed out that BEP is not a good effectiveness measure, because • 1. there may be no parameter setting that yields the breakeven; in this case the final BEP value, obtained by interpolation, is artificial; • 2. to have P=R is not necessarily desirable, and it is not clear that a system that achieves high BEP can be tuned to score high on other effectiveness measure.

Breakeven point (2) • Yang[1999Re-examinatio] also noted that when for no value of the parameters P and R are close enough, interpolated breakeven may not be a reliable indicator of effectiveness.

Comments from Pre-submission Presentation • 3. Add more qualitative analysis would be better

Analysis and Proposal: Empirical observation Comparison of idf, rf and chi2 value of four features in two categories of Reuters Corpus

Comments from Pre-submission Presentation • 4. Chap 7 remove Joachims Results using quotation is fine

Comments from Pre-submission Presentation • 5. Tone down “best” claims •  to our knowledge (experience, understanding) • Pay attention this usage when doing presentation

Introduction:Other Text Representation • Word senses (meanings) [Kehagias 2001] • same word assumes different meanings in a different contexts • Term clustering [Lewis 1992] • group words with high degree of pairwise semantic relatedness • Semantic and syntactic representation [Scott & matwin 1999] • Relationship between words, i.e. phrases, synonyms and hypernyms

Introduction:Other Text Representation • Latent Semantic Indexing [Deerwester 1990] • A feature reconstruction technique • Combination Approach [Peng 2003] • combine two types of indexing terms, i.e. words and 3-grams • In general, high level representation did not show good performance in most cases

Literature Review:Knowledge-based Representation • Theme Topic Mixture Model – Graphical Model [Keller 2004] • Using keywords from summarization [Li 2003]

Literature Review: 2. How to weight a term (feature) • [Salton 1988] elaborated three considerations: • 1. term occurrences closely represent the content of document • 2. other factors with the discriminating power pick up the relevant documents from other irrelevant documents • 3. consider the effect of length of documents

Literature Review: 2. How to weight a term (feature) • 1. Term Frequency Factor • Binary representation (1 for present and 0 for absent) • Term frequency (tf): number of times a term occurs in a document • Log(tf): log operation to scale the effect of unfavorably high term frequency • Inverse term frequency (ITF)

Literature Review: 2. How to weight a term (feature) • 2. Collection Frequency Factor • idf: the most-commonly used factor • Probabilistic idf: aka. term relevance weight • Feature selection metrics: chi^2, information gain, gain ratio, odds ratio, etc.

Literature Review: 2. How to weight a term (feature) • 3. Normalization Factor • Combine the above two factors by using multiplication operation • In order to eliminate the length effect, we use the cosine normalization to limit the term weighting range within (0,1)

Comments from Pre-submission Presentation

Comments from Pre-submission Presentation

Presentation Transcript

Some Stuxnet Related Comments [excerpted from a longer presentation]

Pre-Listing Presentation

Submission from the

Presentation Template: Instructor Comments

Pre-production Presentation

Pre Presentation 5

Comments from Anna

Comments from your last final presentation (fall 2011)

Pre-Scheduling Presentation

NWCI 2008 Pre-Budget Submission

Comments from Pre-submission Presentation

Pre-presentation Question

Comments from Japan

Pre-Conference Presentation

PRE-PRESENTATION NOTES

Pre-Presentation Notes

Pre-Presentation Notes

Pre-Presentation Notes

Some Stuxnet Related Comments [excerpted from a longer presentation]

Friday Presentation Comments

Comments from two theorists

Pre-submission Consultation