70 likes | 197 Vues
This research investigates the use of domain knowledge to enhance text classification effectiveness, particularly in scenarios with limited training datasets. The authors present a methodology employing Bayesian logistic regression with Gaussian and Laplace priors to incorporate this knowledge. Experimental results demonstrate significant improvements in classification performance, especially when only small training sets are available. The findings suggest that leveraging informative prior distributions can lead to more robust classifiers without sacrificing effectiveness in larger datasets.
E N D
Constructing Informative Prior Distributions from Domain Knowledge in Text Classification Graduate : Chen, Shao-Pei Authors : Aynur Dayanik, David D. Lewis, David Madigan, Vladimir Menkov, Alexander Genkin IGIR
Outline • Motivation • Objective • Methodology • Experimental Results • Conclusion
Motivation • In operational text classification settings, however, small training sets are the rule, due to the expense and inconvenience of labeling, or skepticism that efforts will be adequately repaid.
Objective • Using domain knowledge texts would greatly improve classifier effectiveness when few training examples are available, and not hurt effectiveness with large training sets.
Methodology Bayesian Logistic Regression Gaussian Priors Laplace Priors 5
Experimental Results 500 Random Example 5 Positive and 5 Random Example 5 Positive and 5 Closest Negative Examples
Conclusion We found large improvements in effectiveness, particularly when only small training sets are available.