200 likes | 338 Vues
This project explores the implementation of various classifiers, including MaxEnt, DecisionTree, and NaiveBayes, using the Mallet package for effective query processing. We examine the challenges faced with poor classification results and data sparseness, as evidenced by low accuracy in Named Entity Recognition (NER) tasks. Pre-trained models with NLTK were utilized, highlighting the need for alternative NER tools. Our findings indicate possible improvements with BalancedWinnow and MaxEnt classifiers, achieving test accuracies of 0.804 and 0.78 respectively. Future work will focus on leveraging WordNet and exploring class-specific enhancements.
E N D
Q/A System First Stage: Classification Project by: Abdullah Alotayq, Dong Wang, Ed Pham
Query Processing • Classification Package: Mallet • Classifiers: Maxent, DecisionTree, C45, NaiveBayes, AdaBoost, Winnow, Balanced Winnow, Bagging Trainer .etc
Features Semantic Morphological Neighboring (Syntactic)
Stemming • nltk stemmer
N-grams • Bigrams:
Trigrams: • Poor Classification results • 0.48 • 0.478 • Not A good strategy .
NER (Named Entity Recognition) • nltk NER • pre-trained model to do this task. • 6 types of NE
Frequencies Training Data:
NO Named Entity detected • In training data: 3533, namely 64.8% • In test data, 353, 70.6%. -> data sparseness problem
NER Results & Future work • Test data accuracy= 0.802 • we might try other NE tools, which would give more NE types and cover more percentage on training and test data.
Binary and Real Values • Testing for potential improvement. • Best performing classifiers: For Binary: • BalancedWinnow: Test data accuracy= 0.804 • MaxEnt: Test accuracy mean = 0.78 For Real Values: • BalancedWinnow: Test data accuracy= 0.784 • MaxEnt: Test data accuracy= 0.758
Proposed future improvement • WordNetSenses • Class-Specific Related Words
Issues • Performing poorly on some refinements. • Low accuracy scores: • 0.42 • 0.54 • Memory consuming classifiers. • Classifiers showed some error messages.
Successes • Made progress in creating the system. • Had some hands-on experience dealing with classifiers, and NLP packages. • Learned ways to improve classification results.
Readings that helped • Employing Two Question Answering Systems in TREC-2005, SandaHarabagiu & others.
Software packages participated • Mallet • NLTK • Porter-stemmer • Self-written code files • Stanford Parser, Berkeley Parser