220 likes | 322 Vues
This study presents the WLW technique, which estimates word location weights in Chinese Health Questions (CHQs) to improve classification accuracy. Location weights are utilized by the SVM classifier to enhance categorization into in-space categories like cause, diagnosis, and process. The WLW approach demonstrated superior performance in both classifying in-space CHQs and filtering out-space CHQs. Empirical evaluation revealed significant improvements in classification accuracy, highlighting the potential of WLW in aiding healthcare consumers seeking reliable health information online.
E N D
Improving Health Question Classification by Word Location Weights Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
Outline • Background • Problem definition • The proposed approach: WLW • Empirical evaluation • Conclusion
Classification of Health Questions • Why health questions? • Health questions provide both reliable and readable health information • Why classification of health questions? • Given a health question q, retrieve related questions (and their answers)
Goal & Motivation • Goal • Target: Chinese Health Questions (CHQs) • Contribution: Developing a technique WLW (Word Location Weight) that estimates the location weights of words in a CHQ based on their locations • Motivation • Location weights can be used by classifiers (e.g., SVM) to improve the classification • Classifying in-space CHQs (cause, diagnosis, process) • Filtering out-space CHQs (may be whatever)
Basic Idea • Those words that are more related to the category of a CHQ tend to appear at the beginning and end of the CHQ • Examples: 如何(how to)克服(deal with)緊張(nervous)的情緒(mood)? process 嬰兒(infant)體溫(body temperature)太低(too low)怎麼辦(how to do)? process
Related Work • Recognition of question types (e.g., when, where) • Weakness: Types Intended categories of CHQs • Classification by parsing • Weakness I: Parsing Chinese is still challenging • Weakness II: CHQs are NOT always well-formed • Classification by pattern matching • Weakness: Difficult to construct the string patterns
Main Challenges (1) Defining the two weights of a location p in a CHQ q
Main Challenges (cont.) (2) Encoding the location weights of a word w into two features for the underlying classifier
Interesting Behaviors of WLW • A word w in a question q has two features • Fvaluefront and Fvaluerear • Applicable to different categories and languages (e.g., English) • When w is far from the front and the rear • Both features reduce to the term frequency (TF) of w • WLW reduces to traditional feature-encoding approach (using TF as the features)
Experimental Design • CHQs were downloaded from a health information provider • 864in-space CHQs • cause (category 1): 313 • diagnosis (category 2): 92 • process (category 3): 459 • 100out-space CHQs • whatever (general description) • Five-fold cross validation
Underlying Classifiers • Underlying classifier • The Support Vector Machine (SVM) classifier
Results: Classification of In-Space CHQs • Evaluation criteria • Micro-averaged F1(MicroF1) • Macro-averaged F1(MacroF1)
Results: Filtering of Out-Space CHQs • Evaluation criteria • Filtering ratio (FR) = # out-space CHQs successfully rejected by all categories / # out-space CHQs • Average number of misclassifications (AM) = # misclassifications for the out-space CHQs / # out-space CHQs
Healthcare consumers often read health information on the Internet • Health questions as the valuable resources for healthcare consumers • Providing both reliable and readable health information • Classification of health questions is basis for the retrieval of related questions • cause, diagnosis, process, whatever • WLW can help SVM to improve the classification of CHQs