140 likes | 283 Vues
AST2009. A Semantic Text Classification Based on DBpedia. Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo 255049, China { brj, ljhbrj}@sdut.edu.cn. OUTLINE. 1.BACKGROUND 2. DBpedia 3.OUR PROPOSED METHODS 4.EXPERIMENT 5.CONCLUSION.
E N D
AST2009 A Semantic Text Classification Based on DBpedia Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo 255049, China { brj, ljhbrj}@sdut.edu.cn
OUTLINE • 1.BACKGROUND • 2. DBpedia • 3.OUR PROPOSED METHODS • 4.EXPERIMENT • 5.CONCLUSION
1.BACKGROUND • “Bag of Words” (BOW) .VS. “Bag of Conceptions” (BOC) • Semantic Features Representation
2. DBpedia DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web.
3.OUR PROPOSED METHODS • Definition 1 (Core Ontology). A core ontology is a structure O := (C,<c) consisting of a set C, whose elements are called concept identifiers, and a partial order <c on C, called concept hierarchy or taxonomy. • Definition 2 (Subconcepts and Superconcepts).If c1 <c c2 for any c1, c2 ∈ C, then c1 is a subconcept (specialization) of c2 and c2 is a superconcept (generalization) of c1. If c1 <c c2 and there exists no c3 ∈ C with c1 <c c3 <c c2, then c1 is a direct subconcept of c2, and c2 is a direct superconcept of c1, denoted by c1﹤ c2.
3.OUR PROPOSED METHODS The candidate expression detection algorithm Input: document d = {w1,w2, …,wn}, Lex = (SC;RefC) and window size k ≥ 1. i 1 list Ls index-term s while i≤n do for j = min(k, n - i + 1) to 1 do s {wi…wi+j-1} if s ∈ SC then save s in Ls i i + j break else if j = 1 then i i + j end if end for end while return Ls
4.EXPERIMENT • Datasets • Our goal is to obtain a high performance for closely related categories. Therefore, in order to test our approach, we designed a robot to crawler a data set from Yahoo! Website. It is contained the closely related (ambiguous) categories under Science->Biology . The test categories under Science->Biology considered here for Training and Testing are: Bio-Archaeology, Bio-Informatics, Genetics, Food Science and Microbiology.
4.EXPERIMENT Table 1. Confusion Matrix before Applying Semantic Processing
4.EXPERIMENT Table 2. Confusion Matrix after Applying Semantic Processing
4.EXPERIMENT Fig.3 Accuracy from Semantic Representation Terms vs. Bag of Words
5.CONCLUSION • In this paper, we have discussed a novel approach to applying DBpedia’s background knowledge represent documents for boosting text categorization performance. • Our approach and experiments prove that applying semantic level processing and normalization help in achieving higher accuracies over classification of documents, which have words with cross category references.
END THANKS!