Jun Liu (liukeen@mail.xjtu) Lu Jiang, Zhaohui Wu Qinghua Zheng, Yanan Qian

Mining Preorder Relation between Knowledge Units from Text Jun Liu (liukeen@mail.xjtu.edu.cn) Lu Jiang, Zhaohui Wu Qinghua Zheng, Yanan Qian November 10, 2014 SAC 2010, Sierre, Switzerland

Outline • Motivation and Challenges • Two features of the Preorder Relation • Process of Mining Preorder Relations • Experimental Evaluation • Conclusions

Motivation Learning is an incremental process. To understand a new knowledge unit often relies on the understanding of certain existing knowledge units. Preorder relations among the knowledge units help the learners avoid the disorientation problem in learning. Manually annotating the potential preorder relations is very time consuming, and requires the annotators be the domain experts.

Problem Definition Given a text document set , and a knowledge unit set extracted from T as the input, the preorder relation mining process will output a set . Each can be further represented as a triplet of (name, type, content). • Name: such as “definition of subnet mask” • Type: such as definition, property or method • Content: the text content of the knowledge unit

Challenges There has been no previous work on mining the relation among the knowledge units. Ontology learning, KAT and RDC can hardly be applied to mine the preorder relations . Challenges: • Knowledge units expressed in natural language are ambiguous or ill-formed • Knowledge units have far more complex structures than the concepts and named entities • Preorder relationshave the characteristic of long distance dependency

Annotating work We generated KUs in the given document set by using our extraction method and manually refined the results. Then we manually annotated the preorder relation among the extracted KUs. The annotating work was conducted as follows: • Developed web-based annotating system • Hired 24 undergraduates from the CS department • Create a set of rules to guide the work • Created the experimental data set that covers the five courses: Computer Network , Advanced Mathematics, Computer Organization and Architecture, Database System and Application and Geometry (KUs: 5000+; Relations: 7000+ )

is inversely proportional to exponential function of d, that is, . Preorder relation can be mined within the same document, or the documents with similar topic.

If knowledge units in are precursors of knowledge units in , then .

STEP Ⅰ: Text Association Mining • Text Association Mining aims at finding the documents of similar topic, and then ranks them in pairs. • The clustering process deals with three cases: • Two documents ti and tj are put into one cluster; • A document ti is put into the cluster S (assume tj in S is closest to ti); • Cluster S and cluster S’ merge into a new cluster (assume ti in S and tj in S’ are closest document pair). For each pair (ti , tj ), set a proper threshold F0 (F0<1 ), If , ; If , ; Once the clustering is finished, the directed graph is also generated.

STEP Ⅱ: Candidate KU-Pairs Generation • For each node in , . • For each ,

STEP Ⅲ: Feature Selection • Three useful features for classification–based recognition algorithm • Term frequency: The greater the is , the more likely that has preorder relation. • Distance: decays exponentially while grows. • Semantic type:

Experimental Data Sets

Impact of β on precision and recall • The classification results is immune to the changing of β within a certain range. we set β to 0.4.

Application • Yotta (1024) : A topic-map-based knowledge management system(under construction)

Conclusions Two features of the preorder relation were discovered : the locality of the preorder relation and the distribution asymmetry of the domain terms. A classification-based method of mining the preorder relations was proposed. Future work: to extend the method to mining the preorder relation residing in online knowledge repository –Wikipedia.

J. M. Ruiz-Sanchez, R. Valencia-Garca, J. T. Fernandez-Breis, R. Martnez-Bejar and P. Compton. An Approach for Incremental Knowledge Acquisition from Text. Expert Systems with Applications, July 2003, 25(1):77-86. C. Timothy and P. Patrick. VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations. 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP-04, Barcelona, Spain, 2004 :33-40. X.Y. Du, M. Li, S. Wang. A Survey on Ontology Learning Research. Journal of Software, 2006, 17(9):1837-1847. F. Michael and H. Eduard. Offline Strategies for Online Question Answering: Answering Questions Before They Are Asked. The 41st Annual Meeting of the Association for Computational Linguistics (ACL-03), Sapporo, Japan, 2003: 1-7. D. Zhou, J. Su and M. Zhang. Modeling Commonality among Related classes in Relation Extraction. The 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL’2006), Sydney, Australia, 2006: 121-128. M. Witbrock, D. Baxter, J. Curtis, et al. An Interactive Dialogue System for Knowledge Acquisition in CYC. The 18th International Joint Conference on Artificial Intelligence, Acapulco, Mexico, 2003: 138-145. X. Chang, Q.H. Zheng. Knowledge Element Extraction for Knowledge-Based Learning Resources Organization. The 6th International Conference on Web-based Learning. Edinburgh, United Kingdom, 2007: 102-113.

Thank You! Questions?

Jun Liu (liukeen@mail.xjtu) Lu Jiang, Zhaohui Wu Qinghua Zheng, Yanan Qian