140 likes | 159 Vues
This paper presents a novel approach using Multilayer SOM for efficient document retrieval and plagiarism detection by incorporating tree-structured data. The method enhances retrieval accuracy by combining global and local characteristics, showing promising results. MLSOM serves as a practical computational solution, offering simplicity and effectiveness. However, the rate of failed plagiarism detection remains a drawback. Overall, this innovative application demonstrates the potential of MLSOM in text analysis.
E N D
Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection Presenter : Cheng-Feng Weng Authors :Tommy W. S. Chow, M. K. M. Rahman 2009/10/12 TNN.18 (2009)
Outline • Motivation • Objective • Method • Experiments • Conclusion • Comments
Motivation Science ……. Computer ……. School …….. School of Computer Science …….. • Document Retrieval: • Term-Frequency Problem • Two doc. Containing similar term frequencies may be of different contextually when it spatial distribution of terms is very different. • Plagiarism Detective: • Paraphrasing Problem SOM …project…….. SOM …be mapped into……..
Objective • It proposed a tree-structured document model with MLSOM for DR and PD. Global View Document ……. Tree-Structured Model DR Local View PD MLSOM
Structured Representation of DF • A document is partitioned into pages that are further partitioned into paragraphs. Page 我是網頁 我是網頁 第一行 第二行 無言的第三行 我是網頁 第一行 第一行 第二行 <HTML> <HEAD> </HEAD> <BODY> 我是網頁<br> <p>第一行</p> <p>第二行</p> 無言的第三行 </BODY> </HTML> 無言的第三行 Paragraph 我是網頁
Multilayer SOM • MLSOM was developed for handling tree-structured data.
Multilayer SOM (cont.) • Similarity:
Related Docs. MLSOM Retrieval Document Extract to tree-structure and project with PCA matrix Trained MLSOM
Plagiarism Detective • Plagiarism Detective using Local Association (PDLA) Layer 3 SOM Related Docs. D1, D2, … D3, D4, …. D2, D6, … …
Experiments • Document Retrieval:
Experiments (cont.) • Plagiarism Detective:
Conclusions • A new approach of DR and PD using tree-structured document representation and MLSOM is proposed. • It has shown that tree-structured representation enhances the retrieval accuracy by incorporating local characteristics with traditional global characteristics. • Computational Issue: • The MLSOM serves as an efficient computational solution for practical implementation.
Comments • Advantage • Practical, Simple but efficient and effective • Drawback • Rate of fail plagiarism detective is still high • Application • …