1 / 11

Data Mining and Text Analytics

Data Mining and Text Analytics. Quranic Arabic Corpus. By Saima Rahna & Anees Mohammad. Summary. Quranic Arabic corpus enables further analysis of the Quran Uses linguistic resources for each word and verse in the quran – e.g. Morphology and syntax

errin
Télécharger la présentation

Data Mining and Text Analytics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining and Text Analytics Quranic Arabic Corpus By Saima Rahna & Anees Mohammad

  2. Summary Quranic Arabic corpus enables further analysis of the Quran Uses linguistic resources for each word and verse in the quran – e.g. Morphology and syntax Automated algorithms were used in the Quran.

  3. Introduction Islam was born in Arabia (1400 years ago) The key sacred texts are in Arabic Only a minority Muslims can speak and understand Arabic A larger percentage of Muslims know English as a second language or even first Web resources and book resources use English in parallel with Arabic.

  4. Data Mining Uses tools and techniques to extract data Different aspects of a single topic in the Quran can reappear in many chapters Therefore frequent patterns can be used to construct a subjective index where all versus on a single topic can be covered easily.

  5. Text Analytic Referred to as information extraction The Quranic corpus is an advantage to those who don't understand Arabic Can give the English readers a better insight into the source The translation is at a detailed text Analytic level

  6. Resources & Techniques Statistical techniques Implementing statistical techniques such as keyword extraction Can explore semiotic relationships between sound and meaning in the Quran Recognise reoccurring patterns Recognise reoccurring patterns for high level of accuracy Linguistic resource Arabic grammar and syntax used for each word in the quran A comment based system used online for visitors to discuss and correct the data.

  7. Algorithms Quranic Arabic Corpus used Java to implement their algorithms. Search feature (searching concepts and key words in the Holy Quran) Finding multi-word repetitions Mining frequent patterns to a graph.

  8. Algorithm for indexing the Quran When a word is encountered for the first time, it is added to the index; if it already exists there, then a new location is added to its list. For each verse V parse word list -> list(W) For each word W If INDEX contains W is false add W and W.location to Index Else fetch W in INDEX add new location to W

  9. Filtering algorithm The Quranic 'quote filtering' algorithm The Quran has the use of Arabic diacritics (symbols) The filtering algorithm has 3 filtering stages after making the input text. Algorithm-Sub path Mining This is used to generate frequent patterns within the Quran corpus The process starts by scanning the transaction database, calculating the count for each vertex in the graph

  10. Conclusion Algorithms used Resources and techniques used for implementation of the Quranic Arabic corpus How data mining is applied How text analytic has also been applied

  11. Thank you :-)

More Related