1 / 13

malim – a new computational approach of malay morphology

Mohd Yunus Sharum , Muhammad Taufik Abdullah, Md Nasir Sulaiman , Masrah Azrifah Azmi Murad & Zaitul Azma Zainon Hamzah. malim – a new computational approach of malay morphology. Ainun Najwa Bt Aziz P61811 Fatimah Zawani Bt Abdullah P61028

alda
Télécharger la présentation

malim – a new computational approach of malay morphology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MohdYunusSharum, Muhammad Taufik Abdullah, MdNasirSulaiman, MasrahAzrifahAzmiMurad & ZaitulAzmaZainonHamzah malim – a new computational approach of malay morphology AinunNajwa Bt Aziz P61811 Fatimah Zawani Bt Abdullah P61028 MohdRashidie B. Ramli P62451

  2. INTRODUCTION • A major problem in Malay morphological processing is in analysis. • Existing model : finite-state, two-level formalism. • Hypothesis : higher accuracy of morphological analysis can be achieved by widening the decision-selection domain. • Implements MALIM approach using S-A-P-I.

  3. MALAY MORPHOLOGY • Basic target of S-A-P-I is to analyze affixation, especially multiple affixations. • Affixation could be one or several of these processes (prefixation, suffixation, circumfixation and infixation). • 3 basic categories of Malay reduplication: • Full reduplication • Partial reduplication • Rhythmic reduplication

  4. THE S-A-P-I APPROACH • Use the divide-and-conquer technique to handle Malay morphological analysis. • S-A-P-I (‘search-all-pick-if…) algorithm. • Advantage : we can search for most appropriate result, since we had gathered all possible options from the decision-selection domain. • Side-effect : multiple outputs due to ambiguity. • 2 technique to improve the analysis’ results (separating and filtering).

  5. MALIM – MORPHOLOGICAL ANALYZER FOR LINGUISTIC INDECISION OF MALAY • A morphological analyzer which implements the S-A-P-I approach. • Developed with Perl. • Characteristic of Perl : • Support regular expression, a notation which describes regular language. • Capability of supporting lexical processing. • MALIM contains a basic set but comprehensive root lexicon as reference (root lexicon: 5710 root words).

  6. MALIM – MORPHOLOGICAL ANALYZER FOR LINGUISTIC INDECISION OF MALAY • MALIM contains a set of 80 morphosyntatic rules. • Limitations in implementation: • Do not includes infixation analysis. • Do not includes analysis on complex affixation/reduplication. • Do not analyze rhythmic and free reduplication. • Limited in analyzing affixation / reduplication of compound word and phrase. • Overcome the limitation : use a strategy resembling direct mapping approach.

  7. Method Experiment • Types of experiment : • Testing processing model (S-A-P-I) • Splitting lexicon (of mono-syllabic and multi-syllabic) • Morphosyntactic rule filtering • First syllabic reduplication analysis • Clitics/particles extraction • The effects of ‘cheat-list’ (direct mapping)

  8. Method Experiment • Experiment setting : • Set 1 : MALIM (complete) • Set 2 : MALIM without lexicon splits • Set 3 : MALIM without morphosyntactic rule filtering • Set 4 : MALIM without first syllabic reduplication analysis • Set 5 : MALIM without clitics/particles extraction • Set 6 : MALIM without ‘cheat list’ • Set X : MALIM with basic capabilities (fullfills all Set 2 to Set 6) – use as control set

  9. contribution • Introducing a new and more accurate approach of morphological analysis using S-A-P-I • Solved most of morphological problems involving Malay morphology, except involving multi-words (or compound word) and certain reduplicated words

  10. Conclusion • MALIM only uses controlled sample data which is not from daily life usage. • Thus, this may not pose the real challenge as solving the real world problems. • So, in future, we may perform a test-run using real-life data such as from corpus to verify the performance.

More Related