330 likes | 462 Vues
Software Applications for Processing Romanian Texts. Demonstration and Comparison. Sanda Cherata Babe ş-Bolyai University Faculty of Letters. Software Applications. The Romanian Morphological Dictionary ( DMR ) – Software ITC SA – RoLingva www.rolingva.ro
E N D
Software Applications for Processing Romanian Texts. Demonstration and Comparison Sanda Cherata Babeş-Bolyai University Faculty of Letters
Software Applications • The Romanian Morphological Dictionary (DMR) – Software ITC SA – RoLingva www.rolingva.ro • LEXICON – for updating attributes in lexical entries • SIASTRO-AM – phrase analysis of noun, adjective, adverb, verb and prepositional phrases • ETR – term extractor for Romanian specialised texts
DMR • Paradigm of a given lemma • classic form • stem + termination • Accents • Syllabification • Morphological analysis of a given word
Software Applications • The Romanian Morphological Dictionary (DMR) – Software ITC SA – RoLingva www.rolingva.ro • LEXICON – for updating attributes in lexical entries • SIASTRO-AM – phrase analysis of noun, adjective, adverb, verb and prepositional phrases • ETR – term extractor for Romanian specialised texts
LEXICON • Specifying attributes for lexico-morphological classes • Designed to collect data from multiple users • Friendly interface
Software Applications • The Romanian Morphological Dictionary (DMR) – Software ITC SA – RoLingva www.rolingva.ro • LEXICON – for updating attributes in lexical entries • SIASTRO-AM – phrase analysis of noun, adjective, adverb, verb and prepositional phrases • ETR – term extractor for Romanian specialised texts
SIASTRO-AM • Lexico-morphological analysis • Parsing of noun, adjective, adverb, verb and prepositional phrases • Uses a lexicon based on DMR, enriched with new lexical and syntactic attributes added with the LEXICON application • Outputs an annotated text
{F – Start sentence sentence F} – Endsentence {C – Startword word C} – Endword {N – Startunknown word unknown word N} – Endunknown word {D – Start number number D} – End number SIASTRO-AMTags for text elements {S – Start punctuation sign punctuation sign S} – End punctuation sign {L – Start hyphen - L} – End hyphen {I – Start ignored sequence sequence I}– End ignored sequence
SIASTRO-AMTags for words {C word (part of speech + grammatical category + grammatical category + ...... , separates parts of speech + grammatical category + grammatical category + ...... )syllabification+accent position: , separates homographs (.......) , ....... (......) syllabification+ accent position:+ lemma +: ...... C} {C date (vrb+p_fp+, sbt+fdpn+fisn+fipn+fvpa+, adj+fdpn+fisn+fipn+fvpa+ ) da-te+2:+da+:+dată+:+dat+: C}
Software Applications • The Romanian Morphological Dictionary (DMR) – Software ITC SA – RoLingva www.rolingva.ro • LEXICON – for updating attributes in lexical entries • SIASTRO-AM – phrase analysis of noun, adjective, adverb, verb and prepositional phrases • ETR – term extractor for Romanian specialised texts
ETR – Future Developments • Syntactical analysis • Enriching the terminological form by adding new terminological features