1 / 26

Building A Specialized Multilingual Dictionary from General Monolingual Dictionaries

Building A Specialized Multilingual Dictionary from General Monolingual Dictionaries. Choy-Kim CHUAH. This .ppt presentation was prepared by Ms. Chia Shy Miin. Content. Introduction General Monolingual Dictionaries Kamus Dewan (KD) WordNet (WN) Specialised Bilingual Dictionary Conclusion.

alton
Télécharger la présentation

Building A Specialized Multilingual Dictionary from General Monolingual Dictionaries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building A Specialized Multilingual Dictionary from General Monolingual Dictionaries Choy-Kim CHUAH • This .ppt presentation was prepared by Ms. Chia Shy Miin.

  2. Content • Introduction • General Monolingual Dictionaries • Kamus Dewan (KD) • WordNet (WN) • Specialised Bilingual Dictionary • Conclusion

  3. Introduction • Wealth of information accumulated over the years - Natural History Museum, London has ± 300,000 index cards and 68 million biol. specimens; - computerize, or lose this information; • Four-year project to prepare inventory of medicinal plants in the Asia-Pacific region • Others: specialized database of names of crocodiles and turtles in Borneo • Lucid, a multimedia expert system for non-taxonomists to identify specimens.

  4. Introduction (cont.) • Many general monolingual dictionaries in the market. • Definitions of compounds may include chemical formulae, e.g. • benzene …: a colorless volatile flammable toxic liquid aromatic hydrocarbon C6H6 … • Definitions of names of plants and animals may contain taxonomic information, e.g. • cockroachn …: any of an order (Blattaria) of chiefly nocturnal insects including some that are domestic pests.

  5. Introduction (cont.) • To compile a specialised dictionary, much time and effort is required. • Also, compilers themselves have to be domain specialists. • Qt: Can we merge the information in two general monolingual dictionaries to out a draft of specialised bilingual dictionary?

  6. Introduction (cont.) • Using two general monolingual dictionaries rich in taxonomic information to get out the first draft of a specialised multilingual dictionary of plants and animals. • The first draft of a list of names of insects in English and Malay also can get out by linking of entries from two general dictionaries via their taxonomic content. • This draft will be the basic database of information about insects for managing urban pests.

  7. General Monolingual Dictionaries • Data are accessed from two digital dictionaries • Kamus Dewan (Dewan Bahasa dan Pustaka, 1994) • WordNet (WN)

  8. General Monolingual Dictionaries Kamus Dewan (KD) • Each entry - various derivative forms given - idiomatic expressions - origin of the word is provided • Enriched with taxonomic information on plants and animals • This information can be used to build first seed database of names of insects

  9. Kacoak,kecuak -Variant spelling of kecoak cecunguk, coro, kecoak and kepuyuk are synonyms Synonyms and variant spellings • There are quite a few synonyms and variant spelling for many entries especially those of names of plants and animals. • Consider the entries below: cecunguk 1. (Sunda) lipas; … coro Jw lipas kacoak Jk lipas kecoak Jk lipas; kecuak. kecuak Jk lipas;  kecoak kepuyuk Mn lipas, kacoa lipas sj serangga berwarna perang gelap, berbadan leper dan bujur, mempunyai sesungut yg panjang dan bahagian mulut menggigit, kepuyuk, Periplaneta spp.; ~ kudung sj lipas, Periplaneta orientalis; …

  10. Synonyms and variant spellings • However, we are unable to ensure that they are referred to similar insects of a different kind or its hyponyms or they are indeed its synonyms unless their scientific names is given.

  11. Synonyms and variant spellings • Consider entry below: labah; labah-labah, lelabah sj serangga berkaki lapan yg membuat sarang dgn menyirat benang yg keluar drpd badannya (dan sarang ini dapat memerangkap mangsanya); jenis-jenisnya: ~ beruk, ~ lotong, dll; … • Here, labah-labah ‘a spider’ has been incorrectly defined as “an insect with eight legs …”

  12. Synonyms and variant spellings • In fact, definition of serangga ‘insect’, viz. “a small animal with … and six legs …” • serangga sj binatang kecil yg badannya bertekung tiga dan kakinya enam serta beruas-ruas (spt nyamuk, lalat, belalang, dll); …. • Scientific names of plants and animals are unique.

  13. Synonyms and variant spellings • Two entries do not share the same scientific names • Entomologist who with the help of the layman can determine if a given insect is indeed that of a given scientific name - set up a knowledge database • In this way, help lexicographers - general or specialised - improve on the quality of the dictionaries produced.

  14. Impoverished definitions • Some definition of entries of KD are poor. • Consider some example below: • kelulut III sj serangga, Laccifer lacca. • sesorok sj serangga, Gymnogryllus elegans. • api II; api-api sj serangga, kunang-kunang. • cengkerik sj serangga, jengkerik, keridik. • sambah; sambah-sambah Br sj serangga, gegancung, mentadak-mentadu. • walang II Jw sj belalang; ~ sangit sj serangga, cenangau, pianggang.

  15. Impoverished definitions • Many entries have just sj serangga meaning ‘a type of insect’ as definition. • If not for the scientific name given, the definition alone would not be enough to differentiate one entry from the other.

  16. WordNet (WN) • There are 36 entries with the word “cockroach” that obtained by Guo(1998). • ldoce(american_cockroach, 1, n, … large, reddish, brown, 'free-flying', cockroach, originally, from, southern, unite, state, but, now, widely, distribute, … … • ldoce(periplaneta, 1, n, … cosmopolitan, genus, of, large, cockroach, … • ldoce(periplaneta_americana, 1, n, … large, reddish, brown, 'free-flying', cockroach, originally, from, southern, unite, state, but, now, widely, distribute, …

  17. WordNet (WN) (cont) • After processing and merging some of the information that could be merged, we obtained: • american_cockroach 1n … large reddish brown 'free-flying' cockroach originally from southern unite state but now widely distribute … • periplaneta 1n … cosmopolitan genus of large cockroach … • periplaneta_americana 1n … large reddish brown 'free-flying' cockroach originally from southern unite state but now widely distribute …

  18. WordNet (WN) (cont) • From definition given below, we know that lipas is an entity of the genus Periplaneta • lipas sj serangga berwarna perang gelap, berbadan leper dan bujur, mempunyai sesungut yg panjang dan bahagian mulut menggigit, kepuyuk, Periplaneta spp.; ~ kudung sj lipas, Periplaneta orientalis; … • We can link the English definition of Periplaneta to thatof lipas

  19. Table 1. Merged entry lipas ‘cockroach’

  20. Table 1. Merged entry lipas ‘cockroach’ (cont)

  21. Table 1. Merged entry lipas ‘cockroach’ (cont)

  22. Table 2. Merged entry lipas ‘cockroach’

  23. Conclusion • We were able obtain about 50 bilingual entries of entomological names for the first version of our database • However, Kamus Dewan has a different region with WordNet. • Therefore, we are unable to link as many entries than if the dictionaries contain words of plants and animals from the same region.

  24. Table 3 Some Linked entries

  25. Table 3 Some Linked entries ( cont)

  26. The End Thank You!!

More Related