1 / 39

Where do we stand? MT development, research, and deployment in Asia

Where do we stand? MT development, research, and deployment in Asia. Key-Sun Choi (KAIST) AAMT http://www.asianlp.org/ http://www.afnlp.org/ http://korterm.org/. Contents. China Japan India Malaysia Thailand Taiwan Korea UNL Associations related to MT. MT in China – 1980-1990’s.

shalom
Télécharger la présentation

Where do we stand? MT development, research, and deployment in Asia

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Where do we stand? MT development, research, and deployment in Asia Key-Sun Choi (KAIST) AAMT http://www.asianlp.org/ http://www.afnlp.org/ http://korterm.org/

  2. Contents • China • Japan • India • Malaysia • Thailand • Taiwan • Korea • UNL • Associations related to MT

  3. MT in China – 1980-1990’s • To translate the scientific documents • From Russian and Western Countries’ language • Supported by government • No private company in early stage • TRANS-STAR: • 30,000 words/hour for 386 PC. • Basis dictionary includes 40,000 entries, • 10 specialized technical dictionaries • including 350,000 entries. • subject fields: computer, economics, telecommunication, ceramics, thermal power industry, printing machine industry, automobile/tractor industry, Petroleum prospecting, geology, Chemical industry.

  4. MT in China – PresentEnglish-to-Chinese • GAOLI: • jointly by Beijing GAOLI Computer Co. Lid. & Linguistics Institute of CASS. • Basic lexical dictionary: 60,000 entries in which usage and grammatical function of every word is described in detail. • Translation accuracy: 80% • Readability of translated text: 80%-90% • 863-IMT/EC: • by the Institute of Computer Technology, Academia Sinica. • commercialized and got very good economic benefits.

  5. MT in China – PresentChinese-to-English • SINO-TRANS • by the Company CS&S (China National Software & Technology Service Co.) at 1993. • Basic dictionary: 40,000 entries • Two special subject technical dictionaries: Naval ships and boats (9312 entries), rocket-gun (33,773 entries) • Linguistic rules: 1,000 rules

  6. MT in China – PresentEnglish-to-Chinese + terminology • TONGYI system: • by the Tianjin DATONG computer software company • WINDOWS platform • Different special subject dictionaries: a. commonly-used scientific terms: 200,000 entries b.terms including 22 different subjects (e.g. machine building, telecommunication, aviation, medicine, etc): 3,000,000 entries • Good market strategy and service • Cooperation with enterprises

  7. MT in China – PresentEnglish-to-Chinese + internet browsing + more user interface • YIWANG: • by SUNSHINE company of Shenzhen. • Highest translation speed: 100 sentences per second. • Internet browsing • YIBA: • by YAXINCHENG software technical company. • Three translation: on line, automatic, interface. • Open to users: to revise dictionary and rules • Rich special subject dictionaries: 30 subjects (e.g. Computer, telecommunication, medicine)

  8. MT in China – PresentEnglish-to-Japanese • E-to-J • by JEC company in Beijing. • Technique of transformation from phrase tree (P-tree) to dependency tree (D-tree). • Closely integrated with word processor

  9. MT in China – PresentExample-based MT: experimental systems • Japanese-Chinese EBMT: • computer department of Qinghua university in 1996. • corpus for Japanese and Chinese alignment sentences • The example unit is sentence • The similarity rate calculation based on word • DAYA EBMT: • Harbin Polytechnic University. • machine-aided translation system, human factor is very important • corpus is sentence-level alignment

  10. MT in ChinaGovernment Funding: 1990’s • Hi-Tech 863 funding: • 863-IMT/EC system (English-Chinese) • SUNSHINE YIWANG system. • 905 Chinese Language Processing Project: • completed in 1998.

  11. MT in ChinaUser’s English Level • The proportion of English level of user for TONGYI MT software: • Higher level: 16.5% • Middle level: 49.5% • Lower level: 34.1% • So the MT software must be oriented to common people

  12. MT in ChinaPotential Users • The proportion of enterprise user for TONGYI MT software: • Small enterprises: 31.3% • Medium-scale & large-scale enterprises: 68.7% • So the MT software must be oriented to • large-scale & medium-scale enterprises, • but we don’t ignore the small enterprises that also has translation demand.

  13. MT in ChinaRegional Distribution • User’s region distribution of MT software: • translation demand is concentrated in the big cities and developing regions. • Beijing: 18.7% • Liaoning: 7.9%, Jiangsu: 7.5% • Zhejiang: 6.5%, Hubei: 6.5%, Shanghai: 6.1% • Sichuan: 4.7%, Guangdong: 4.7% • Henan: 3.3%, Helongjiang: 3.3% • Hebei: 2.8%, Shanxi: 2.3%, Jilin: 2.3% • Yunnan: 1.9%, Neimeng: 1.5%, Gansu: 1.4% • Guizhou: 0.5%, Anhui: 0.5%

  14. MT in China - Future and Strategies (1)Terminology Data Bank • MT software combines with terminology data bank • 1990: sub-committee of computer-aided in terminology of China set up. • This sub-committee is attached to the State Language Commission (SLC) of China • A series of national standards for terminology data-bank • Terminology Databank creation • Chinese-English: Since 1995, by ISTIC (Institute of Scientific and technical Information of China) • Remarkable databanks…

  15. MT in China - Future and Strategies (2)Language Corpus Processing • Corpus construction: • the scale of 25 million Chinese characters (1999) • Automatic segmentation of Chinese writing text in corpus (97.68%, close test) • Automatic phrase bracketing and syntactic annotation for Chinese Corpus

  16. MT in China - Future and Strategies (3)speech-to-speech translation • Chinese speech into Chinese text. • "SIDA-863A" system can recognize • 398 basic Chinese syllable, • recognition rate can arrive to 93%, • response time is less than 0.1 second, • input rapidity can arrive to 80 Chinese characters per minute

  17. MT in China - Future and Strategies (4)combined with OCR and Internet • Internet MT: • SUNSHINE YIWANG, YAXIN YIBA, TONGYI, etc. • The advantage for MT software in INTERNET are: • Higher translation speed, real-time translation • Cheap price • Large machine dictionary • Possibility to add the new words

  18. MT in China: New National Project • 973 project: from 2001 • supported by Chinese government. • For creative research in • Natural Language processing including machine translation. • automatic speech-to-speech translation system (English-Chinese) • developing in Institute of Automation of Academia Sinica.

  19. MT in China – Survey Source • Prof. Feng, Zhiwei: • Secretary-general and the deputy chairman of • sub-committee of computer-aided in terminology of China • under the State Language Commission (SLC) of China. • Invited professor, KAIST (Sep/2001 – Aug/2002) • Dr. Liu, Qun • Institute of Computer Technology, Academia Sinica, Beijing

  20. MT in Japan - 1 • More than 10 companies • For English, Chinese, Korean • Waiting for the new breakthrough • Internet • eLearning • Co-work with special-domain related companies • Technology transfer • Collaboration tools is ready to be in market • For translator’s collaboration workbench thru network • User interface: well-organized.

  21. MT in Japan - 2 • Leading Systems • Cross-lingual patent retrieval • Prime • NTT/ALT • Japanese-to-English • Japanese-to-Malay • Japanese-to-Chinese • Speech Translation • ATR: C-Star

  22. UNL in UN University • Through Universal Networking Language • With Hindi, Japanese, Persian, Indonesia-Malay, Thai, Chinese, Mongolian, Korean in Asian Region • Other region: Major European languages and English • Possible Users: • ITU mail translation

  23. MT in Malaysia • No commercial product yet. • But in academic sectors • For application to • Internet • eLearning • eCommerce • Universiti Sains Malaysia • Computer Aided Translation Unit • Prof. Tang Enya Kong and Prof. Yusoff Zaharin

  24. MT in India • 18 constitutional languages with 10 different scripts: • their script grammar and language grammars are quite similar • they have 40 to 80 percent vocabularies in common • less than 5 percent people who can work in English

  25. MT in India: 1990-2001government effort for IT • TDIL (Technology Development of Indian Languages): • 1990-1991 • development of corpora, OCR, Text-to-Speech, machine translation; Standards for keyboard and internal code for information interchange • 2000-2001 • seven major initiatives: • Knowledge Resources, Knowledge Tools, Translation Support Systems, Human Machine Interface Systems, Localisation, Standardization and Language Technology Human Resource Development. • Thirteen Resource centres for Indian Language Technology Solutions (RC-ILTS) • were supported covering all 18 Indian languages.

  26. MT in India: FutureDigital Unite and Knowledge for All • Indian Language Technology Vision 2010 has been prepared • with the Vision statement “ Digital Unite and Knowledge for All”. • Growing popularity of Internet • content creation, localisation, on-line gisting and summarisation, e-learning, Cross-Lingual Information Retrieval are being promoted to ensure information access in cyberspace in Indian languages • Source: Dr. Om Vikas • Senior Director and Head, Computer Development Division, Ministry of Information Technology

  27. MT in ThailandGovernment 1996 • IT-2000 • To build a national information infrastructure (NII) • To invest in people, intends to concentrate on transferring IT knowledge to their children. • To build a Government Information Network (GINET) • Internet Users in Thailand (2000): 2.3M/66M • Age <10 10-14 15-19 20-29 30-39 40-49 50-59 60-69 70+ Total • Freq 18 124 261 1,238 572 187 32 27 2 2,461 • Percent 0.7 5 10.6 50.3 23.2 7.6 1.3 1.1 0.1 100 • Most of the Thai Internet users know English and other Internet languages at a basic or low intermediate level

  28. MT in ThailandPARSIT • web-based Thai-English Machine Translation • since 1998 in cooperation with NEC (Japan). • very popular among Thai users • to translate English to Thai with the accuracy of 60%. • 20 percent mistranslating might be due to differences in expressions, slang, and sentence structures • http://www.suparsit.com/ • 300,000 hits/month • 25,000 users/month

  29. MT in Thailand: Dictionary • a web-based dictionary: Lexitron • Thai-English and English-Thai dictionary

  30. MT in Thailand: Future • to develop PARSIT translating system • Thai-to-English • and to other target languages. • Other language programs, such as OCR research, speech research, and language research • Thai full-text search engine

  31. MT in Thailand: eASEAN • eASEAN Plan: • Multilingual Machine Translation Proposal • Thailand, Cambodia, Laos, Vietnam, Japan, Korea, English • source: • Dr. Virach Sornlertlamvanich [virach@nectec.or.th] • Dr. Prayong THITITHANANON (Rajabhat Institute Ubon Ratchathani, Thailand)

  32. MT in Taiwan • Prof. Su, Keh-Ih • Machine translation • localization

  33. MT in KoreaCommercial Product • English-to-Korean (Korean-to-English) • Enguide LNI Soft • E-Tran2001 NLP Lab (Seoul National University) • EZ Reader Language and Computer • ClickWorld ClickQ • Transmate IBM Korea • … • Japanese-to/from-Korea • Unisoft • Changmyung • … • Translation Memory • Localization companies develop for their own use: • ITI …

  34. MT in KoreaTest suite for E-to-K • KAIST (http://korterm.kaist.ac.kr/ksurimal) • Supported by Ministry of Science and Technology • Exhaustive Evaluation • A variety of Sentences (5000 from high school textbooks, 10000 from internet e-business site) • To identify the R&D direction

  35. serious average Problematic Part of System A Article Noun Pronoun Adverb Adjective Verb Part of Specech Mark Preposition Conjunction Relatives Structural Part Partial Structure Infinitive Participle Gerund Tense Idioms Number Sentence type Special Construction Comparative Subjunctive mood Sentence Structure Ellipsis Insertion Speech Inversion Lists Negation Multiple part of speech Realtion and Scope of modification Phrase Semantic Part V+N V+Prep. N+V N+N Collocation N+Prep. Adv.+N Adv.+ Prep N V Etc. Ambiguous word NP VP Idioms PP AP(adjective phrase) Sentence Natural Expression Different meaning between singular and plural

  36. MT in Korea • Caption/EK and KE - ETRI • Real-time translation of caption in the TV news • CNN for English-Korean • KBS for Korean-English • Chinese-Korean MT • Pohang University of Science & Tech. • KAIST • ETRI (Korean-to-Chinese) • Companies: Konan tech. • Japanese-Korean MT (technology transfer) • Pohang University of Science & Tech.

  37. Online language populations (2001 June) • English 45%, Japanese 9.8%, Chinese 8.4% • German 6.2%, Korean 4.7%, Spanish 4.5% • Italian 3.6%, French 3.4%, Portuguese 2.5% • Dutch 2%, Russian 1.9% • GlobalReach. Global Internet Statistics (by Language). • http://www.glreach.com/globstats/index.php3

  38. Organizations in Asia • AAMT • AFNLP (Asia Federation of NLP Assocations) • http://asianlp.org/ • http://afnlp.org/ • Eafterm (East Asia Terminology Forum) • http://eafterm.org/ • Language Resource Sharing and Management • Jan/2001 – workshop in Tokyo, invited by Japan • Prof. Tanaka, Hozumi (Chair; GSK) • Nov/2001 – workshop in NLPRS-2001, Tokyo • ISO TC37/SC4 (Language Resource Management) under organization

  39. MT Status in Asia Thank you.

More Related