1 / 1

Experiments

Japanese-Chinese Phrase Alignment Using Common Chinese Characters Information Chenhui Chu, Toshiaki Nakazawa and Sadao Kurohashi Graduate School of Informatics, Kyoto University. Kanji: 開発 (develop). Category 3 Kanji: 発. Category 2 Kanji: 開. 発→ 發 ・・・ Unihan database. 開→ 开 發 → 发

kimama
Télécharger la présentation

Experiments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Japanese-Chinese Phrase Alignment Using Common Chinese Characters InformationChenhui Chu, Toshiaki Nakazawa and SadaoKurohashi Graduate School of Informatics, Kyoto University Kanji: 開発(develop) Category 3 Kanji: 発 Category 2 Kanji:開 発→發 ・・・ Unihan database 開→开 發→发 ・・・ Category 2 Kanji:發 Introduction Alignment Model Simplified Chinese: 开发 • Common Chinese characters information may be valuable in word/phrase alignment between Japanese and Chinese • Chinese characters are used both in Japanese (Kanji) and Chinese (Hanzi) • There exist common Chinese characters between Kanji and Hanzi • Parallel sentences contain equivalent meanings in each language, and we can assume common Chinese characters appear in the sentences • Bayesian subtree alignment model on dependency trees (Nakazawa et al. 2011) (1) (2) (3) (4) (5) • Three categories of Kanji: • Category 1: identical to Simplified Chinese • Category 2: identical to Traditional Chinese but different from Simplified Chinese • Category 3: visual variations • Common Chinese characters information incorporation • Base distribution adjustment (6) (7) • Model modification (8) Common Chinese Characters Detection Experiments • Aiming to detect common Chinese characters between Japanese and Simplified Chinese, we do a conversion of Japanese into Chinese • Freely available resources used for category 2 and 3 Kanji conversion: • Japanese-Chinese corpus we used • Coverage of common Chinese characters detection • Example of common Chinese characters detection • Alignment • We also do Kana-Kanji conversion for common Chinese characters detection

More Related