120 likes | 136 Vues
Explore JEITA’s role in setting Japanese SSML standards for accurate voice synthesis in technology. Learn about notations for pronunciation, speech rate, and ruby elements in Japanese text. Dive into the future of voice synthesis technology.
E N D
W3C Workshop on SSML, Nov2-3,2005, Beijing Issues of SSML in Japanese Wataru IMATAKE (ANIMO LIMITED) Makoto AKABANE (Sony Computer Entertainment Inc.) Kazuyo TANAKA (Tsukuba University) JEITA Technical Standardization Group on Speech Input/Output Systems JEITA Speech Group
1-1 About JEITA • JEITA (Japan Electronics and Information Technology Industries Association) is industry organization about information systems, personal information device, digital appliance, industrial or social system device and electronic parts. • JEITA was established in November 2000, by merging Japanese Electronic Industry Development Association (JEIDA) and Electronic Industries Association of Japan (EIAJ). JEITA Speech Group
1-2 JEITA Speech Group, Activities • Expert Committee on Speech Input/Output Systems (JEIDA Speech Group) was established "JEIDA-62-2000 Standard of Symbols for Japanese Text-to-Speech Synthesizer" as JEIDA standard, in March, 2000. • Revised version of JEIDA-62-2000 was published in March, 2005 , as “JEITA-IT-4002”. • JEIDA-62-2000 included control tags for synthesizers, defined by XML. • However, the control tags are removed in "JEITA-IT-4002“. JEITA Speech Group
2-1 How to specify Japanese pronunciation in phoneme element "JEITA IT-4002: Symbols for Japanese Text-to-Speech Synthesizer " • Two levels for notation: kana level notation with Japanese katakana, and phonemic level with IPA or SAMPA. We suggest that we describe it with "x-JEITA-IT-4002-kana", "x-JEITA-IT-4002-ipa", "x-JEITA-IT-4002-sampa" as alphabet attribute. JEITA Speech Group
2-2 How to specify Japanese pronunciation in phoneme element JEITA Speech Group
3-1 How to specify speaking rate in Japanese • A basic unit of Japanese rhythm is mora. • Mora is called "拍"(haku) in Japanese. For example, a haiku is described in 5-7-5 haku. “こんにちわ”/ko N ni chi wa/ →5 moras “しゃしん”/sya si n/→3 moras Japanese, /sya sin/→2 syllables English • Therefore, it is natural to specify the speaking rate / Japanese phoneme length by a number of mora. • To specify speaking rate in rate attribute of prosody element, use a unit of mora/sec. • By the same token, to specify pause time in time attribute of break element, use a unit of mora. JEITA Speech Group
3-2 How to specify speaking rate in Japanese JEITA Speech Group
4-1 ruby element • There is a lot of different meaning word of the same type (a reading different by the same notation) in a Japanese kanji. • For a long time, the newspaper publishing companies or magazine companies used a ruby to understand kanji words easier for readers. • In addition, there is a function to describe a ruby, and it is generally used for the word processor which is used a lot in Japan. (Ex. Microsoft Word, Justsystem ICHITARO, OpenOffice writer, etc) • Therefore, there are a lot of contents of a text including a ruby in Japan. • Japanese voice synthesis engines can reduce misreading by utilizing a ruby positively. • A ruby is usually described Japanese katakana or a hiragana letter. Therefore, a ruby does not fit a phoneme element. JEITA Speech Group
4-2 ruby element • We know "Ruby Annotation - W3C Recommendation 31 May 2001"(http://www.w3.org/TR/ruby/) , but this is overspecialization for voice synthesis. Layout information is unnecessary for a voice synthesis. • The simplest expression of the ruby is enough for a voice synthesis. • Therefore, we propose that a ruby element be defined newly. JEITA Speech Group
4-3 ruby element JEITA Speech Group
5-1 Expansion of an say-as element • There are different readings (both are right) in Japanese in the same meaning and the same notation. • For example, 「二十日」can be read as [ニジュウニチ」(ni-jyu-ni-chi) and 「ハツカ」(ha-tsu-ka) with same notation. Both mean 20th of the month. • In this case, SSML should provide a function that a creator can choose whether a voice synthesis engine reads "10/20" with "ジューガツハツカ" (jyu-gatsu-ha-tsu-ka) or "ジューガツニジューニチ"(jyu-gatsu-ni-jyu-ni-chi). • Therefore, we propose the attribute that can speak a Japanese language reading of a date for a say-as element. • We are still examining this issues. JEITA Speech Group
5-2 Expansion of an say-as element JEITA Speech Group