html5-img
1 / 25

SSML Extensions for Chinese Voice Browsing

SSML Extensions for Chinese Voice Browsing. Helen MENG, Wai-Kit LO, Tien-Ying FUNG, Yuk-Chi LI and Zhiyong WU Human-Computer Communications Laboratory Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2nd November, 2005. Outline.

helmut
Télécharger la présentation

SSML Extensions for Chinese Voice Browsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SSML Extensions for Chinese Voice Browsing Helen MENG, Wai-Kit LO, Tien-Ying FUNG, Yuk-Chi LI and Zhiyong WU Human-Computer Communications LaboratoryDepartment of Systems Engineering and Engineering ManagementThe Chinese University of Hong Kong 2nd November, 2005

  2. Outline • Characteristics of Chinese • Proposed attributes for existing elements • “dialect-accent” • Proposed elements • <phrase> and <word> • <tone> • Proposed attribute values • for “interpret-as” attribute in <say-as> element • Summary

  3. Characteristics of Chinese • Rich in dialects, e.g., Cantonese, Shanghaiese, Mandarin • Write alike, speak differently • similar writing system; e.g., 中国 and 中國 • significantly different pronunciations • Mandarin with different accents • No explicit phrase and word boundaries • e.g., 我們現在在開電話會議 (we are) (now) (having) (a teleconference) • proper segmentation is critical for prosodic control, pronunciation selection for homographs and resolution of semantic ambiguity • Monosyllabic and tonal • Syllable + Lexical Tone  lexical meaning of Chinese character • tone can change according to meaning, context, mode of speaking

  4. Phonetic Transcription Schemes • Pronunciation of a character = tonal syllable = syllable + tone • Many transcription schemes developed for different dialects • syllable in Roman alphabets • tone as a one-digit Arabic number • Popular schemes are • pinyin (for Mandarin)銀行 (bank): /yin2 hang2/ • jyutping (for Cantonese) 銀行 (bank): /ngan4 hong4/

  5. level of F0 time Chinese Tone Systems Figure 1. Mandarin tone system (4 tones + 1 ‘light’ tone) (2). 陽平/yang ping/,low levele.g., 麻 (1). 陰平/yin ping/,high levele.g., 媽 (3). 上/shang/,risinge.g., 馬 (4). 去/qu/,goinge.g., 罵 (1). 陰平, high levele.g., 詩 (2). 陰上, high risinge.g., 史 (3). 陰去, high goinge.g.,試 8(3). 中入,middleenteringe.g., 舌 9(6). 陽入,low enteringe.g., 蝕 7(1). 陰入,high enteringe.g., 色 (5). 陽上, low risinge.g., 市 (4). 陽平, low levele.g., 時 (6). 陽去, low goinge.g., 事 Figure 2. Cantonese tone system (9 tones, specified in 6 classes)

  6. “dialect-accent” Beijing Mandarin Guangdong Mandarin Hong Kong Cantonese

  7. Proposed “dialect-accent” Attribute • Specify dialects and accents in a language • use with xml:lang [XML1.0] • dialect-accent = primary-subtag[“-”optional-subtag] • primary-subtag = 2ALPHA • specify dialect • e.g., MD for Mandarin, CT for Cantonese • optional-subtag = 2ALPHA • specify accent • e.g., BJ for Beijing, GD for Guangdong, HK for Hong Kong • follows the abbreviations of Chinese provinces, autonomous regions and special administrative regions listed in the EDU.CN Domain Policy (中國教育和科研計算機網 EDU.CN 網絡域名註冊辦法)1 • examples • Mandarin in Beijing and Guangdong accent: MD-BJ, MD-GD • Cantonese in Hong Kong and Guangdong accent: CT-HK, CT-GD 1 Defined by the China Education and Research Network Information Centre (CERNET網絡信息中心)

  8. xml:lang values Dialect Accent “dialect-accent” value zh-HK Cantonese Hong Kong CT-HK Guangdong CT-GD Mandarin Hong Kong MD-HK Beijing MD-BJ Taiwan MD-TW “dialect-accent” Attribute (continue) <p>Hello, where are you from?</p> <p xml:lang="zh-CH" dialect-accent="MD-BJ"> 我(I am) 從(from) 北京(Beijing) 來的。</p> <p xml:lang="zh-CH" dialect-accent="MD-GD"> 我(I am) 從 (from) 廣東(Guangdong) 來的。</p> <p xml:lang="zh-CH" dialect-accent="CT-HK"> 我(I am) 從 (from) 香港(Hong Kong)來的。</p> Mandarin withBeijing accent Mandarin with Guangdong accent Cantonese with Hong Kong accent

  9. <phrase> and <word> elements

  10. Enrich <p>, <s> with <phrase>, <word> • Current SSML 1.0: <p> and <s> • Proposed elements: <phrase> and <word> • Serve as cues for prosodic control (e.g., pause) • Assist correct pronunciation selection for homographs • A Cantonese example • The character 行has FIVE pronunciations /haang4/ 行山(hiking) /hang6/ 品行(discipline) /hong2/ 洋行(foreign trading company) /hong4/ 銀行(bank) /hang4/ 行人(pedestrian)

  11. Proposed <phrase> Element • Definition: • Defines the course of a Chinese phrase • No attributes • Occurs within <s> • These elements can be nested within <phrase> • <audio>, <break>, <emphasis>, <mark>, <phoneme>, <prosody>, <say-as>, <sub>, <voice>, <word> • Example (an ancient poem) 終年倒運少有餘財 • Pessimistic phrasing • <phrase>終年倒運</phrase> <phrase>少有餘財</phrase> • Optimistic phrasing • <phrase>終年倒運少</phrase> <phrase>有餘財</phrase> Whole year unlucky Not much money left Only with a few unlucky events in the year Have money left

  12. Proposed <word> Element • Definition: • Defines the course of a Chinese word • No attributes • Occur within <s> and <phrase> • These elements can be nested within <phrase> • <audio>, <break>, <emphasis>, <mark>, <phoneme>, <prosody>, <say-as>, <sub>, <voice> • Example 這一晚會如常舉行 • Segmentation 1 • <word>這一</word> <word>晚會</word> <word>如常</word> <word>舉行</word> • Segmentation 2 • <word>這一晚</word> <word>會</word> <word>如常</word> <word>舉行</word> /wui2/ 1. This banquet is held as usual This banquet as usual hold /wui3/ 2. Tonight will be held as usual Tonight will as usual hold

  13. <tone> element

  14. Proposed <tone> Element • Tone • Important in Chinese pronunciation • Tones can vary according to differences in meaning, context and mode of speaking • 相 • in tone 2 meansphoto • in tone 3 means facial appearance / minister • Current SSML 1.0: phoneme • Requires pronunciation transcription • Example <phoneme alphabet="x-lshk-jyutping" ph="soeng2">相</phoneme> <phoneme alphabet="x-lshk-jyutping" ph="soeng3">相</phoneme> • Proposed <tone> element • with the required “value” attribute <tone value="2">相</tone> (photo) <tone value="3">相</tone> (face appearance) • inherit the alphabet attribute, or explicitly specify

  15. Examples of Using “tone” Element • Tone changes on meaning • 糖 (candy / sugar) <tone value="2">糖</tone> (tone 2 /tong2/: means candy) <tone value="4">糖</tone> (tone 4 /tong4/: means sugar) • Tone changes on context • 爺 (grandfather) 阿<tone value="4">爺</tone> (tone 4 /je4/: preceded by 阿) 爺<tone value="2">爺</tone> (tone 2 /je2/: preceded by 爺) • Tone changes on mode of speaking: • 英文(English) 英<tone value=“4">文</tone> (tone 4 /man4/: formal mode) 英<tone value="2">文</tone> (tone 2 /man2/: colloquial mode)

  16. Values for “interpret-as” in <say-as>

  17. Proposed Legal Values for “interpret-as” Attribute • VoiceXML2.0 Appendix P • boolean, date, digits, currency, number, phone, time • SSML 1.0 <say-as> attribute values (W3C Working Group Note 2005) • date, time, telephone, characters, cardinal, ordinal • Propose 6 new values: • Chinese-name, • fraction, • measure, • net, • percentage, • ratio

  18. “Chinese-name” Value • Specify as name to aid pronunciation selection • 單明明:單/daan1/  /sin6/ (surname) 明明/ming4 ming4/  /ming4 ming2/ (given name) • Format: S*G* • S: surname, G: given name • Examples • <say-as interpret-as=“Chinese-name” format=“SG”>姚明</say-as> (Yao Ming) • <say-as interpret-as=“Chinese-name” format=“SGG”>單明明</say-as>(Sin Ming Ming) • <say-as interpret-as=“Chinese-name” format=“SSG”>歐陽修</say-as>(Au-yeung Sau)

  19. “fraction” Value • Specify as fraction • e.g. 3/4 • Verbalization of fraction in Chinese: • with an additional word: 分之(out of) • A / B (Aout ofB): B分之A[note that the order is reversed!] • e.g. 3/4 is verbalized as 四(four)分之(out of)三(three) • “format” and “detail” attributes not required • Example 我吃了3/4個橙 (I) (ate) (orange) 我吃了<say-as interpret-as="fraction">3/4</say-as>個橙 我吃了四分之三個橙 (I ate three-fourth of the orange)

  20. “measure” Value • Specify as measurement • e.g. 10cm, 30ml • measurement = number + unit • number [VoiceXML2.0]; e.g. 10 is ten (not one zero) • unit: translated and pronounced in Chinese, e.g. cm is 厘米, g is 克, oz is 安士, yd is 碼 • “format” and “detail” attributes not required • Example 他的身高是180cm 他的身高是<say-as interpret-as="measure">180cm</say-as> 他的身高是一百八十厘米 (his height is 180cm) • (his) (height) (is)

  21. “net” Value • Specify as URI or email address • Possible ways to verbalize a URI: • Read the whole string in English, including punctuations • Omit http:// (ftp://, etc.), read the rest in English • Read alphabets in English, punctuations in Chinese • “format” attribute value: “email” or “uri” • Example 詳情請瀏覽http://www.w3.org (for details) (please) (browse) • Possible verbalizations: • H T T P colon slash slash W W W dot W three dot O R G • W W W dot W three dot O R G • W W W 點 W 三 點 O R G (點:dot三:three) [Similarly the protocol part may be kept as another option] 詳情請瀏覽<say-as interpret-as="net" format="uri"> http://www.w3.org </say-as>

  22. “percentage” Value • Specify as percentage • Verbalization of percentage in Chinese • with an additional word: 百分之(out of a hundred) • A%: 百分之A • e.g. 70% is verbalized as 百分之(out of a hundred)七十(seventy) • “format” and “detail” attributes not required • Example 海洋約佔全球總面積的70% 海洋約佔全球總面積的<say-as interpret-as="percentage">70%</say-as> 海洋約佔全球總面積的百分之七十 (ocean covers 70% of global surface) • (ocean) (covers) (global) (surface)

  23. “ratio” Value • Specify as ratio • e.g. 1:3 • Verbalization of ratio in Chinese: • with an additional word: 比(to) • A:B (A toB): A 比B • e.g. 1:99 is verbalized as 一(one)比(to)九十九(ninety nine) • “format” and “detail” attributes not required • Example 用1:99 的稀釋漂白水 用<say-as interpret-as="ratio">1:99</say-as>的稀釋漂白水 用一比九十九的稀釋漂白水 (use diluted bleach at a ratio of 1:99) • (use) (diluted) (bleach water)

  24. Summary • “dialect-accent” attribute to enrich the xml:lang attribute • <phrase> and <word> for text processing • <tone> for pronunciation • 6 values for “interpret-as” attribute • Chinese-name • fraction • measure • net • percentage • ratio

  25. Thank You

More Related