1 / 20

Analysis Flow

The Impact of IDN Registration policy by UNICODE variants issue -- Case Study on Chinese Characters Vincent WS Chen TWNIC October, 2002. Analysis Flow. VCP : Valid code point twRV: Recommended variants by .tw cnRV: Recommended variants by .cn CV: Character variants. Registered IDN.com

Télécharger la présentation

Analysis Flow

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Impact of IDN Registration policy by UNICODE variants issue-- Case Study on Chinese CharactersVincent WS ChenTWNICOctober, 2002

  2. Analysis Flow VCP : Valid code point twRV: Recommended variants by .tw cnRV: Recommended variants by .cn CV: Character variants Registered IDN.com IDN.net IDN.org Registered IDN.tw Valid Code Point 4E00-9FA5 (20,902) Name Conflict Analysis Chinese Character Mapping Table % Collision with twRV % Collision with cnRV % Collision with twRV and cnRV % Collision with CV

  3. Chinese Character Mapping Table (CCMT)for Chinese Domain Name • The table draft is prepared by the CCMT Task force • organized by TWNIC from January, 2002. • Task force members have 9 experts from • language linguist, computer experts and DNS experts. • The table draft has submitted to the Bureau of Standards, • Ministry of Economic Affairs to final review. • The CNS Standard version will be published on • December, 2002 tentatively.

  4. Chinese Character Mapping Table (CCMT) --- Sources of Character Codes Based on the USC, CNS 14649, published in 2002, and referred to as the Mapping Table Source. The range of codes is described below: Block Name Code Range CJK Unified Ideographs 4E00-9FA5 (20,902) Character for registration (Valid code point): all Chinese character codes in the Mapping Table Source (20,902) Primary corresponding character (Recommended Variants by .tw) : T-source Chinese character codes in the Mapping Table Source (18,368) Secondary corresponding character (Recommended variants by .cn) : G-source Chinese character codes in the Mapping Table Source (20,902) Relevant character (Character variants): all Chinese character codes in the Mapping Table Source

  5. Chinese Character Mapping Table (CCMT) ---- Table format and categories

  6. Chinese Character Mapping Table (CCMT)---- Table format and categories (cont.) ?(個(500B)箇(7B87)): sometime个(4E2A) should be recommended by 個(500B), but sometime should be recommended by箇(7B87), depends on its context.

  7. Chinese Character Mapping Table (CCMT)---- Table format and categories (cont.) ?(發(767C)髮(9AEE)): sometime发(53D1)should be recommended by發(767C), but sometime发(53D1)should be recommended by髮(9AEE), depends on its context.

  8. Chinese Character Mapping Table (CCMT)---- Table format and categories (cont.) ?(發(767C)髮(9AEE)): sometime 発(767A) should be recommended by發(767C), but sometime 発(767A) should be recommended by髮(9AEE) depends on its context.

  9. Chinese Character Mapping Table (CCMT)---- Table format and categories (cont.) ?(發(767C)髮(9AEE)): sometime 髪(9AEA)should be recommended by發(767C), but sometime 髪(9AEA)should be recommended by髮(9AEE) depends on its context.

  10. Characters Relationship 1. Singular-relation character: single character VCP = twRV = cnRV 2. Pair-relation character: A pair of characters (VCP1 and VCP2) 2.1 twRV1=cnRV1=TWRV2=cnRV2 2.2 (twRV1=cnRV1=cnRV2)≠TWRV2 2.3 (twRV1=twRV2)≠(cnRV1=cnRV2) 3. Multiple-relation character: (VCP1, VCP2, VCP3 ….) 3.1 with two or more twRV (twRV11, twRB12….) options

  11. Singular-relation character (VCP=twRV=cnRV): 13888(66.4%) VCP=twRV≠cnRV: 2783 (13.3%) VCP=cnRV≠twRV: 2453(11.7%) VCP≠(twRV=cnRV): 333(1.6%) VCP≠twRV≠SCR: 387(1.9%) Chinese Character Mapping Table (CCMT)---- Table characters

  12. Chinese Character Mapping Table(CCMT)for Chinese Domain Name

  13. Case Study -- Sources Han char.IDN: any character in that IDN has CJK Unified Ideographs charcater IDN.tw: Valid code point is in the scope of Big5 code range

  14. Apply Mapping Table to Case I ~ IV Convert to twRV- collision with twRV 竹叶青竹葉青 竹葉青竹葉青 Convert to cnRV  collision with cnRV 万事如意万事如意 萬事如意万事如意 Convert to CV  collision with CV 一个一个、一個、一箇 一個一个、一個、一箇 Case Study Method

  15. Case Study– Result (only CJK domain name)

  16. Real case in IDN.com 为什么 为什麽为甚么 為什么- 為什麼 為甚麼 Case Study Example six registered name should be as one name

  17. Case Study -- idn.tw Example • Current valid code point for IDN.tw is Big5(13,051), • less than in the CCMT Tables (20,902) • 2. Current tentative TC/SC mapping table (old version) is • a little different from CCMT tables. • 3. Even the applied table is a little different, but number of • the name conflict is reduced hugely.

  18. Case Study -- real registered IDN name example 財產財産财产財產保險财产保险財產稅财产税財產管理財産管理财产管理財神财神財神到财神到財神爺财神爷 运财運財运货汽车運貨汽車运输運輸运输学運輸學运输服务運輸服務运输设备運輸設備運転運轉 龍圖蛇業龙图蛇业 龍之杰醫院龙之杰医院龍之杰集團龙之杰集团 歯科材料齒科材料齿科材料 黃金時代黄金时代黄金時代 黃山中旅黄山中旅黃山之旅黄山之旅黃山國旅黄山国旅黃山旅遊黄山旅遊黃帝黄帝 鹿儿岛鹿兒島鹿儿岛大学鹿児島大学鹿児島市鹿兒島市鹿児島銀行鹿兒島銀行鹿岛鹿島鹿嶋鹿岛建设鹿島建設 麻将麻將麻将世界麻將世界麻将桌麻將桌麻将馆麻將館

  19. Case Study– Conclusion • IDN.com case: • If no any mechanisms to reduce name confusion, • About 18% to 23% of registered IDN.com names has • Name conflict problem. • IDN.net case: • About 16% to 21% (Consider character variants) • IDN.org case: • About 15% to 20% (Consider character variants) • IDN.tw case: • Very few percentage of name conflict, if we apply • mapping table mechanisms.

  20. Case Study– Conclusion (cont.) • More registered IDN names, more percentage of name • conflicts will be happened. • (more percentage of idn.com’s name conflict than idn.org) • In Chinese case, apply recommended variants rule can • reduce major name conflict and apply character variants • rule can also improve reducing name conflict. • If no any reducing name confusion mechanism, for example, • idn.com (242,512 idn names) will have about 18% to 23% • name confusion. If the number increases, the percentage • will increase too. • If we expand the valid code point from CJK Unified Ideographs • 4E00-9FA5 (20,920) to whole Unicode code point, then • the situation is worse than this case study.

More Related