1 / 36

CLDR: The Common Locale Data Repository Locales for the World

CLDR: The Common Locale Data Repository Locales for the World. Lisa Moore George Rhoten Mark Davis Steven Loomis. Agenda. Why CLDR? CLDR data Tools and vetting Today and the future. Agenda. Why CLDR? CLDR data Tools and vetting Today and the future.

cael
Télécharger la présentation

CLDR: The Common Locale Data Repository Locales for the World

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CLDR:The Common Locale Data RepositoryLocales for the World Lisa Moore George Rhoten Mark Davis Steven Loomis

  2. Agenda • Why CLDR? • CLDR data • Tools and vetting • Today and the future LRC – XI The Localisation Factory

  3. Agenda • Why CLDR? • CLDR data • Tools and vetting • Today and the future LRC – XI The Localisation Factory

  4. Locales – does anything stay the same? "Theatre Center News: Thedate of the last version of this document was 2003年3月20日. A copy can be obtained for$50,0 or 1.234,57 грн. We would like to acknowledge contributions by the following authors(in alphabetical order): Alaa Ghoneim, Behdad Esfahbod, Ahmed Talaat, Eric Mader, Asmus Freytag, Avery Bishop, and Doug Felt." LRC – XI The Localisation Factory

  5. Locales – the many differences • Locales specify user preferences • Linguistic and cultural differences • Languages, scripts, writing systems, ordering, directionality, formatting, numbers, sizes • Even in the same locale, interoperability issues across platforms • Global economics has increased the need for greater globalization support in computer systems • Everyone expects more! LRC – XI The Localisation Factory

  6. Add the Universal Character Encoding • Unicode: Unique character codes for all languages … LRC – XI The Localisation Factory

  7. The Need for Common Locale Data • Computing environments often contain a variety of operating systems and software. • Historically locale sensitive data research has been done by individuals and/or companies. • Because of political changes, it is easy for locale data to become out of date. • It is difficult to get complete agreement on correctness. LRC – XI The Localisation Factory

  8. Common Locale Data Project • Began as Common XML Locale Repository (CXLR) developed by OpenI18N in 2003 • CLDR project began in 2004 • Hosted by Unicode Consortium • http://www.unicode.org/cldr/ • Goals: • Common, necessary software locale data for all world languages • Collect and maintain locale data • XML format for effective interchange • Freely available LRC – XI The Localisation Factory

  9. CLDR in use (partial list) • Libraries and Environments • ICU – International Components for Unicode • JDK – Java Development Kit • Operating Systems • Solaris • AIX • MacOS X • Applications • OpenOffice.org • Acrobat • ModernBill LRC – XI The Localisation Factory

  10. Agenda • Why CLDR? • CLDR data • Tools and vetting • The future LRC – XI The Localisation Factory

  11. What is a Locale? • A locale is an identifier referring to linguistic and cultural preferences • en_US, en_GB, ja_JP • These preferences can change over time due to cultural and political reasons • Introduction of new currencies, like the Euro • Standard sorting of Spanish changes • Many of these preferences have varying degrees of standardization • 12 and 24 hour format in the United States • This is a very broad topic LRC – XI The Localisation Factory

  12. Types of Locale Data • Dates/time/calendar formats • Number/currency formats • Measurement system • Collation specification • Sorting • Searching • Matching • Translated names for language, territory, script, timezones, currencies,… • Script and characters used by a language LRC – XI The Localisation Factory

  13. Locale Data Markup Language • Locale data described using XML • CLDR data uses LDML • Structure of CLDR controlled by Locale Data Markup Language (LDML) specificationhttp://unicode.org/reports/tr35 LRC – XI The Localisation Factory

  14. LDML Data Categories <ldml> <identity> <localeDisplayNames> <layout> <characters> <delimiters> <measurement> <dates> <numbers> <posix> <collations> LRC – XI The Localisation Factory

  15. Names <localeDisplayNames> • Provides translated display names for languages, territories, scripts, variants and keywords used in CLDR. • Most of this information is at the language level, since it typically does not vary by territory, only language. • An example: ICU Locale Explorer LRC – XI The Localisation Factory

  16. Names Examples From ga.xml (Irish): <localeDisplayNames> <languages> <language type="aa">Afar</language> <language type="ab">Abcáisis</language>… <scripts> <script type="Arab">Araibis</script>… <territories> <territory type="AD">Andóra </territory> <territory type="AE">Aontas na nÉimíríochtaí Arabacha </territory>… LRC – XI The Localisation Factory

  17. Characters <characters> • Allows for creation of exemplar character sets. An exemplar set specifies the set of characters that must be present in order to properly render the language. • Auxiliary exemplar set defines additional characters that may appear in foreign words or phrases. • Lower case only LRC – XI The Localisation Factory

  18. Date Formats <dates> • Defines representation of calendars using various calendaring systems (Gregorian, Buddhist, Islamic, Japanese, etc.) • Defines formatting for dates, times, eras and time zones • wide, abbreviated, or narrow • Date and time formats use patterns of letters to define proper formatting • Week information • Relative day/time translations (for example, yesterday, tomorrow, etc. ) • An example: ICU Locale Explorer LRC – XI The Localisation Factory

  19. Characters / Dates Examples From ga.xml (Irish): <characters> <exemplarCharacters> [a á b-e é f-i í j-o ó p-u ú v-z] </exemplarCharacters> <exemplarCharacters type="auxiliary"> [ḃ ċ ḋ ḟ ġ ṁ ṗ ṡ ṫ]</exemplarCharacters> </characters>… <dayContext type="format"> <dayWidth type="abbreviated"> <day type="sun">Domh</day> <day type="mon">Luan</day>… LRC – XI The Localisation Factory

  20. Time Zone Names <timeZoneNames> • Based on Olson time zone database • Localized display names for standard, daylight, and generic representations of time zones. • Short and long display names. LRC – XI The Localisation Factory

  21. Numbers <numbers> • Specifies proper localized formatting of numeric quantities • Decimal • Scientific • Currency • Percentages • Includes localized decimal, thousands separators, currency symbols, etc. LRC – XI The Localisation Factory

  22. Time Zones / Currencies From ga.xml (Irish) and root.xml: <timeZoneNames> <zone type="Europe/Dublin"> <long> <standard>Meán-Am Greenwich</standard> <daylight>AmSamhraidh na hÉireann</daylight> </long>… <numbers> <currencies> <currency type=“EUR"> <displayName>Euro</displayName> <symbol>€</symbol>… LRC – XI The Localisation Factory

  23. Delimiters <delimiters> • Specifies a primary and secondary of delimiter characters to be used for bracketing quotations in text LRC – XI The Localisation Factory

  24. Delimiters Example From fr.xml (French): <delimiters> <quotationStart>«</quotationStart> <quotationEnd>»</quotationEnd> <alternateQuotationStart>“</alternateQuotationStart> <alternateQuotationEnd>”</alternateQuotationEnd> </delimiters> LRC – XI The Localisation Factory

  25. Collation <collations> • Information in collation directory, not main • XML version of Java/ICU collation syntax • Unicode collation algorithm is the base http://unicode.org/reports/tr10 • Allows tailoring of the UCA on a per locale basis. LRC – XI The Localisation Factory

  26. Collation Example From collations/root.xml: <collations validSubLocales="ga ga_IE id id_ID ms ms_BN ms_MY nl nl_BE nl_NL pt pt_BR pt_PT"> <collation type="standard"> <rules> ... <s>ā</s> <t>Ā</t> <s>á</s> <t>Á</t> <s>ǎ</s> <t>Ǎ</t> <s>à</s> <t>À</t>… LRC – XI The Localisation Factory

  27. Agenda • Why CLDR? • CLDR data • Tools and vetting • Today and the future LRC – XI The Localisation Factory

  28. CLDR Tools • Export • ICU resource bundle generation • POSIX locale generator • openOffice.org format export • Survey tool • http://www.unicode.org/cgi-bin/cldr-survey LRC – XI The Localisation Factory

  29. Vetting Process for Data • Collect from different platforms, experts, submissions: new or revised • References to external sources strongly encouraged • Must be before freeze date for release • Use Survey Tool to Collect Data LRC – XI The Localisation Factory

  30. Causes of Conflicting Data • Typographical errors • Canda instead of Canada • Regional differences • German spelling is different between countries • Parts of speech • “март 2004” versus “3 марта” when the Russian word for March is used in a date • Context of usage • Normal German sorting versus German phonebook sorting • Standards versus common use • “Republic of Laos” versus “Laos” • Individual preferences • 24 hour time format versus 12 hour time format LRC – XI The Localisation Factory

  31. Agenda • Why CLDR? • CLDR data • Tools and vetting • Today and the future LRC – XI The Localisation Factory

  32. Latest Release: CLDR 1.4 • Released: July 17, 2006 • 360 locales: • 121 languages • 142 territories • 25% more data • 17,000 new or modified data items • Over 100 different contributors LRC – XI The Localisation Factory

  33. Challenges • Complex Formats • Experts knowledgeable both in technology and a specific language • Collation • Exemplar characters • Etc… • Require close interaction of CLDR experts with language experts LRC – XI The Localisation Factory

  34. Getting Involved • Simplest – anyone! • Use CLDR • Bug report / feature request • More Involved • Vetting, Assessment, Tools, Policies, Decisions, … • Any Unicode member eligible to name representatives including country liaison members LRC – XI The Localisation Factory

  35. Example Country Process (Finland) • Finnish Ministry of Education made CLDR data a major goal, 2004-06 • Research Institute for the Languages of Finland (“RILF” aka “Kotus”) designated agency • Two official languages (Finnish and Swedish) & four regional / minority languages (three Sámi & Romani as spoken in Finland) to be covered • Over 30 different parties represented: commercial, non-commercial, individuals • Results expected to lead to new/revised national standards LRC – XI The Localisation Factory

  36. For More Information • Unicode • http://www.unicode.org/ • CLDR • http://www.unicode.org/cldr/ • LDML specification • http://unicode.org/reports/tr35 • lisam@us.ibm.com LRC – XI The Localisation Factory

More Related