390 likes | 516 Vues
Internationalization. Bob Alcorn, Blackboard Inc. And now a word from our lawyers….
E N D
Internationalization Bob Alcorn,Blackboard Inc.
And now a word from our lawyers… Any statements in this presentation about future expectations, plans and prospects for Blackboard and other statements containing the words "believes," "anticipates," "plans," "expects," "will," and similar expressions, constitute forward-looking statements within the meaning of The Private Securities Litigation Reform Act of 1995. Actual results may differ materially from those indicated by such forward-looking statements as a result of various important factors, including the factors discussed in the "Risk Factors" section of our most recent 10-K filed with the SEC. In addition, the forward-looking statements included in this press release represent the Company's views as of April 11, 2005. The Company anticipates that subsequent events and developments will cause the Company's views to change. However, while the Company may elect to update these forward-looking statements at some point in the future, the Company specifically disclaims any obligation to do so. These forward-looking statements should not be relied upon as representing the Company's views as of any date subsequent to April 11, 2005. Blackboard, in its sole discretion, may delay or cancel the release of any product or functionality described in this presentation.
Internationalization Overview • Internationalization (i18n) vs. Localization (l10n) • Character sets, Character Encoding • Unicode, ISO • Pitfalls • Blackboard Learning System™ Application Pack 2 Features • Looking Ahead
I18N vs L10N • I18N is the process of building the infrastructure to support multiple locales • String extraction • Data formatting • Character set encoding • L10N is the process of enabling a locale • Providing resource bundles • Dependent on depth of i18n
Concepts • Character - the smallest components of written language that have semantic value • Glyphs – the shapes that characters can have when they are rendered or displayed • Not a one-to-one correspondence. E.g., Different fonts, ligatures • Not what we care about vis-à-vis I18N…
Concepts • Character Sets • Collection of characters used to express a given written language, expressed as a numeric value • Kinda sorta language specific • Character Set Encoding • Binary encoding for numeric values in a character set • Sometimes used interchangeably with character set… • E.g., MIME type “text/html;charset=iso-8859-1”… the “character set” is ISO-8859-1. • For our intents, we can treat them as synonymous.
Concepts • Universal Character Set (UCS) • Unambiguous numeric value for every character in every language (more or less) • UCS can be unambiguously encoded with… • UTF-16 • UCS-2 (subset of UTF-16) • UTF-8 • Requires multi-byte encoding
Concepts • Unambiguous encoding enables “co-existence” of several different languages in a single data stream • Single byte encodings require a “marker” to know that any given value (e.g., 64) is to be interpreted as a different character • Example: you couldn’t directly encode Hebrew and Cyrillic (with ISO-8859-7 and ISO-8859-5, respectively) in the same database field.
Concepts • The “ISO 8859 Planes” • Set of single byte encodings (256 UCS characters per encoding) • ISO-8859-X, where X = 1 . . 15 • Super set of 7-bit US-ASCII (values 0-127 are identical) • Windows encoding is NOT ISO-8859-1 • CP1252. Similar except for a control character range
The Pipeline Browser Application Server Database Server May see arbitrary text streams from servers. Posts in same encoding. Internally Unicode. Handles translation from browser to database Dependent on database storage type (char, nchar) and various settings
The Application Server • Internally Unicode (Java uses UTF-16 internally • Java I/O APIs take encoding into consideration • Specifically java.io.Reader and java.io.Writer • Map bytes from HTTP input stream to characters
CHAR vs. NCHAR Single vs. multi-byte CHAR can still be “internationalized” with different collations The Database
What’s Wrong Here? Client: String value = “problème”; socketStream.write( value.getBytes() ); Server: byte[] byteBuf = new byte[1024]; int count = socketStream.read( byteBuf ); String value = new String( byteBuf, 0, count );
What’s Wrong Here? File path = storeFileFromRequest(); FileReader fr = new FileReader( path ); char[] buf = new char[1024]; StringBuffer str = new StringBuffer(); while( fr.read( buf ) != -1 ) { str.append( buf ); }
ISO-8859-1 x50 x72 x6F x62 x6C xE8 x6D x65 byte[] bytes = value.getBytes() UTF-8 x50 x72 x6F x62 x6C xC3 xA8 x6D x65 byte[] bytes = value.getBytes( “UTF-8” ) Reading UTF-8 as ISO-8859-1 P r o b l à ¨ m e new String( bytes ) Encodings and Transformations Text P r o b l è m e
x00 x00 x00 x00 x00 x00 x00 x00 Encodings and Transformations Text P r o b l è m e UTF-16 (LE) x50 x72 x6F x62 x6C xE8 x6D x65 Reading UTF-16 (LE) as ISO-8859-1 □ P □ r □ o □ b □ l □ è □ m □ e
Blackboard Academic Suite ™ Version 6, Application Pack 2 • Internationalized! • Text extracted into locale-specific resource bundles • Application code uses locale settings for formatting (numbers, dates, names) • ISO-8859-1 only • In theory, any 1-byte encoding could be used, but it is not being tested
Blackboard Academic Suite ™ Version 6, Application Pack 3 • Internationalized! • Support multiple locales simultaneously • Per-course, Per user settings • Blackboard Building Blocks™ view doesn’t change • Locale negotiation is still transparent
Blackboard Academic Suite ™ Release 7 • Final Stage in internationalization • Full multi-byte support from browser to database • Multi-byte file name handling, independent of server file system • Localizable Blackboard Building Blocks manifests • “Language Pack Editor”
Application Pack 2 – Blackboard Building Blocks View • What’s the current locale? • Display this datum using current locale settings… • Parse this datum using the current locale settings…
Making it Work • Automatically handled through tag library <bb:docTemplate></bb:docTemplate> HTTP Header: Content-type: text/html;charset=ISO-8859-1 HTML: <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
Making it Work • If you’re not using the tag, you can still set the appropriate values String encoding = BbServiceManager .getConfigurationService() .getBbProperty( “bbconfig.webserver.charset” ); response.setContentType( “text/html;charset=” + encoding );
I18N API • LocaleManager • Fail-safe service (no service exceptions; falls back to en_US locale) • Hides locale negotiation • AP2 is per-VI; ML feature set has per course, or per user. LocaleManager localeManager = BbServiceManager.getLocaleManager();
I18N API • BbLocale • Wraps JDK locale object • Auto-negotiated for current context • Important for dynamic, multi-locale capabilities • Wraps utility functions BbLocale locale = localeManager.getLocale(); locale.getLocale();
<% Locale locale = BbServiceManager.getLocaleManager() .getLocale().getLocaleObject(); ResourceBundle bundle = ResourceBundle.getBundle( "resources", locale ); String pageTitle = bundle.getString( "index.page.title" ); String formatDemoTitle = bundle.getString( "format.demo.title" ); String formatDemoDesc = bundle.getString( "format.demo.desc" ); String inputDemoTitle = bundle.getString( "input.demo.title" ); String inputDemoDesc = bundle.getString( "input.demo.desc" ); %> <bbUI:docTemplate title="<%=pageTitle%>"> <bbUI:titleBar><%=pageTitle%></bbUI:titleBar> <bbUI:caretList> <bbUI:caret title="<%=formatDemoTitle%>" href="<%=Util.getFullUri( request, Constants.URI_LOCALE_DATA )%>"> <%=formatDemoDesc%> </bbUI:caret> <bbUI:caret title="<%=inputDemoTitle%>" href="<%=Util.getFullUri( request, Constants.URI_DATA_INPUT )%>"> <%=inputDemoDesc%> </bbUI:caret> </bbUI:caretList> </bbUI:docTemplate> Application
What’s Wrong Here? String dateString = dateValue.toString(); out.println( dateString ); Date.toString() does output a locale-appropriate string, or give any options for formatting. toString(): Mon Jul 19 21:02:13 GMT-05:00 2004 In French: 19 juil. 2004 21 h 02 GMT-05:00 In English: Jul 19, 2004 9:02:13 PM GMT-05:00
Displaying Data value = locale.formatDate( dateValue, BbLocale.Name.SHORT ); value = locale.formatNumber( floatValue ); value = locale.formatName( user, BbLocale.Name.SHORT );
Displaying Data • Corresponds to Java libraries (see java.text.*), but with formats predefined to Blackboard UI conventions. • DateFormat.format() • DecimalFormat.format() • PercentageFormat.format()
Format Enumerations • BbLocale.Name • LONG, MEDIUM, SHORT, GREETING • BbLocale.Date • LONG, MEDIUM, SHORT • BbLocale.Time • LONG, MEDIUM, SHORT
What’s Wrong Here? String numberString = “100,000.00”; float floatVal = Float.parseFloat( numberString ); Number.parseType() methods do not perform locale-sensitive transformations. European locales, for example, use comma separators, instead of decimal separators. E.g., 100.000,00
Reading Data BbLocale locale = BbServiceManager .getLocaleManager() .getLocale(); float floatValue = locale.parseNumberAsFloat( input ); double doubleValue = locale.parseNumber( input );
Limitations • Pre-R7, B2 Manifest is single locale • Block installs as en_US, and always displays as en_US • E.g., if Locale is es_ES, links are not rendered properly • Pre-R7, Multi-byte locales not supported • Incompatible, single-byte locales not verified (e.g., ISO-8859-5 will not co-exist with ISO-8859-1)
Looking Ahead – Blackboard Academic Suite™ Release 7 • Complete internationalization • Additional extraction and localization • Platform changes to support additional, non-Latin languages • Multi-byte I/O support • Database (NVARCHAR, etc.) • UTF-8/16 encoding browser to application, application to database • UCS-2 on Windows
Looking Ahead – Blackboard Academic Suite ™ Release 7 • Blackboard Building Blocks Resource Bundles • Register bundles to display appropriate end-user text • Multiple Locales in Blackboard Building Blocks • Installation, default, and fall-back rules