1 / 48

Supporting Complex Scripts (such as Arabic and Hebrew) in your Windows 2000™ Application F. Avery Bishop Senior Program

Supporting Complex Scripts (such as Arabic and Hebrew) in your Windows 2000™ Application F. Avery Bishop Senior Program Manager Microsoft Corporation. Agenda:. Overview of character encoding, Unicode Guidelines for supporting complex scripts Right-to-left layout of applications

una
Télécharger la présentation

Supporting Complex Scripts (such as Arabic and Hebrew) in your Windows 2000™ Application F. Avery Bishop Senior Program

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Supporting Complex Scripts (such as Arabic and Hebrew) in your Windows 2000™ ApplicationF. Avery BishopSenior Program ManagerMicrosoft Corporation

  2. Agenda: • Overview of character encoding, Unicode • Guidelines for supporting complex scripts • Right-to-left layout of applications • Multilingual User Interface

  3. Overview of Character Encoding and Unicode

  4. Why do character set differences matter? • Historically, they fragmented code bases for both Windows and applications • Single byte: European editions • Double byte: Far East editions • Bi-directional: Middle East editions • Make it difficult to share data • Make it difficult to develop multilingual applications

  5. Example: Multiple Hebrew Character Encodings • 8bit Hebrew encodings still in use • Windows codepage 1255 • OEM (DOS) codepage 862 • Visual Hebrew encodings (many exist)

  6. Example: Multiple Arabic Character Encodings • 8bit Arabic encodings supported in Internet Explorer 4.0/CS • ASMO-708 • DOS 720 • ISO 8859-6 • Windows Codepage 1256 • Other proprietary encodings

  7. Logical vs Visual Encoding • Logical: • Storage order is same as typing order • Allows natural text processing: • Search • Resizing (e.g., in web pages) • IPC: Select, cut & paste • Visual • Natural text processing difficult or impossible • Cannot always map back to logical order

  8. What is Unicode? • A 16-bit character encoding • A mapping of characters to numbers • Syntax rules for display of complex scripts • Not a font or glyph encoding! • Not a sort algorithm! • Includes all characters in common use in modern scripts (and others) • Basis for the ISO 10646 character encoding standard • Native text encoding for Windows NT

  9. A 0041 9662 FF96 4F85 0000 0xFFFF Unicode™ / ISO 10646 Compatibility Private use Future use • 16-bit international character encoding • Windows 2000 uses Unicode version 2.0 Ideographs(Hanzi, Kanji, Hanja) Hangul Kana Symbols Punctuation Thai Indian Arabic, Hebrew Greek (null) Latin ASCII 0x0000

  10. Relatives of Unicode • ISO/IEC 10646 • 32 bit ISO standard of 64K X 64K “planes” • Unicode repertoire is plane 0 • UTF-7 • 7 bit transformation format • Not widely used • UTF-8 • 8 bit transformation format • Used in web pages and some email

  11. Unicode in Win32: the W and A Entry Points • Two kinds of window classes: Unicode, ANSI • Win32 API has two versions of most functions: • “W” (wide) version handles Unicode • “A” (ANSI – ) assumes the system default code page (character encoding)

  12. Unicode in Win32 … • Macros resolve to W or A entry point • Example: Macro for RegisterClassEx #ifdef UNICODE #define RegisterClassEx RegisterClassExW #else #define RegisterClassEx RegisterClassExA #endif • To create Unicode application: • Compile with –DUNICODE or • Use W routines explicitly

  13. For Applications that Must Also Run on Windows 98… • Use Unicode everywhere with single binary, two code paths: • On Windows NT use W entry points • On Windows 98, convert Unicode  ANSI, use A entry points • See sample GLOBALDV for example • See April Microsoft Systems Journal for details and other options

  14. Summary: Use Unicode if you can! • Represent all text with one unambiguous encoding • Support multilingual text easily • Avoid special processing for variable byte-length characters • Use standard encoding recognized throughout the industry and the world • Support new scripts that are only supported through Unicode

  15. Guidelines for Supporting Complex Scripts in Applications

  16. 1. Displaying Complex Scripts in Plain-text • In Win32 apps use standard edit control • Use standard win32 API display functions • Win32 APIs: ExtTextOutW or DrawTextW • ScriptString API in Uniscribe

  17. Pitfalls in Enabling for Complex Scripts • When displaying typed text: • Do not output characters one by one! • Do save text in a buffer and display the whole string with Uniscribe or Win32 API • To measure line lengths: • Do not sum cached character widths • Do use a GetTextExtent function or Uniscribe

  18. 2. Displaying Complex Scripts in Simple Formatted Text • In Win32 applications use rich edit control • In web pages for Internet Explorer 5.0, use Document Object Model

  19. 3. Displaying CS in Text with Advanced Formatting and Layout • Use script APIs (“Uniscribe”) • See MSJ article of November 1998

  20. Overview of Uniscribe Background and Purpose of Uniscribe Low level APIs High level APIs For details see November 1998 MSJ article

  21. The Uniscribe DLL: USP10.DLL • Platforms • Windows 2000 • Windows NT 4 • Windows 98 • Windows 95 (excluding Far East) • Single worldwide binary • Installs with Windows2000, IE5, Office 2000

  22. Hides language details • Syllable structure (Indian, Thai) • Contextual shaping (Arabic, Indic) • Caret placement (all) • Wordbreak (Thai) • National digits (Arabic, Indic, Thai) • Bidirectional layout (Arabic, Hebrew)

  23. Hides Unicode OS details • APIs are Unicode on all platforms • Hides glyph codes • Hides font differences • Shaping tables • Fixed repertoire fonts

  24. Uniscribe Structure Uniscribe Client GDI Itemize Unicode BiDi algorithm Measurer Arabic shaping engine Renderer GetCharABC - Hindi shaping engine CMAP & WidthsI width Shape, Tamil shaping engine GetGlyphOutline Place tables, Thai shaping engine Open - and Vietnamese shaping TextOut Type ExtTextOut library Display Hebrew engine ETO_ GLYPH_INDEX Layout Justify Caret Mouse XtoCP & CPtoX

  25. Shaping engines • Per script • Understand language rules • Understand font features • OpenType provides full control • Many older fixed layout fonts

  26. Application USER GDI LPK.DLL Uni-scribe

  27. Low level APIs Support • Formatting text • Style runs • Measurement • Paragraph filling • Rendering • Information needed for font fallback

  28. Summary • Script… • Itemize • Shape, Place • Break, Layout • TextOut • CPtoX, XtoCP

  29. High level APIS • Purpose • Analysis • Display • Font fallback

  30. Purpose For Windows 2000 ExtTextOut DrawText System edit control Cross-platform Unicode plaintext display Easier than low level APIs

  31. Summary of ScriptString APIs: • ScriptString… • Analyse • … query analysis ... • Out • Free • Provides simple font fallback

  32. Implementing Right-to-left Layout in Applications

  33. Background On RTL Layout (“Mirroring”) For BiDi Localization • Localized Arabic and Hebrew Windows® is laid out from Right to Left • In the past was done “ad hoc” or not at all • Windows 2000 and BiDi Windows 98 include mechanisms to “automatically” mirror shell and applications • Also helpful for multilingual user interface support

  34. Mirroring in System Based on Coordinate Transformation • Origin (0,0) in upper RIGHT corner of window • X scale factor = -1, x values increase from right to left Origin Origin Increasingx Increasingx 0 1 1 0 Default (LTR) Window Mirrored (RTL) Window

  35. More Background on Mirroring… • Developers use programming interfaces and Windows style bits • Automatic inheritance of RTL property: • Child window of RTL window defaults to RTL • You can disable inheritance of RTL Property • APIs provided to disable mirroring of bitmaps

  36. Implementing Mirroring in Win32 Applications:Standard Windows • Use SetProcessDefaultLayout: • Affects all Windows created thereafter • SetProcessDefaultLayout(LAYOUTRTL) ; • SetProcessDefaultLayout(0) ; // Reset to LTR • Or call CreateWindowEx: • Use extended style WS_EX_LAYOUTRTL • To inhibit mirroring in child windows, also set WS_EX_NOINHERITLAYOUT

  37. Changing Layout of Existing Window BOOL IsRTLLayout ; // TRUE iff window is to be mirrored // ... Get new value of IsRTLLayout LONG lExStyles = GetWindowLongA(hWnd, GWL_EXSTYLE) ; // Check whether new layout is opposite current layout if(!!(IsRTLLayout) != !!(lExStyles & WS_EX_LAYOUTRTL)){ lExStyles ^= WS_EX_LAYOUTRTL ; // Toggle layout // Set extended styles to new value SetWindowLongA(hWnd, GWL_EXSTYLE, lExStyles) ; // Update client area InvalidateRect(hWnd, NULL, TRUE) ; }

  38. Controlling Mirroring of a Device Context • SetLayout(HDC hDc, DWORD dwLayout) dwLayout = 0 ; // will layout LTR dwLayout = LAYOUTRTL ;// will layout RLT dwLayout = LAYOUTRTL | LAYOUT_BITMAPORIENTATIONPRESERVED ; // will layout RTL, but not bitmaps • GetLayout(HDC hDc, DWORD *pdwLayout)Tells what the layout settings are for a hDc

  39. Mirroring in Win32 Applications: Dialogs • Set WS_EX_LAYOUTRTL in dialog template • Visual Studio 6 Dialog editor: • Has option for RTL layout • BUG in Visual Studio 6: • Writes WS_EX_LAYOUT_RTL to RC file! • Must correct RC file by hand to compile • Will be fixed in future version

  40. Mirroring in Win32 Applications: Message Boxes • Set MB_RTLLAYOUT option bit

  41. Guidelines for using RTL Layout • Using coordinates • Use GetWindowRect with care • Use client, rather than screen coordinates • Do not mix screen coordinates and client coordinates • Use MapWindowPoints to map rectangles, instead of ClientToScreen and ScreenToClient • Windows 95 does not support mirroring!

  42. Implementing Multi-language User Interface in Applications

  43. Guidelines for Multilanguage User Interface • Initialize to current UI language • Windows 2000: GetUserDefaultUILanguage() • Others: Use the language of the O/S • See function InitUiLang in Globaldev sample code

  44. Guidelines for Multilanguage User Interface • Allow user to select UI language • Put language-dependent resources in resource DLLs • Use naming convention, e.g., res<LANGID>.dll • Find all resource DLLs, put up list box of choices • See module UPDTLANG.CPP in Globaldev Sample

  45. Summary • Use Unicode to encode if you can • Use controls to display text and accept user input • Use Uniscribe for advanced formatting • Use new RTL layout API for applications localized to RTL languages • Consider multilingual user interface

  46. Demo

  47. Questions?

  48. Further Information and Resources • http://www.microsoft.com/globaldev(Watch for updates!) • MSJ articles, e.g., • Uniscribe: http://www.microsoft.com/msj/1198/multilang/multilangtop.htm • Multilingual UI: http://www.microsoft.com/msj/0499/multilangUnicode/multilangUnicodetop.htm • Send suggestions to nlshelp@microsoft.com

More Related