1 / 21

Character Set and Language Negotiation in Z39.50 Version 3

Character Set and Language Negotiation in Z39.50 Version 3. Scope. Negotiate language of messages Negotiate character set of InternationalString Z39.50 “message” strings Optionally retrieve records in negotiated character set Character set negotiation only valid for version 3.

cleave
Télécharger la présentation

Character Set and Language Negotiation in Z39.50 Version 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Character Set and Language Negotiation in Z39.50 Version 3

  2. Scope • Negotiate language of messages • Negotiate character set of InternationalString • Z39.50 “message” strings • Optionally retrieve records in negotiated character set • Character set negotiation only valid for version 3 Stockholm, 10 August 1999

  3. Negotiation Basics • Carried in UserInfo external object in Init • Similar to option negotiation • origin proposes list of possibilities • target selects one from list • Only a single round of negotiation takes place • Applies to complete session • Cannot change during session Stockholm, 10 August 1999

  4. UserInfoFormat-charSetandLanguageNegotiation-2{1 840 10003 10 2} DEFINITIONS ::= BEGIN CharSetandLanguageNegotiation ::= CHOICE { proposal [1] IMPLICIT OriginProposal, response [2] IMPLICIT TargetResponse } Stockholm, 10 August 1999

  5. Character Sets • ISO 2022 is “code page” approach to character set • ISO 10646 is ~ Unicode • Different procedures for negotiating character sets: • ISO 2022 • ISO 10646 • Can negotiate “private” character set Stockholm, 10 August 1999

  6. OriginProposal ::= SEQUENCE { proposedCharSets [1] IMPLICIT SEQUENCE OF CHOICE{ iso2022 [1] Iso2022, iso10646 [2] IMPLICIT Iso10646, private [3] PrivateCharacterSet} OPTIONAL, -- proposedCharSets must be omitted -- if origin proposes version 2 } Stockholm, 10 August 1999

  7. ISO 2022 • Supports 7- and 8-bit environments • “Page” is 96 graphic characters (“G set”) and 32 control characters (“C set”) • 2 G pages active at any one time (G-Right [hex 20-7F], G-Left [hex A0-FF]) • 2 C sets active (C0 [00-1F], C1 [80-9F]) • Can define 4 G pages and swap into GL, GR as needed Stockholm, 10 August 1999

  8. ISO 2022 Escapes • Assign character sets to pages G0-G3, C0-C1 • Make G pages active in GL, GR • Character sets identified by 1 or 2 characters in the escape sequence • Character sets and the escape sequences to identify them are registered : • http://www.itscj.or.jp/ISO-IR/index.htm Stockholm, 10 August 1999

  9. ISO 2022 negotiation • Negotiate initial assignment of G0-G3 • Negotiate initial assignment of GL, GR • Sequence of origin proposals for all of these • Target response chooses one of these proposals • In absence of negotiation must assume IRV in GL with GR undefined • no characters above hex 7F Stockholm, 10 August 1999

  10. Iso2022 ::= CHOICE{ originProposal [1] IMPLICIT SEQUENCE{ proposedEnvironment [0] Environment OPTIONAL, proposedSets [1] IMPLICIT SEQUENCE OF INTEGER, proposedInitialSets [2] IMPLICIT SEQUENCE OF InitialSet, proposedLeftAndRight [3] IMPLICIT LeftAndRight }, } Environment ::= CHOICE{ sevenBit [1] IMPLICIT NULL, eightBit [2] IMPLICIT NULL } Stockholm, 10 August 1999

  11. InitialSet::= SEQUENCE{ g0 [0] IMPLICIT INTEGER, g1 [1] IMPLICIT INTEGER, g2 [2] IMPLICIT INTEGER, g3 [3] IMPLICIT INTEGER, c0 [4] IMPLICIT INTEGER, c1 [5] IMPLICIT INTEGER } LeftAndRight ::= SEQUENCE{ gLeft [3] IMPLICIT INTEGER {g0 (0), g1 (1), g2 (2), g3 (3)}, gRight [4] IMPLICIT INTEGER {g1 (1), g2 (2), g3 (3)} } Stockholm, 10 August 1999

  12. ISO 10646 • Defines a single set of 1032 possible characters (4+ billion !!!) • Divided into “planes” of 1016 characters • Only first plane currently has characters defined: “Basic Multilingual Plane” (BMP) • BMP is co-terminous with Unicode • Z39.50 negotiates ISO 10646, not Unicode per se Stockholm, 10 August 1999

  13. Unicode Encoding Rules • UCS-4:32-bit characters • UCS-2: 16-bit character encoding with “surrogate” mechanism for characters in planes above 0 • UTF-16: like UCS-2 • UTF-8: 8-bit character encoding, with variable length multi-byte characters for all characters other than first 128 Stockholm, 10 August 1999

  14. UTF-8 • Intended to be a “file system safe” encoding • Guarantees that every character with value below hex 80 is an ASCII character, including hex 00. • All characters with values above 7F are encoded as 2, 3 or 4 bytes • Transformation between UTF-8 and UCS-2 is simple and efficient Stockholm, 10 August 1999

  15. Negotiating ISO 10646 • Specify the “character repertoire” (i.e. the subset of the full UCS that will be used) • Specify the encoding • Handled by object identifiers • For Unicode: • character repertoire is the full BMP • encoding can be UTF-16 or UTF-8 Stockholm, 10 August 1999

  16. Iso10646 ::= SEQUENCE{ collections [1] IMPLICIT OBJECT IDENTIFIER, -- oid of form 1.0.10646.implementationLevel -- .repertoireSubset.arc1.arc2. .... -- [use 1.0.10646.1.2.1.3 for Unicode] encodingLevel [2] IMPLICIT OBJECT IDENTIFIER -- oid of form 1.0.10646.0.form -- where value of 'form' is 2, 4, 5, or 8 -- for ucs-2, ucs-4, utf-16, utf-8 Stockholm, 10 August 1999

  17. Language Negotiation • Instances of InternationalString are either “message” or “name” • Language negotiation applies to “message strings” • Origin proposes one or more language codes • Codes from Z39.53 • Target may choose 1 of these proposed codes Stockholm, 10 August 1999

  18. proposedLanguages [2] IMPLICIT SEQUENCE OF LanguageCode OPTIONAL, recordsInSelectedCharSets [3] IMPLICIT BOOLEAN OPTIONAL -- default 'false’ Stockholm, 10 August 1999

  19. initRequest { -- SEQUENCE referenceId -- "9" --, protocolVersion 'e0'H, options 'eda2'H, preferredMessageSize 15000, exceptionalRecordSize 15000, implementationName -- "Amicus Professional Workstation" --, implementationVersion -- "3.0” --, otherInfo { -- SEQUENCE OF { -- SEQUENCE category { -- SEQUENCE categoryTypeId {1 2 840 10003 10 2}, categoryValue 0 }, information externallyDefinedInfo { -- SEQUENCE direct-reference {1 2 840 10003 10 2}, encoding single-ASN1-type proposal { -- SEQUENCE proposedCharSets { -- SEQUENCE OF iso10646 { -- SEQUENCE collections {1 0 10646 1 2 1 3}, encodingLevel {1 0 10646 1 0 8} }, Stockholm, 10 August 1999

  20. iso2022 originProposal { -- SEQUENCE proposedEnvironment eightBit NULL, proposedSets { -- SEQUENCE OF 2, 1000, 1001, 1002, 1003, 1, 67 }, proposedInitialSets { -- SEQUENCE OF { -- SEQUENCE g0 2, g1 1001, g2 1001, g3 1001, c0 1, c1 67 } }, proposedLeftAndRight { -- SEQUENCE gLeft 0, gRight 1 } }, Stockholm, 10 August 1999

  21. proposedlanguages { -- SEQUENCE OF -- “ENG” }, recordsInSelectedCharSets TRUE } } } } } Stockholm, 10 August 1999

More Related