1 / 38

Regular Object Types and X TATIC

Regular Object Types and X TATIC. Based on: A Paper by Vladimir Gapeyev and Benjamin C. Pierce A Presentation of the paper by Benjamin C. Pierce Presented by: Lena Lempert. Introduction. Regular types have been proposed as a base for statically typed processing of XML.

avery
Télécharger la présentation

Regular Object Types and X TATIC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regular Object Types andXTATIC Based on: A Paper by Vladimir Gapeyev and Benjamin C. Pierce A Presentation of the paper by Benjamin C. Pierce Presented by: Lena Lempert

  2. Introduction • Regular types have been proposed as a base for statically typed processing of XML. • However, regular types have only been explored in special-purpose languages – languages with type system designed around regular types (XDuce, CDuce, Xquery).

  3. Our objective • To develop XTATIC language, which goal is to bring regular types to a broader audience by offering them as a lightweight extension of a popular object-oriented language – C#.

  4. Key ideas of XTATIC • XTATIC data model - a combination of: • Tree-structured data model of XDuce • Classes-and-objects data model of object oriented language. • Treats XML structures as objects.

  5. FX : a core calculus for XTATIC • A formal core of the XTATIC design is being developed. • A tool for this investigation – a tiny language called FX. • FX features are drawn from: • FJ – Featherweight Java • The core of XDuce • Points of interest include: • A smooth interleaving of the two data models • A definition of “subtype” relation • A natural encoding of XML documents using singleton classes

  6. An XML fragment: <Person> <Name> Lena Lempert </Name> <Email> slempa@t2.technion.ac.il</Email> </Person> <Person> <Name> Queen Elisabeth </Name> <Phone> +44 55 6666 </Phone> </Person> The corresponding XTATIC value [ <Person> [ <Name> [ ‘Lena Lempert’ ] <Email> [‘slempa@t2.technion.ac.il’ ] ], <Person> [ <Name> [ ‘Queen Elisabeth’ ], <Phone> [ +44 55 6666 ] ] ] XTATIC example A type for this expression: [ <Person> [ <Name> [ pcdata ], (<Email> [ pcdata ] | <Phone> [ pcdata ] ) ] * ] , concetanation | union * repetition

  7. XTATIC example (cont.) • Sequence values can be examined using type-based pattern matching. • Example: If Person has Email – extracts the email to pcdata variable e and uses it to extend the text in spamlist • match (list) { • case [<Person>[ <Name>[pcdata], <Email>[pcdata e ] ], any rest ]: • spamlist = [ spamlist, ‘,‘, e ]; • case [<Person>[ <Name>[pcdata], <Phone>[pcdata] ] p, any rest]]: • phonebook = [[ phonebook, p ]]; • case []: //.. • } list – variable that contains sequence of the type given in previous slide Otherwise, the person must have Phone. Second case binds the whole entry to variable p and adds it to the phonebook sequence Empty sequence

  8. Data model • Data model of a language is: • The collection of values that programs in the language manipulate • The types of those values • Fundamental relations such as value typing and subtyping • Our primary goal –combination of trees and objects (and their types). Therefore we will concentrate on data model of FX, which is combination of data models of XDuce and FJ.

  9. The XDuce Data Model • XDuce – language of labels • Consists of: • A set of label values • A set of label types • A denotation function [[·]] giving the set [[ L ]]  L of label values that are members of each type L • The subtyping relation: L1ᆮL2 (L1 is a subtype of L2) iff [[L1]]  [[L2]] • Simple choice of label language: • for each value l  L , consider l to be a label type as well. Then l is the singleton type whose denotation contains just l. • A wildcard label type ~, denoting the whole set L.

  10. The XDuce Data Model (cont.) • Tree value t – consists of a label value and a sequence ofchildrentree values: t ::= <(l)>[t1, …, tn] where n ≥ 0 • XDuce types – regular types - regular expressions over an “alphabet” consisting of tree types <(L)>[X]: T ::= <(L)>[X] tree [] empty sequence T, T concetanation T | T union T* repetition • Subtyping relation for regular types: T1 < T2 iff [[ T1 ]]  [[ T2 ]]

  11. The FJ Data Model • FJ (Featherweight Java) is a tiny calculus designed to capture the essential typing mechanisms of class-based object-oriented languages such as Java and C#. • Included: the core mechanisms of objects creation, field access, method invocation, inheritance. • Ommited: interfaces, overloading, static members, concurrency, and even assignment!

  12. But how can we manage without assignment…? • Here’s the trick: • Demand that the fields of an object be initialized from it’s constructor arguments and never touched again. A class definition must have the form: class C { D1 f1; … Dn fn; C (D1 x1, …Dn xn) { f1=x1; … fn = xn } … method definitions… } • Now identify an object with the expression new C(a1, …, an) used to create it – i.e., just treat new expressions as values.

  13. The FJ Data Model (cont. 1) • An FJ program consists of a collection of class declarations plus a single expression to be evaluated. • FJtypes = class names C • FJvalues = objects o ::= new C(o1, o2, …, on) (n ≥ 0) • The constructor arguments o1, o2, …, on (usually written just ō) must correspond exactly to the fields of class C. • Example: private: a b C private: e f d = new D(a, b, e, f) D

  14. The FJ Data Model (cont. 2) • We say that a value C(ō) is a valid object of the class C if: • Its field valuesō conform to the field types declared for C • The denotation of a class C: • The set of all valid objects of the class C and its subclasses • Subtypingrelation: • C1ᆮC2 (C1 is a subtype of C2) iff [[C1]]  [[C2]]

  15. The FX Data Model • The interleaving of: • XDuce data model • FJ data model • Observation 1: • We can treat sequences of trees as objects • A special classSeq, whose subtypes are all the regular types. • All the tree values are transalated to the objects of the class Seq.

  16. The FX Data Model (cont.) • Observation 2: • We can treat the data model of classes and objects as a “label language” : • Objects–labels in XDuce trees • Classes–label types

  17. Values a ::= FX value new C(ā) object [ ] delimited sequence t ::= tree value <(a)> [ ] Types A ::= FX type C Class name [X] Regular type name [T] Regular type T ::= regular type <(A)> [X] tree type [] empty sequence T, T concetanation T | T union T* repetition FX values and types Full FX language Regular expression sub-language

  18. Program context • A program context is a tuple: Ctx = <Typenames, def, Classes, ᆮ:, fields, mtypes, mbody> Where: • Typenames – set of names for regular types • def – a function that maps each name in Typenames to its definition • Classes – a set of class names, containing special names Object and Seq • ᆮ:- a transitive subclass relation, such that C ᆮ: Object for all C, and such that Seq has no sub or super-types except Object. • fields – a list F1 f1 … Fn fn, such that fields(Seq) and fields(Object) are empty, and if C is a subclass of D, then fields(D) is a prefix of fields(C). • mtypes – method type • mbody – method body

  19. FX type membership • The syntax of values given until now allows ill-formed object values new C(ā) , where actual field values ā do not conform to the field types declared for class C in program context. • To correct this, we introduce a type membership relation a  A: value a is valid, if there is a type A, such that a  A. • Type denotation (set of values of the type): [[ A ]] = { a | a  A } • Denotation of Seq: • Does not contain objects ( new Seq(ā)) • Contains all valid sequence values • Subtyping in FX: A1 is a subtype of A2 [[A1]]  [[ A2]]

  20. The FX Language Syntax • The full-blown FX language syntax: • e := expression x value variable new C(ē) newobjectcreation e.f field access e.m(ē) method call <(e)>[ē] tree [ē] sequence match(e) { case [P]: ē} pattern match

  21. The FX Language Syntax (cont.) • Q := FX pattern C class [X] pattern name [P] regular pattern Q x FX var binding • P := Regular Pattern <(Q)>[P] tree [] empty sequence P, P concetanation P | P alternative T* type repetition P x regular var binding

  22. FX syntax - explanations and constraints • Types (types of fields, or appearing in method signatures) – can be regular types as well as classes. • Variables– can hold any FX values, either objects or sequences. • Only tree values can be members of sequence values. • [ [t], (new C(a)), [s]] –not allowed! • Sequence expressions – nested sequences allowed! • The reason - we want the following expression to be legal: [ db.getPapers(“POPL”), db.getPapers(“ICFP”)] If the method getPapers() returns values of sequence type. • An object is never legal as a member of sequence. • A tree expression <(e)>[d] is never allowed outside of sequence parentheses[…].

  23. Pattern matching • Deconstruction of sequence values is done by matching them against patterns using the match construct. • Syntactically resembles C# switch • Behaves like XDUCE match match(d) { case [P1]: e1; case [P2]: e2; … case [Pn]: en; } A sequence A sequence pattern No “fall through”

  24. Patterns matching (cont. 1) • The syntax of FX patterns [P]: • As in XDUCE – a pattern is just a type annotated with variable binders. • A class pattern – has the form C x (C – class name, x – variable to be bound). • In pattern matching, we do not examine object fields for conformance with the declared field types – only the class tag of the object is checked (in contrast to the type membership relation). • Similarly, in pattern matching, the validity of sequences is not checked. • This is safe, as only valid objects and sequences exist at a run time.

  25. Patterns matching (cont. 2) • We can use a class pattern in the label possition in a tree pattern. • Classes can be types of labels in tree types. • Allows to extract a label from a tree as an object, for a later use in the program.

  26. Properties • We can now state for FX the standard results of static type safety: • Preservation • Progress • Formal definitions: • Avalue environment Σ • Atyping environment Γ • Σ conforms to Γ ( Σ Γ), if: • dom(Σ) = dom(Γ) • Σ(x)  Γ(x), for all x • ● - an environment with an empty domain

  27. Properties (cont.) • Expression e gets type A, in the typing environment Γ Γ ├ e  A • Expression e evaluates to value a, in the value environment Σ Σ ├ e ↓ a • Evaluation of e gets stuck in a finite number of steps: Σ ├ e ↓

  28. Properties: proposition • Proposition(pattern matching preserves validity): • Let a  A • Q – a pattern • If • a ►Q => Σ and • ►Q => Γ • Then • A <: tyof(Q) and • Σ Γ • tyof(Q) – type obtained from Q by erasing value binding annotations.

  29. Properties: preservation and soundness • Preservation: • For Σ Γ , if • Γ ├ e  A and • Σ ├ e ↓ a • Then • a  A • Soundness: • If ●├ e  A • Then not●├ e ↓

  30. XML in FX • How the “leaf data” (PCDATA) can be treated? • We extend the C# data model by introducing singleton classes for individual characters. • The program context Ctx provides: • A class Char (standard C# character class) • For each character c – a class CharC extending Char. • Each CharC contains a single object – new CharC()

  31. XML in FX - pcdata • We can define a regular type pcdata, representing XML character data: • def(pcdata) = (<(Char)>[])* a sequence of trees, where each tree: • Has no children • Has a character object as its label • <(Object)>[pcdata] – a tree whose body contains only character data.

  32. Why not use C# String? • First reason: • pcdata representation opens the way to interesting uses of pattern matching for string regular expression processing. • Since Chara is a subtype of Char – we can write types that restrict text to a particular form. • Example: • All character sequences starting with ‘a’ and ending with ‘b’: <(‘a’)>[], pcdata, <(‘b’)>[]

  33. Why not use C# String? (cont.) • Second reason: • In XML, two character sequences following each other are indistinguishable from a single larger character sequence. • pcdata–satisfies this requirement [pcdata, pcdata] = [<(Char)>[]*, <(Char)>[]*] = [<(Char)>[]*] = [pcdata] • String–does not satisfy this requirement [<String>[], <String>[]] ≠ [<String>[]]

  34. The encoding of XML documents in XTATIC • Encoding of XML tags • Exactly the same intuition we used for characters! • A special class Tag • For each tag <g> - a singleton class Tag<g> • Tag <g> is a subclass of Tag • a single object – new Tag<g>()

  35. The encoding of XML documents in XTATIC - example • XML fragment: <basket> <apple/> <banana/> </basket> • XTATIC value: < new Tag<basket>()>[< new Tag<apple>()>[], <new Tag<banana>()>[] ] • XTATIC type: <Tag<basket>> [ <Tag<apple>>[], <Tag<banana>>[] ]

  36. Status • FX language definition : more or less finished • Prototype typechecker / interpreter for FX : running • Pattern match compilation : underway • Run-time system: just starting • Extension with attributes: underway

  37. Some of the remaining challenges • Run-time representation issues • Exploring alternative pattern matching primitives • Dealing with update operations on XML structures • Possible approaches: • Leave type system alone; use run-time checking to maintain safety • Add types for mutable XML structures • Namespaces • Additional XML features (e.g. from XML-Schema) • Integration with polymorphizm (generics) • Dealing with large XML structures (streaming)

  38. Related work • Current work at MS on integrating “native XML types” with C# • Work on adding regular expression types and patterns to OCaml • CDuce • Relax-NG

More Related