1 / 25

Mapping Techniques: Demonstration of automatic data transformation

Mapping Techniques: Demonstration of automatic data transformation. Maria Theodoridou ICS-FORTH. April 2004. The Problem. Convert data of various formats into a common XML format compatible with the CIDOC Conceptual Reference Model Input data: Databases Text files Spreadsheet files

phuoc
Télécharger la présentation

Mapping Techniques: Demonstration of automatic data transformation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mapping Techniques: Demonstration of automatic data transformation Maria Theodoridou ICS-FORTH April 2004

  2. The Problem Convert data of various formats into a common XML formatcompatible with the CIDOC Conceptual Reference Model • Input data: • Databases • Text files • Spreadsheet files • Output files: • XML files, compatible with the CIDOC Conceptual Reference Model

  3. Data Junction 7.5 An integration/migration tool designed to convert structured data from one format to another. Also designed to clean and restructure the data to fit the new format. Market Leader in its Field Map Designer: transfers data among a wide variety of data file types. To transfer data, you design and run a conversion. Each conversion contains all the information needed to convert data from an existing data file or table to a new Target data file or table. Conversions are done in three Steps: • Define options for the source file • Define options for the target file • Define appropriate mapping from the source to the target file using all the functions, expressions, actions and events the tool offers

  4. Step 1 (Source) • Define Source Type • ASCII, Access, Excel, SQL, Oracle etc. • Define Source File • Each text source filewill be separated into fields and records • Field is a category of things, which have a common characteristic • Things of the same field are listed in separate records below the field • We use field and record delimiters

  5. Step 1 (Source) • We can sort our data descending or ascending depending on one or more fields • By using the “Source Filters” we can keep only the records we want • We can have the field names by using a header in the source file • The parsing is done serially and every record is listed below the other • From the Data Browser we see all the fields, records and the data

  6. Step 2 (Target) • Define Target Type • ASCII, Access, Excel, Lotus, SQL, Oracle etc. • Define Target File • We can use an existing record layout from: • A previous conversion • An XML file, an XML DTD, an XML Schema

  7. General about a conversion • Every conversion uses the Event Handlers, the Actions, the Expressions and the Functions • Every Event, from a predefined set, contains one or more actions • Every action has either a predefined set of parameters or uses an expression • Every expression can use one or more of the predefined or the user defined functions • We can also declare global variables, which can be used by every expression within a conversion.

  8. Events • Predefined set of Events • They occur at a specific moment with a predefined order in the conversion cycle • Events are separated into Source Generic Events, Target Events, Conversion Events, and Record Type Events • Examples of Events: • BeforeConversion,AfterConversion,AfterEveryRecord,OnDataChanged,BeforeMap,AfterMap,BeforePutRecord,OnDiscard

  9. Actions Actions have their own format and parameters: • Execute: executes an expression we specify • Clear: Clears the contents of a target record memory buffer • Map: maps a specific target record type • Put: writes the results of a target record type to a specific position in the target file • ClearMapPut: combines the functionality of Clear, Map, and Put • Abort: causes the conversion to abort

  10. An Example • We Map and then Put the target record types “CRM_Entity”, “is_identified_by” and “has_current_owner” • The global variable “id” helps us map the appropriate fields to the “is_identified_by” target record type

  11. Expressions • They offer conditional mapping and data modification • Each expression is written for an “Execute” Actionor is a Target Field Expression • We use the same syntax in every expression • Every expression is written in the expression builder window

  12. A target field expression: If (IsNull(Fields("nr"))) Then "NULL" Else "OID:LUPA."&Fields("nr") End If An expression example

  13. Functions Set of predefined functions separated in categories: • Conversion: typecast a string or a variable • Date/Time: convert strings into several formats of date and time • Math: return the sign, absolute, logarithm etc. • Parse: parse a string into sub strings by using delimiters

  14. Functions • Text: replace, compare, count, trim strings • File: read, copy, append and rename a file or get the name of a directory or make a new directory • Flow Control: For … Next, Go To, If … Then … Else, Return, Select Case … End Select, While … Wend. • We can also have user-defined functions, save them as a Visual Basic Module File (.bas)and call them within the conversions we want

  15. Step 3 (Map) • Here we define the Events, Actions, Expressions and Functions we use • We define the order and type of the fields of every target record type • A target record type may contain a lot of other record types • When a field is selected to be a record, then it automatically links to a record type, which has the same name with the field

  16. Lupa transformation Lupa – Stone data base of 7000 archaeological records, City of Vienna, Austria Microsoft Access to XML conversion

  17. Lupa transformation Source

  18. Lupa transformation Target XML file

  19. Field name CIDOC CRM correspondence Lupa transformation Map Nr CRM_Entity->P47F_is_identified_by->E42_Object_Identifier Bilder CRM_Entity->P129B_is_subject_of->E38_Image Titel CRM_Entity->P102F_has_title->E35 Title Objekt CRM_Entity ->P2F_has_type-> E55_Type Fundort CRM_Entity -> P12B_was_present_at-> E5_Event Bezirk CRM_Entity->P12B_was_present_at->P7F_took_place_at->E53_Place Verwahrort CRM_Entity->P55F_has_current_location->E53_Place Literatur CRM_Entity->P70B_is_documented_in->E31_Document Inschrift CRM_Entity->P65F_shows_visual_item->P142F_has_clear_text Inschrift_Abfrage CRM_Entity->P65F_shows_visual_item->P141F_has_transcription

  20. CIL transformation CIL –Corpus Inscriptionum Latinarum 25,150 roman inscriptions text to XML conversion

  21. CIL transformation Source CIL 13, 00029* = AE 1994, 1194. Abelloni / deo / (A)elia(?) Mar/[6] / v(otum) s(olvit) CIL 13, 00118* (AE 1989, 0519). D(eo?) M(ercurio?) / pro sal(ute) Imp(eratoris) M(arci) Aur(eli) Antonini / Fel(icis) Aug(usti) Brit(annici) p(ontificis) m(aximi) tr(ibunicia) p(otestate) XV imp(eratoris) / II co(n)s(ulis) II des(ignati) III p(atris) p(atriae) C(aio) Iul(io) / Drutedo et Balorice Taur(i) f(ilio) / ex v(oto)

  22. CIL transformation Source ContentExtractor is a report reader designed to mine important data out of unstructured report formats such as invoices, bank statements, emails, html docs, etc., and turn that information into a database We used Content Extractor to convert unstructured text to Microsoft Excel file format

  23. CIL transformation Target XML file

  24. Field name CIDOC CRM correspondence CIL transformation Map Field 1 CRM_Entity->E34 Inscription->P65F_shows_characters Field 1 CRM_Entity->E34 Inscription->P141F_has_transcription Field 2 CRM_Entity->P70B_is_documented_in->E31_Document Field 2 CRM_Entity->P70B_is_documented_in->E31_Document

  25. Thank You!

More Related