540 likes | 678 Vues
FIGIS’ML hands-on. creating XML documents for FIRMS factsheets. AN INTRODUCTION TO XML What is XML an XML document what is a DTD, XSD HOW TO CREATE XML DOCUMENTS FIRMS FACTSHEETS Marine Resource fact sheets: FIGIS AND FIRMS HOW FIGIS IS ORGANISED FIGIS MAIN STRUCTURE
 
                
                E N D
FIGIS’ML hands-on creating XML documents for FIRMS factsheets
AN INTRODUCTION TO XML • What is XML • an XML document • what is a DTD, XSD • HOW TO CREATE XML DOCUMENTS • FIRMS FACTSHEETS • Marine Resource fact sheets: FIGIS AND FIRMS • HOW FIGIS IS ORGANISED • FIGIS MAIN STRUCTURE • CREATING Marine Resource fact sheets • 1) CREATING OBJECTS • 2) REFERENCING OBJECTS • 3) ADVANCED TAGGING: TOPIC • 4) ADVANCED TAGGING: FORMATTING CONTENTS
An introduction to XML • What is XML? : • XML stands foreXtensible Markup Language • XML is not itself a markup language but it can be considered as a set of rules for building markup languages • XML is extensible : you can create your own tags. XML doesn’t define any markup elements: every user needs to make up his own markup language to express his information in the best way possible. But it is important to follow strictly the XML syntax. • XML deals with content and structure: it allows to contain and manage information with markup
Presentation: the main difference between XML and HTML • Separation of style and meaning is a very important matter in xml. • Being Presentation the way how a document look like, this should be not comprised in an XML document. • The layout of an XML document is assigned trough another document called stylesheet. • Thus, like HTML XML makes use of tags (words bracketely by '<' and '>') and attributes (of the form name="value"). While HTML specifies what each tag and attribute means, and often how the text between them will look in a browser, XML uses the tags only to delimit pieces of data, and leaves the interpretation of the data completely to the application that reads it. In other words, if you see "<p>" in an XML file, do not assume it is a paragraph. Depending on the context, it may be a price, a parameter, a person.... • Keeping style out of the document enhances your presentation possibilities. It is indeed possible to apply many different stylesheet for a single xml document and different versions of the same information can be created on the fly. • Document implemented with stylistic information as for example an html document is difficult to repurpose, update or convert in other forms.
XML and HTML • HTML • pros : widely used, simple to use; • cons : mixes data and formatting, non-extensible • XML • handles only the content - formatting is done by stylesheets • the tags are defined by the authors • tags definition and structure is handled by dictionnaries (DTD) • XML is platform indipendent • It allows View, Reuse, multiple use of the same document, load database
Information dealing with the presentation are stored Elsewhere, this has many benefits: 1)use of the same style settings for many documents (e.g. figis species factseets); 2)change the layout to a set of documents can imply the updating of only one single file; 3)you can swap stylesheets for different purpose for example one for web displaying, one for printing (e.g Sidp, Figis species factsheets); 4)the document structure and content can not be messed up by changing its layout.The purity of the information structure does not get in the way of format conversions. XML DOCUMENT Thunnus thynnus Sidp factsheet Thunnus thynnus Sidp factsheet on FIGIS Thunnus thynnus
Other important charachteristics of XML • - XML is an open standard managed by the World Wide Web Consortium (W3C) • This means that is not tied to the fortunes of any single company: it is considered to be platform indipendent technology. • http://www.w3c.org • - Instead to create new tags in many environment people working in information systems are trying to set standards tags to be used for exchanging data. The goal of writing unambiguouse structure makes writing XML markup more difficult, but still it is easy to read and parse by humans and programs alike. One of the most recent application is through Internet Media using RSS feeds. • - XML is text-based
An introduction to XML • What is XML? : an example <?xml version="1.0" encoding="UTF-8"?> <fi:FIGISDoc xmlns:fi="http://www.fao.org/fi/figis/devcon/" xsi:schemaLocation="http://www.fao.org/fi/figis/devcon/ http://figis01/Dtd/Beta/3.5/firms_schema/editor/aqres_editor.xsd" xml:lang="en"> <fi:AqRes> <fi:AqResIdent Status="1" Factsheet="true" RefObservation="false"> <fi:FigisID>10008</fi:FigisID> <!-- this is the firms name --><dc:Title>Albacore - Atlantic and Mediterranean Sea</dc:Title><fi:SpeciesList Type="Target"> <fi:SpeciesRef Taxonomy="Species"> <fi:ForeignID CodeSystem="Scientific_name" Code="Thunnus alalunga"/> <dc:Title Type="FIRMS">Albacore</dc:Title> </fi:SpeciesRef> </fi:SpeciesList> <fi:ReportingYear>2003</fi:ReportingYear> <fi:AdditionalRefData> <dcterms:Alternative> "Albacore"</dcterms:Alternative> </fi:AdditionalRefData> </fi:AqResIdent>
an XML document Document Type declaration XML declaration <?xml version="1.0" encoding="UTF-8"?> <fi:FIGISDoc xmlns:fi="http://www.fao.org/fi/figis/devcon/" xsi:schemaLocation="http://www.fao.org/fi/figis/devcon/ http://figis01/Dtd/Beta/3.5/firms_schema/editor/aqres_editor.xsd" xml:lang="en"> <fi:AqRes> <fi:AqResIdent Status="1" Factsheet="true" RefObservation="false"> <fi:FigisID>10008</fi:FigisID> <!-- this is the firms name --><dc:Title>Albacore - Atlantic and Mediterranean Sea</dc:Title><fi:SpeciesList Type="Target"> <fi:SpeciesRef Taxonomy="Species"> <fi:ForeignID CodeSystem="Scientific_name" Code="Thunnus alalunga"/> <dc:Title Type="FIRMS">Albacore</dc:Title> </fi:SpeciesRef> </fi:SpeciesList> <fi:ReportingYear>2003</fi:ReportingYear> <fi:AdditionalRefData> <dcterms:Alternative> "Albacore"</dcterms:Alternative> </fi:AdditionalRefData> </fi:AqResIdent> root element attribute value of the attribute element element entity
an XML document • Prolog • Elements • they define the document’s content dividing it into its constituent parts. • they can contain other elements, text or both. • they can be empty • Attributes • add information about one element • one attribute can only appear once in one element • attributes can only contain text • Entities • use an entity in place of not allowed characters (e.g. “&”= “&” ; “<“=“<”...) • can be used as “shortcuts” • are similar to variables
an XML document : the prolog The top of an XML document contains special information called document prolog. The prolog says that this is an XML document and it declares the version of XML being used. It can hold additional information as the document type definition being used, text encoding and istructions to XML processors. • XML Declaration: it tells to the processor that it needs an XML parser to interpret the document. • always the first line of an XML document • can be omitted (not recommended) • the simplest always like : <?xml version="1.0”> • can contain language-related info :<?xml version="1.0" encoding="UTF-8"?> • Document type declaration: it describes the root element type and designates a Document Type Definition or a XML Schema Definition e.g.:<fi:FIGISDoc xmlns:fi="http://www.fao.org/fi/figis/devcon/" fi:FIGISDoc in this case is the root element. The root element is the first XML element to appear in the document and therefore it is the one that contains the rest of the document. A SYSTEM identifier specifies the location of the DTD (Document Type DEFINITION) or the XSD (XML Schema Definition).
An introduction to XML • Creating well-formed XML documents • well-formed means that the document respects the XML rules • the main rules are : • there must be a root • there must be at least one element inside the root • all elements must be properly imbricated and closed • elements, attributes and entities are case-sensitive • attribute values have to be quoted
An introduction to XML • Valid XML documents • an XML document is valid when : • it follows the structure that has been defined • all the elements respect the defined rules set in the XSD or in the DTD • valid elements • valid attributes • optional and mandatory elements/attributes • the validation rules are defined in a “dictionary” : • a Document Type Definition (DTD) • a XML Schema Definition (XSD)
An introduction to XML • Valid XML documents • well-formed doesn’t imply valid • well-formed means syntaxically correct • valid is similar to grammatically correct • but a valid document is always well-formed
An introduction to XML • Valid XML documents : schema • The purpose of a schema is to define a class of XML documents. As DTD it sets out what names are to be used for the different types of element, where they may occur, and how they all fit together. • schemas are written in XML • they don’t require learning a new language • they can be handled and transformed as normal XML documents • schemas can contain more complex rules that DTDs • conditions • tests
An introduction to XML • creating XML documents • XML is pure text • XML documents can be created manually • XML editing softwares make the task easier • check for well-formedness and validity • provide a graphical user interface • Various solutions exist : • XMLSpy by Altova (Windows) • Xmetal by SoftQuad(Windows) • Morphon XML Editor Suite by Morphon (MacOS) • ElfData XML Editor by ElfData (MacOS)
An introduction to XML • Creating XML documents : • hands-on : creating a document with XMLSpy • example of an order form : • 1 order : reference 120136 • two items in the order : • 1 CD, “Groovy beats” by Howie C., ed. “Average Records”, available with 2-days delivery • 2 copies of the book “Piano for dummies - vol1” by K. Board, published by “Dubious & P. Ano”, available with 1-week delivery
An introduction to XML • Creating XML documents : • hands-on : creating a document with XMLSpy <?xml version="1.0" encoding="iso-8859-1"?> <ORDER orderref="120136"> <ITEM type="CD" quantity="1"> <AVAILABLE status="yes" time="2days"/> <TITLE>Groovy beats</TITLE> <AUTHOR>Howie C.</AUTHOR> <EDITOR>Average Records</EDITOR> </ITEM> <ITEM type="book" quantity="2"> <AVAILABLE status="yes" time="1week"/> <TITLE>Piano for Dummies - vol1</TITLE> <AUTHOR>K. Board</AUTHOR> <EDITOR>Dubious & P. Ano</EDITOR> </ITEM> </ORDER>
An introduction to XML • using XML documents • exchanging data between systems • from a DB to another DB • data manipulation by softwares • submitting data to a system • automatically (software) (e.g. conversion from firms excel inventory to xml) • manually (creation of content) • presenting data to the user • web pages • data export in any format (text, pdf, etc.)
XMLtemp files Text documents FIGIS application(or FIGIS-like) FIGIS DB XML documents FIGIS loader DB End user • using XML documents : within FIGIS Original data data load user’s query query result
DTD DTD DTD Text documents XML documents FIGIS DB user • using XML documents : FIGIS and the world FAO PARTNERS Graphical user interface : website
Load XML Reports (.xls; .doc) XML documents user • using XML documents : FIRMS and the world XML converter FIRMS FACT SHEET PARTNERS On Line Editor
Creating XML content for FIGIS, FIRMS • Objectives : • Organisation of data in FIGIS • Creating “FIGISML” documents • Creating “FIGISML” objects • Referencing “FIGISML” objects • Creating documents : hands-on • Detailed structure • Advanced tagging
Creating XML content for FIGIS • Organisation of data in FIGIS • data is organised by “domains” • e.g. : aquatic species, marine fisheries, fishing technologies, aquaculture.. • each topic handles “objects” • e.g. : species, gear types, marine resource ... • complex objects can be defined using simple objects • e.g. : a fishing technique is defined by a gear, a target species and a vessel
Creating XML content for FIGIS • Organisation of data in FIGIS • generic structure of objects : • the OBJECT SOURCE block • it contains information about the origin of the data • the IDENTITY block • it contains the definition of the object • the TOPIC block • contains a description of the object : what makes it different from another of the same type • contains any other information on the object : factual data
Creating XML content for FIGIS • Organisation of data in FIGIS • the XML documents reflect the structure of the objects they contain • a document can contain one or more objects • e.g. : two species • an object can contain other objects • e.g. : a resource composed of three stocks
Creating XML content for FIGIS • High level structure of “FIGISML documents” • an XML document created for FIGIS always starts with : • fi:FIGISDoc as the root element • followed by • a OBJECT SOURCE block (fi:ObjectSource) • one or more OBJECT blocks (fi:AqSpecies (FIGIS Species fact sheet), fi:GearType (FIGIS Gear fact sheet),fi:AqRes (FIRMS Marine Resource fact sheet)...)
Creating XML content for FIGIS Creating documents : FIGISDOC Domains currently available in FIGIS The root of the document
Creating XML content for FIGIS Creating documents : Object Source Elements that can be used within the Object Source (fi:ObjectSource)
Creating XML content for FIGIS Creating documents : Source of Information Elements that can be used within the Source of Information (fi:Sources)
Creating XML content for FIGIS • Creating documents : OBJECT SOURCE • the Object Source is important : • for quality assurance • for ownership of the data • for version management • for observations management
Creating XML content for FIGIS • Creating objects • generic structure of objects :example for AqRes • the OBJECT SOURCE block • Collection Ref, Cover Page, Corporate Cover Page • the IDENTITY block • FigisIdentifier (FIGISId), Title, Alternative Title, SpeciesList, WaterAreaList, Reporting Year, Foreign Id • the TOPIC block - History, Habitat and Biology, Geographical Distribution, Water Area Overview, Resource Structure, Exploitation, Statistics, Assessment, Management, Biological State and Trend - Source of Information, Bibliography, Related Resources
Creating XML content for FIGIS • Creating objects : the Object Source • each fact sheet has its own Object Source • 3 components: Data Collection Owner, Cover Page, Corporate Cover Page <fi:ObjectSource> <fi:Owner> <fi:CollectionRef> <fi:FigisID>6</fi:FigisID> <!--ICCAT SCRS Reports--> </fi:CollectionRef> </fi:Owner> <fi:CorporateCoverPage> <fi:FigisID>6</fi:FigisID><!--Stock status report --> </fi:CorporateCoverPage> <fi:CoverPage> <dcterms:Created>2005-08-05</dcterms:Created> </fi:CoverPage> </fi:ObjectSource>
Creating XML content for FIGIS • Creating objects : the Object Source Data Collection Owner : A Data collection is a set of homogeneous data handled over time by a data owner according to agreed and consistent processes and dissemination formats; as such it may cover data types from different domains. A Data collection is also the primary level of definition of user rights, hence is systematically associated with data owner institutional name.
Creating XML content for FIGIS • Creating objects : the Object Source Cover Page: The cover page is composed by a set of public bibliographic-like information. It is modelled to adapt the traditional paper publishing logic (made of a cover page wrapping a thick intellectual content) to the internet publishing logic (fact sheets can be considered short electronic pages part of a broader virtual book). Most of the information used to build the citation of a domain object observation comes from the cover page attributes. dcterms:Created is the date of creation of the intellectual content. dcterms:Modified is the date of modification of the intellectual content. dc:Language element for each language in which the resource is available
Creating XML content for FIGIS • Creating objects : the Object Source Corporate Cover Page : A cover page is attached to each observation made on a domain object (e.g. marine resource, fishery, etc...). In general, a set of observations issued by the same data owner under the same data collection will have the same cover page, more precisely at least part of their cover page attributes will be the same. This group of shared attributes is called “Corporate Cover Page”. A Corporate Cover Page is defined by a FIGIS reference (fi:FigisID) or (exclusive) by a set of elements defining an unreferenced Corporate Cover Page. For an unreferenced Corporate Cover Page, the Title and the Corporate Author are mandatory. The Publisher element is used only for output purposes. The elements Title, Series and CreatorCorporate might be provided in 3 languages (English, French and Spanish). The Data collection module is used to indicate which corporate Cover pages can be served within a given collection. This module will display all the attributes of the referenced Corporate Cover pages.
Creating XML content for FIGIS • Creating objects : the Object Source Conceptual data model Corporate Cover Pages might be referenced in the FIGIS system and managed by the Reference Tables Management System (RTMS). Each observation made on a domain object (e.g. on a marine resource) has a Corporate Cover Page, but this cover page might not be a FIGIS referenced Corporate Cover Page. The FIGIS system manages a relationship between Referenced Corporate Cover pages and Data Collection. This relation indicates which Corporate Cover Pages may be used according to the Data Collection under which an observation on a domain object is published.
Creating XML content for FIGIS • Creating objects : the Object Source What is the Reference Tables Management System (RTMS) The RTMS (http://www.fao.org/figis/servlet/RefServlet) is a graphical interface to manage the reference data. Reference data is a set of static values utilized by FIGIS applications to determine univocally all the objects involved on each operational context. Every FIGIS application referring to a reference objects queries the RTMS in order to obtain all the related attributes pertaining to a certain objects (countries, species, areas, stocks...). As example: an application requests for a reference object like a country specifies a unique ID and gets consequently a list of related attributes (e.g. UN code; ISO 2-alpha code; ISO 3-alpha code...)or (species: 3-alpha code, taxonomic code, scientific name..)
Creating XML content for FIGIS • Creating objects : the IDENTITY • each type of object has its own identity element • they are all named using the object name and the suffix Ident • e.g. : for a marine resource, the element is AqRes and the matching identity element is AqResIdent
Creating XML content for FIGIS • Creating objects : the Topic block example : marine resource topic block
Creating XML content for FIGIS • Referencing objects : • FIGIS can draw links between existing objects based on criteria • retrieving data from objects and embed them in the XML document • e.g. : when describing a yellow fin tuna stock, get the standard image and names of the species from the species identification sheet and include them in the marine resource fact sheet • creating hyperlinks in web pages that point to existing objects • e.g. : in the same fact sheet page, hyperlink all the gears and species names to point to their respective description pages
Creating XML content for FIGIS • Referencing objects : • retrieving data from existing objects • it can be done with any object defined in FIGIS • it can be used to define an object using other objects • e.g. : the bigeye tuna resource of indian ocean • is defined by the SPECIES (bigeye tuna) and the AREA (indian ocean) • bigeye tuna is already described in a species fact sheet • indian ocean is defined in the reference tables • -> the definition of the resource will only need to be done using REFERENCES to those two objects
Creating XML content for FIGIS • Referencing objects : • how to reference existing objects : • each object type has a REFERENCE tag • the tag is built using the object tag and adding the suffix Ref • e.g. : AqRes AqResRef • this will reference the species Albacore using the “Scientific name” Code System. • the output document will contain additional info about that species : picture, scientific and FAO official names, standard codes... <fi:SpeciesRef Taxonomy="Species"> <fi:ForeignID CodeSystem="Scientific_name" Code="Thunnus alalunga"/> </fi:SpeciesRef>
Creating XML content for FIGIS • Referencing objects : • e.g. : WaterAreaRef • this will reference the area “ICCAT SMU” using : • the references (ALB_N, ALB_S..) to the ICCAT Statistical Management Unit where the resource is “located” <fi:AqRes> <fi:AqResIdent Status="1" Factsheet="true" RefObservation="false"> -------------------------------------------------- <fi:WaterAreaList> <fi:WaterAreaRef> <fi:ForeignID CodeSystem="iccat_smu" Code="ALB_N"/> </fi:WaterAreaRef> <fi:WaterAreaRef> <fi:ForeignID CodeSystem="iccat_smu" Code="ALB_S"/> </fi:WaterAreaRef> <fi:WaterAreaRef> <fi:ForeignID CodeSystem="iccat_smu" Code="ALB_M"/> </fi:WaterAreaRef> </fi:WaterAreaList> ------------------------------------------------ </fi:AqResIdent>
Creating XML content for FIGIS • Referencing objects : • behaviour of Ref elements • when the document is loaded and processed, “Ref” tags are interpreted this way : • if a matching object is found in the database, the “Ref” tag is replaced by the matching object’s “Ident” block • if no matching object is found in the database, the tag is left alone
Creating XML content for FIGIS • Referencing objects : • creating hyperlinks • to objects available in internet (web page, pdf file, ftp..) • e.g. : link the word ICCAT to “http://www.iccat.es/” this is done by using “a” HTML tag : <a href =“http://www.iccat.es ” target=“_blank”>ICCAT</a>
Creating XML content for FIGIS • Creating documents :a 3-steps method • high-level tagging • overall structure of document and objects • mid-level tagging • precising the thematic content of the document • referencing objects • low-level tagging • formatting • keywords, links, biblio etc.
Creating XML content for FIGIS • Creating objects : hands-on • pre-requisite: understanding of the XSD, DTD structure • starting from a sample document • analysis and structure of the paper document • creation of the XML document • high-level tagging of the document • mid-level tagging of the document • low-level tagging of the document
Creating XML content for FIGIS • Creating objects : hands-on • analysis and structure of the paper document : • identify the high-level elements in the document : • Object Source • Objects • for each object, identify it’s the following : • Ident block • the various “topics” for each object, which will be used for the tagging • match each “topic” with a Schema/DTD element
Creating XML content for FIGIS • Creating objects : hands-on • creation of the XML document • XML editors only accept pure TEXT • you need to avoid text containing UNICODE characters • creation of the document : • open the XML editor and create a new document • assign the FIGIS DTD/XSD to the new document • insert the ROOT element FIGISDOC • insert the Object Source information • insert the Ident information • start copying the content from the original text document into the elements that you have defined during the previous step