Part 1: Introduction to XProc

Part 1: Introduction to XProc Roger Costello James Garriss July 18, 2009

Thank You • We’d like to thank the people on the xproc-dev mailing list who patiently answered our many questions about XProc. • We want to give a special thank you to Norm Walsh, who not only wrote the XProc processor Calabash but also helped us learn how to use it.

XProc Version • These slides are based upon the 26 Nov 08 working draft version of XProc. Details may change when XProc becomes a recommendation. http://www.w3.org/TR/xproc/

Viewing this Tutorial • This tutorial is best viewed in slide show mode • Under the View menu select Slide Show • Periodically you will see an icon at the bottom, right of the slide indicating that it is time to do a lab exercise. We strongly recommend that you stop and do the lab exercise to obtain the maximum benefit from this tutorial. 4

Table of Contents • Prerequisites • Purpose • Advantages • XProc Processors • Hello, world! • Core Concepts • Core Pipeline Elements • Steps

What should you already know before taking this tutorial? PREREQUISITES

Prerequisites • This tutorial assumes that you already have: • A basic knowledge of XML. • The ability to write simple XPath expressions. • A basic familiarity with XSLT.

Related Technologies • To take full advantage of XProc, you will find it helpful to have a working knowledge of XML Schema, RelaxNG, Schematron, XQuery, and XSLT. • But can you can still work through this tutorial even if you don’t know these. Don’t worry; we provide the files you need.

What is the point of XProc? PURPOSE

XML Document Processing • Processing XML documents is becoming increasingly frequent. • The same sort of operations tend to be done repeatedly: • Combine documents with XInclude then… • Validate with XML Schema then… • Check business rules with Schematron then… • Transform with XSLT • It would be nice to have a simple, non-proprietary way to declaratively specify how to process XML documents.

XProc Defined • There is! It’s called XProc. • XProc is an XML pipeline language, a language for describing operations to be performed on XML documents. • What does that mean? http://www.w3.org/TR/xproc/#introduction

XProc Explained • It means… • XProc is a new language (vocabulary). • The language is itself written using XML. • It is a language for building pipelines. • A pipeline processes XML documents. • A process is composed of various operations. • These operations are performed sequentially.

Purpose • The purpose of XProc is to provide a way to describe a sequence of operations to be performed on XML documents. • XProc provides a declarative way to express an XML workflow, e.g.

Why should you use XProc instead of something else? ADVANTAGES

Advantages • XProc is not the only way to do XML processing. • Java, C#, or other languages could be used. • So why use XML pipelines? • Here's why: • Declarative – allows developers to focus on what is to be done, not how it is to be done • Simple – can be learned (more) easily by non-programmers • Reusable/scalable – small, simple pipelines can be combined into large, complex pipelines • Platform-neutral – works on any OS • Language-neutral – an XProc processor can be implemented in any language

What implementation can I use to execute my pipeline? CALABASH

XProc Processor • Calabash is an implementation of XProc, i.e. it's an XProc processor. • It’s written in Java by Norm Walsh, the chair of the XProc Working Group. • The code can be downloaded from: • http://xmlcalabash.com/ • There’s not much documentation, though help is available from the xproc-dev mailing list: • http://lists.w3.org/Archives/Public/xproc-dev/

Calabash • Required software: • Calabash (latest version) • Java 1.5 (or later) • Saxon 9.1.0.1 (or later) • Optional software: • Saxon SA (to validate with XML Schema) • ISO RELAX and Sun’s Multi-Schema Validator (MSV) (to validate with RelaxNG) • XQJ (to use XQuery) • Apache HTTP Client (to interact with web services) • Latest info here: • http://xmlcalabash.com/docs/

Calabash • To run Calabash: • Put the Saxon jar files (and SaxonSA license file) on the classpath. • Execute: • java com.xmlcalabash.Main pipefilename.xpl • If you want to validate with XML Schema, add the –a option: • java com.xmlcalabash.Main -a pipefilename.xpl The name of the pipeline. By convention .xpl is the suffix for pipeline files.

Running Calabash • The examples folder contains a bunch of examples. • There is a DOS batch file, run-calabash.bat in each folder. • To run Calabash simply type this at a DOS prompt: run-calabash <name of the pipe file>Example: run-calabash helloWorld.xpl

Running Calabash from Oxygen XML • You can run Calabash directly in Oxygen XML by following this procedure: • Open: Tools ► External Tools ► Preferences ► New • Fill in the dialog box as shown here: For further details see: http://fgeorges.blogspot.com/2008/10/poor-mans-calabash-integeration-into.html java -classpath "c:/new-xml-course/xproc/lib/calabash/calabash.jar;c:/new-xml-course/xproc/lib/saxon/saxon9.jar;c:/new-xml-course/xproc/lib/saxon/saxon9-s9api.jar;c:/new-xml-course/xproc/lib/saxon/saxon9sa.jar;c:/new-xml-course/xproc/lib/apache-commons-http-client/commons-httpclient-3.1.jar;c:/new-xml-course/xproc/lib/apache-commons-logging/commons-logging-1.1.1.jar;c:/new-xml-course/xproc/lib/apache-commons-codec/commons-codec-1.3.jar;c:/SAXON-license/" -Dcom.xmlcalabash.phonehome=false com.xmlcalabash.drivers.Main -a ${cfne} Note: adjust your path names appropriately cfd = current file directory; cfne = current file name with extension 21

Careful! Be sure you don't have a carriage return at the bottom. It won't work if you do.

Running Calabash from Oxygen XML • After you've filled in the dialog box and pressed the OK button, enable Oxygen's external toolbar:Perspective ► Show Toolbar ► External Tools • Toggle it off and then on. • Now you should see a new button called Calabash: 23

Running Calabash from Oxygen XML • Now you simply drag and drop an .xpl file into Oxygen, and click on the Calabash icon to run it. 24

Dealing with Firewalls • Some XProc steps (e.g. http-request) make calls out to the internet. If your company has a firewall then you need to tell Oxygen XML the location of your firewall. You need to add this to the java command that invokes Calabash: -Dhttp.proxyHost=xxx, where xxx is the location of your firewall. java -Dhttp.proxyHost=xxx -classpath "c:/new-xml-course/xproc/lib/calabash/calabash.jar;c:/new-xml-course/xproc/lib/saxon/saxon9.jar;c:/new-xml-course/xproc/lib/saxon/saxon9-s9api.jar;c:/new-xml-course/xproc/lib/saxon/saxon9sa.jar;c:/new-xml-course/xproc/lib/apache-commons-http-client/commons-httpclient-3.1.jar;c:/new-xml-course/xproc/lib/apache-commons-logging/commons-logging-1.1.1.jar;c:/new-xml-course/xproc/lib/apache-commons-codec/commons-codec-1.3.jar;c:/SAXON-license/" -Dcom.xmlcalabash.phonehome=false com.xmlcalabash.drivers.Main -a ${cfne}

What implementation can I use to execute my pipeline? CALUMET 26

XProc Processor • Calumet is another XProc processor. • Download Calumet here: https://community.emc.com/docs/DOC-4242#Download_EMC_Documentum_XProc_Engine 27

XProc GUI Tool The makers of Calumet have also created a drag and drop GUI tool to create XProc pipelines: http://137.69.120.115:8080/designer-20090703-1510/ 28 See XProc Designer at: https://community.emc.com/docs/DOC-3139

XProc GUI Tool Leif Warner created an online XProc GUI tool that has a look and feel of Yahoo Pipes: http://feedscape.appspot.com/ Simply drag and drop onto the canvas. Then "wire" the steps together. 29

BookStore.xml • Most of the examples in this tutorial use BookStore.xml or a variation of it. • It will be helpful to familiarize yourself with this XML document:

<?xml version="1.0"?><BookStore> <Book> <Title>My Life and Times</Title> <Author>Paul McCartney</Author> <Date>1998</Date> <ISBN>1-56592-235-2</ISBN> <Publisher>McMillin Publishing</Publisher> </Book> <Book> <Title>Illusions The Adventures of a Reluctant Messiah</Title> <Author>Richard Bach</Author> <Date>1977</Date> <ISBN>0-440-34319-4</ISBN> <Publisher>Dell Publishing Co.</Publisher> </Book> <Book> <Title>The First and Last Freedom</Title> <Author>J. Krishnamurti</Author> <Date>1954</Date> <ISBN>0-06-064831-7</ISBN> <Publisher>Harper & Row</Publisher> </Book></BookStore> BookStore.xml

What does a simple pipeline look like? HELLO, WORLD!

Hello, world! • Before looking at the core concepts of XProc, let’s look at a simple pipeline:

Hello, world! Pipeline A graphical representation of the pipeline: Pipeline Book Store File Location Rename the Date element. Store the results in a file.

The XProc root element <p:declare-stepxmlns:p="http://www.w3.org/ns/xproc"> <p:input port="source"> <p:document href="BookStore.xml"/> </p:input> <p:output port="result"> <p:pipe step="myStore" port="result"/> </p:output> <p:rename match="BookStore/Book/Date" new-name="Year"/> <p:store href="test.xml" name="myStore"/></p:declare-step> helloWorld.xpl 35

The XProc namespace <p:declare-step xmlns:p="http://www.w3.org/ns/xproc"> <p:input port="source"> <p:document href="BookStore.xml"/> </p:input> <p:output port="result"> <p:pipe step="myStore" port="result"/> </p:output> <p:rename match="BookStore/Book/Date" new-name="Year"/> <p:store href="test.xml" name="myStore"/></p:declare-step> helloWorld.xpl 36

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"> <p:input port="source"> <p:document href="BookStore.xml"/> </p:input> <p:output port="result"> <p:pipe step="myStore" port="result"/> </p:output> <p:rename match="BookStore/Book/Date" new-name="Year"/> <p:store href="test.xml" name="myStore"/></p:declare-step> helloWorld.xpl filename 37

The input into the pipeline is the BookStore XML document <p:declare-step xmlns:p="http://www.w3.org/ns/xproc"><p:input port="source"><p:document href="BookStore.xml"/> </p:input><p:output port="result"> <p:pipe step="myStore" port="result"/> </p:output> <p:rename match="BookStore/Book/Date" new-name="Year"/> <p:store href="test.xml" name="myStore"/></p:declare-step> helloWorld.xpl 38

Step 1: Rename all <Date> elements to <Year> <p:declare-step xmlns:p="http://www.w3.org/ns/xproc"> <p:input port="source"> <p:document href="BookStore.xml"/> </p:input> <p:output port="result"> <p:pipe step="myStore" port="result"/> </p:output><p:rename match="BookStore/Book/Date" new-name="Year"/><p:store href="test.xml" name="myStore"/></p:declare-step> helloWorld.xpl 39

Step 2: Store the results in test.xml <p:declare-step xmlns:p="http://www.w3.org/ns/xproc"> <p:input port="source"> <p:document href="BookStore.xml"/> </p:input> <p:output port="result"> <p:pipe step="myStore" port="result"/> </p:output> <p:rename match="BookStore/Book/Date" new-name="Year"/><p:store href="test.xml" name="myStore"/></p:declare-step> helloWorld.xpl 40

The output of the pipeline <p:declare-step xmlns:p="http://www.w3.org/ns/xproc"> <p:input port="source"> <p:document href="BookStore.xml"/> </p:input> <p:output port="result"> <p:pipe step="myStore" port="result"/> </p:output> <p:rename match="BookStore/Book/Date" new-name="Year"/> <p:store href="test.xml" name="myStore"/></p:declare-step> helloWorld.xpl 41

Run it! • Let’s use Calabash to run this pipeline! • Go to the folder: • examples/example00_hello_world • Run helloWorld.xpl • Compare your results to our results (ourResult.xml). • Compare your file that is saved by the pipeline (test.xml) to our file (ourTest.xml) • Congratulations! You’ve executed your first XProc pipeline.

Analyze Hello, world! • Now that you’ve run your first pipeline, let’s go back and break it down step-by-step so we understand all the pieces of a pipeline.

What are the main ideas that form the basis of XProc? CORE CONCEPTS

Pipeline • A pipeline is a set of "steps" with the output of one step flowing into the input of another. A pipeline: • takes 0 or more XML documents as input. • consists of steps; each step operates on the XML document(s). - The output of the first step becomes the input to the second step, and so on. • produces 0 or more XML documents as output. http://www.w3.org/TR/xproc/#pipeline-concepts

Pipeline The output of step 2 is the input to step 3. The pipeline input is the input to the first step. Pipeline The output of the last step goes into the pipeline output The output of step 1 is the input to step 2.

Step • A step is the basic computational unit of a pipeline. • There are 3 types of steps: • Atomic • Compound • Multi-container http://www.w3.org/TR/xproc/#step-concept

Atomic Steps • Atomic steps carry out a single operation. • XProc has a library of atomic steps, including: • p:rename – rename element(s) • p:store – store the input into a file • p:delete – delete element(s) • p:compare – compare 2 XML documents • p:validate-with-xml-schema – validate using XML Schema • p:xslt – apply an XSLT 1.0 or 2.0 stylesheet to a document • p:xquery – query a document using XQuery • p:xinclude – apply XInclude to the document

A Pipeline with Atomic Steps Pipeline Atomic steps

Compound Steps • A compound step is a step that contains a subpipeline. • A subpipeline is a container of steps. • The functionality of a subpipeline is determined by the steps it contains. • A subpipeline is comprised of one or more of the following: • p:for-each – looping • p:viewport – processing pieces of a document • p:group – a wrapper for steps • Any atomic step • Any multi-container step (see following slides) • A subpipeline can contain a subpipeline, ad infinitum. • A subpipeline is a step in the pipeline that contains it.

Part 1: Introduction to XProc