Understanding XML in SAS: Tools, Standards, and Examples for Data Handling
This presentation covers the essentials of XML (eXtensible Markup Language) in the context of SAS applications. It defines XML and explores its structural components, such as elements, attributes, and schemas. Key examples from industry standards, including a fictional NHL schema, illustrate how XML is utilized for data storage and transport. The session will also provide practical guidance on using various SAS tools to work with XML documents, including exporting SAS datasets to XML and validating XML documents with schemas.
Understanding XML in SAS: Tools, Standards, and Examples for Data Handling
E N D
Presentation Transcript
XML in a SAS World Mike Molter d-Wise Technologies
Agenda • What is XML? • Examples of industry XML standards (schemas) • SAS tools for working with XML
What is XML? • eXtensibleMarkup Language • Used for structure, storage, and transport of data (w3schools.com) • Like any other computer language… • textual gibberish • set of rules (structural, syntax) • vocabulary • elements • attributes • tags • schemas
<nhl> <team name="Red Wings"> <conference>Eastern</conference> <division>Atlantic</division> <location>Detroit</location> </team> <team name="Flames"> <conference>Western</conference> <division>Pacific</division> <location>Calgary</location> </team> <team name="Devils"> <conference>Eastern</conference> <division>Metropolitan</division> <location>New Jersey</location> </team> </nhl> • XML document is made of elements (nhl, team, conference) • Elements are marked with a start tag and an end tag (<division>, </division>) • Elements may be nested within other elements (location is nested within team) • Elements may contain attributes (team element contains the name attribute) • An element's value is the text outside of a nested element between the element's start and end tags (Pacific is a value of the division element) • Each XML document must contain a root element (nhl)
What is XML? • Like any other computer language… • textual gibberish • set of rules (structural, syntax) • vocabulary • elements • attributes • tags • schemas • Unlike other computer languages… • no keywords • no processor
XML Schema (or standard) • XML Schema (informal) - A specific set of elements and attributes, along with a set of rules that govern their use, for the purpose of transferring data between systems and developing applications for processing such data. • An XML schema can be a combination of new elements along with other XML schemas (extensible) • XML schema file - A well-formed XML file used for enforcing the rules of an XML schema, or validating an XML document.
XML Schema Examples • NHL (Ok, I made this one up) • XSL (eXtensibleStylesheetLanguage, .xsl) • Transforms XML into something else • XML schema files (.xsd) • Validates an XML document • XML Spreadsheet 2003 (.xml) • Read and displayed by Excel • ODM, Define, SDS • Clinical Trials data, metadata
XML in Pharma • Operational Data Model (ODM) • Collected clinical trial data, metadata, administrative data, reference data, audit information • Define-XML • Metadata for submitted data in ODM structure • Value-level metadata is in the define extension • SDS-XML • Submission data in ODM structure
XML in Pharma Data Submission Collected Data Data Transformations Metadata Submission SDS.XML ODM.XML SAS Define.XML
ODM Clinical Data ItemGroup (dataset-level) Metadata
Clinical Data ODM ItemGroup (dataset-level) Metadata Item (variable-level) Metadata
ODM Item (variable-level) Metadata Codelist Metadata (allowable values)
Exporting XML Teams.sas7bdat
Exporting XML with the LIBNAME statement libnamexmlout xml 'C:\teams_generic.xml' ; data xmlout.xteams; set teams ; run;
Exporting XML with the LIBNAME statement libnamexmlout xml 'C:\teams_oracle.xml' xmltype=oracle; data xmlout.xteams; set teams ; run;
Exporting XML with a DATA step filename xmlout4 'C:\teams_datastep.xml' ; data _null_ ; file xmlout4 ; set teams end=thatsit ; if _n_ eq 1 then put '<nhl>' ; put '<team name="' name '">' ; put '<conference>' conference '</conference>' ; put '<division>' division '</division>' ; put '<location>' location '</location>' ; put '</team>' ; if thatsit then put '</nhl>' ; run;
Exporting XML with the LIBNAME statement or ODS using tagsets libnamexmlout xml 'C:\teams_tagset_libname.xml' tagset=<tagset-name>; data xmlout.xteams; set teams ; run; ods markup tagset=<tagset-name> file='C:\teams_tagset_ods.xml'; proc print noobs data=teams ; run; ods markup close ;
Exporting XML with ODS using SAS's ExcelXPtagset ods markup tagset=excelxpfile='C:\teams_excel.xml'; proc print noobs data=teams ; run; ods markup close ;
References A SAS Programmer's Guide to Generating Define.xml, SAS Global Forum 2009 ods markup tagset=mydefine file='define.xml' ; proc print noobs data=meta-dataset1; run; proc print noobs data=meta-dataset2; run; proc print noobs data=meta-dataset3; run; etc ods markup close ;
References Tips and Tricks for Creating Multi-Sheet Microsoft Excel Workbooks, Vince DelGobbo, SAS Global Forum 2009 ODS Markup: The SAS Reports You've Always Dreamed of, Eric Gebhart, SUGI 30
References ExcelXP on Steroids: Adding Custom Options to the ExcelXPTagset, SAS Global Forum 2011 ods markup tagset=myexcel file='define.xml' options (tab_color='45') ; proc print noobs data=dataset1; run; ods markup close ;
Importing XML Export libnamexmlout xml 'C:\teams_generic.xml' ; data xmlout.xteams; set teams ; run; Import data sasteams; set xmlout.xteams; run;
NHL.XML libnamexmlinxml 'C:\teams_nhl.xml' ; data sasteam; set xmlin.team; run; <nhl> <team name="Red Wings"> <conference>Eastern</conference> <division>Atlantic</division> <location>Detroit</location> </team> <team name="Flames"> <conference>Western</conference> <division>Pacific</division> <location>Calgary</location> </team> <team name="Devils"> <conference>Eastern</conference> <division>Metropolitan</division> <location>New Jersey</location> </team> </nhl> SASTEAM.SAS7BDAT
Importing XML with an XML map • An XML map is an XML schema • Provides instructions to the XML LIBNAME engine for reading XML • Name and Label for the data set • Which XML elements define observations • How to define variables (attributes and values) • Uses XPath syntax to navigate the XML document and identify its components filename mymap 'C:\mymap.map' ; libnamexmlin xml 'C:\nhl.xml' xmlmap=mymap; data sasteams; set xmlin.teams; run;
Importing XML with an XML map <?xml version="1.0" encoding="UTF-8"?> <SXLEMAP version="1.2"> <TABLE name="SASTeams"> Name of data set to be created <TABLE-PATH syntax="XPath">/nhl/team</TABLE-PATH> Observation boundary <COLUMN name="conference"> <PATH syntax="XPath">/nhl/team/conference</PATH> <TYPE>character</TYPE> <DATATYPE>string</DATATYPE> <LENGTH>20</LENGTH> </COLUMN> <COLUMN name="name"> <PATH syntax="XPath">/nhl/team/@name</PATH> <TYPE>character</TYPE> <DATATYPE>string</DATATYPE> <LENGTH>20</LENGTH> </COLUMN> Variable Definition </TABLE> </SXLEMAP>
Clinical Standards Toolkit (CST) • A Base SAS framework for executing clinical data tasks such as verification of data compliance against standards and importing/exporting ODM and Define.xml. • Contains all necessary files (SAS macros and driver programs, maps, XSL stylesheets) • Learning curve
Clinical Standards Toolkit (CST) …or PROC XSL
References • Using the SAS Clinical Standards Toolkit 1.5 to Import CDISC ODM Files, Lex Jansen, Pharmasug 2013 • Using the SAS Clinical Standards Toolkit for Define.xml Creation, Lex Jansen, Pharmasug 2011 • Accessing the Metadata from the Define.xml Using XSLT Transformation, Lex Jansen, Phuse 2010
In Summary… • Options for Exporting XML • XML LIBNAME engine (XMLTYPE=, TAGSET= options) • ODS (SAS XML destinations or user-defined tagsets) • DATA step • XSL stylesheets • CST (clinical) • Options for Importing XML • XML LIBNAME engine (XMLTYPE=, TAGSET= options) • XML maps • XSL stylesheets • CST (clinical)