160 likes | 281 Vues
This paper explores the processes of converting SGML to XML, focusing on the application of XSLT and CSS within digital library environments. It discusses crucial aspects such as handling empty tags, preserving named entities, utilizing CDATA sections, and integrating mathematical markup using MathML. The use of metadata schemas, specifically the Resource Description Framework (RDF) and Dublin Core (DC), is examined to normalize key fields and improve searchability across diverse publisher DTDs. These transformations enhance the representation and accessibility of digital content.
E N D
Using XML, XSLT, and CSS in a Digital Library Markup Transformations SGML to XML Conversions Metadata Schema & Generation Robert Ferrer r-ferrer@uiuc.edu ASIS Annual Meeting 2000
SGML to XML Conversions - Modular ASIS Annual Meeting 2000
SGML to XML Conversions - Basic • Empty tags <empty> to < ….. /> • <?Processing Instruction> to <? ……... ?> • CDATA to CDATA sections <![CDATA[ … ]]> • Named entities remain unchanged - α • <!DOCTYPE ...> refers to XML DTD containing only character entity definitions to Unicode points <!ENTITY alpha “α”> ASIS Annual Meeting 2000
SGML to XML Conversions - Linking • Attributes to facilitate internal linking • <CITEREF REFID="bib5" idli_occurrence=”3” /> • External links represented as XLinks • <FIG NAME=“F1” xlink:type=“simple” xlink:href=“fig1.jpg” xlink:show=“new” xlink:actuate=“user” /> ASIS Annual Meeting 2000
SGML to XML Conversions - Math • SGML Math converted to MathML Presentational MathML <math xmlns=“http://www.w3.org/…”> <msubsup> <mrow><mi>α</mi></mrow> <mrow><mi>i</mi></mrow> <mrow><mo>-</mo><mn>2</mn></mrow> </msubsup> </math> ISO 12083 Math <dformula> <g>a</g> <sup>-2</sup> <inf>i</inf> </dformula> Identify & translate mathematical character references Identify & tokenize mathematical content ASIS Annual Meeting 2000
SGML to XML Conversions - Math • Recognize & transform mathematical markup • <xsl:template match=“dformula”> :<xsl:when test="sup or inf"> <xsl:for-each select="child::node()"> <xsl:choose> <xsl:when test="name(self::node())='sup' and name(following sibling::node()[1])='inf'"> <xsl:element name="msubsup” namespace=“http://www.w3.org/…”> <xsl:element name="mrow” namespace=“http://www.w3.org/…”> <xsl:apply-templates select="preceding-sibling::node()[1]"/> </xsl:element> ASIS Annual Meeting 2000
SGML to XML Conversions - TeX • TeX converted to GIF images • <FORM NOTATION="TEX" HIDE="TRUE">$$ (j_0-a_2')\,{\rm mod}\,P $$</FORM><uie name= “uie1” xlink:type="simple" xlink:href="fig1.gif" xlink:show="new" xlink:actuate="user” /> • TeX converted into MathML • IBM TechExplorer $$ (j_0-a_2')\,{\rm mod}\,P $$ <math><mo>(</mo><msub> <mrow><mi>j</mi></mrow><mrow><mn>0</mn></mrow></msub><mi>−</mi> <msubsup><mrow><mi>a</mi> </mrow><mrow><mn>2</mn>….. ASIS Annual Meeting 2000
SGML to XML Conversions - DTD • XML DTD does not permit inclusions and exclusions • SGML:<!ELEMENT Article - - (front, body) +(%i.float;)> • XML:<!ELEMENT Article (front | body | %i.float;)*> • XML DTD does not permit the ‘&’ connector • XML DTD does not permit the use of mixed content models • <!ELEMENT Other ((author, journal) | (#PCDATA))> ASIS Annual Meeting 2000
Metadata - Usage • Metadata Within the DLI Testbed • Normalize key fields from different publisher DTDs to facilitate searching • Provide common and easily displayable intermediate search results • Add value in the form of links to cited or citing articles within the Testbed, external abstracts and indexes, etc. ASIS Annual Meeting 2000
Metadata - Schema • Resource Description Framework (RDF) provides standardized way to represent metadata using XML • Encapsulates metadata elements • Provides varying levels of granularity • RDF container objects describe the relations between repeated metadata elements ASIS Annual Meeting 2000
Metadata - Schema • Dublin Core (DC) model is used to encapsulate all searchable metadata • Provides the semantic framework for describing each object in the collectionContent Intellectual Property InstantiationTitle Creator DateSubject Publisher FormatDescription Contributor IdentifierType Rights LanguageSourceRelationCoverage ASIS Annual Meeting 2000
Metadata - Schema • Extensive custom IDLI tags are included • Offer a further level of granularity • <DC:Description><idli:Abstract></DC:Description> • Search clients familiar with IDLI schema can achieve much greater precision • Dublin Core Qualifiers (DCQ) substructure to replace many of the project-specific IDLI elements • <DC:Description><DCQ:Abstract></DC:Description> ASIS Annual Meeting 2000
Metadata - Schema • <rdf:seq> • <rdf:li> • <dc:Creator> • <idli:author_name>Giust, G. K.</idli:author_name> • <idli:organization_name>Department of Electrical Engineering, Arizona State University</idli:organization_name> • </dc:Creator> • </rdf:li> • <rdf:li> • <dc:Creator> • <idli:author_name>Sigmon, T.W.</idli:author_name> • <idli:organization_name>Department of Computer Science, Illinois State University </idli:organization_name> • </dc:Creator> • </rdf:li> • </rdf:seq> ASIS Annual Meeting 2000
Metadata - Extracting • Metadata is extracted from the ‘base’ XML files • Utilization of XML Header • DTD is used to resolve entities • XML-Stylesheet processing instruction • Visual Basic application serves as parser • Document Object Model (DOM) • XSLT Style Sheets ASIS Annual Meeting 2000
Metadata - Extracting • Utilization of XSLT Style Sheets • XSLT transformative features to generate base metadata file and forward citation fragment • XSLT scripting features to generate elementsnot directly expressed in the document • XSLT instantiation of ActiveX objects to test for links ASIS Annual Meeting 2000
Metadata - Extracting • Utilization of DOM • Insert pseudo elements (e.g. bibliographic data) • Search reference citations from the generated metadata object to insert forward references into other metadata files ASIS Annual Meeting 2000