1 / 31

Author Generated JATS XML Markup

Author Generated JATS XML Markup. Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com. How We Started. Co-Founded Worldwide Cars Online in 1990 Sent images of cars and car parts via Compuserve emails (modem speed 7kb/sec) No official Internet

marvel
Télécharger la présentation

Author Generated JATS XML Markup

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com

  2. How We Started • Co-Founded Worldwide Cars Online in 1990 • Sent images of cars and car parts via Compuserve emails (modem speed 7kb/sec) • No official Internet • Closed the company in 1994 • Created online content while at Baylor in 1994 • Netscape goes public in 1995 • Officially launched 1st online journal in 1995

  3. How We Continued • Started with The Internet Journal of Anesthesiology • Added more journal over time • All were open access from the beginning no registration required as reader) • Some of the first articles were submitted in print via mail and I retyped them with Word • Articles were then submitted to me via email (attached as Word document)

  4. How We Continued • Initially used a Mosaic Browser tool and then a Netscape Browser tool to create HTML for the web pages • Then used 1st version of FrontPage to create a more complex web site • We decided in 1997 to convert Word documents into SGML data sets and then to use XML in 1998

  5. What We Are Today • We currently publish 82 titles (online medical journals) at www.ispub.com • We use our own article submission system (home-grown) at www.quickmedpub.com • We just implemented a new backend for article submissions and article flow • We decided to have authors generate much of the markup

  6. And Now Lets Get Technical Author Generated JATS XML Markup by Andy Gajetzki

  7. What is our JATS editor? • Represents a move to author generated markup for our XML • Based on a customizable and reusable PHP component • Symfony2 – popular PHP framework • Easy to use • Form based, WYSYWIG and linear workflow

  8. Our old workflow • How we used to do things: • Three separate workflows for each article: • Header generation • Body markup • Conversion from proprietary XML to JATS as the last step

  9. Word Macros

  10. Problems with our current method • Time consuming • Delays in publishing • Error prone • Data entry is performed by programmers • Authors don’t like the delay to publish and the delay to correct errors

  11. Design Rational • We can’t support the whole spec. • How did we determine what to support? • Statistical analysis of most markup in our current article corpus How can we offset as much markup to the author as possible but still have a clean and intelligible end product?

  12. What is supported • NLM Blue 3.0 • Two separate support levels • Inline-level • Block-level • Our level of JATS support is determined by each level.

  13. Inline Level • Italics, bold, and all other presentation layermarkup supported

  14. Block level • Single level sections only as WYSIWYG editor is based on the HTML DOM • Other tools providing a more XML approach are expensive, and more difficult for the author to use • General structure is <sec> <title> <xyz> • <Sec> • > Boxed-text, fig, graphic, preformat, table-wrap, p, list

  15. Titles • Support of presentational elements with, for the most part, a non-mixed content-type

  16. Contributors • Flexible • Single / collaborative authors • Most JATS <contrib-group>markup supported • Inline-level formatting in block elements

  17. Keywords • Keywords should be based on MeSH entries • Validation constraints canbe applied based on that

  18. Other article-meta • Article ID’s • Author notes • Supplemental content • Funding/grants • Article history • Permissions

  19. Abstract / Body / Appendices • Currently a moving target • MathML is not currently supported • Current subset of JATS covers 99% of our cases, but we will always try to expand coverage

  20. WYSIWYG HTML Editor • Utilize a specific subset of HTML that we can unambiguously map to JATS via data transformations • XSLT • regexp • If no mapping is possible, another method must be devised

  21. Images / Table Capture / Media • Images / Figures are handled via out-of-band file upload on a separate page • Authors are requested to upload highest quality format that they can • Tables can either be captured as an image, or inserted via a Word style table creation tool • Other media types have not been implemented yet

  22. Endnote Handling – Document references • JavaScript annotation tool • Endnote number / reference is highlighted in the text and a resolution is made to a back-matter citation entry

  23. Supported Back Matter • Acknowledgments • Appendices • Biography • Glossary’s • Citations • Notes • Content-type attribute of note element supported

  24. Citation Handling – Back matter • One citation per line • Regular expression search for meta-data service identifiers at PMC and Crossref • If a match is found, correct metadata is pulled from the service • Simple JavaScript annotation tool to tokenize citation string • Before submission, author must resolve all endnote problems

  25. Citation Tokenization Example

  26. From browser to JATS XML • The block level components operate on the HTML DOM • CSS classes are added to elements to distinguish content types • Through various transformations, we interpret the resultant DOM and produce the JATS XML HTML  mapping  JATS XML

  27. Validation • When things go wrong 1) XSD Validation • Intervention required by staff 2) Style/presentation problems • Intervention required by author/staff 3) Copy editing 4) Peer review

  28. Amazon Mechanical Turk • For predictable failures, Amazon Mechanical Turk, a platform for “human intelligence tasks”, can be used • For a small price, work units are created and human workers get paid to perform the task • 24x7 availability

  29. Summary

  30. Contact For Questions Technical questions: Andy Gajetzki andy@ispub.com General questions: Olivier Wenker, MD, MBA wenker@ispub.com

More Related