1 / 12

Classification of Business Documents

Classification of Business Documents. DITA BusDocs Subcommittee Meeting January 14 th , 2008 Presentation with Notes from the Meeting. Meeting Summary.

aolivo
Télécharger la présentation

Classification of Business Documents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classification of Business Documents DITA BusDocs Subcommittee Meeting January 14th, 2008 Presentation with Notes from the Meeting

  2. Meeting Summary • Classification focus group members include Howard Schwartz, Eric Severson, Amber Swope, and Michael Boses. Howard was not able to attend the meeting due to travel • Michael presented the enclosed PowerPoint as a starting point for the discussion • Discussion was captured and incorporated into the PowerPoint under the heading, “Notes” • Next steps: • Eric will work on a preliminary mapping of a limited number of document types that illustrate the mapping • The focus group will present a summary of what we have discussed to the full subcommittee during the January 21 meeting

  3. Introduction - 1 • The need for a classification system for business documents arises from: • The desire to indentify the specific document set that is being addressed by the subcommittee, as well as the rationale behind that selection • The ability to further analyze the document set using a refinement of the same characteristics used to classify them

  4. Introduction - 2 • What type of characteristics are important? • Documents can be classified in many ways. The most common way used is a semantic classification based upon the textual content of the document • The subcommittee approach is different since we want to classify documents based upon their structural characteristics since it is the structure of business document that will need to be harmonized with DITA

  5. Potential Structural Characteristicto Consider when Classifying • Is it a narrative? • Narrative complexity • Document length • Tree depth • Tree balance • Table frequency • Table complexity • Graphic frequency • XML vocabularies • Transclusions • Notes: Eric feels that repetitive structures will be an important characteristic • Amber suggests that whether a document references external system data might be important as well

  6. First-level Classification Notes: while the concept is good, none of us is happy with the terminology. In particular, we need to come up with an alternative for Forms.

  7. Form-Narrative Scale Subject Document • Metric: • Ratio of total elements to total words • Notes Eric: What is a form? How do we keep from excluding documents with structures that we need to address, because we called a “form”? Something to describe “form” that isn’t based upon its implementation. “XML blurs the distinction between documents and data” • A: Elements are “structural” in nature. We need to define what type of elements we will use to arrive at the ratio

  8. Most Significant Characteristic? • Once we have established that it is a narrative document, what is the next most significant characteristic to examine? • Notes, general agreement with the presentation, that it would be the tree depth of the document

  9. The Need to Quantify Hierarchy • The author of the highly nested document is using structure to communicate semantics. • Hierarchical Scale • Ratio of total transitions in hierarchy to total elements • Notes: General agreement. No specific comments

  10. Qualifying Narrative Density • Narrative Density Scale • Average paragraph length for paragraphs > 100 characters • Notes: no specific comments

  11. Recap of Characteristic Importance • Is it a Narrative? • Narrative complexity • Document length • Tree depth • Tree balance • Table frequency • Table complexity • Graphic frequency • XML vocabularies • Transclusions • Notes: Eric- we need to address: repetitive structures (i.e., topics) and constrained structures. What do repetitive structures and constrained structures mean to DITA? • Michael: the number of paragraphs per section seems important—but what is a section?

  12. Notes: Additional Discussion • Discussion of an SOP as it relates to repeating structures • One approach to an SOP is for it to be very verbose, with only 4-5 “structures” • Another approach is for it to be very terse, with 20 structures that add semantics to the content. • The goal of XML in general when applied to narrative documents, is to imply more and more of the semantics through the document structure • “Document linearity with repeating structures” as a structural characteristic provides “random access” to the information in the document. • Repetitive structures appear to be as important a characteristic as the tree depth, if not more. Repetitive structures to a degree indicate whether the document is a reference or something intended to be read end-to-end? • Repetitive structures cause a document to actually be a collection of mini-documents, each that could stand alone

More Related