1 / 37

Content Discovery in Regulated and Litigious Industries: The Pro-active Role of XML

Content Discovery in Regulated and Litigious Industries: The Pro-active Role of XML. Paul Wlodarczyk VP Content Lifecycle Solutions XMetaL, a JustSystems company 9 November 2006. "We are drowning in information". June 16, 2005 BofA, Brokerage Affiliates to Pay $1.5M E-mail Fine.

Télécharger la présentation

Content Discovery in Regulated and Litigious Industries: The Pro-active Role of XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Content Discovery in Regulated and Litigious Industries: The Pro-active Role of XML Paul Wlodarczyk VP Content Lifecycle Solutions XMetaL, a JustSystems company 9 November 2006

  2. "We are drowning in information" June 16, 2005 BofA, Brokerage Affiliates to Pay $1.5M E-mail Fine Bank of America Corp brokerage affiliates will pay the SEC $1.5 million to settle charges they failed to preserve business e-mails. Between January 2001 and February 2004, the units did not ensure its software kept e-mail, the SEC said. INFOGLUT You are here. You will stay here. Failed Methods Turn Information AssetsInto Information Liabilities Source: Gartner

  3. Across the Enterprise Enterprise Trx. Customers Employees Partners Information Products Orgs. Financials Reports E-Mail Web Content Documents Databases Media Management Across All Content Managing Information as a Strategic Asset Delivers Value Efficiency Differentiation Process Simplification Promote reuse and data quality Compliance Transparency of information "Infoglut" Manage expanding volumes Vendor Consolidation Spend less on same technology M&A Reduce integration burdens Enterprise Agility Sense and respond Continuous Flow Real Time Closed-loop analytics Single View Consistent and holistic view across all channels Relationship management Revenue Optimization Support top-line growth on cross-sell/upsell Leverage global purchasing power Source: Gartner

  4. Questions we will answer today • What is enterprise information management (EIM)? • What are the issues driving convergence of data and documents? • What are the people, process, and technology enablers for EIM? • What are new approaches to make content available to the enterprise for discovery?

  5. Gartner: Defining Enterprise Information Management Enterprise information management (EIM) is an integrative discipline for structuring, describing and governing information assets regardless of organizational and technological boundaries to improve operational efficiency, promote transparency and enable business insight. Source: Gartner

  6. Questions we will answer today • What is enterprise information management (EIM)? • What are the issues driving convergence of data and documents? • What are the process, people, technology and content enablers for EIM? • What are new approaches to make content available to the enterprise for discovery?

  7. Structured vs. Unstructured Information • Business Transactions consist of data (structured information) • Business Decisions are often based on documents (unstructured information)

  8. The Challenge of Structured / Unstructured Convergence Complexity & Dynamics of Data / Document Convergence LitigationInteroperability DiscoveryProcessIntegration RegulationInfoglut Decisions Transactions The World of Data The World of Documents Data = Information Structured for Machine Processing Document = Information Presented for Human Processing

  9. Unstructured • Opaque • A snapshot in time • Passive • Indexed & searched • Mixes content& presentation • Pushed through a deterministic workflow • Protected by applications • All or Nothing (a file) • Application-specific Structured • Self-describing • An audit trail • Active • Discovered • Separates content (meaning)from presentation (format) • Navigates through a dynamic process • Protects & Tracks itself • Fine-grained (objects) • Application-independent Contrasting Unstructured and Structured Content

  10. Strategic Planning Assumption By 2009, organizations will spend on the order of $3 billion in the worldwide market on unstructured data management – at least half of what they spend on structured data management (0.8 probability).

  11. Unstructured content creation in the enterprise • Office Documents (word processing, spreadsheet, email) • Many decision documents (contracts, policies/procedures, proposals, forms) still largely unstructured, little or no semantic markup • Content entry through enterprise applications • Exists as plain text or XHTML, e.g. • ERP • e-Commerce • Call center / CRM / customer support applications • PLM – Product Lifecycle Management • Little or no semantic markup • Desktop publishing • Largely unstructured outside of high tech / technical publications • Starting to move to XML because of L10N, multi-channel • User-generated content through Web • Blogs, forms or wiki markup – little or no semantic markup • Rich media • E.g. e-learning, rich communications e.g. Flash – little or no semantic markup

  12. Content Must Be Described to Be Processed by Machines Word Processing ECM BCS Business Intelligence Security screening Spreadsheet Applications KM Data Mining Transactions Information Access E mail SQL ODF XBRL .doc, .xsl, .ppt ASCII/Unicode Flash Sign Language Standards RSS DITA XML mpeg SOAP, WSSDL jpeg Open XML Doc Format OWL RDF Text Master data Repositories Paper Audio Calculations Illustration Indexes Formats Photographs XML vocabularies Graphics Metadata Less structure, machine inaccessible  Humans process Orientation Machines process  More structure, machine accessible Minimal Metadata Database tables Hierarchy + Metadata + References Content Types Blobs Files, Repositories Cells Source: Gartner plus JustSystems

  13. Strategic Planning Assumption By 2009, separate and sometimes conflicting approaches to dealing with documents and databases will give way to enterprise information management programs that deal with all data as part of the organization's enterprise architecture strategy (0.7 probability).

  14. Approaches to EIM • Reactive • Indexing and searching content post facto; data-mining (e.g. Autonomy, Clear Forest, Google, etc.) • Requires technology investment only • Proactive • Indexing content as it is created (XML, metadata, taxonomies, records management, etc.) • Requires investments in people, process, technology, and content

  15. Questions we will answer today • What is enterprise information management (EIM)? • What are the issues driving convergence of data and documents? • What are the process, people, technology and content enablers for EIM? • What are new approaches to make content available to the enterprise for discovery?

  16. Enablers to Proactive EIM • Process • Best methods for EIM need to be defined and propagated (e.g. Gartner model) • People • Information Architects to do the work • CWA and other ethnographic approaches to assure uptake and compliance • Content • Broader definition and adoption of standard XML vocabularies like DITA • Technology • Maturing of the XML ecosystem

  17. Vision Strategy Governance Organization Process Enabling Infrastructure Metrics Process: Proactive EIM is a comprehensive program, not just technology Vision: How is information perceived and valued in the organization? Is it a bi-product, a shareable resource or source of differentiation? Gartner's Essential Building Blocks for EIM Strategy: How is information currently managed? Is it ad-hoc, departmental, or is there an enterprise focus? Governance: What decision rights and controls exist for managing information as an asset and who is involved? Organization: What information-centric roles exist and where are they located? Process: Are there practices (such as stewardship) and standards around the information lifecycle? Enabling infrastructure: How well do information management technologies support current and future needs? Metrics: How much is spent managing information? How much information is redundant? How much poor quality information exists and what impact does it have on the business? Source: Gartner

  18. Strategic Planning Assumption By 2007, information architects will establish the principles, governance processes, models and framework for improving the accuracy and integrity of information assets as part of an organization's commitment to enterprise information management (0.7 probability).

  19. People: Information Architect Roles Contribute to EIM Success Information Architect (Web, Records Management or Content Level) Information Architect (BI or Application Level) Information Architect (Enterprise Level - EIA) • Focus on strategic information requirements • Publish enterprise standards • Draft enterprise information models and meta models • Formalize principles • Establish governance • Develop Information Value Network Model • Who: Enterprise Planners and Modelers • Methods of classification: modeling and frameworks (e.g. Gartner Enterprise Architecture, Zachman, FEAF, IEEE, OMG) • Create data models and meta models • Implement stewardship and quality objectives • Focus on integration • Oversee sourcing, profiling and transformation • Implement Common Business Vocabularies • Who: Data Modelers, DBAs • Follow rigorous SDLC • Methods for classification: data models, process models, object models • Work with multimedia tools • Content-driven, not metadata-driven • Navigation, personalization • XML DTD design, standards and forms creation • Create document and data retention schedules • Who: Records Management Specialists, Information science, library science or cognitive science backgrounds, portal • Methods for classification – taxonomies, ontologies, tagging Source: Gartner

  20. Strategic Planning Assumption The need to deliver business value from information assets will force Enterprise Information Architecture to mature as a discipline in 70% of Global 2000 organizations by 2008 (0.7 probability).

  21. Technology: XML Hype Cycle – XML is here and maturing

  22. Strategic Planning Assumption Fully mature semantic reconciliation tools will not be available until 2011 (0.7 probability). By year-end 2009, 40 percent of a multinational company's data will be defined in some way by XML (0.7 probability). By year-end 2009, 75% of the Global 500's inter-application messaging infrastructure will be formatted in XML (0.7 probability).

  23. Content: Business Drivers for XML Adoption • Support faster product cycles • Reuse content to accelerate time to market • Enable simultaneous product release in multiple markets • Reduce cost and improve efficiency • Automate publishing and translation processes • Meet regulatory and quality requirements • Enable content discovery for litigation support • Validate that content is accurate, consistent and complete to improve customer experience • Support personalized outputs • Serve local language and cultural needs

  24. Content: DITA To The Rescue • A standardized framework for management and extensibility of XML document types • The Next Step in XML Manageability • Interoperability and tool independence • Reuse • Collaborative authoring • Originally developed by IBM • Published as an OASIS Specification in May 2005

  25. DITA - Darwin Information Typing Architecture • Darwin: Allows natural evolution of document types through inheritance and specialization • Information Typing: Provides an information architecture for technical documents with base topic types of Concept, Task, and Reference • Architecture: A model that encapsulates best practices for both design and processes

  26. Topic Oriented Information Development • Information created and managed as modular chunks (topics) • Topics become the building blocks of your information products • Topic Characteristics* • Discrete units of information covering a specific subject with a specific intent • Small enough to promote reuse across multiple contexts and output media • Large enough to be easily authored and large enough to be readable and coherent • Organizable into a wide variety of structures from linear to networked *Source: CIDM, JoAnn Hackos

  27. People, process, technology, and content:The enterprise with self-describing content

  28. Questions we will answer today • What is enterprise information management (EIM)? • What are the issues driving convergence of data and documents? • What are the people, process, and technology enablers for EIM? • What are new approaches to make content available to the enterprise for discovery?

  29. Strategic Planning Assumption Through 2010, organizations implementing both customer data integration and product information management MDM initiatives will link these efforts as part of an overall enterprise information management program (0.7 probability).

  30. Example 1: Structuring Product Information • Structured content analysis for knowledge workers in product teams, call center • XML editing embedded into enterprise applications (e.g. PLM, CRM) • XML/DITA for enterprise product-related publishing • Structured WIKI and blogs for User-generated Content (UGC) known issue XML Contact Center Knowledge Base new issue XML DITA Support DITA RSS Web Self Service web phone email / chat notification XML Product Design XML DITA DITA XML XML CMS of Topics FAQs Procedures Specs Best Practices Learning Collaborative Authoring RSS XML user generatedcontent publications Customers Info Dev

  31. Example 2: Structuring e–Commerce Content • Data/document convergence solutions for knowledge workers in marketing and e-commerce • XML editing embedded into e-commerce and e-merchandizing • DITA / XML for enterprise publishing of marketing communications • Structured editor, WIKI and blogs for UGC on retail sites (ActiveX, AJAX) 3rd Party Sites: • Retailers • Communities reviews XML RSS Customers purchases Product Marketing DITA notification blogs news XML eCommerce site XML forums CMS of Topics Product Catalog Feature / Benefits Specs Reviews Ad Content Collaborative Authoring purchases Mar-Comm XML XML reviews news RSS DITA DITA XML notification Merchandizing (e.g. atg) eCommerce

  32. ITEM ORDER CUST SHIP xfy - Display and Analyze Content Exposed through XML Customer News Customer Service History Press Releases XML Prod PTR Act Date HIJ HIJ ABC ABC DEF DEF Sales History ABC ABC Delivery Log ABC ABC DEF ABC ABC ABC HIJ Proposals ABC 2003 2004 2005 ABC XML Engine Adaptive Vocabulary DOM tree Compound XML schema XML object scripting XML object scripting Adapter Adapter Web Services (SOAP, WSDL) XML X Query XML XML Documents XML Content Defined Schema Document vocabulary SQL Server, Oracle, DB2, etc. Business Applications

  33. IBM Information On Demand SAP NetWeaver Vendors Attempts At EIM Through MDM Oracle Fusion Middleware Large vendors focus on master data management …one part of an overall EIM program. Source: Gartner

  34. Case Studies

  35. Global Shipping and Logistics company • Key issues: • HR Policies and Procedures (litigation is driver) • Operations procedures – Sharing best practices in operations worldwide (compliance, localization of practices and language are keydrivers) • Implementing ECM infrastructure • Implementing XML and topic-oriented authoring, review, and content management • Exploring Knowledge Management • Governance • Technologies • Content models – including DITA for self-describing content

  36. Leading Tobacco Products company • Key issues: • Document discovery (consumer and regulatory litigation is key driver) • Knowledge management – sharing of R&D across units is a secondary factor • Implementing DITA / XML for R&D documents • Implementing topic-oriented content management • Implementing topic-oriented review / approval and workflow

  37. Auto Manufacturer • Key issues: • Regulation / Litigation (TREAD act - Transportation Recall Enhancement, Accountability, and Documentation) – discovery of all documents related to vehicle product safety issues – who knew what, when • Compliance – getting employees to adhere to records management and content classification procedures • Issue: Office documents are not self-describing, need to be classified manually. • Implementing EIM for product related documents, records management • Considering XML as an aid to making content self-describing

More Related