Ontologizing the ONTOLOG Content: Tools, Techniques, and Approaches Panel at Almaden Research Center

Ontologizing the ONTOLOG content Tools, techniques, and approaches panel E. Michael (Max) Maximilien Almaden Services Research http://maximilien.org Almaden Research Center, San Jose, CA

Agenda • Today’s web • ONTOLOG community of practice • What have we done so far? • Where do we want to go? • Introduce panelists • Scott Spangler, IBM Almaden Services Research • Peter Mika, Free University of Amsterdam • Conor Shankey, VisualKnowledge, Inc., Vancouver, Canada • John “Boz” Handy-Bosma, IBM Global Services • Initial questions Almaden Research Center, San Jose, CA

Today’s web • User contributed content, e.g., blogs, audio and video podcasts • Highly unstructured • Highly idiosyncratic • Collaborative platforms for content creation, e.g., wikis • Idiosyncratic web resource annotations leads to folksonomies • Wisdom of the crowds • “Instant” feedback • Issues with collusion and “smart mob” effect [H. Rheingold, 2002] • Ratings and reputation, e.g., digg.com, eBay.com • Exposing content and processes as services • RSS feeds for data aggregation • SOAP, REST, and other services for data and process programming Almaden Research Center, San Jose, CA

ONTOLOG community of practice • Experts in formal and semi-formal knowledge representation • Highly educated researchers – academia and industry • Content contribution – structured but diverse • Wiki pages for presentation preparation and summary • Presentations (PPT and PDF) by participants and moderators • MP3 audio files of sessions – aggregated as podcast feeds • Biographies of members and participants • Can we use our own knowledge and approaches to give structure to ONTOLOG body of knowledge? Almaden Research Center, San Jose, CA

What have we done so far? • Discussed use cases • Discussed architectural approaches • Have an initial taxonomy [ Bedford and Smith] • Started discussing tools and approaches • Not clear what exactly we want to achieve? • However, common direction seems to be to: • Achieve a better categorization of content • Invite user participation (a la web 2.0) • Take advantage of ontology expertise • Reconcile web 2.0 “loose” semantic efforts with more formalize approaches Almaden Research Center, San Jose, CA

Where do we want to go? • For today’s panel we have a good cross-section – includes experts • Unstructured text mining and automated taxonomy and categorization • Folksonomies and original research in reconciling ontologies and social networks • Research semantic tools • Semantic wikis and tools • Faceted-based logic, search, and associated tools • Audience can contribute ontologist’s view • Previous talk (Tim Redmond) discussed the Protégé tool and how to extend it • Future talk (Pat Cassidy) will discuss more formalized approaches to ontological engineering Almaden Research Center, San Jose, CA

Scott Spangler, IBM Almaden Services Research, San Jose, CA • Senior Technical Staff member at IBM Almaden Services Research • 15 years developing applications for • Statistical data analysis • Knowledge-based systems • Text mining • Business intelligence (BI) • 14 patents in text mining and BI • BS in mathematics from MIT • Masters in computer science from UT Austin Almaden Research Center, San Jose, CA

Peter Mika, Free University of Amsterdam, Netherlands • Ph.D. candidate in computer science at Free University, Amsterdam, The Netherlands • Social networks and folksonomy • Semantic Web and ontology • Best paper award at ISWC 2005 for paper entitled “Ontologies are us: A unified model of social networks and semantics” • Winner of Semantic Web Challenge at ISWC 2004 for Flink system • Co-chair of Semantic Web Challenge 2006 • Author of various semantic related tool and research • openacademia.org, Elmo, SWAP, WonderWeb, OnToKnowledge, and more • See Peter’s Web site: http://www.cs.vu.nl/~pmika/research.html Almaden Research Center, San Jose, CA

Conor Shankey, VisualKnowledge, Vancouver, Canada • CEO of VisualKnowledge • Enterprise class ontology lifecycle management platform • Flexible metadata support, e.g., OWL and RDF • Ontology federation • Support for transactions and multithreading • Pluggable micro-inference engine • Long history of successful ontological applications • Various successful projects using VisualKnowledge tools Almaden Research Center, San Jose, CA

John “Boz” Handy-Bosma, IBM Global Services, Austin, TX • IBM Master Inventor • More than 60 patents issued or pending • 15th plateau level • Recipient of several IBM awards for inventions incorporated into products • Expertise • Faceted-based logic • Application of facet-based search • Senior IT Architect, IBM Austin • Assignee to IBM Almaden Services Research • Visiting Faculty, University of Texas at Austin Almaden Research Center, San Jose, CA

Initial questions • Is automated taxonomy and categorization of unstructured text “good enough”? • What are best text mining techniques for wiki-based content? • What active or passive role do you see for users of the ONTOLOG forum to help better categorize content? • Importance of user ratings? Do we need more active users in order for ratings to reflect wisdom? • How can facetted-logic and -search help? • Role of ontological engineering tools? • Role and application of web 2.0 tools, services, and techniques? E.g., podzinger.com Almaden Research Center, San Jose, CA

Hindi Thai Traditional Chinese Gracias Spanish Russian Thank You Obrigado English Brazilian Portuguese Arabic Danke German Grazie Merci Italian French Simplified Chinese Tamil Japanese Korean Almaden Research Center, San Jose, CA

Backup slides Almaden Research Center, San Jose, CA

Approach and thesis • Two key issues in just ontologizing content • Lack of pragmatism in the goals of ontologies • Heterogeneity of usage and use cases • Summary of approach • Simple tagging for human collaboration (folksonomies) as well as rating systems for content’s parts • Convert audio automatically into annotated text transcripts • Mining tools to automate annotation of content and infer taxonomies • Ontology for outline of content • Secret sauce is in how we combine the semantics, i.e., algorithms, to solve the use cases Almaden Research Center, San Jose, CA

Tagging and ratings – Human collaboration • Tagging • Idiosyncratic • Results in bag of tags forming folksonomies • Various available services, e.g., http://del.icio.us, http://flikr.com, and so on • Need incentives for humans, e.g., easier search • Evolving into some form of “ontology” (see Peter Mika’s paper “Ontologies are us: A unified model of social networks and semantics” at ICSW 2005) • Ratings • Enables feedback • Rate the ratings to avoid collusion • Similar to http://digg.com, Amazon’s rating system, and eBay.com’s reputation system (various works in literature) Almaden Research Center, San Jose, CA

Audio content • Automated transcript • Use Web services to convert audio to text transcript • Some Web services, e.g., http://podzinger.com, also annotate the transcript and do more than close captions • May involve human collaboration to gradually improve content (especially resolving context errors) • Issues • ONTOLOG audio (Podcast) have some low quality MP3s • Static noise and “voice storms” Almaden Research Center, San Jose, CA

Mining • Automatic annotation of content • Mature tool set in UIMA • Others (?) • Generate initial taxonomy • Continual process to update annotation • Dr. David Ferrucci (IBM Research) lead architect of UIMA project to present to community on May 11, 2006 Almaden Research Center, San Jose, CA

Ontology • Outline • Create initial outline of site content with some ontology • Reuse existing ontology • IMO this ontology can be specific to ONTOLOG and therefore not necessarily a “upper” ontology • What are the primary goals for this outline and ontology? • Cataloguing • Search (why not just use Google services?) • Statistics (why not just use Amazon’s Alexa services?) • Others (?) Almaden Research Center, San Jose, CA

Ontologizing the ONTOLOG Content: Tools, Techniques, and Approaches Panel at Almaden Research Center