380 likes | 549 Vues
Template-based Authoring. Knowledge Systems Laboratory Stanford. Project Goals. Assist analyst in everyday work Knowledge Authoring Tools to assist in: Research for reports Produce reports Consume reports Share reports Our solution: Semantic Web Templates. Semantic Web Templates.
E N D
Template-based Authoring Knowledge Systems Laboratory Stanford
Project Goals • Assist analyst in everyday work • Knowledge Authoring Tools to assist in: • Research for reports • Produce reports • Consume reports • Share reports • Our solution: Semantic Web Templates
Semantic Web Templates • Knowledge Representation, Semantics are key for information exchange • Creation, maintenance of knowledge must be transparent • Automate extraction of knowledge • Enhance knowledge retrieval methods
Semantic Web Templates • Similar to MS Word Templates • Different templates for different tasks • Word templates can have restrictions on text • Very primitive, such as length of text • Simplistic patterns such as “phone number” • No concepts such as “color” or “country” • One template, many documents • HTML templates are very common today • Many web sites use SQL database as back end, template + SQL HTML
Semantic Web Templates • An HTML file with additional tags • Tags specify: • Where particular knowledge is stated • What kind of knowledge it is • Where it came from, if applicable • References to an entity or relation • Repetitive regions of text
Goal: Assist Research • Unstructured Extraction • Sort through buckets of data to find gold • Entity recognition • Relation recognition • Semistructured Extraction • Utilize repetitive patterns within a page • Use similar pages to extract more data • Robust despite changing pages, data
Unstructured Extraction • Natural language processing • News feeds • Indexing, storage, retrieval • Plugin architecture • Web Services • Our system, collaboration with IBM via NIMD • Rover news crawler • Political news articles from Yahoo! • 22,000 articles, ~8500 concepts, ~1000 relations • Used in authoring tools
Unstructured Extraction • Pattern based system • Leverage “hints” for the reader in news articles • British Prime Minister Tony Blair • <type Country><subClassOf Politician> <unknown name> • “Tony Blair” is a Prime Minister who represents the Country “England”. • System runs daily on Yahoo political news • Highlights known terms in green • Highlights new terms in red • Used to create search index, maintain KB • Demo
Semi-structured Extraction • Extract, produce knowledge • Initial model is Domain Authorities • Enhance KB with ground facts • Strong for relations and breadth of data • Leverages work of others • Makes use of SQL databases • Future work is wide-scale web of trust
Semi-structured Extraction • Site Registry • By description and property • CIA World Fact Book has data about items which are of type <Country> • CIA World Fact Book has properties <population>, <hasNeighbor>, <hasMembership>, etc. • Demo
Semi-structured Extraction • Publishing • Human editing good for high-level concepts • Automated techniques good for relations, ground level facts, and massive repetition • Rover web crawler • Template construction is currently manual • With critical mass of data, templates could be discovered.
Enhanced Document Retrieval • Enhanced document retrieval • Search based on concept • Find articles about… • Membership: Scottie Pippen Trailblazers • Membership: Osama bin Laden al-Qaeda • Subgroups: • Ramadan Shallah Islamic Jihad al-Qaeda • Semantic search
Enhanced Document Retrieval • Document Augmentation • Sidebar acts as glossary as you read • Pre-fetch data user is likely to want • Adapt to user preferences, activities • Deeper understanding for user, gets answers to questions raised while reading
Search Augmentation • Google assumes users only want documents • Provide answers along with documents • Use query term denotation to more closely target results • “Browns Ferry” is a garden park • “Browns Ferry” is a nuclear power plant • Automates what people do with IR systems • Append hints about the type of term being sought
Search Augmentation • Demo: Basic Search • Demo: Followup Data • Demo: Disambiguation • Demo: Relations
Basic Question Answering • Automated techniques for ground facts • Use reasoners for higher-level facts • Tie in with KSL AQUAINT work • Feedback, direction from user • Structure of knowledge allows simple form of question answering
Basic Question Answering • Multiple views into data • Browse interface • Ugly, but complete view • Activity-based knowledge presentation • Search, document augmentation • Future work accept user feedback, customization, preferred sources
Basic Question Answering • Query by example • Users create many similar documents • These are targeted to an activity • Use past work to speed present work • User creates and templates which present data they find interesting in a way they find convenient
Goal: Produce Reports • Most reports are made with Office • Word processor, spreadsheet • Enhance with semantic awareness • Provide seamless access to knowledge • Transparent maintenance, creation • Low overhead of operation • Avoid centralized approach • Contrast with relational database
Word Processing • Creation of new data • Semantic scan • Like spell check or grammar check • Automatically identifies referenced entities • Learns new entities, relations between entities • Annotation of text • User manually adjusts system • User adds new data • System gets smarter over time
Word Processing • Create data via entry into templates • Create new templates • For others • For personal use • Extend templates with new entry areas • Enhance analyst’s view • Semantic Search, Document Augmentation • Sidebar boxes are templates too
Word Processing • Demo: Semantic Scan • Demo: Annotation • Demo: Knowledge Creation
Spreadsheets • Spreadsheets are key tools in analysis • Tabular format, UI are both intuitive • Sorting, basic math functions • We add semantics: • New formula type: “Get Data” • New formula type: “Put Data” • Summarization, new views
Spreadsheets • Example scenario • Suppose SARS was found to affect Asian-Americans more than others? • Analyst wants to determine, based on that, which states are most at risk • Knowledge from Census tells us Asian-American population as a percentage
Goal: Consume Reports • Verify others’ data against yours • Incorporate others’ results into your knowledge base, track sources • Maintain data • Change notification • Document updates with new data • Versioning of documents, data
Goal: Share Reports • Easily exchangable via e-mail • Truth maintenance techniques • Multiple views into data • Leverage domain expertise • The missile guy has a KB, … • Collaboration, trust levels • Colleagues disagree, sources are unreliable
Conclusion • KD-D effort is focused on authoring, analysis tasks • Leverage automated techniques to complement manual techniques • System gets smarter as it’s used • Tie in with commonly used applications