1 / 64

Knowledge Management Systems: Development and Applications Part II: Techniques and Examples

Knowledge Management Systems: Development and Applications Part II: Techniques and Examples. Hsinchun Chen, Ph.D. McClelland Professor, Director, Artificial Intelligence Lab and Hoffman E-Commerce Lab The University of Arizona Founder, Knowledge Computing Corporation.

Philip
Télécharger la présentation

Knowledge Management Systems: Development and Applications Part II: Techniques and Examples

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Knowledge Management Systems: Development and ApplicationsPart II: Techniques and Examples Hsinchun Chen, Ph.D. McClelland Professor, Director, Artificial Intelligence Lab and Hoffman E-Commerce Lab The University of Arizona Founder, Knowledge Computing Corporation Acknowledgement: NSF DLI1, DLI2, NSDL, DG, ITR, IDM, CSS, NIH/NLM, NCI, NIJ, CIA, NCSA, HP, SAP 美國亞歷桑那大學, 陳炘鈞博士

  2. Knowledge Management Systems: Overview

  3. KMS Root: Intersection of IR and AI Information Retrieval (IR) and Gerald Salton • Inverted Index, Boolean, and Probabilistic, 1970s • Expert Systems, User Modeling and Natural Language Processing, 1980s • Machine Learning for Information Retrieval, 1990s • Internet Search Engines, late 1990s

  4. KMS Root: Intersection of IR and AI Artificial Intelligence (AI) and Herbert Simon • General Problem Solvers, 1970s • Expert Systems, 1980s • Machine Learning and Data Mining, 1990s • Autonomous Agents, late 1990s

  5. Representing Knowledge •IR Approach •Indexing and Subject Headings •Dictionaries, Thesauri, and Classification Schemes •AI Approach •Cognitive Modeling •Semantic Networks, Production Systems, Logic, Frames, and Ontologies

  6. Knowledge Retrieval Vendor Direction(Source: GartnerGroup) Market Target Newbies: IR Leaders: • grapeVINE • Sovereign Hill • CompassWare • Intraspect • KnowledgeX • WiseWire • • Lycos • • Autonomy • • Perspecta • Verity • Fulcrum • Excalibur • Dataware Knowledge Retrieval NewBies IR Leaders Niche Players: • • IDI • Oracle • • Open Text • • Folio • • IBM • • InText • PCDOCS • Documentum Lotus Netscape* Technology Innovation Microsoft Niche Players * Not yet marketed Content Experience

  7. KM Software Vendors Challengers Leaders Lotus * Microsoft * Dataware * Autonomy* * Verity * IBM * Excalibur Ability to Execute Netscape * Documentum* PCDOCS/* Fulcrum IDI* Inference* OpenText* Lycos/InMagic* CompassWare* GrapeVINE* KnowledgeX* * InXight WiseWire* SovereignHill* Semio* *Intraspect Visionaries Niche Players Completeness of Vision

  8. Competitive Analysis: Text Analysis

  9. Competitive Analysis: Collection Creation

  10. Competitive Analysis: Retrieval/Display

  11. Knowledge Management Systems: Techniques

  12. KMS Techniques: Linguistic analysis/NLP: identify key concepts (who/what/where…) Statistical/co-occurrence analysis: create automatic thesaurus, link analysis Statistical and neural networks clustering/categorization: identify similar documents/users/communities and create knowledge maps Visualization and HCI: tree/network, 1/2/3D, zooming/detail-in-context

  13. KMS Techniques: Linguistic Analysis Word and inverted index: stemming, suffixes, morphological analysis, Boolean, proximity, range, fuzzy search Phrasal analysis: noun phrases, verb phrases, entity extraction, mutual information Sentence-level analysis: context-free grammar, transformational grammar Semantic analysis: semantic grammar, case-based reasoning, frame/script

  14. Techniques Illinois DLI-1 project: “Federated Search of Scientific Literature” Research goal: Semantic interoperability across subject domain Technologies: Semantic retrieval and analysis technologies • Text Tokenization • Part-of-speech-tagging • Noun phrase generation Natural Language Processing Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

  15. Techniques • Text Tokenization • Part-of-speech-tagging • Noun phrase generation Natural Language Processing Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

  16. KMS Techniques: Statistical/Co-Occurrence Analysis Similarity functions: Jaccard, Cosine Weighting heuristics Bi-gram, tri-gram, N-gram Finite State Automata (FSA) Dictionaries and thesauri

  17. Techniques Illinois DLI project: “Federated Search of Scientific Literature” Research goal: Semantic interoperability across subject domain Technologies: Semantic retrieval and analysis technologies Natural Language Processing Co-occurrence analysis • Heuristic term weighting • Weighted co-occurrence analysis Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

  18. Techniques • Heuristic term weighting • Weighted co-occurrence analysis Co-occurrence analysis Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

  19. KMS Techniques: Clustering/Categorization Hierarchical clustering: single-link, multi-link, Ward’s Statistical clustering: multi-dimensional scaling (MDS), factor analysis Neural network clustering: self-organizing map (SOM) Ontologies: directories, classification schemes

  20. Techniques Illinois DLI project: “Federated Search of Scientific Literature” Research goal: Semantic interoperability across subject domain Technologies: Semantic retrieval and analysis technologies Natural Language Processing Co-occurrence analysis Neural Network Analysis • Document clustering • Category labeling • Optimization and parallelization Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

  21. Techniques • Document clustering • Category labeling • Optimization and parallelization Neural Network Analysis Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

  22. KMS Techniques: Visualization/HCI Structures: trees/hierarchies, networks Dimensions: 1D, 2D, 2.5D, 3D, N-D (glyphs) Interactions: zooming, spotlight, fisheye views, fractal views

  23. Techniques Illinois DLI project: “Federated Search of Scientific Literature” Research goal: Semantic interoperability across subject domain Technologies: Semantic retrieval and analysis technologies Natural Language Processing Co-occurrence analysis Neural Network Analysis Advanced Visualization • 1D: alphabetic listing of categories • 2D: semantic map listing of categories • 3D: interactive, helicopter fly-through using VRML Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

  24. Techniques • 1D, 2D, 3D Advanced Visualization Automatic Generation of CL:

  25. Advanced Techniques Automatic Generation of CL: (Continued) • Entity Extraction and Co-reference based on TREC and MUG • Text segmentation and summarization based on Textile and Wavelets • Visualization techniques based on Fisheye, Fractal, and Spotlight

  26. Advanced Techniques Integration of CL: • Lexicon-enhanced indexing (e.g., UMLS Specialist Lexicon) • Ontology-enhanced query expansion (e.g., WordNet, UMLS Metathesaurus) • Ontology-enhanced semantic tagging (e.g., UMLS Semantic Nets) • Spreading-activation based term suggestion (e.g., Hopfield net)

  27. YAHOO: manual, high-precision OOHAY: automatic, high-recall Acknowledgements: NSF, NIH, NLM, NIJ, DARPA YAHOO vs. OOHAY:

  28. Knowledge Computing Approach Y A H O O Y A H O O Y A H O O Y A H O O O O H A Y O O H A Y O O H A Y O Y H A O From YAHOO! To OOHAY? Y A H O O ! Object Oriented Hierarchical Automatic Yellowpage ?

  29. Knowledge Management Systems: Examples

  30. Web Analysis (1M):Web pages, spidering, noun phrasing, categorization

  31. Research Status Arizona DLI-2 project: “From Interspace to OOHAY?” Research goal: automatic and dynamic categorization and visualization of ALL the web pages in US (and the world, later) Technologies: OOHAY techniques Multi-threaded spiders for web page collection High-precision web page noun phrasing and entity identification Multi-layered, parallel, automatic web page topic directory/hierarchy generation Dynamic web search result summarization and visualization Adaptive, 3D web-based visualization OOHAY: Visualizing the Web

  32. Research Status ROCK MUSIC … 50 6 OOHAY: Visualizing the Web

  33. Lessons Learned: Web pages are noisy: need filtering Spidering needs help: domain lexicons, multi-threads SOM is computational feasible for large-scale application SOM performance for web pages = 50% Web knowledge map (directory) is interesting for browsing, not for searching Techniques applicable to Intranet and marketing intelligence

  34. News Classification (1M):Chinese news content, mutual information indexing, PAT tree, categorization

  35. Lessons Learned: News readers are not knowledge workers News articles are professionally written and precise. SOM performance for news articles = 85% Statistical indexing techniques perform well for Chinese documents Corporate users may need multiple sources and dynamic search help Techniques applicable to eCommerce (eCatalogs) and ePortal

  36. Personal Agents (1K):Web spidering, meta searching, noun phrasing, dynamic categorization

  37. For project information and free download: http://ai.bpa.arizona.edu Research Status OOHAY: CI Spider 1. Enter Starting URLs and Key Phrases to be searched 2. Search results from spiders are displayed dynamically

  38. For project information and free download: http://ai.bpa.arizona.edu Research Status OOHAY: CI Spider, Meta Spider, Med Spider 1. Enter Starting URLs and Key Phrases to be searched 2. Search results from spiders are displayed dynamically

  39. For project information and free download: http://ai.bpa.arizona.edu OOHAY: Meta Spider, News Spider, Cancer Spider

  40. For project information and free download: http://ai.bpa.arizona.edu Research Status OOHAY: CI Spider, Meta Spider, Med Spider 3. Noun Phrases are extracted from the web ages and user can selected preferred phrases for further summarization. 4. SOM is generated based on the phrases selected. Steps 3 and 4 can be done in iterations to refine the results.

  41. Lessons Learned: Meta spidering is useful for information consolidation Noun phrasing is useful for topic classification (dynamic folders) SOM usefulness is suspect for small collections Knowledge workers like personalization, client searching, and collaborative information sharing Corporate users need multiple sources and dynamic search help Techniques applicable to marketing and competitive analyses

  42. CRM Data Analysis (5K):Call center Q/A, noun phrasing, dynamic categorization, problem analysis, agent assistance

More Related