1 / 26

Text Mining at IBM & Microsoft

Text Mining at IBM & Microsoft. Text Mining at IBM. Creating a natural language call router http://www-128.ibm.com/developerworks/library/wi-natural/. a call router simply asks "What can I do for you?" let callers state in plain language their problem

misae
Télécharger la présentation

Text Mining at IBM & Microsoft

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text Mining at IBM & Microsoft

  2. Text Mining at IBM

  3. Creating a natural language call routerhttp://www-128.ibm.com/developerworks/library/wi-natural/ • a call router simply asks "What can I do for you?" • let callers state in plain language their problem • Be routed to the proper destination quickly and reliably

  4. How call routing works • two statistical models need to be created • Statistical Language Model (SLM) - performs speech recognition - • based on what your callers are likely to say • Action Classifier (AC) model • AC model takes the spoken request obtained by the speech engines • predicts the correct action to take

  5. Steps • Collect the initial data that will be used to train the two statistical models • define the question that will be used to ask the callers – e.g., "How may I help you?" • Categorize the organization in the best way to route callers’ queries • Use this list to construct a data collection questionnaire

  6. Listing 1. A sample data collection questionnaire Thank you for helping us build a call routing system. Please respond with the real things you would say out loud to a telephone system according to the following scenario: You call the system and are greeted with the following statement: Thank you for calling ACME Motors. Please say what you'd like and I will direct your call. How may I help you? Think of a few ways to say that you want to be transferred to the warranty department: • __________________________________________________________ • __________________________________________________________ • __________________________________________________________ Now imagine you want to talk to the complaint department. Type some of the things you might imagine saying: • __________________________________________________________ • __________________________________________________________ • __________________________________________________________ Your car is broken down and you need to speak with a roadside assistance specialist. What kinds of things might you say? • __________________________________________________________ • __________________________________________________________ • __________________________________________________________

  7. System environment • Rational® Web Developer (RWD) 6.0 or Rational Application Developer (RAD) 6.0 • DB2® 8.0 or greater • IBM® WebSphere® Voice Toolkit Preview with the Natural Language Understanding (NLU) Feature installed

  8. Resources • Download the IBM WebSphere Voice Toolkit from AlphaWorks to get started. • Check out the WebSphere Voice Zone for additional information about IBM WebSphere Voice products. • Learn more about using the Eclipse platform at the Eclipse Web site. • Get more ideas and expertise on wireless technology in the developerWorks Wireless technology zone.

  9. IBM Business Intelligence (BI) Text mining solutions from IBM http://www-306.ibm.com/solutions/businessintelligence/textmining/index.htm “Eighty percent of the world's electronic information is stored as text, not data” • magazine and newspaper, • text on the Internet, • customer service records, • market surveys, • technical papers and more. Businesses need to process both numeric and textual data

  10. Business intelligence (BI) • BI transforms business data into conclusive, fact-based and actionable information • BI allows companies to • spot customer trends, • create customer loyalty, • enhance supplier relationships, • reduce financial risk, • and uncover new sales opportunities to gain and maintain competitive advantage. http://www-306.ibm.com/solutions/businessintelligence/index.html

  11. Real-world information • Company experience with best business practices and least-productive practices; • Customer complaints and preferences culled from call center transcripts; • Insights from surveys, studies, telephone calls and correspondence; • Success factors in effective direct marketing and sales call techniques; • Reaction to published competitor claims and special marketing offers.

  12. Text Knowledge Miner (TKM) from IBMhttp://www-306.ibm.com/solutions/businessintelligence/textmining/tkminer.htm • Based on IBM's Intelligent Miner for Text toolkit • extract key facts and relationships from large collections of text-based documents • excels at analyzing structured, complex documents e.g., • scientific publications, • patents, newswires, • trade press reports and • newspaper articles • analyze competitive intelligence • a custom Web "crawler" searches selected Web sites for the most recent, relevant information, and then updates a custom database with the search results • search competitive Web sites for the latest rates, terms and customer incentive programs • This intelligence is then routed to marketing teams and company executives so they can act on it quickly

  13. IBM's Customer Relationship Intelligence (CRI)http://www-306.ibm.com/solutions/businessintelligence/textmining/cri.htm • Use those processes to better understand customer requirements and build lasting relationships. • Text mining algorithms of CRI allow companies to condense weeks of manual labor into hours. • Extract knowledge and insight from the mountains of customer-related information

  14. IBM's Customer Relationship Intelligence (CRI)http://www-306.ibm.com/solutions/businessintelligence/textmining/cri.htm • Use IBM's voice recognition technology to convert customer calls into text • Text can then be mined for specific examples of effective sales techniques and positive customer reaction

  15. A text-mining system for knowledge discovery from biomedical documentshttp://researchweb.watson.ibm.com/journal/sj/433/uramoto.html • Use vast amount of textual domain-specific information available • MEDLINE is a database of over 11 million citations (abstracts) of biomedical articles • Use for text-mining systems in life science • extracting information on biomedical concepts as genes, proteins, and diseases from text • Entity extraction—the recognition of gene, protein, and chemical names from biomedical text • Relation extraction—the extraction of relationships among these entities

  16. Features of the MedTAKMI mining system • Information extraction process • Relation extraction

  17. Relation extraction • The TAKMI system for CRM is able to extract “subject ... verb” or “verb ... object” relationships in a sentence.

  18. Relation extraction Table 2   Examples of ternary relationships extracted from MEDLINE

  19. MedTAKMI - Mining functions for large document collections • Keyword-based and full-text searching • Hierarchical category viewer • Chronological viewer • Two-dimensional viewer (term-association) • Trend analysis viewer • Other analytical tools

  20. Text Mining Search and Navigation Research (TMSN) at Microsoft

  21. Natural Language Computing (NLC) Group • multi-language text analysis, • machine translation, • cross language information retrieval, • and question answering

  22. Natural Language Processing (NLP) • design and build software that will analyze, understand, and generate languages that humans use naturally

  23. Selected current projectshttp://research.microsoft.com/nlp/ Machine Translation data-driven approach which all translation knowledge is learned from existing bilingual text. Textual Entailment Recognition captures major semantic inference needs across many natural language processing applications. Paraphrase recognition and generation are crucial to creating applications that approximate our understanding of language. We have released a corpus of  approximately 5000 sentence pairs that have been annotated by humans to indicate whether or not they can be considered paraphrases. Alignment phrase tables created using the data described in Quirk et al. (2004) and Dolan et al. (2004) are now also available for download. MindNet knowledge representation project that uses our broad-coverage parser to build semantic networks from dictionaries, encyclopedias, and free text. The Japanese NLP project page summarizes areas of research we are working on in processing Japanese. Older projects Amalgam is a novel system developed in the Natural Language Processing group at Microsoft Research for sentence realization during natural language generation that employs machine learning techniques. Sentence realization is the process of generating (realizing) a fluent sentence from a semantic representation. IntelliShrink is a product that uses linguistic analysis to abbreviate an email message so that it can be displayed on a cell phone. IntelliShrink analyses messages in English, French, German or Spanish.

  24. Machine Translation

More Related