270 likes | 400 Vues
Text Mining at IBM & Microsoft. Text Mining at IBM. Creating a natural language call router http://www-128.ibm.com/developerworks/library/wi-natural/. a call router simply asks "What can I do for you?" let callers state in plain language their problem
E N D
Creating a natural language call routerhttp://www-128.ibm.com/developerworks/library/wi-natural/ • a call router simply asks "What can I do for you?" • let callers state in plain language their problem • Be routed to the proper destination quickly and reliably
How call routing works • two statistical models need to be created • Statistical Language Model (SLM) - performs speech recognition - • based on what your callers are likely to say • Action Classifier (AC) model • AC model takes the spoken request obtained by the speech engines • predicts the correct action to take
Steps • Collect the initial data that will be used to train the two statistical models • define the question that will be used to ask the callers – e.g., "How may I help you?" • Categorize the organization in the best way to route callers’ queries • Use this list to construct a data collection questionnaire
Listing 1. A sample data collection questionnaire Thank you for helping us build a call routing system. Please respond with the real things you would say out loud to a telephone system according to the following scenario: You call the system and are greeted with the following statement: Thank you for calling ACME Motors. Please say what you'd like and I will direct your call. How may I help you? Think of a few ways to say that you want to be transferred to the warranty department: • __________________________________________________________ • __________________________________________________________ • __________________________________________________________ Now imagine you want to talk to the complaint department. Type some of the things you might imagine saying: • __________________________________________________________ • __________________________________________________________ • __________________________________________________________ Your car is broken down and you need to speak with a roadside assistance specialist. What kinds of things might you say? • __________________________________________________________ • __________________________________________________________ • __________________________________________________________
System environment • Rational® Web Developer (RWD) 6.0 or Rational Application Developer (RAD) 6.0 • DB2® 8.0 or greater • IBM® WebSphere® Voice Toolkit Preview with the Natural Language Understanding (NLU) Feature installed
Resources • Download the IBM WebSphere Voice Toolkit from AlphaWorks to get started. • Check out the WebSphere Voice Zone for additional information about IBM WebSphere Voice products. • Learn more about using the Eclipse platform at the Eclipse Web site. • Get more ideas and expertise on wireless technology in the developerWorks Wireless technology zone.
IBM Business Intelligence (BI) Text mining solutions from IBM http://www-306.ibm.com/solutions/businessintelligence/textmining/index.htm “Eighty percent of the world's electronic information is stored as text, not data” • magazine and newspaper, • text on the Internet, • customer service records, • market surveys, • technical papers and more. Businesses need to process both numeric and textual data
Business intelligence (BI) • BI transforms business data into conclusive, fact-based and actionable information • BI allows companies to • spot customer trends, • create customer loyalty, • enhance supplier relationships, • reduce financial risk, • and uncover new sales opportunities to gain and maintain competitive advantage. http://www-306.ibm.com/solutions/businessintelligence/index.html
Real-world information • Company experience with best business practices and least-productive practices; • Customer complaints and preferences culled from call center transcripts; • Insights from surveys, studies, telephone calls and correspondence; • Success factors in effective direct marketing and sales call techniques; • Reaction to published competitor claims and special marketing offers.
Text Knowledge Miner (TKM) from IBMhttp://www-306.ibm.com/solutions/businessintelligence/textmining/tkminer.htm • Based on IBM's Intelligent Miner for Text toolkit • extract key facts and relationships from large collections of text-based documents • excels at analyzing structured, complex documents e.g., • scientific publications, • patents, newswires, • trade press reports and • newspaper articles • analyze competitive intelligence • a custom Web "crawler" searches selected Web sites for the most recent, relevant information, and then updates a custom database with the search results • search competitive Web sites for the latest rates, terms and customer incentive programs • This intelligence is then routed to marketing teams and company executives so they can act on it quickly
IBM's Customer Relationship Intelligence (CRI)http://www-306.ibm.com/solutions/businessintelligence/textmining/cri.htm • Use those processes to better understand customer requirements and build lasting relationships. • Text mining algorithms of CRI allow companies to condense weeks of manual labor into hours. • Extract knowledge and insight from the mountains of customer-related information
IBM's Customer Relationship Intelligence (CRI)http://www-306.ibm.com/solutions/businessintelligence/textmining/cri.htm • Use IBM's voice recognition technology to convert customer calls into text • Text can then be mined for specific examples of effective sales techniques and positive customer reaction
A text-mining system for knowledge discovery from biomedical documentshttp://researchweb.watson.ibm.com/journal/sj/433/uramoto.html • Use vast amount of textual domain-specific information available • MEDLINE is a database of over 11 million citations (abstracts) of biomedical articles • Use for text-mining systems in life science • extracting information on biomedical concepts as genes, proteins, and diseases from text • Entity extraction—the recognition of gene, protein, and chemical names from biomedical text • Relation extraction—the extraction of relationships among these entities
Features of the MedTAKMI mining system • Information extraction process • Relation extraction
Relation extraction • The TAKMI system for CRM is able to extract “subject ... verb” or “verb ... object” relationships in a sentence.
Relation extraction Table 2 Examples of ternary relationships extracted from MEDLINE
MedTAKMI - Mining functions for large document collections • Keyword-based and full-text searching • Hierarchical category viewer • Chronological viewer • Two-dimensional viewer (term-association) • Trend analysis viewer • Other analytical tools
Text Mining Search and Navigation Research (TMSN) at Microsoft
Natural Language Computing (NLC) Group • multi-language text analysis, • machine translation, • cross language information retrieval, • and question answering
Natural Language Processing (NLP) • design and build software that will analyze, understand, and generate languages that humans use naturally
Selected current projectshttp://research.microsoft.com/nlp/ Machine Translation data-driven approach which all translation knowledge is learned from existing bilingual text. Textual Entailment Recognition captures major semantic inference needs across many natural language processing applications. Paraphrase recognition and generation are crucial to creating applications that approximate our understanding of language. We have released a corpus of approximately 5000 sentence pairs that have been annotated by humans to indicate whether or not they can be considered paraphrases. Alignment phrase tables created using the data described in Quirk et al. (2004) and Dolan et al. (2004) are now also available for download. MindNet knowledge representation project that uses our broad-coverage parser to build semantic networks from dictionaries, encyclopedias, and free text. The Japanese NLP project page summarizes areas of research we are working on in processing Japanese. Older projects Amalgam is a novel system developed in the Natural Language Processing group at Microsoft Research for sentence realization during natural language generation that employs machine learning techniques. Sentence realization is the process of generating (realizing) a fluent sentence from a semantic representation. IntelliShrink is a product that uses linguistic analysis to abbreviate an email message so that it can be displayed on a cell phone. IntelliShrink analyses messages in English, French, German or Spanish.