1 / 27

Text Analytics Workshop Applications

Text Analytics Workshop Applications. Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com. Agenda. Text Analytics Applications Integration with Search –Faceted Navigation Integration with ECM Metadata Auto-categorization

rupert
Télécharger la présentation

Text Analytics Workshop Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text Analytics WorkshopApplications Tom ReamyChief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com

  2. Agenda • Text Analytics Applications • Integration with Search –Faceted Navigation • Integration with ECM • Metadata • Auto-categorization • Platform for Information Applications • Enterprise – internal and external • Commercial • Structure for Social

  3. Text Analytics and Search - Elements • Facet – orthogonal dimension of metadata • Entity / Noun Phrase – metadata value of a facet • Entity extraction – feeds facets, signature, ontologies • Taxonomy and categorization rules • Auto-categorization – aboutness, subject facets • People – tagging, evaluating tags, fine tune rules and taxonomy

  4. Essentials of Facets • Facets are not categories • Categories are what a document is about – limited number • Entities are contained within a document – any number • Facets are orthogonal – mutually exclusive – dimensions • An event is not a person is not a document is not a place. • Facets – variety – of units, of structure • Numerical range (price), Location – big to small • Alphabetical, Hierarchical – taxonomic • Facets are designed to be used in combination • Wine where color = red, price = excessive, location = Calirfornia, • And sentiment = snotty

  5. Advantages of Faceted Navigation • More intuitive – easy to guess what is behind each door • Simplicity of internal organization • 20 questions – we know and use • Dynamic selection of categories • Allow multiple perspectives • Ability to Handle Compound Subjects • Systematic Advantages – fewer elements • 4 facets of 10 nodes = 10,000 node taxonomy • Ability to Handle Compound Subjects • Flexible – can be combined with other navigation elements

  6. Developing Facets: Tools and TechniquesSoftware Tools – Entity Extraction • Dictionaries – variety of entities, coverage, specialty • Cost of update – service or in-house • 50+ predefined entity types • 800,000 people, 700,000 locations, 400,000 organizations • Rules • Capitalization, text – Mr., Inc. • Advanced – proximity and frequency of actions, associations • Need people to continually refine the rules • Entities and Categorization • Total number and pattern of entities = a type of aboutness of the document – Bar Code, Fingerprint • SAS – integration of entities (concepts) and categorization

  7. Three Environments • E-Commerce • Catalogs, small uniform collections of entities • Uniform behavior – buy this • Enterprise • More content, more types of content • Enterprise Tools – Search, ECM • Publishing Process – tagging, metadata standards • Internet • Wildly different amount and type of content, no taggers • General Purpose – Flickr, Yahoo • Vertical Portal – selected content, no taggers

  8. Three Environments: E-Commerce

  9. Three Environments: E-Commerce

  10. Enterprise Environment – When and how add metadata • Enterprise Content – different world than eCommerce • More Content, more kinds, more unstructured • Not a catalog to start – less metadata and structured content • Complexity -- not just content but variety of users and activities • Combination of human and automatic metadata – ECM • Software aided - suggestions, entities, ontologies • Enterprise – Question of Balance / strategy • More facets = more findability (up to a point) • Fewer facets = lower cost to tag documents • Issues • Not enough facets • Wrong set of facets – business not information • Ill-defined facets – too complex internal structure

  11. Facets and Taxonomies Enterprise Environment –Taxonomy, 7 facets • Taxonomy of Subjects / Disciplines: • Science > Marine Science > Marine microbiology > Marine toxins • Facets: • Organization > Division > Group • Clients > Federal > EPA • Instruments > Environmental Testing > Ocean Analysis > Vehicle • Facilities > Division > Location > Building X • Methods > Social > Population Study • Materials > Compounds > Chemicals • Content Type – Knowledge Asset > Proposals

  12. External Environment – Text Mining, Vertical Portals • Internet Content • Scale – impacts design and technology – speed of indexing • Limited control – Association of publishers to selection of content to none • Major subtypes – different rules – metadata and results • Complex queries and alerts • Terrorism taxonomy + geography + people + organizations • Text Mining • General or specific content and facets and categories • Dedicated tools or component of Portal – internal or external • Vertical Portal • Relatively homogenous content and users • General range of questions • More specific targets – the document, not a web site

  13. Internet Design • Subject Matter taxonomy – Business Topics • Finance > Currency > Exchange Rates • Facets • Location > Western World > United States • People – Alphabetical and/or Topical - Organization • Organization > Corporation > Car Manufacturing > Ford • Date – Absolute or range (1-1-01 to 1-1-08, last 30 days) • Publisher – Alphabetical and/or Topical – Organization • Content Type – list – newspapers, financial reports, etc.

  14. Integrated Facet ApplicationDesign Issues - General • What is the right combination of elements? • Faceted navigation, metadata, browse, search, categorized search results, file plan • What is the right balance of elements? • Dominant dimension or equal facets • Browse topics and filter by facet • When to combine search, topics, and facets? • Search first and then filter by topics / facet • Browse/facet front end with a search box

  15. Integrated Facet ApplicationDesign Issues - General • Homogeneity of Audience and Content • Model of the Domain – broad • How many facets do you need? • More facets and let users decide • Allow for customization – can’t define a single set • User Analysis – tasks, labeling, communities • Issue – labels that people use to describe their business and label that they use to find information • Match the structure to domain and task • Users can understand different structures

  16. Automatic Facets – Special Issues • Scale requires more automated solutions • More sophisticated rules • Rules to find and populate existing metadata • Variety of types of existing metadata – Publisher, title, date • Multiple implementation Standards – Last Name, First / First Name, Last • Issue of disambiguation: • Same person, different name – Henry Ford, Mr. Ford, Henry X. Ford • Same word, different entity – Ford and Ford • Number of entities and thresholds per results set / document • Usability, audience needs • Relevance Ranking – number of entities, rank of facets

  17. Putting it all together – Infrastructure Solution • Facets, Taxonomies, Software, People • Combine formal power with ability to support multiple user perspectives • Facet System – interdependent, map of domain • Entity extraction – feeds facets, signatures, ontologies • Taxonomy & Auto-categorization – aboutness, subject • People – tagging, evaluating tags, fine tune rules and taxonomy • The future is the combination of simple facets with rich taxonomies with complex semantics / ontologies

  18. Putting it all together – Infrastructure Solution • Integration with ECM • Central Team – • Metadata – Create dictionaries of entities • Develop text analytics catalogs • Publishing Process • Software suggests entities, categorization • Authors task is simple – yes or no, not think of keyword • Enterprise Search • Integrate at metadata level – build advanced presentation and refine results • Integrate into relevance

  19. Text Analytics Platform – Multiple Applications • Platform for Information Applications • Content Aggregation • Duplicate Documents – save millions! • Text Mining – BI, CI – sentiment analysis • Social – Hybrid folksonomy / taxonomy / auto-metadata • Social – expertise, categorize tweets and blogs, reputation • Ontology – travel assistant – SIRI • Integrate with Applications • Text into data – predictive analytics • Use your Imagination!

  20. New Applications in Social MediaBehavior Prediction – Telecom Customer Service • Problem – distinguish customers likely to cancel from mere threats • Analyze customer support notes • General issues – creative spelling, second hand reports • Develop categorization rules • First – distinguish cancellation calls – not simple • Second - distinguish cancel what – one line or all • Third – distinguish real threats

  21. New Applications in Social MediaBehavior Prediction – Telecom Customer Service • Basic Rule • (START_20, (AND, • (DIST_7,"[cancel]", "[cancel-what-cust]"), • (NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”))))) • Examples: • customer called to say he will cancell his account if the does not stop receiving a call from the ad agency. • cci and is upset that he has the asl charge and wants it offor her is going to cancel his act • ask about the contract expiration date as she wanted to cxltehacct • Combine sophisticated rules with sentiment statistical training and Predictive Analytics and behavior monitoring

  22. New Applications: Wisdom of CrowdsCrowd Sourcing Technical Support • Example – Android User Forum • Develop a taxonomy of products, features, problem areas • Develop Categorization Rules: • “I use the SDK method and it isn't to bad a all. I'll get some pics up later, I am still trying to get the time to update from fresh 1.0 to 1.1.” • Find product & feature – forum structure • Find problem areas in response, nearby text for solution • Automatic – simply expose lists of “solutions” • Search Based application • Human mediated – experts scan and clean up solutions

  23. New Directions in Social MediaText Analytics, Text Mining, and Predictive Analytics • Two Systems of the Brain • Fast, System 1, Immediate patterns (TM) • Slow, System 2, Conceptual, reasoning (TA) • Text Analytics – pre-processing for TM • Discover additional structure in unstructured text • Behavior Prediction – adding depth in individual documents • New variables for Predictive Analytics, Social Media Analytics • New dimensions – 90% of information • Text Mining for TA– Semi-automated taxonomy development • Bottom Up- terms in documents – frequency, date, clustering • Improve speed and quality – semi-automatic

  24. Questions? Tom Reamytomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com

More Related