1 / 26

World class IT in a world-wide market

World class IT in a world-wide market. Practical results with Emile. Marten Trautwein Syllogic B.V. Road map. Introduction myself Context: Text mining tools Results with Emile. Introduction myself. Computer Science at UvA (1986 - 1991) Theoretical computer science

lpedro
Télécharger la présentation

World class IT in a world-wide market

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. World class IT in a world-wide market

  2. Practical results with Emile Marten Trautwein Syllogic B.V.

  3. Road map • Introduction myself • Context: • Text mining tools • Results with Emile

  4. Introduction myself • Computer Science at UvA (1986 - 1991) • Theoretical computer science • Complexity of Categorial Unification Grammar • Dr Janssen • PhD Computer Science at Uva (1991 - 1995) • Theoretical computer science • Complexity of Unification Grammars • Dr v. Emde Boas, Dr Janssen, Dr Torenvliet • Syllogic B.V. (1995 - ...) • Research and development • Text mining

  5. Context • Term clustering • TextAnalyst - Microsystems Co. Ltd. • Intelligent miner for text - IBM

  6. TextAnalyst • Microsystems Co. Ltd. • Megaputer Intelligence Inc (distributor) • Version 2.0 • www.megaputer.com

  7. TextAnalyst - Features • Functionality includes • Hierarchical / Structured topics • Knowledge base formation • Semantic search • Abstracting • Languages • English • Russian

  8. TextAnalyst - Knowledge base

  9. TextAnalyst - Summarization

  10. Intelligent miner for text • IBM Corp. • Version 2.3 • December 1998 • www-4.ibm.com/software/data/iminer/fortext/

  11. IM4Text - Features • Functionality includes • Clustering • Categorization • Search • Summarization • WebCrawler • Languages • English

  12. IM4Text- Clustering 0 III IX, X VII XI I II IV V VI VIII XII

  13. IM4Text - Summarization

  14. Verity Knowledge Organizer Autonomy Knowledge Server GrapeVine TextWise's DR-LINK, CHESS and CINDOR Data Junction's Cambio DataSet Synthema, Italy (IBM Technology Watch) Semio Corp's SemioMap Cartia's ThemeScape Canis' cMap Inxight's LinguistX and VizControls Muscat's Empower Other tools

  15. Emile • Syllogic / University of Amsterdam • Version 3.1

  16. Emile - Features • Functionality includes • Grammar induction • Knowledge base construction • Compound term separation • Languages • Any

  17. Fragment of Phaistos disk 1 41 40 7. 2 12 4 40 33. 2 12 6 18 *. 2 12 13 1. 2 12 13 1 18. 2 12 27 14 32 18 27. 2 12 27 35 37 21. 2 12 31 26. 2 12 32 23 38. 2 12 41 19 35. 2 27 25 10 23 18. … 16 14 18. 16 23 18 43. Fragment of grammar [0] --> [3] . [3] --> [16] [47] [14] --> 15 [40] [14] --> 2 12 [16] --> 2 [57] 25 10 23 [16] --> [14] 13 1 [16] --> 16 14 [40] --> 7 [40] --> 29 [47] --> 18 [47] --> 24 40 [57] --> 27 [57] --> 29 Emile - Grammar induction

  18. Emile - Incomplete data set Ik kan geen mail lezen met MS-Mail Ik kan geen mail schrijven met MS-Mail Ik kan geen mail openen met MS-Mail Ik kan geen mail verzenden met MS-Mail Ik kan geen mail lezen met MS-Outlook Ik kan geen mail schrijven met MS-Outlook Ik kan geen mail openen met MS-Outlook Ik kan geen mail verzenden met MS-Outlook Ik kan geen mail lezen met Mail Ik kan geen mail schrijven met Mail Ik kan geen mail openen met Mail Ik kan geen mail verzenden met Mail Ik kan geen mail lezen met Outlook Ik kan geen mail schrijven met Outlook Ik kan geen mail openen met Outlook Ik kan geen mail verzenden met Outlook

  19. Default on 12 context support: 30%expression support: 30%total support: 50% Default on 8 context support: 40%expression support: 40%total support: 60% context support: 50%expression support: 50%total support: 70% Generate data set Generate complete language Generate data set Generate 15 out of 16 sentences Generate complete language Emile - Variable settings

  20. [0] --> [2] [18] [0] --> [31] [29] [0] --> [42] [15] [2] --> Ik kan geen mail [12] met [12] --> openen [12] --> verzenden [15] --> met [41] [15] --> met [18] [18] --> MS-Mail [18] --> MS-Outlook [27] --> verzenden [27] --> lezen [29] --> met [30] [30] --> MS-Outlook [30] --> Mail [31] --> Ik kan geen mail [27] [31] --> Ik kan [45] [39] --> lezen [39] --> schrijven [41] --> Mail [41] --> Outlook [42] --> Ik kan [45] [45] --> geen mail [39] [45] --> geen mail [12] Emile - Induced grammar

  21. Dictionary Type [35] K033 k033 K105 k33 Dictionary Type [87] Vrachtgeb vrachtgeb Vrachtgebouw Vracht Dictionary Type [89] CGOADTP6 Printqueue Dictionary Type [114] is Userid Password Dictionary Type [138] status Error Dictionary Type [196] scarlos vrachtbrieven Dictionary Type [215] G239 g239 Dictionary Type [237] enorm ontzettend super Dictionary Type [290] pingen benaderen Emile - Knowledge base

  22. [16] --> School of Medicine , University of Washington , Seattle 98195 , USA [16] --> University of Kitasato Hospital , Sagamihara , Kanagawa , Japan [16] --> Heinrich-Heine-University , Dusseldorf , Germany [16] --> School of Medicine , Chiba University [5] --> Department of Urology , [16] [94] --> Chinese [94] --> Japanese [94] --> Polish [101] --> 32 : Cancer Res 1996 Oct [101] --> 35 : Genomics 1996 Aug [101] --> 44 : Cancer Res 1995 Dec [101] --> 50 : Cancer Res 1995 Feb [101] --> 54 : Eur J Biochem 1994 Sep [101] --> 58 : Cancer Res 1994 Mar [105] --> identified in 13 cases ( 72 [105] --> detected in 9 of 87 informative cases ( 10 [105] --> observed in 5 ( 55 [11] --> LOH was [105] % Emile - Knowledge base

  23. Emile on Biomed (1)

  24. Emile on Biomed (2)

  25. Emile on Biomed (3)

  26. Merits Emile • Language independent • Clustering within sentences • Incremental learning • No training phase • Raw text input • Access to source code

More Related