1 / 38

Ternary Directed Acyclic Word Graphs (TDAWG)

Ternary Directed Acyclic Word Graphs (TDAWG). Satoru Miyamoto, Shunsuke Inenaga, Masayuki Takeda and Ayumi Shinohara. Present by Peera Liewlom (The Last Algorithm Group). CIAA 2003. Eighth International Conference on Implementation and Application of Automata

tao
Télécharger la présentation

Ternary Directed Acyclic Word Graphs (TDAWG)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ternary Directed Acyclic Word Graphs (TDAWG) Satoru Miyamoto, Shunsuke Inenaga, Masayuki Takeda and Ayumi Shinohara Present by Peera Liewlom (The Last Algorithm Group)

  2. CIAA 2003 • Eighth International Conference on Implementation and Application of Automata • July 16-18, 2003, Santa Barbara, CA, USA • Topic / Committee / Community

  3. Why did I select this paper ? • DAWG start 1985… not so far • Continueing development • cDAWG, ASDAWG, morphic DAWG, WDAWG, SDAWG, two-tree DAWG, DASG, CSDAWG etc. • TST : 1997 – 98, TDAWG : 2003 • DAWG : Widely Apply by Bioinformatics, NLP, Graph Theory, String Matching, Automata etc. • Speed & Space Trends in Huge Data Management • Topic for Algorithm Group • Matching the interesting topics in this seminar group

  4. Content • DFA (use in string matching’s problem) • DAWG • Ternary Search Tree • Paper : TDAWG, Experiment & Result • Paper : Conclusion • Paper : Discussion

  5. DFADeterministic Finite Automata

  6. Formalities • Deterministic Finite Accepter (DFA) : set of states : input alphabet : transition function : initial state : set of final states

  7. Set of States

  8. Input Aplhabet

  9. Initial State

  10. Set of Final States

  11. Transition Function

  12. Transition Function

  13. Another Example accept accept accept

  14. = { all substrings with prefix } accept

  15. = { all strings without substring }

  16. DAWGDirected Acyclic Word Graph

  17. DAWG

  18. DAWG

  19. DAWG

  20. cDAWG

  21. TSTTernary Search Tree

  22. TST History • Jon L. Bentley and Robert Sedgewick • Algorithms for Sorting and Searching Strings, Proceeding. 8th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), January 1997. • Ternary Search Trees, Dr. Dobb's Journal, April 1998. • Dictionary of Algorithms and Data Structures, National Institute of Standard and Technology, http://www.nist.gov/

  23. DST BST TST

  24. TDAWGTernary Directed Acyclic Word Graph

  25. Introduction • DFA  how to implement the transitions of each state ? (Time & Space efficiency) • TST “implant” BST for transitions • Good Time • DAWG smallest DFA for all suffixes • Good Space • TDAWG • Proof : TDAWG VS. DAWG

  26. Hypothesis / Theorem (1/2) • Time = Construct + Search (useable for online) • DFA function •  = Alphabet (Chinese & Japan ~ 1000 chars) • State • Table  O(|p|) p = length of pattern • Table use very large memory • Link List  O(| | x |p|) search time • If  is large … problem for search time

  27. Hypothesis / Theorem (2/2) • For TDAWG • Use O(|S|) space • Use O(log|| x |p|) for search time • Use O(|| x |S|2) construct time (Bentley & Sedwick) • Use O(|| x |S|) construct time (this paper … apply from Blummer’s online DAWG construction) • Comparison : TDAWG VS. DAWG(table & link list) • Space , Search Time , Construction Time

  28. TST  TDAWG

  29. Online DAWG Construction

  30. Online TDAWG Construction

  31. Experiment Result

  32. Conclusion • New data structure … TDAWG • Construction time (English text 256) • TDAWG < linklistDAWG < tableDAWG • Space Requirment • linklistDAWG < TDAWG ~ 20 % • tableDAWG not compare in same scale • Search Time • Short pattern: tableDAWG best , TDAWG < linklistDAWG • Log curve VS. Linear Curve (long pattern?)

  33. Discussion & Future Work • In Asian Language (characters~1000s) should have better search time than English (character 256) because log(||x|p|) • Apply to other DAWG… cDAWG, minimumDAWG …etc. • More efficiency by AVL tree (AVL-balance) • Bioinformatic have 4 character . But, Sliding window with 12 characters = 412

More Related