1 / 28

280 likes | 474 Vues

Parallel String Matching Algorithm(s) Using Associative Processors. Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt April 18, 2007. String Matching Problem. Aka. pattern matching or string searching

Télécharger la présentation
## Parallel String Matching Algorithm(s) Using Associative Processors

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.
Content is provided to you AS IS for your information and personal use only.
Download presentation by click this link.
While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

**Parallel String Matching Algorithm(s) Using Associative**Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt April 18, 2007**String Matching Problem**• Aka. pattern matching or string searching • Useful in many applications such as text editing and information retrieval, DNA analysis, Homeland Security**What are we doing?**• Given a pattern and some text, find out if the pattern is IN the text • Is pattern AB in the text ABAA? If so, where? AB ABAA**What’s the notation?**• P is a pattern string of length m • T is a text string of length n, usually n ≥ m**Why use P[j]? How does it relate to T[i+j-1]?**Goal of String Matching • To find all occurrences of a pattern string in the text string • Locate all positions i in T such that T[i+j-1] = P[j] for all j, 1 ≤ j ≤ m**Pattern Variations**• An exact pattern • A “Don’t Care” character (*) in pattern • Flexibility in matching • * indicates character(s) of the text that are irrelevant to the matching process**General “Don’t Care” Character’s (*) Characteristics**• Single character of text • Multiple consecutive text characters • No characters • Combination of above three Example: • Pattern AB*CD could match ABBCD, ABBBBBCD, or ABCD (* is null)**String Matching using ASC**• Three parallel algorithms using associative computing (using 1-D mesh) • String matching for exact match • String matching with fixed length “don’t care” • I.e., exactly 1 character • String matching with variable length “don’t care” • a “don’t care” can have any length or be null**ASC Exact Match Algorithm**for (j = patt_length - 1; j >= 0; j--) { Responders are text[$] == patt_string[j] and counter[$] == patt_counter; Responders add 1 to counter[$] and store result in counter[$] of preceding cell; patt_counter++; } /* When pattern has been processed */ Responders are counter[$] == patt_length; Responders set match[$] = 1 in next cell;**Text[$]**Match[$] Counter[$] Pattern: BBA Text: ABBBABBBABA m=pattern length n=text length j = pattern index i = text index Pattern: BBA patt_ counter patt_length**Text[$] Match[$] Counter[$]**Final State of Exact Match Algorithm Pattern: BBA Text: ABBBABBBABA m = pattern length n = text length j = pattern index i = text index**Algorithm for unit length "don't cares" using ASC**for (j = patt_length - 1; j >= 0; j--) { if (pattern[j] == '*') Responders are counter[$] == patt_counter; else // pattern[j] is not the “don’t care” character Responders are text[$] == pattern[j] and counter[$] == patt_counter; If no Responders are detected, exit; Responders add 1 to counter[$] and store result in counter[$] of preceding cell; patt_counter++; } /* When pattern has been processed */ Responders are counter[$] == patt_length; Responders set match[$] = 1 in next cell;**ASC Exact Match Algorithm (again)**for (j = patt_length - 1; j >= 0; j--) { Responders are text[$] == patt_string[j] and counter[$] == patt_counter; Responders add 1 to counter[$] and store result in counter[$] of preceding cell; patt_counter++; } /* When pattern has been processed */ Responders are counter[$] == patt_length; Responders set match[$] = 1 in next cell;**Text[$]**Match[$] Counter[$] Pattern: BBA Text: ABBBABBBABA m=pattern length n=text length j = pattern index i = text index Pattern: B*A patt_ counter patt_length**Text[$] Match[$] Counter[$]**Final State of Exact Match Algorithm Pattern: B*A Text: ABBBABBBABA m = pattern length n = text length j = pattern index i = text index**VLDC Algorithm (added)**• Works on each “segment” of the pattern broken up by the * character • AB*BB*A has three sections • Consecutive ** characters not necessary, not allowed • This VLDC algorithm unique • Provides information to find all continuation points of all matches following each “*”**VLDC ALGORITHM USING ASC**int patt_length = m; int maxcell = n + 2; /* Special handling for ‘*’ at end of pattern */ if (pattern[m-1] == ‘*’) { Responders are cell index > 1; Responders set segment$[0] = 1; patt_counter = 1; k = 1; /* Reset initial segment index */ } while ((patt_length -= patt_counter) > 0 && maxcell > 0) { patt_counter = 0; for ( I = patt_length - 1; I>= 0 && pattern[I] != ‘*’; I--) { Responders are text$ == pattern[I] and counter$ == patt_counter and cell index < maxcell; Responders add 1 to counter$ and store result in counter$ of preceding cell; patt_counter++; } Responders are counter$ == patt_counter;**VLDC continued**Responders set segment$[k] = patt_counter in next cell; Responders are segment$[k] > 0; maxcell = maximum cell index value of Responders else if no Responders maxcell = 0; All cells become Responders and set counter$ = 0; patt_counter++; k++ } /* When pattern has been processed */ Responders are segment$[--k] > 0; Responders set match$ = 1; /* Special handling for ‘*’ at start of pattern */ if (pattern[0] == ‘*’) { Responders are cell index < maxcell and cell index > 1; Responders set match$ = 1; }**After third pattern segment in VLDC Algorithm**Pattern: AB*BB*A Text: ABBBABBBABA T$ M$ C$ S0$ S1$ S2$ Responder$ 1 2 Patt_counter 3 4 5 6 7 Maxcell 8 9 10 11 12**After second pattern segment in VLDC Algorithm**Pattern: AB*BB*A Text: ABBBABBBABA T$ M$ Counter$ S0$ S1$ S2$ Responder$ 1 2 Patt_counter 3 4 5 6 7 Maxcell 8 9 10 11 (Used to keep pattern segments in order, I.e. AB occurs before BB) 12**After first pattern segment in VLDC Algorithm**Pattern: AB*BB*A Text: ABBBABBBABA T$ M$ Counter$ S0$ S1$ S2$ Responder$ 1 2 Patt_counter 3 4 5 6 7 Maxcell 8 9 10 11 (Used to keep pattern segments in order, I.e. AB occurs before BB) 12**Final State in VLDC Algorithm**Pattern: AB*BB*A Text: ABBBABBBABA T$ M$ Counter$ S0$ S1$ S2$ Responder$ 1 2 Patt_counter 3 4 5 6 7 Maxcell 8 9 10 11 (Used to keep pattern segments in order, I.e. AB occurs before BB) 12**Finding All Continuation Points**• Match starts where M$ = 1 • Match to any pattern segment begins where S$[x] == segment length • i.e. where any S$[x] > 0 • Continuation of match in S$[x-1] whose cell/PE index is >= (S$[x] + segment size) of S$[x]’s cell/PE index**Using the Final State in VLDC Algorithm**Pattern: AB*BB*A Text: ABBBABBBABA S0$ S1$ S2$ T$ M$ C$ • Start with index 2, where there’s a match M$=1 • Work from S2$ down and left, count down 2 values and move into S1$, count down 2 values and move to S0$ • That produces: 246 ABBBA • Any index >= 4 in S1[$] whose value is >0 will also produce a correct match • 2710 ABBBABBBA • 2810 ABBBABBBA • Some of the additional matches are: • 2410 ABBBABBBA • 2412 ABBBABBBABA • 2812 ABBBABBBABA • 6810 ABBBA • 6812 ABBBABA 1 2 3 4 5 6 7 8 9 10 11 12**Existing Algorithms**• Sequential Algorithms • Naïve algorithm: O(mn) • Knuth, Morris, & Pratt, or Boyer-Moore: O(m+n) • Parallel Algorithms • A PRAM exact string matching: O(n) • On a reconfigurable mesh: O(1) on n(n-m+1) PEs • On a SIMD hypercube (limited to {0,1}): O(lg n) on n/lg n PEs • On a neural network: O(1) on nm PEs • ASC algorithms: O(m) time on O(n) PEs**Question to consider**• The “don’t care” character allows non-matching for an arbitrary length. This is discussed on slide 13. Instead, consider “*” to allow a non-match for two characters and make necessary changes in trace in Slide 15-16.

More Related