Inferring Finite Automata from queries and counter-examples

Inferring Finite Automata from queries and counter-examples Eggert Jón Magnússon

Learning a language • Inferring finite automata is analogous to learning a language. • In fact, there is no way to distinguish between two automata that recognize the same language, without examining the state structure. • We focus on finding the minimum equivalent automata.

Requirements for learning • It has been shown that the only classes of languages that can be learned from positive data only are classes which include no infinite language. • The idea is proof by contradiction.Assume that we have a guessing algorithm that can build an automaton to recognize the finite language L from the series of strings w1...wn, members of L. • Build an infinite language L’ that simply consists of the strings w1...wn, plus at least one rule or string that is not a member of L. The infinite language can therefore always fool any guessing algorithm.

Teacher • Angluin introduced the concept of a minimally adequate teacher, that can answer the questions: • “is S a member of L” – yes/no • “Is given DFA, D, the answer” – yes / or a string from the symmetric difference of LD and L (either a string that is in L and not in LD or a string that is in LD and not in L). • With a given teacher, an algorithm exists that recognizes a regular set, and is  P.

Angluin’s Algorithm • Iteratively, the algorithm builds a DFA using membership queries, then presents the teacher with the DFA as a solution. • If the DFA is accepted, the algorithm is finished. Otherwise, the teacher responds with a counter-example, a string that the DFA presented would either accept or reject incorrectly. • The algorithm uses the counter-example to refine the DFA, going back to the first step.

Angluin’s Algorithm, details. • The algorithm uses two sets, S for states and E for experiments, and one observation table, T, where elements of (SSA) form rows, and elements of E form columns – the values of each cell is the outcome of a membership test for the concatenation of the row and column strings. • The set S is prefix-complete, the set E is suffix-complete. • Before making a guess, the observation table is required to be closed and consistent. • Closed means that there are no unique rows in the bottom part of the observation table, for elements in SA. • - if the observation table isn’t closed, we find a unique row in the bottom part of the observation table, and pull it’s corresponding element from SA into S • Consistent means that if two rows for elements s1, s2 in S in the table are the same, for all a in A, the rows for s1a and s2a are the same. • - if the table isn’t consistent, we find a suffix where this doesn’t hold, and add that to E.

Example Run • Let’s use an example DFA from Sipser (Example 1.68, p. 76 in International version). • The alphabet is A= {a,b}

Example, continued • S = E = {} • T initialized with • T is not closed – t(a) t() • Add “a” to S, extend T • T is now both closed and consistent.

First guess • The teacher rejects, and gives the counterexample “ba” – which is not accepted by the first guess. • We add “ba” and all it’s prefixes (“b”) to S. • S is now: {,“a”,”b”,”ba”} • Now, the table is no longer consistent – row(b) = row(ba), but row(bab)row(bb). • We add “b” to E

Second guess • The table is now consistent, and closed, so we make a guess. • Note that the unique row “bitmask values” translate directly to states.

Running time • Equivalence test uses EQDFA • Since, for each equivalence test, we add at least one state to the guess state machine, in the worst case, we make one guess for each state in the target machine. • In general, before each guess, we add only one string to either S or E. • The running time is O(m2n2 + mn3) – m is the longest counterexample produced, and n is the number of states in the target machine.

Further work • The requirement of a teacher is considered unfair by many and requiring too much knowledge of the automaton. • Estimation/exploration algorithm (EEA) is a genetic algorithm. • Creates many random state machines, and many random test strings • Compares the output of the random state machines with the output of the target machine • Iteratively refines, alternatively, the random state machines and test strings, either until convergence or until some desirable behaviour is displayed. • Verification is done with a new set of test strings.

References • Angluin, D., 1987. Learning Regular Sets from Queries and Counter-examples. • Gold, E. Mark, 1967. Language Identification in the Limit. • Bongard, J., Lipson, H., 2005. Active Coevolutionary Learning of Deterministic Finite Automata.

Inferring Finite Automata from queries and counter-examples

Inferring Finite Automata from queries and counter-examples

Presentation Transcript

Finite Automata

Finite Automata

Finite Automata

Finite Automata

Finite Automata – Definition and Examples

Finite Automata

Finite Automata

Finite Automata

Finite Automata

Finite Automata

Finite Automata

Finite Automata

Finite Automata

Finite Automata

Finite Automata

Finite Automata

Examples for Finite Automata

Finite Automata

Finite Automata

Finite Automata

Finite Automata