next up previous contents index CD CD Algorithms
Next: Longest Common Substring Up: Set and String Problems Previous: Cryptography

Finite State Machine Minimization

  

  figure22578

Input description: A deterministic finite automaton M.

Problem description: The smallest deterministic finite automaton M' such that M' behaves identically to M.

Discussion: Problems associated with constructing and minimizing finite state machines arise repeatedly in software and hardware design applications. Finite state machines are best thought of as pattern recognizers, and minimum-size machines correspond to recognizers that require less time and space. Complicated control systems and compilers are often built using finite state machines to encode the current state and associated actions, as well as the set of possible transitions to other states. Minimizing the size of this machine minimizes its cost.         

Finite state machines are best thought of as edge-labeled directed graphs, where each vertex represents one of n states and each edge a transition from one state to the other on receipt of the alphabet symbol that labels the edge. The automaton above analyzes a given sequence of coin tosses, with the dark states signifying that an even number of heads have been observed. Automata can be represented using any graph data structure (see Section gif), or by using an tex2html_wrap_inline30998 transition matrix, where tex2html_wrap_inline31000 is the size of the alphabet.      

Finite state machines are often used to specify search patterns in the guise of regular expressions, which are patterns formed by and-ing, or-ing, and looping over smaller regular expressions. For example, the regular expression tex2html_wrap_inline31002 matches any string on (a,b,c) that begins and ends with an a (including a itself). The best way to test whether a string is described by a given regular expression (especially if many strings will be tested) is to construct the equivalent finite automaton and then simulate the machine on the string. See Section gif for alternative approaches to string matching.   

We consider three different problems on finite automata:

Implementations: FIRE Engine is a finite automaton toolkit, written in C++ by Bruce Watson.    It provides production-quality implementations of finite automata and regular expression algorithms. Several finite automaton minimization algorithms have been implemented, including Hopcroft's tex2html_wrap_inline31026 algorithm. Both deterministic and nondeterministic automata are supported. FIRE Engine has been used for compiler construction, hardware modeling, and computational biology applications. It is strictly a computing engine and does not provide a graphical user interface. FIRE Engine is available by anonymous ftp from ftp.win.tue.nl in the directory /pub/techreports/pi/watson.phd/. A greatly improved commercial version is available from www.RibbitSoft.com.

Grail is a C++ package for symbolic computation with finite automata and regular expressions, from Darrell Raymond and Derrick Wood. Grail enables one to convert between different machine representations and to minimize automata. It can handle machines with 100,000 states and dictionaries of 20,000 words. All code and documentation are accessible from the WWW site http://www.csd.uwo.ca/research/grail, as well as pointers to a variety of other automaton packages. Commercial use of Grail is not allowed without approval, although it is freely available to students and educators.  

An implementation in C of a regular-expression matching algorithm appears in [BR95]. The source code for this program is printed in the text and is available on disk for a modest fee. A bare bones implementation in C of a regular-expression pattern matching algorithm appears in [GBY91]. See Section gif.  

XTango (see Section gif) includes a simulation of a DFA. Many of the other animations (but not this one) are interesting and quite informative to watch.  

FLAP (Formal Languages and Automata Package) is a tool by Susan Rodger for drawing and simulating finite automata, pushdown automata, and Turing machines. Using FLAP, one can draw a graphical representation (transition diagram) of an automaton, edit it, and simulate the automaton on some input. FLAP was developed in C++ for X-Windows. See http://www.cs.duke.edu:80/ tex2html_wrap_inline31028 rodger/tools/tools.html.  

Notes: Aho [Aho90] provides a good survey on algorithms for pattern matching, and a particularly clear exposition for the case where the patterns are regular expressions. The technique for regular expression pattern matching with tex2html_wrap_inline31030 -moves is due to Thompson [Tho68]. Other expositions on finite automaton pattern matching include [AHU74].  

Hopcroft [Hop71] gave an optimal tex2html_wrap_inline31032 algorithm for minimizing the number of states in DFAs. The derivatives method of constructing a finite state machine from a regular expression is due to Brzozowski [Brz64] and has been expanded upon in [BS86]. Expositions on the derivatives method includes Conway []. Testing the equivalence of two nondeterministic finite state machines is PSPACE-complete [SM73].

Related Problems: Satisfiability (see page gif). string matching (see page gif).    


next up previous contents index CD CD Algorithms
Next: Longest Common Substring Up: Set and String Problems Previous: Cryptography

Algorithms
Mon Jun 2 23:33:50 EDT 1997