grammar::fa::op(3tcl)Finite automaton operations and usaggrammar::fa::op(3tcl)
______________________________________________________________________________
NAME
grammar::fa::op - Operations on finite automatons
SYNOPSIS
package require Tcl 8.4
package require snit
package require struct::list
package require struct::set
package require grammar::fa::op ?0.4.1?
::grammar::fa::op::constructor cmd
::grammar::fa::op::reverse fa
::grammar::fa::op::complete fa ?sink?
::grammar::fa::op::remove_eps fa
::grammar::fa::op::trim fa ?what?
::grammar::fa::op::determinize fa ?mapvar?
::grammar::fa::op::minimize fa ?mapvar?
::grammar::fa::op::complement fa
::grammar::fa::op::kleene fa
::grammar::fa::op::optional fa
::grammar::fa::op::union fa fb ?mapvar?
::grammar::fa::op::intersect fa fb ?mapvar?
::grammar::fa::op::difference fa fb ?mapvar?
::grammar::fa::op::concatenate fa fb ?mapvar?
::grammar::fa::op::fromRegex fa regex ?over?
::grammar::fa::op::toRegexp fa
::grammar::fa::op::toRegexp2 fa
::grammar::fa::op::toTclRegexp regexp symdict
::grammar::fa::op::simplifyRegexp regexp
______________________________________________________________________________
DESCRIPTION
This package provides a number of complex operations on finite automa-
tons (Short: FA), as provided by the package grammar::fa. The package
does not provide the ability to create and/or manipulate such FAs, nor
the ability to execute a FA for a stream of symbols. Use the packages
grammar::fa and grammar::fa::interpreter for that. Another package re-
lated to this is grammar::fa::compiler which turns a FA into an execu-
tor class which has the definition of the FA hardwired into it.
For more information about what a finite automaton is see section FI-
NITE AUTOMATONS in package grammar::fa.
API
The package exports the API described here. All commands modify their
first argument. I.e. whatever FA they compute is stored back into it.
Some of the operations will construct an automaton whose states are all
new, but related to the states in the source automaton(s). These opera-
tions take variable names as optional arguments where they will store
mappings which describe the relationship(s). The operations can be
loosely partitioned into structural and language operations. The latter
are defined in terms of the language the automaton(s) accept, whereas
the former are defined in terms of the structural properties of the in-
volved automaton(s). Some operations are both. Structure operations
::grammar::fa::op::constructor cmd
This command has to be called by the user of the package before
any other operations is performed, to establish a command which
can be used to construct a FA container object. If this is not
done several operations will fail as they are unable to con-
struct internal and transient containers to hold state and/or
partial results.
Any container class using this package for complex operations
should set its own class command as the constructor. See package
grammar::fa for an example.
::grammar::fa::op::reverse fa
Reverses the fa. This is done by reversing the direction of all
transitions and swapping the sets of start and final states. The
language of fa changes unpredictably.
::grammar::fa::op::complete fa ?sink?
Completes the fa complete, but nothing is done if the fa is al-
ready complete. This implies that only the first in a series of
multiple consecutive complete operations on fa will perform any-
thing. The remainder will be null operations.
The language of fa is unchanged by this operation.
This is done by adding a single new state, the sink, and transi-
tions from all other states to that sink for all symbols they
have no transitions for. The sink itself is made complete by
adding loop transitions for all symbols.
Note: When a FA has epsilon-transitions transitions over a sym-
bol for a state S can be indirect, i.e. not attached directly to
S, but to a state in the epsilon-closure of S. The symbols for
such indirect transitions count when computing completeness of a
state. In other words, these indirectly reached symbols are not
missing.
The argument sink provides the name for the new state and most
not be present in the fa if specified. If the name is not speci-
fied the command will name the state "sinkn", where n is set so
that there are no collisions with existing states.
Note that the sink state is not useful by definition. In other
words, while the FA becomes complete, it is also not useful in
the strict sense as it has a state from which no final state can
be reached.
::grammar::fa::op::remove_eps fa
Removes all epsilon-transitions from the fa in such a manner the
the language of fa is unchanged. However nothing is done if the
fa is already epsilon-free. This implies that only the first in
a series of multiple consecutive complete operations on fa will
perform anything. The remainder will be null operations.
Note: This operation may cause states to become unreachable or
not useful. These states are not removed by this operation. Use
::grammar::fa::op::trim for that instead.
::grammar::fa::op::trim fa ?what?
Removes unwanted baggage from fa. The legal values for what are
listed below. The command defaults to !reachable|!useful if no
specific argument was given.
!reachable
Removes all states which are not reachable from a start
state.
!useful
Removes all states which are unable to reach a final
state.
!reachable&!useful
!(reachable|useful)
Removes all states which are not reachable from a start
state and are unable to reach a final state.
!reachable|!useful
!(reachable&useful)
Removes all states which are not reachable from a start
state or are unable to reach a final state.
::grammar::fa::op::determinize fa ?mapvar?
Makes the fa deterministic without changing the language ac-
cepted by the fa. However nothing is done if the fa is already
deterministic. This implies that only the first in a series of
multiple consecutive complete operations on fa will perform any-
thing. The remainder will be null operations.
The command will store a dictionary describing the relationship
between the new states of the resulting dfa and the states of
the input nfa in mapvar, if it has been specified. Keys of the
dictionary are the handles for the states of the resulting dfa,
values are sets of states from the input nfa.
Note: An empty dictionary signals that the command was able to
make the fa deterministic without performing a full subset con-
struction, just by removing states and shuffling transitions
around (As part of making the FA epsilon-free).
Note: The algorithm fails to make the FA deterministic in the
technical sense if the FA has no start state(s), because deter-
minism requires the FA to have exactly one start states. In
that situation we make a best effort; and the missing start
state will be the only condition preventing the generated result
from being deterministic. It should also be noted that in this
case the possibilities for trimming states from the FA are also
severely reduced as we cannot declare states unreachable.
::grammar::fa::op::minimize fa ?mapvar?
Creates a FA which accepts the same language as fa, but has a
minimal number of states. Uses Brzozowski's method to accomplish
this.
The command will store a dictionary describing the relationship
between the new states of the resulting minimal fa and the
states of the input fa in mapvar, if it has been specified. Keys
of the dictionary are the handles for the states of the result-
ing minimal fa, values are sets of states from the input fa.
Note: An empty dictionary signals that the command was able to
minimize the fa without having to compute new states. This
should happen if and only if the input FA was already minimal.
Note: If the algorithm has no start or final states to work with
then the result might be technically minimal, but have a very
unexpected structure. It should also be noted that in this case
the possibilities for trimming states from the FA are also se-
verely reduced as we cannot declare states unreachable.
Language operations All operations in this section require that all in-
put FAs have at least one start and at least one final state. Otherwise
the language of the FAs will not be defined, making the operation
senseless (as it operates on the languages of the FAs in a defined man-
ner).
::grammar::fa::op::complement fa
Complements fa. This is possible if and only if fa is complete
and deterministic. The resulting FA accepts the complementary
language of fa. In other words, all inputs not accepted by the
input are accepted by the result, and vice versa.
The result will have all states and transitions of the input,
and different final states.
::grammar::fa::op::kleene fa
Applies Kleene's closure to fa. The resulting FA accepts all
strings S for which we can find a natural number n (0 inclusive)
and strings A1 ... An in the language of fa such that S is the
concatenation of A1 ... An. In other words, the language of the
result is the infinite union over finite length concatenations
over the language of fa.
The result will have all states and transitions of the input,
and new start and final states.
::grammar::fa::op::optional fa
Makes the fa optional. In other words it computes the FA which
accepts the language of fa and the empty the word (epsilon) as
well.
The result will have all states and transitions of the input,
and new start and final states.
::grammar::fa::op::union fa fb ?mapvar?
Combines the FAs fa and fb such that the resulting FA accepts
the union of the languages of the two FAs.
The result will have all states and transitions of the two input
FAs, and new start and final states. All states of fb which ex-
ist in fa as well will be renamed, and the mapvar will contain a
mapping from the old states of fb to the new ones, if present.
It should be noted that the result will be non-deterministic,
even if the inputs are deterministic.
::grammar::fa::op::intersect fa fb ?mapvar?
Combines the FAs fa and fb such that the resulting FA accepts
the intersection of the languages of the two FAs. In other
words, the result will accept a word if and only if the word is
accepted by both fa and fb. The result will be useful, but not
necessarily deterministic or minimal.
The command will store a dictionary describing the relationship
between the new states of the resulting fa and the pairs of
states of the input FAs in mapvar, if it has been specified.
Keys of the dictionary are the handles for the states of the re-
sulting fa, values are pairs of states from the input FAs. Pairs
are represented by lists. The first element in each pair will be
a state in fa, the second element will be drawn from fb.
::grammar::fa::op::difference fa fb ?mapvar?
Combines the FAs fa and fb such that the resulting FA accepts
the difference of the languages of the two FAs. In other words,
the result will accept a word if and only if the word is ac-
cepted by fa, but not by fb. This can also be expressed as the
intersection of fa with the complement of fb. The result will be
useful, but not necessarily deterministic or minimal.
The command will store a dictionary describing the relationship
between the new states of the resulting fa and the pairs of
states of the input FAs in mapvar, if it has been specified.
Keys of the dictionary are the handles for the states of the re-
sulting fa, values are pairs of states from the input FAs. Pairs
are represented by lists. The first element in each pair will be
a state in fa, the second element will be drawn from fb.
::grammar::fa::op::concatenate fa fb ?mapvar?
Combines the FAs fa and fb such that the resulting FA accepts
the cross-product of the languages of the two FAs. I.e. a word W
will be accepted by the result if there are two words A and B
accepted by fa, and fb resp. and W is the concatenation of A and
B.
The result FA will be non-deterministic.
::grammar::fa::op::fromRegex fa regex ?over?
Generates a non-deterministic FA which accepts the same language
as the regular expression regex. If the over is specified it is
treated as the set of symbols the regular expression and the au-
tomaton are defined over. The command will compute the set from
the "S" constructors in regex when over was not specified. This
set is important if and only if the complement operator "!" is
used in regex as the complementary language of an FA is quite
different for different sets of symbols.
The regular expression is represented by a nested list, which
forms a syntax tree. The following structures are legal:
{S x} Atomic regular expression. Everything else is constructed
from these. Accepts the Symbol "x".
{. A1 A2 ...}
Concatenation operator. Accepts the concatenation of the
regular expressions A1, A2, etc.
Note that this operator accepts zero or more arguments.
With zero arguments the represented language is epsilon,
the empty word.
{| A1 A2 ...}
Choice operator, also called "Alternative". Accepts all
input accepted by at least one of the regular expressions
A1, A2, etc. In other words, the union of A1, A2.
Note that this operator accepts zero or more arguments.
With zero arguments the represented language is the empty
language, the language without words.
{& A1 A2 ...}
Intersection operator, logical and. Accepts all input ac-
cepted which is accepted by all of the regular expres-
sions A1, A2, etc. In other words, the intersection of
A1, A2.
{? A} Optionality operator. Accepts the empty word and anything
from the regular expression A.
{* A} Kleene closure. Accepts the empty word and any finite
concatenation of words accepted by the regular expression
A.
{+ A} Positive Kleene closure. Accepts any finite concatenation
of words accepted by the regular expression A, but not
the empty word.
{! A} Complement operator. Accepts any word not accepted by the
regular expression A. Note that the complement depends on
the set of symbol the result should run over. See the
discussion of the argument over before.
::grammar::fa::op::toRegexp fa
This command generates and returns a regular expression which
accepts the same language as the finite automaton fa. The regu-
lar expression is in the format as described above, for ::gram-
mar::fa::op::fromRegex.
::grammar::fa::op::toRegexp2 fa
This command has the same functionality as ::gram-
mar::fa::op::toRegexp, but uses a different algorithm to sim-
plify the generated regular expressions.
::grammar::fa::op::toTclRegexp regexp symdict
This command generates and returns a regular expression in Tcl
syntax for the regular expression regexp, if that is possible.
regexp is in the same format as expected by ::gram-
mar::fa::op::fromRegex.
The command will fail and throw an error if regexp contains com-
plementation and intersection operations.
The argument symdict is a dictionary mapping symbol names to
pairs of syntactic type and Tcl-regexp. If a symbol occurring in
the regexp is not listed in this dictionary then single-charac-
ter symbols are considered to designate themselves whereas mul-
tiple-character symbols are considered to be a character class
name.
::grammar::fa::op::simplifyRegexp regexp
This command simplifies a regular expression by applying the
following algorithm first to the main expression and then recur-
sively to all sub-expressions:
[1] Convert the expression into a finite automaton.
[2] Minimize the automaton.
[3] Convert the automaton back to a regular expression.
[4] Choose the shorter of original expression and expression
from the previous step.
EXAMPLES
BUGS, IDEAS, FEEDBACK
This document, and the package it describes, will undoubtedly contain
bugs and other problems. Please report such in the category grammar_fa
of the Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please
also report any ideas for enhancements you may have for either package
and/or documentation.
When proposing code changes, please provide unified diffs, i.e the out-
put of diff -u.
Note further that attachments are strongly preferred over inlined
patches. Attachments can be made by going to the Edit form of the
ticket immediately after its creation, and then using the left-most
button in the secondary navigation bar.
KEYWORDS
automaton, finite automaton, grammar, parsing, regular expression, reg-
ular grammar, regular languages, state, transducer
CATEGORY
Grammars and finite automata
COPYRIGHT
Copyright (c) 2004-2008 Andreas Kupries <andreas_kupries@users.sourceforge.net>
tcllib 0.4 grammar::fa::op(3tcl)