pt_peg_introduction(3)


pt::pegrammar(3tcl)              Parser Tools              pt::pegrammar(3tcl)

______________________________________________________________________________

NAME
       pt::pegrammar - Introduction to Parsing Expression Grammars

SYNOPSIS
       package require Tcl  8.5

______________________________________________________________________________

DESCRIPTION
       Are  you  lost ?  Do you have trouble understanding this document ?  In
       that case please read the overview  provided  by  the  Introduction  to
       Parser  Tools.  This document is the entrypoint to the whole system the
       current package is a part of.

       Welcome to the introduction  to  Parsing  Expression  Grammars  (short:
       PEG),  the  formalism used by the Parser Tools.  It is assumed that the
       reader has a basic knowledge of parsing theory, i.e. Context-Free Gram-
       mars  (short:  CFG), languages, and associated terms like LL(k), LR(k),
       terminal and nonterminal symbols, etc.  We do not intend  to  recapitu-
       late   such   basic   definitions  or  terms  like  useful,  reachable,
       (left/right) recursive, nullable, first/last/follow sets, etc.   Please
       see  the References at the end instead if you are in need of places and
       books which provide such background information.

       PEGs are formally very similar to CFGs, with terminal  and  nonterminal
       symbols, start symbol, and rules defining the structure of each nonter-
       minal symbol.  The main difference lies in the choice(sic!)  of  choice
       operators. Where CFGs use an unordered choice to represent alternatives
       PEGs use prioritized choice. Which is fancy way of saying that a parser
       has  to  try the first alternative first and can try the other alterna-
       tives if only if it fails for the first, and so on.

       On the CFG side this gives rise to  LL(k)  and  LR(k)  for  making  the
       choice  deterministic  with  a bounded lookahead of k terminal symbols,
       where LL is in essence topdown aka recursive descent  parsing,  and  LR
       bottomup aka shift reduce parsing.

       On  the  PEG  side  we can parse input with recursive descent and back-
       tracking of failed choices, the latter of which  amounts  to  unlimited
       lookahead.  By additionally recording the success or failure of nonter-
       minals at the specific locations they were tried at  and  reusing  this
       information  after  backtracking we can avoid the exponential blowup of
       running time usually associated with backtracking and keep the  parsing
       linear. The memory requirements are of course higher due to this cache,
       as we are trading space for time.

       This is the basic concept behind packrat parsers.

       A limitation pure PEGs share with LL(k)  CFGs  is  that  left-recursive
       grammars cannot be parsed, with the associated recursive descent parser
       entering an infinite recursion.  This limitation is usually overcome by
       extending pure PEGs with explicit operators to specify repetition, zero
       or more, and one or more, or, formally spoken, for the  kleene  closure
       and positive kleene closure.  This is what the Parser Tools are doing.

       Another  extension,  specific  to  Parser  Tools, is a set of operators
       which map more or less directly to various character classes built into
       Tcl, i.e. the classes reachable via string is.

       The  remainder  of  this  document consists of the formal definition of
       PEGs for the mathematically inclined, and an  appendix  listing  refer-
       ences to places with more information on PEGs specifically, and parsing
       in general.

FORMAL DEFINITION
       For the mathematically inclined, a  Parsing  Expression  Grammar  is  a
       4-tuple (VN,VT,R,eS) where

       o      VN is a set of nonterminal symbols,

       o      VT is a set of terminal symbols,

       o      R  is  a finite set of rules, where each rule is a pair (A,e), A
              in VN, and e a parsing expression.

       o      eS is a parsing expression, the start expression.

       Further constraints are

       o      The intersection of VN and VT is empty.

       o      For all A in VT exists exactly one pair (A,e)  in  R.  In  other
              words,  R  is a function from nonterminal symbols to parsing ex-
              pressions.

       Parsing expressions are inductively defined via

       o      The empty string (epsilon) is a parsing expression.

       o      A terminal symbol a is a parsing expression.

       o      A nonterminal symbol A is a parsing expression.

       o      e1e2 is a parsing expression for parsing expressions e1  and  2.
              This is called sequence.

       o      e1/e2  is a parsing expression for parsing expressions e1 and 2.
              This is called ordered choice.

       o      e* is a parsing expression for parsing  expression  e.  This  is
              called zero-or-more repetitions, also known as kleene closure.

       o      e+  is  a  parsing  expression for parsing expression e. This is
              called one-or-more repetitions, also known  as  positive  kleene
              closure.

       o      !e  is  a  parsing expression for parsing expression e1. This is
              called a not lookahead predicate.

       o      &e is a parsing expression for parsing expression  e1.  This  is
              called an and lookahead predicate.

       PEGs  are used to define a grammatical structure for streams of symbols
       over VT. They are a modern phrasing of  older  formalisms  invented  by
       Alexander  Birham.  These  formalisms  were  called TS (TMG recognition
       scheme), and gTS (generalized TS). Later  they  were  renamed  to  TPDL
       (Top-Down Parsing Languages) and gTPDL (generalized TPDL).

       They  can be easily implemented by recursive descent parsers with back-
       tracking. This makes them relatives of LL(k) Context-Free Grammars.

REFERENCES
       [1]    The  Packrat  Parsing  and  Parsing  Expression  Grammars   Page
              [http://www.pdos.lcs.mit.edu/~baford/packrat/],  by  Bryan Ford,
              Massachusetts Institute of Technology. This is  the  main  entry
              page to PEGs, and their realization through Packrat Parsers.

       [2]    http://en.wikipedia.org/wiki/Parsing_expression_grammar
              Wikipedia's entry about Parsing Expression Grammars.

       [3]    Parsing      Techniques      -      A      Practical       Guide
              [http://www.cs.vu.nl/~dick/PTAPG.html],  an online book offering
              a clear, accessible, and thorough discussion of  many  different
              parsing  techniques  with  their interrelations and applicabili-
              ties, including error recovery techniques.

       [4]    Compilers and Compiler  Generators  [http://scifac.ru.ac.za/com-
              pilers/], an online book using CoCo/R, a generator for recursive
              descent parsers.

BUGS, IDEAS, FEEDBACK
       This document, and the package it describes, will  undoubtedly  contain
       bugs  and other problems.  Please report such in the category pt of the
       Tcllib Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please  also
       report  any  ideas  for  enhancements  you  may have for either package
       and/or documentation.

       When proposing code changes, please provide unified diffs, i.e the out-
       put of diff -u.

       Note  further  that  attachments  are  strongly  preferred over inlined
       patches. Attachments can be made by going  to  the  Edit  form  of  the
       ticket  immediately  after  its  creation, and then using the left-most
       button in the secondary navigation bar.

KEYWORDS
       EBNF, LL(k), PEG, TDPL, context-free  languages,  expression,  grammar,
       matching,  parser, parsing expression, parsing expression grammar, push
       down automaton, recursive descent, state, top-down  parsing  languages,
       transducer

CATEGORY
       Parsing and Grammars

COPYRIGHT
       Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>

tcllib                                 1                   pt::pegrammar(3tcl)
Man(1) output converted with man2html
list of all man pages