docstrip(3tcl) Literate programming tool docstrip(3tcl)
______________________________________________________________________________
NAME
docstrip - Docstrip style source code extraction
SYNOPSIS
package require Tcl 8.4
package require docstrip ?1.2?
docstrip::extract text terminals ?option value ...?
docstrip::sourcefrom filename terminals ?option value ...?
______________________________________________________________________________
DESCRIPTION
Docstrip is a tool created to support a brand of Literate Programming.
It is most common in the (La)TeX community, where it is being used for
pretty much everything from the LaTeX core and up, but there is nothing
about docstrip which prevents using it for other types of software.
In short, the basic principle of literate programming is that program
source should primarily be written and structured to suit the develop-
ers (and advanced users who want to peek "under the hood"), not to suit
the whims of a compiler or corresponding source code consumer. This
means literate sources often need some kind of "translation" to an il-
literate form that dumb software can understand. The docstrip Tcl
package handles this translation.
Even for those who do not whole-hartedly subscribe to the philosophy
behind literate programming, docstrip can bring greater clarity to in
particular:
o programs employing non-obvious mathematics
o projects where separate pieces of code, perhaps in different
languages, need to be closely coordinated.
The first is by providing access to much more powerful typographical
features for source code comments than are possible in plain text. The
second is because all the separate pieces of code can be kept next to
each other in the same source file.
The way it works is that the programmer edits directly only one or sev-
eral "master" source code files, from which docstrip generates the more
traditional "source" files compilers or the like would expect. The mas-
ter sources typically contain a large amount of documentation of the
code, sometimes even in places where the code consumers would not allow
any comments. The etymology of "docstrip" is that this documentation
was stripped away (although "code extraction" might be a better de-
scription, as it has always been a matter of copying selected pieces of
the master source rather than deleting text from it). The docstrip Tcl
package contains a reimplementation of the basic extraction functional-
ity from the docstrip program, and thus makes it possible for a Tcl in-
terpreter to read and interpret the master source files directly.
Readers who are not previously familiar with docstrip but want to know
more about it may consult the following sources.
[1] The tclldoc package and class, http://ctan.org/tex-ar-
chive/macros/latex/contrib/tclldoc/.
[2] The DocStrip utility, http://ctan.org/tex-archive/macros/la-
tex/base/docstrip.dtx.
[3] The doc and shortvrb Packages, http://ctan.org/tex-ar-
chive/macros/latex/base/doc.dtx.
[4] Chapter 14 of The LaTeX Companion (second edition), Addison-Wes-
ley, 2004; ISBN 0-201-36299-6.
FILE FORMAT
The basic unit docstrip operates on are the lines of a master source
file. Extraction consists of selecting some of these lines to be copied
from input text to output text. The basic distinction is that between
code lines (which are copied and do not begin with a percent character)
and comment lines (which begin with a percent character and are not
copied).
docstrip::extract [join {
{% comment}
{% more comment !"#$%&/(}
{some command}
{ % blah $blah "Not a comment."}
{% abc; this is comment}
{# def; this is code}
{ghi}
{% jkl}
} \n] {}
returns the same sequence of lines as
join {
{some command}
{ % blah $blah "Not a comment."}
{# def; this is code}
{ghi} ""
} \n
It does not matter to docstrip what format is used for the documenta-
tion in the comment lines, but in order to do better than plain text
comments, one typically uses some markup language. Most commonly LaTeX
is used, as that is a very established standard and also provides the
best support for mathematical formulae, but the docstrip::util package
also gives some support for doctools-like markup.
Besides the basic code and comment lines, there are also guard lines,
which begin with the two characters '%<', and meta-comment lines, which
begin with the two characters '%%'. Within guard lines there is fur-
thermore the distinction between verbatim guard lines, which begin with
'%<<', and ordinary guard lines, where the '%<' is not followed by an-
other '<'. The last category is by far the most common.
Ordinary guard lines conditions extraction of the code line(s) they
guard by the value of a boolean expression; the guarded block of code
lines will only be included if the expression evaluates to true. The
syntax of an ordinary guard line is one of
'%' '<' STARSLASH EXPRESSION '>'
'%' '<' PLUSMINUS EXPRESSION '>' CODE
where
STARSLASH ::= '*' | '/'
PLUSMINUS ::= | '+' | '-'
EXPRESSION ::= SECONDARY | SECONDARY ',' EXPRESSION
| SECONDARY '|' EXPRESSION
SECONDARY ::= PRIMARY | PRIMARY '&' SECONDARY
PRIMARY ::= TERMINAL | '!' PRIMARY | '(' EXPRESSION ')'
CODE ::= { any character except end-of-line }
Comma and vertical bar both denote 'or'. Ampersand denotes 'and'. Ex-
clamation mark denotes 'not'. A TERMINAL can be any nonempty string of
characters not containing '>', '&', '|', comma, '(', or ')', although
the docstrip manual is a bit restrictive and only guarantees proper op-
eration for strings of letters (although even the LaTeX core sources
make heavy use also of digits in TERMINALs). The second argument of
docstrip::extract is the list of those TERMINALs that should count as
having the value 'true'; all other TERMINALs count as being 'false'
when guard expressions are evaluated.
In the case of a '%<*EXPRESSION>' guard, the lines guarded are all
lines up to the next '%</EXPRESSION>' guard with the same EXPRESSION
(compared as strings). The blocks of code delimited by such '*' and '/'
guard lines must be properly nested.
set text [join {
{begin}
{%<*foo>}
{1}
{%<*bar>}
{2}
{%</bar>}
{%<*!bar>}
{3}
{%</!bar>}
{4}
{%</foo>}
{5}
{%<*bar>}
{6}
{%</bar>}
{end}
} \n]
set res [docstrip::extract $text foo]
append res [docstrip::extract $text {foo bar}]
append res [docstrip::extract $text bar]
sets $res to the result of
join {
{begin}
{1}
{3}
{4}
{5}
{end}
{begin}
{1}
{2}
{4}
{5}
{6}
{end}
{begin}
{5}
{6}
{end} ""
} \n
In guard lines without a '*', '/', '+', or '-' modifier after the '%<',
the guard applies only to the CODE following the '>' on that single
line. A '+' modifier is equivalent to no modifier. A '-' modifier is
like the case with no modifier, but the expression is implicitly
negated, i.e., the CODE of a '%<-' guard line is only included if the
expression evaluates to false.
Metacomment lines are "comment lines which should not be stripped
away", but be extracted like code lines; these are sometimes used for
copyright notices and similar material. The '%%' prefix is however not
kept, but substituted by the current -metaprefix, which is customarily
set to some "comment until end of line" character (or character se-
quence) of the language of the code being extracted.
set text [join {
{begin}
{%<foo> foo}
{%<+foo>plusfoo}
{%<-foo>minusfoo}
{middle}
{%% some metacomment}
{%<*foo>}
{%%another metacomment}
{%</foo>}
{end}
} \n]
set res [docstrip::extract $text foo -metaprefix {# }]
append res [docstrip::extract $text bar -metaprefix {#}]
sets $res to the result of
join {
{begin}
{ foo}
{plusfoo}
{middle}
{# some metacomment}
{# another metacomment}
{end}
{begin}
{minusfoo}
{middle}
{# some metacomment}
{end} ""
} \n
Verbatim guards can be used to force code line interpretation of a
block of lines even if some of them happen to look like any other type
of lines to docstrip. A verbatim guard has the form '%<<END-TAG' and
the verbatim block is terminated by the first line that is exactly
'%END-TAG'.
set text [join {
{begin}
{%<*myblock>}
{some stupid()}
{ #computer<program>}
{%<<QQQ-98765}
{% These three lines are copied verbatim (including percents}
{%% even if -metaprefix is something different than %%).}
{%</myblock>}
{%QQQ-98765}
{ using*strange@programming<language>}
{%</myblock>}
{end}
} \n]
set res [docstrip::extract $text myblock -metaprefix {# }]
append res [docstrip::extract $text {}]
sets $res to the result of
join {
{begin}
{some stupid()}
{ #computer<program>}
{% These three lines are copied verbatim (including percents}
{%% even if -metaprefix is something different than %%).}
{%</myblock>}
{ using*strange@programming<language>}
{end}
{begin}
{end} ""
} \n
The processing of verbatim guards takes place also inside blocks of
lines which due to some outer block guard will not be copied.
The final piece of docstrip syntax is that extraction stops at a line
that is exactly "\endinput"; this is often used to avoid copying random
whitespace at the end of a file. In the unlikely case that one wants
such a code line, one can protect it with a verbatim guard.
COMMANDS
The package defines two commands.
docstrip::extract text terminals ?option value ...?
The extract command docstrips the text and returns the extracted
lines of code, as a string with each line terminated with a new-
line. The terminals is the list of those guard expression termi-
nals which should evaluate to true. The available options are:
-annotate lines
Requests the specified number of lines of annotation to
follow each extracted line in the result. Defaults to 0.
Annotation lines are mostly useful when the extracted
lines are to undergo some further transformation. A first
annotation line is a list of three elements: line type,
prefix removed in extraction, and prefix inserted in ex-
traction. The line type is one of: 'V' (verbatim), 'M'
(metacomment), '+' (+ or no modifier guard line), '-' (-
modifier guard line), '.' (normal line). A second annota-
tion line is the source line number. A third annotation
line is the current stack of block guards. Requesting
more than three lines of annotation is currently not sup-
ported.
-metaprefix string
The string by which the '%%' prefix of a metacomment line
will be replaced. Defaults to '%%'. For Tcl code this
would typically be '#'.
-onerror keyword
Controls what will be done when a format error in the
text being processed is detected. The settings are:
ignore Just ignore the error; continue as if nothing hap-
pened.
puts Write an error message to stderr, then continue
processing.
throw Throw an error. The -errorcode is set to a list
whose first element is DOCSTRIP, second element is
the type of error, and third element is the line
number where the error is detected. This is the
default.
-trimlines boolean
Controls whether spaces at the end of a line should be
trimmed away before the line is processed. Defaults to
true.
It should be remarked that the terminals are often called "op-
tions" in the context of the docstrip program, since these spec-
ify which optional code fragments should be included.
docstrip::sourcefrom filename terminals ?option value ...?
The sourcefrom command is a docstripping emulation of source. It
opens the file filename, reads it, closes it, docstrips the con-
tents as specified by the terminals, and evaluates the result in
the local context of the caller, during which time the info
script value will be the filename. The options are passed on to
fconfigure to configure the file before its contents are read.
The -metaprefix is set to '#', all other extract options have
their default values.
DOCUMENT STRUCTURE
The file format (as described above) determines whether a master source
code file can be processed correctly by docstrip, but the usefulness of
the format is to no little part also dependent on that the code and
comment lines together constitute a well-formed document.
For a document format that does not require any non-Tcl software, see
the ddt2man command in the docstrip::util package. It is suggested that
files employing that document format are given the suffix ".ddt", to
distinguish them from the more traditional LaTeX-based ".dtx" files.
Master source files with ".dtx" extension are usually set up so that
they can be typeset directly by latex without any support from other
files. This is achieved by beginning the file with the lines
% \iffalse
%<*driver>
\documentclass{tclldoc}
\begin{document}
\DocInput{filename.dtx}
\end{document}
%</driver>
% \fi
or some variation thereof. The trick is that the file gets read twice.
With normal LaTeX reading rules, the first two lines are comments and
therefore ignored. The third line is the document preamble, the fourth
line begins the document body, and the sixth line ends the document, so
LaTeX stops there -- non-comments below that point in the file are
never subjected to the normal LaTeX reading rules. Before that, how-
ever, the \DocInput command on the fifth line is processed, and that
does two things: it changes the interpretation of '%' from "comment" to
"ignored", and it inputs the file specified in the argument (which is
normally the name of the file the command is in). It is this second
time that the file is being read that the comments and code in it are
typeset.
The function of the \iffalse ... \fi is to skip lines two to seven on
this second time through; this is similar to the "if 0 { ... }" idiom
for block comments in Tcl code, and it is needed here because (amongst
other things) the \documentclass command may only be executed once. The
function of the <driver> guards is to prevent this short piece of LaTeX
code from being extracted by docstrip. The total effect is that the
file can function both as a LaTeX document and as a docstrip master
source code file.
It is not necessary to use the tclldoc document class, but that does
provide a number of features that are convenient for ".dtx" files con-
taining Tcl code. More information on this matter can be found in the
references above.
SEE ALSO
docstrip_util
KEYWORDS
\.dtx, LaTeX, docstrip, documentation, literate programming, source
CATEGORY
Documentation tools
COPYRIGHT
Copyright (c) 2003-2010 Lars Hellstrom <Lars dot Hellstrom at residenset dot net>
tcllib 1.2 docstrip(3tcl)