string::token::shell(3tcl) Text and string utilitiesstring::token::shell(3tcl)
______________________________________________________________________________
NAME
string::token::shell - Parsing of shell command line
SYNOPSIS
package require Tcl 8.5
package require string::token::shell ?1.2?
package require string::token ?1?
package require fileutil
::string token shell ?-indices? ?-partial? ?--? string
______________________________________________________________________________
DESCRIPTION
This package provides a command which parses a line of text using basic
sh-syntax into a list of words.
The complete set of procedures is described below.
::string token shell ?-indices? ?-partial? ?--? string
This command parses the input string under the assumption of it
following basic sh-syntax. The result of the command is a list
of words in the string. An error is thrown if the input does
not follow the allowed syntax. The behaviour can be modified by
specifying any of the two options -indices and -partial.
-- When specified option parsing stops at this point. This
option is needed if the input string may start with dash.
In other words, this is pretty much required if string is
user input.
-indices
When specified the output is not a list of words, but a
list of 4-tuples describing the words. Each tuple con-
tains the type of the word, its start- and end-indices in
the input, and the actual text of the word.
Note that the length of the word as given by the indices
can differ from the length of the word found in the last
element of the tuple. The indices describe the words ex-
tent in the input, including delimiters, intra-word quot-
ing, etc. whereas for the actual text of the word delim-
iters are stripped, intra-word quoting decoded, etc.
The possible token types are
PLAIN Plain word, not quoted.
D:QUOTED
Word is delimited by double-quotes.
S:QUOTED
Word is delimited by single-quotes.
D:QUOTED:PART
S:QUOTED:PART
Like the previous types, but the word has no clos-
ing quote, i.e. is incomplete. These token types
can occur if and only if the option -partial was
specified, and only for the last word of the re-
sult. If the option -partial was not specified
such incomplete words cause the command to thrown
an error instead.
-partial
When specified the parser will accept an incomplete
quoted word (i.e. without closing quote) at the end of
the line as valid instead of throwing an error.
The basic shell syntax accepted here are unquoted, single- and double-
quoted words, separated by whitespace. Leading and trailing whitespace
are possible too, and stripped. Shell variables in their various forms
are not recognized, nor are sub-shells. As for the recognized forms of
words, see below for the detailed specification.
single-quoted word
A single-quoted word begins with a single-quote charac-
ter, i.e. ' (ASCII 39) followed by zero or more unicode
characters not a single-quote, and then closed by a sin-
gle-quote.
The word must be followed by either the end of the
string, or whitespace. A word cannot directly follow the
word.
double-quoted word
A double-quoted word begins with a double-quote charac-
ter, i.e. " (ASCII 34) followed by zero or more unicode
characters not a double-quote, and then closed by a dou-
ble-quote.
Contrary to single-quoted words a double-quote can be em-
bedded into the word, by prefacing, i.e. escaping, i.e.
quoting it with a backslash character \ (ASCII 92). Simi-
larly a backslash character must be quoted with itself to
be inserted literally.
unquoted word
Unquoted words are not delimited by quotes and thus can-
not contain whitespace or single-quote characters. Dou-
ble-quote and backslash characters can be put into un-
quoted words, by quting them like for double-quoted
words.
whitespace
Whitespace is any unicode space character. This is
equivalent to string is space, or the regular expression
\\s.
Whitespace may occur before the first word, or after the
last word. Whitespace must occur between adjacent words.
BUGS, IDEAS, FEEDBACK
This document, and the package it describes, will undoubtedly contain
bugs and other problems. Please report such in the category textutil
of the Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please
also report any ideas for enhancements you may have for either package
and/or documentation.
When proposing code changes, please provide unified diffs, i.e the out-
put of diff -u.
Note further that attachments are strongly preferred over inlined
patches. Attachments can be made by going to the Edit form of the
ticket immediately after its creation, and then using the left-most
button in the secondary navigation bar.
KEYWORDS
bash, lexing, parsing, shell, string, tokenization
CATEGORY
Text processing
tcllib 1.2 string::token::shell(3tcl)