textutil(3tcl) Text and string utilities, macro processing textutil(3tcl)
______________________________________________________________________________
NAME
textutil - Procedures to manipulate texts and strings.
SYNOPSIS
package require Tcl 8.2
package require textutil ?0.8?
::textutil::adjust string args
::textutil::adjust::readPatterns filename
::textutil::adjust::listPredefined
::textutil::adjust::getPredefined filename
::textutil::indent string prefix ?skip?
::textutil::undent string
::textutil::splitn string ?len?
::textutil::splitx string ?regexp?
::textutil::tabify string ?num?
::textutil::tabify2 string ?num?
::textutil::trim string ?regexp?
::textutil::trimleft string ?regexp?
::textutil::trimright string ?regexp?
::textutil::trimPrefix string prefix
::textutil::trimEmptyHeading string
::textutil::untabify string ?num?
::textutil::untabify2 string ?num?
::textutil::strRepeat text num
::textutil::blank num
::textutil::chop string
::textutil::tail string
::textutil::cap string
::textutil::uncap string
::textutil::longestCommonPrefixList list
::textutil::longestCommonPrefix ?string...?
______________________________________________________________________________
DESCRIPTION
The package textutil provides commands that manipulate strings or texts
(a.k.a. long strings or string with embedded newlines or paragraphs).
It is actually a bundle providing the commands of the six packages
textutil::adjust
textutil::repeat
textutil::split
textutil::string
textutil::tabify
textutil::trim
in the namespace textutil.
The bundle is deprecated, and it will be removed in a future release of
Tcllib, after the next release. It is recommended to use the relevant
sub packages instead for whatever functionality is needed by the using
package or application.
The complete set of procedures is described below.
::textutil::adjust string args
Do a justification on the string according to args. The string
is taken as one big paragraph, ignoring any newlines. Then the
line is formatted according to the options used, and the command
return a new string with enough lines to contain all the print-
able chars in the input string. A line is a set of chars between
the beginning of the string and a newline, or between 2 new-
lines, or between a newline and the end of the string. If the
input string is small enough, the returned string won't contain
any newlines.
Together with ::textutil::indent it is possible to create prop-
erly wrapped paragraphs with arbitrary indentations.
By default, any occurrence of spaces characters or tabulation
are replaced by a single space so each word in a line is sepa-
rated from the next one by exactly one space char, and this
forms a real line. Each real line is placed in a logical line,
which have exactly a given length (see -length option below).
The real line may have a lesser length. Again by default, any
trailing spaces are ignored before returning the string (see
-full option below). The following options may be used after the
string parameter, and change the way the command place a real
line in a logical line.
-full boolean
If set to false, any trailing space chars are deleted be-
fore returning the string. If set to true, any trailing
space chars are left in the string. Default to false.
-hyphenate boolean
if set to false, no hyphenation will be done. If set to
true, the last word of a line is tried to be hyphenated.
Defaults to false. Note: hyphenation patterns must be
loaded prior, using the command ::textutil::adjust::read-
Patterns.
-justify center|left|plain|right
Set the justification of the returned string to center,
left, plain or right. By default, it is set to left. The
justification means that any line in the returned string
but the last one is build according to the value. If the
justification is set to plain and the number of printable
chars in the last line is less than 90% of the length of
a line (see -length), then this line is justified with
the left value, avoiding the expansion of this line when
it is too small. The meaning of each value is:
center The real line is centered in the logical line. If
needed, a set of space characters are added at the
beginning (half of the needed set) and at the end
(half of the needed set) of the line if required
(see the option -full).
left The real line is set on the left of the logical
line. It means that there are no space chars at
the beginning of this line. If required, all
needed space chars are added at the end of the
line (see the option -full).
plain The real line is exactly set in the logical line.
It means that there are no leading or trailing
space chars. All the needed space chars are added
in the real line, between 2 (or more) words.
right The real line is set on the right of the logical
line. It means that there are no space chars at
the end of this line, and there may be some space
chars at the beginning, despite of the -full op-
tion.
-length integer
Set the length of the logical line in the string to inte-
ger. integer must be a positive integer value. Defaults
to 72.
-strictlength boolean
If set to false, a line can exceed the specified -length
if a single word is longer than -length. If set to true,
words that are longer than -length are split so that no
line exceeds the specified -length. Defaults to false.
::textutil::adjust::readPatterns filename
Loads the internal storage for hyphenation patterns with the
contents of the file filename. This has to be done prior to
calling command ::textutil::adjust with "-hyphenate true", or
the hyphenation process will not work correctly.
The package comes with a number of predefined pattern files, and
the command ::textutil::adjust::listPredefined can be used to
find out their names.
::textutil::adjust::listPredefined
This command returns a list containing the names of the hyphena-
tion files coming with this package.
::textutil::adjust::getPredefined filename
Use this command to query the package for the full path name of
the hyphenation file filename coming with the package. Only the
filenames found in the list returned by ::textutil::ad-
just::listPredefined are legal arguments for this command.
::textutil::indent string prefix ?skip?
Each line in the string indented by adding the string prefix at
its beginning. The modified string is returned as the result of
the command.
If skip is specified the first skip lines are left untouched.
The default for skip is 0, causing the modification of all
lines. Negative values for skip are treated like 0. In other
words, skip > 0 creates a hanging indentation.
Together with ::textutil::adjust it is possible to create prop-
erly wrapped paragraphs with arbitrary indentations.
::textutil::undent string
The command computes the common prefix for all lines in string
consisting solely out of whitespace, removes this from each line
and returns the modified string.
Lines containing only whitespace are always reduced to com-
pletely empty lines. They and empty lines are also ignored when
computing the prefix to remove.
Together with ::textutil::adjust it is possible to create prop-
erly wrapped paragraphs with arbitrary indentations.
::textutil::splitn string ?len?
This command splits the given string into chunks of len charac-
ters and returns a list containing these chunks. The argument
len defaults to 1 if none is specified. A negative length is not
allowed and will cause the command to throw an error. Providing
an empty string as input is allowed, the command will then re-
turn an empty list. If the length of the string is not an entire
multiple of the chunk length, then the last chunk in the gener-
ated list will be shorter than len.
::textutil::splitx string ?regexp?
Split the string and return a list. The string is split accord-
ing to the regular expression regexp instead of a simple list of
chars. Note that if you add parenthesis into the regexp, the
parentheses part of separator would be added into list as addi-
tional element. If the string is empty the result is the empty
list, like for split. If regexp is empty the string is split at
every character, like split does. The regular expression regexp
defaults to "[\\t \\r\\n]+".
::textutil::tabify string ?num?
Tabify the string by replacing any substring of num space chars
by a tabulation and return the result as a new string. num de-
faults to 8.
::textutil::tabify2 string ?num?
Similar to ::textutil::tabify this command tabifies the string
and returns the result as a new string. A different algorithm is
used however. Instead of replacing any substring of num spaces
this command works more like an editor. num defaults to 8.
Each line of the text in string is treated as if there are tab-
stops every num columns. Only sequences of space characters con-
taining more than one space character and found immediately be-
fore a tabstop are replaced with tabs.
::textutil::trim string ?regexp?
Remove in string any leading and trailing substring according to
the regular expression regexp and return the result as a new
string. This apply on any line in the string, that is any sub-
string between 2 newline chars, or between the beginning of the
string and a newline, or between a newline and the end of the
string, or, if the string contain no newline, between the begin-
ning and the end of the string. The regular expression regexp
defaults to "[ \\t]+".
::textutil::trimleft string ?regexp?
Remove in string any leading substring according to the regular
expression regexp and return the result as a new string. This
apply on any line in the string, that is any substring between 2
newline chars, or between the beginning of the string and a new-
line, or between a newline and the end of the string, or, if the
string contain no newline, between the beginning and the end of
the string. The regular expression regexp defaults to "[
\\t]+".
::textutil::trimright string ?regexp?
Remove in string any trailing substring according to the regular
expression regexp and return the result as a new string. This
apply on any line in the string, that is any substring between 2
newline chars, or between the beginning of the string and a new-
line, or between a newline and the end of the string, or, if the
string contain no newline, between the beginning and the end of
the string. The regular expression regexp defaults to "[
\\t]+".
::textutil::trimPrefix string prefix
Removes the prefix from the beginning of string and returns the
result. The string is left unchanged if it doesn't have prefix
at its beginning.
::textutil::trimEmptyHeading string
Looks for empty lines (including lines consisting of only white-
space) at the beginning of the string and removes it. The modi-
fied string is returned as the result of the command.
::textutil::untabify string ?num?
Untabify the string by replacing any tabulation char by a sub-
string of num space chars and return the result as a new string.
num defaults to 8.
::textutil::untabify2 string ?num?
Untabify the string by replacing any tabulation char by a sub-
string of at most num space chars and return the result as a new
string. Unlike textutil::untabify each tab is not replaced by a
fixed number of space characters. The command overlays each
line in the string with tabstops every num columns instead and
replaces tabs with just enough space characters to reach the
next tabstop. This is the complement of the actions taken by
::textutil::tabify2. num defaults to 8.
There is one asymmetry though: A tab can be replaced with a sin-
gle space, but not the other way around.
::textutil::strRepeat text num
The implementation depends on the core executing the package.
Used string repeat if it is present, or a fast tcl implementa-
tion if it is not. Returns a string containing the text repeated
num times. The repetitions are joined without characters between
them. A value of num <= 0 causes the command to return an empty
string.
::textutil::blank num
A convenience command. Returns a string of num spaces.
::textutil::chop string
A convenience command. Removes the last character of string and
returns the shortened string.
::textutil::tail string
A convenience command. Removes the first character of string and
returns the shortened string.
::textutil::cap string
Capitalizes the first character of string and returns the modi-
fied string.
::textutil::uncap string
The complementary operation to ::textutil::cap. Forces the first
character of string to lower case and returns the modified
string.
::textutil::longestCommonPrefixList list
::textutil::longestCommonPrefix ?string...?
Computes the longest common prefix for either the strings given
to the command, or the strings specified in the single list, and
returns it as the result of the command.
If no strings were specified the result is the empty string. If
only one string was specified, the string itself is returned, as
it is its own longest common prefix.
BUGS, IDEAS, FEEDBACK
This document, and the package it describes, will undoubtedly contain
bugs and other problems. Please report such in the category textutil
of the Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please
also report any ideas for enhancements you may have for either package
and/or documentation.
When proposing code changes, please provide unified diffs, i.e the out-
put of diff -u.
Note further that attachments are strongly preferred over inlined
patches. Attachments can be made by going to the Edit form of the
ticket immediately after its creation, and then using the left-most
button in the secondary navigation bar.
SEE ALSO
regexp(3tcl), split(3tcl), string(3tcl)
KEYWORDS
TeX, formatting, hyphenation, indenting, paragraph, regular expression,
string, trimming
CATEGORY
Text processing
tcllib 0.8 textutil(3tcl)