ms_transform(3erl) Erlang Module Definition ms_transform(3erl)
NAME
ms_transform - A parse transformation that translates fun syntax into
match
specifications.
DESCRIPTION
This module provides the parse transformation that makes calls to ets
and dbg:fun2ms/1 translate into literal match specifications. It also
provides the back end for the same functions when called from the Er-
lang shell.
The translation from funs to match specifications is accessed through
the two "pseudo functions" ets:fun2ms/1 and dbg:fun2ms/1.
As everyone trying to use ets:select/2 or dbg seems to end up reading
this manual page, this description is an introduction to the concept of
match specifications.
Read the whole manual page if it is the first time you are using the
transformations.
Match specifications are used more or less as filters. They resemble
usual Erlang matching in a list comprehension or in a fun used with
lists:foldl/3, and so on. However, the syntax of pure match specifica-
tions is awkward, as they are made up purely by Erlang terms, and the
language has no syntax to make the match specifications more readable.
As the execution and structure of the match specifications are like
that of a fun, it is more straightforward to write it using the famil-
iar fun syntax and to have that translated into a match specification
automatically. A real fun is clearly more powerful than the match spec-
ifications allow, but bearing the match specifications in mind, and
what they can do, it is still more convenient to write it all as a fun.
This module contains the code that translates the fun syntax into match
specification terms.
EXAMPLE 1
Using ets:select/2 and a match specification, one can filter out rows
of a table and construct a list of tuples containing relevant parts of
the data in these rows. One can use ets:foldl/3 instead, but the
ets:select/2 call is far more efficient. Without the translation pro-
vided by ms_transform, one must struggle with writing match specifica-
tions terms to accommodate this.
Consider a simple table of employees:
-record(emp, {empno, %Employee number as a string, the key
surname, %Surname of the employee
givenname, %Given name of employee
dept, %Department, one of {dev,sales,prod,adm}
empyear}). %Year the employee was employed
We create the table using:
ets:new(emp_tab, [{keypos,#emp.empno},named_table,ordered_set]).
We fill the table with randomly chosen data:
[{emp,"011103","Black","Alfred",sales,2000},
{emp,"041231","Doe","John",prod,2001},
{emp,"052341","Smith","John",dev,1997},
{emp,"076324","Smith","Ella",sales,1995},
{emp,"122334","Weston","Anna",prod,2002},
{emp,"535216","Chalker","Samuel",adm,1998},
{emp,"789789","Harrysson","Joe",adm,1996},
{emp,"963721","Scott","Juliana",dev,2003},
{emp,"989891","Brown","Gabriel",prod,1999}]
Assuming that we want the employee numbers of everyone in the sales de-
partment, there are several ways.
ets:match/2 can be used:
1> ets:match(emp_tab, {'_', '$1', '_', '_', sales, '_'}).
[["011103"],["076324"]]
ets:match/2 uses a simpler type of match specification, but it is still
unreadable, and one has little control over the returned result. It is
always a list of lists.
ets:foldl/3 or ets:foldr/3 can be used to avoid the nested lists:
ets:foldr(fun(#emp{empno = E, dept = sales},Acc) -> [E | Acc];
(_,Acc) -> Acc
end,
[],
emp_tab).
The result is ["011103","076324"]. The fun is straightforward, so the
only problem is that all the data from the table must be transferred
from the table to the calling process for filtering. That is ineffi-
cient compared to the ets:match/2 call where the filtering can be done
"inside" the emulator and only the result is transferred to the
process.
Consider a "pure" ets:select/2 call that does what ets:foldr does:
ets:select(emp_tab, [{#emp{empno = '$1', dept = sales, _='_'},[],['$1']}]).
Although the record syntax is used, it is still hard to read and even
harder to write. The first element of the tuple, #emp{empno = '$1',
dept = sales, _='_'}, tells what to match. Elements not matching this
are not returned, as in the ets:match/2 example. The second element,
the empty list, is a list of guard expressions, which we do not need.
The third element is the list of expressions constructing the return
value (in ETS this is almost always a list containing one single term).
In our case '$1' is bound to the employee number in the head (first el-
ement of the tuple), and hence the employee number is returned. The re-
sult is ["011103","076324"], as in the ets:foldr/3 example, but the re-
sult is retrieved much more efficiently in terms of execution speed and
memory consumption.
Using ets:fun2ms/1, we can combine the ease of use of the ets:foldr/3
and the efficiency of the pure ets:select/2 example:
-include_lib("stdlib/include/ms_transform.hrl").
ets:select(emp_tab, ets:fun2ms(
fun(#emp{empno = E, dept = sales}) ->
E
end)).
This example requires no special knowledge of match specifications to
understand. The head of the fun matches what you want to filter out and
the body returns what you want returned. As long as the fun can be kept
within the limits of the match specifications, there is no need to
transfer all table data to the process for filtering as in the
ets:foldr/3 example. It is easier to read than the ets:foldr/3 example,
as the select call in itself discards anything that does not match,
while the fun of the ets:foldr/3 call needs to handle both the elements
matching and the ones not matching.
In the ets:fun2ms/1 example above, it is needed to include ms_trans-
form.hrl in the source code, as this is what triggers the parse trans-
formation of the ets:fun2ms/1 call to a valid match specification. This
also implies that the transformation is done at compile time (except
when called from the shell) and therefore takes no resources in run-
time. That is, although you use the more intuitive fun syntax, it gets
as efficient in runtime as writing match specifications by hand.
EXAMPLE 2
Assume that we want to get all the employee numbers of employees hired
before year 2000. Using ets:match/2 is not an alternative here, as re-
lational operators cannot be expressed there. Once again, ets:foldr/3
can do it (slowly, but correct):
ets:foldr(fun(#emp{empno = E, empyear = Y},Acc) when Y < 2000 -> [E | Acc];
(_,Acc) -> Acc
end,
[],
emp_tab).
The result is ["052341","076324","535216","789789","989891"], as ex-
pected. The equivalent expression using a handwritten match specifica-
tion would look like this:
ets:select(emp_tab, [{#emp{empno = '$1', empyear = '$2', _='_'},
[{'<', '$2', 2000}],
['$1']}]).
This gives the same result. [{'<', '$2', 2000}] is in the guard part
and therefore discards anything that does not have an empyear (bound to
'$2' in the head) less than 2000, as the guard in the foldr/3 example.
We write it using ets:fun2ms/1:
-include_lib("stdlib/include/ms_transform.hrl").
ets:select(emp_tab, ets:fun2ms(
fun(#emp{empno = E, empyear = Y}) when Y < 2000 ->
E
end)).
EXAMPLE 3
Assume that we want the whole object matching instead of only one ele-
ment. One alternative is to assign a variable to every part of the
record and build it up once again in the body of the fun, but the fol-
lowing is easier:
ets:select(emp_tab, ets:fun2ms(
fun(Obj = #emp{empno = E, empyear = Y})
when Y < 2000 ->
Obj
end)).
As in ordinary Erlang matching, you can bind a variable to the whole
matched object using a "match inside the match", that is, a =. Unfortu-
nately in funs translated to match specifications, it is allowed only
at the "top-level", that is, matching the whole object arriving to be
matched into a separate variable. If you are used to writing match
specifications by hand, we mention that variable A is simply translated
into '$_'. Alternatively, pseudo function object/0 also returns the
whole matched object, see section Warnings and Restrictions.
EXAMPLE 4
This example concerns the body of the fun. Assume that all employee
numbers beginning with zero (0) must be changed to begin with one (1)
instead, and that we want to create the list [{<Old empno>,<New
empno>}]:
ets:select(emp_tab, ets:fun2ms(
fun(#emp{empno = [$0 | Rest] }) ->
{[$0|Rest],[$1|Rest]}
end)).
This query hits the feature of partially bound keys in table type or-
dered_set, so that not the whole table needs to be searched, only the
part containing keys beginning with 0 is looked into.
EXAMPLE 5
The fun can have many clauses. Assume that we want to do the following:
* If an employee started before 1997, return the tuple {inventory,
<employee number>}.
* If an employee started 1997 or later, but before 2001, return
{rookie, <employee number>}.
* For all other employees, return {newbie, <employee number>}, except
for those named Smith as they would be affronted by anything other
than the tag guru and that is also what is returned for their num-
bers: {guru, <employee number>}.
This is accomplished as follows:
ets:select(emp_tab, ets:fun2ms(
fun(#emp{empno = E, surname = "Smith" }) ->
{guru,E};
(#emp{empno = E, empyear = Y}) when Y < 1997 ->
{inventory, E};
(#emp{empno = E, empyear = Y}) when Y > 2001 ->
{newbie, E};
(#emp{empno = E, empyear = Y}) -> % 1997 -- 2001
{rookie, E}
end)).
The result is as follows:
[{rookie,"011103"},
{rookie,"041231"},
{guru,"052341"},
{guru,"076324"},
{newbie,"122334"},
{rookie,"535216"},
{inventory,"789789"},
{newbie,"963721"},
{rookie,"989891"}]
USEFUL BIFS
What more can you do? A simple answer is: see the documentation of
match specifications in ERTS User's Guide. However, the following is a
brief overview of the most useful "built-in functions" that you can use
when the fun is to be translated into a match specification by
ets:fun2ms/1. It is not possible to call other functions than those al-
lowed in match specifications. No "usual" Erlang code can be executed
by the fun that is translated by ets:fun2ms/1. The fun is limited ex-
actly to the power of the match specifications, which is unfortunate,
but the price one must pay for the execution speed of ets:select/2 com-
pared to ets:foldl/foldr.
The head of the fun is a head matching (or mismatching) one parameter,
one object of the table we select from. The object is always a single
variable (can be _) or a tuple, as ETS, Dets, and Mnesia tables include
that. The match specification returned by ets:fun2ms/1 can be used with
dets:select/2 and mnesia:select/2, and with ets:select/2. The use of =
in the head is allowed (and encouraged) at the top-level.
The guard section can contain any guard expression of Erlang. The fol-
lowing is a list of BIFs and expressions:
* Type tests: is_atom, is_float, is_integer, is_list, is_number,
is_pid, is_port, is_reference, is_tuple, is_binary, is_function,
is_record
* Boolean operators: not, and, or, andalso, orelse
* Relational operators: >, >=, <, =<, =:=, ==, =/=, /=
* Arithmetics: +, -, *, div, rem
* Bitwise operators: band, bor, bxor, bnot, bsl, bsr
* The guard BIFs: abs, element, hd, length, node, round, size, tl,
trunc, self
Contrary to the fact with "handwritten" match specifications, the
is_record guard works as in ordinary Erlang code.
Semicolons (;) in guards are allowed, the result is (as expected) one
"match specification clause" for each semicolon-separated part of the
guard. The semantics is identical to the Erlang semantics.
The body of the fun is used to construct the resulting value. When se-
lecting from tables, one usually construct a suiting term here, using
ordinary Erlang term construction, like tuple parentheses, list brack-
ets, and variables matched out in the head, possibly with the occa-
sional constant. Whatever expressions are allowed in guards are also
allowed here, but no special functions exist except object and bindings
(see further down), which returns the whole matched object and all
known variable bindings, respectively.
The dbg variants of match specifications have an imperative approach to
the match specification body, the ETS dialect has not. The fun body for
ets:fun2ms/1 returns the result without side effects. As matching (=)
in the body of the match specifications is not allowed (for performance
reasons) the only thing left, more or less, is term construction.
EXAMPLE WITH DBG
This section describes the slightly different match specifications
translated by dbg:fun2ms/1.
The same reasons for using the parse transformation apply to dbg, maybe
even more, as filtering using Erlang code is not a good idea when trac-
ing (except afterwards, if you trace to file). The concept is similar
to that of ets:fun2ms/1 except that you usually use it directly from
the shell (which can also be done with ets:fun2ms/1).
The following is an example module to trace on:
-module(toy).
-export([start/1, store/2, retrieve/1]).
start(Args) ->
toy_table = ets:new(toy_table, Args).
store(Key, Value) ->
ets:insert(toy_table, {Key,Value}).
retrieve(Key) ->
[{Key, Value}] = ets:lookup(toy_table, Key),
Value.
During model testing, the first test results in {badmatch,16} in
{toy,start,1}, why?
We suspect the ets:new/2 call, as we match hard on the return value,
but want only the particular new/2 call with toy_table as first parame-
ter. So we start a default tracer on the node:
1> dbg:tracer().
{ok,<0.88.0>}
We turn on call tracing for all processes, we want to make a pretty re-
strictive trace pattern, so there is no need to call trace only a few
processes (usually it is not):
2> dbg:p(all,call).
{ok,[{matched,nonode@nohost,25}]}
We specify the filter, we want to view calls that resemble
ets:new(toy_table, <something>):
3> dbg:tp(ets,new,dbg:fun2ms(fun([toy_table,_]) -> true end)).
{ok,[{matched,nonode@nohost,1},{saved,1}]}
As can be seen, the fun used with dbg:fun2ms/1 takes a single list as
parameter instead of a single tuple. The list matches a list of the pa-
rameters to the traced function. A single variable can also be used.
The body of the fun expresses, in a more imperative way, actions to be
taken if the fun head (and the guards) matches. true is returned here,
only because the body of a fun cannot be empty. The return value is
discarded.
The following trace output is received during test:
(<0.86.0>) call ets:new(toy_table, [ordered_set])
Assume that we have not found the problem yet, and want to see what
ets:new/2 returns. We use a slightly different trace pattern:
4> dbg:tp(ets,new,dbg:fun2ms(fun([toy_table,_]) -> return_trace() end)).
The following trace output is received during test:
(<0.86.0>) call ets:new(toy_table,[ordered_set])
(<0.86.0>) returned from ets:new/2 -> 24
The call to return_trace results in a trace message when the function
returns. It applies only to the specific function call triggering the
match specification (and matching the head/guards of the match specifi-
cation). This is by far the most common call in the body of a dbg match
specification.
The test now fails with {badmatch,24} because the atom toy_table does
not match the number returned for an unnamed table. So, the problem is
found, the table is to be named, and the arguments supplied by the test
program do not include named_table. We rewrite the start function:
start(Args) ->
toy_table = ets:new(toy_table, [named_table|Args]).
With the same tracing turned on, the following trace output is re-
ceived:
(<0.86.0>) call ets:new(toy_table,[named_table,ordered_set])
(<0.86.0>) returned from ets:new/2 -> toy_table
Assume that the module now passes all testing and goes into the system.
After a while, it is found that table toy_table grows while the system
is running and that there are many elements with atoms as keys. We ex-
pected only integer keys and so does the rest of the system, but
clearly not the entire system. We turn on call tracing and try to see
calls to the module with an atom as the key:
1> dbg:tracer().
{ok,<0.88.0>}
2> dbg:p(all,call).
{ok,[{matched,nonode@nohost,25}]}
3> dbg:tpl(toy,store,dbg:fun2ms(fun([A,_]) when is_atom(A) -> true end)).
{ok,[{matched,nonode@nohost,1},{saved,1}]}
We use dbg:tpl/3 to ensure to catch local calls (assume that the module
has grown since the smaller version and we are unsure if this inserting
of atoms is not done locally). When in doubt, always use local call
tracing.
Assume that nothing happens when tracing in this way. The function is
never called with these parameters. We conclude that someone else (some
other module) is doing it and realize that we must trace on ets:in-
sert/2 and want to see the calling function. The calling function can
be retrieved using the match specification function caller. To get it
into the trace message, the match specification function message must
be used. The filter call looks like this (looking for calls to ets:in-
sert/2):
4> dbg:tpl(ets,insert,dbg:fun2ms(fun([toy_table,{A,_}]) when is_atom(A) ->
message(caller())
end)).
{ok,[{matched,nonode@nohost,1},{saved,2}]}
The caller is now displayed in the "additional message" part of the
trace output, and the following is displayed after a while:
(<0.86.0>) call ets:insert(toy_table,{garbage,can}) ({evil_mod,evil_fun,2})
You have realized that function evil_fun of the evil_mod module, with
arity 2, is causing all this trouble.
This example illustrates the most used calls in match specifications
for dbg. The other, more esoteric, calls are listed and explained in
Match specifications in Erlang in ERTS User's Guide, as they are beyond
the scope of this description.
WARNINGS AND RESTRICTIONS
The following warnings and restrictions apply to the funs used in with
ets:fun2ms/1 and dbg:fun2ms/1.
Warning:
To use the pseudo functions triggering the translation, ensure to in-
clude the header file ms_transform.hrl in the source code. Failure to
do so possibly results in runtime errors rather than compile time, as
the expression can be valid as a plain Erlang program without transla-
tion.
Warning:
The fun must be literally constructed inside the parameter list to the
pseudo functions. The fun cannot be bound to a variable first and then
passed to ets:fun2ms/1 or dbg:fun2ms/1. For example, ets:fun2ms(fun(A)
-> A end) works, but not F = fun(A) -> A end, ets:fun2ms(F). The latter
results in a compile-time error if the header is included, otherwise a
runtime error.
Many restrictions apply to the fun that is translated into a match
specification. To put it simple: you cannot use anything in the fun
that you cannot use in a match specification. This means that, among
others, the following restrictions apply to the fun itself:
* Functions written in Erlang cannot be called, neither can local
functions, global functions, or real funs.
* Everything that is written as a function call is translated into a
match specification call to a built-in function, so that the call
is_list(X) is translated to {'is_list', '$1'} ('$1' is only an ex-
ample, the numbering can vary). If one tries to call a function
that is not a match specification built-in, it causes an error.
* Variables occurring in the head of the fun are replaced by match
specification variables in the order of occurrence, so that frag-
ment fun({A,B,C}) is replaced by {'$1', '$2', '$3'}, and so on. Ev-
ery occurrence of such a variable in the match specification is re-
placed by a match specification variable in the same way, so that
the fun fun({A,B}) when is_atom(A) -> B end is translated into
[{{'$1','$2'},[{is_atom,'$1'}],['$2']}].
* Variables that are not included in the head are imported from the
environment and made into match specification const expressions.
Example from the shell:
1> X = 25.
25
2> ets:fun2ms(fun({A,B}) when A > X -> B end).
[{{'$1','$2'},[{'>','$1',{const,25}}],['$2']}]
* Matching with = cannot be used in the body. It can only be used on
the top-level in the head of the fun. Example from the shell again:
1> ets:fun2ms(fun({A,[B|C]} = D) when A > B -> D end).
[{{'$1',['$2'|'$3']},[{'>','$1','$2'}],['$_']}]
2> ets:fun2ms(fun({A,[B|C]=D}) when A > B -> D end).
Error: fun with head matching ('=' in head) cannot be translated into
match_spec
{error,transform_error}
3> ets:fun2ms(fun({A,[B|C]}) when A > B -> D = [B|C], D end).
Error: fun with body matching ('=' in body) is illegal as match_spec
{error,transform_error}
All variables are bound in the head of a match specification, so
the translator cannot allow multiple bindings. The special case
when matching is done on the top-level makes the variable bind to
'$_' in the resulting match specification. It is to allow a more
natural access to the whole matched object. Pseudo function ob-
ject() can be used instead, see below.
The following expressions are translated equally:
ets:fun2ms(fun({a,_} = A) -> A end).
ets:fun2ms(fun({a,_}) -> object() end).
* The special match specification variables '$_' and '$*' can be ac-
cessed through the pseudo functions object() (for '$_') and bind-
ings() (for '$*'). As an example, one can translate the following
ets:match_object/2 call to a ets:select/2 call:
ets:match_object(Table, {'$1',test,'$2'}).
This is the same as:
ets:select(Table, ets:fun2ms(fun({A,test,B}) -> object() end)).
In this simple case, the former expression is probably preferable
in terms of readability.
The ets:select/2 call conceptually looks like this in the resulting
code:
ets:select(Table, [{{'$1',test,'$2'},[],['$_']}]).
Matching on the top-level of the fun head can be a more natural way
to access '$_', see above.
* Term constructions/literals are translated as much as is needed to
get them into valid match specification. This way tuples are made
into match specification tuple constructions (a one element tuple
containing the tuple) and constant expressions are used when im-
porting variables from the environment. Records are also translated
into plain tuple constructions, calls to element, and so on. The
guard test is_record/2 is translated into match specification code
using the three parameter version that is built into match specifi-
cation, so that is_record(A,t) is translated into
{is_record,'$1',t,5} if the record size of record type t is 5.
* Language constructions such as case, if, and catch that are not
present in match specifications are not allowed.
* If header file ms_transform.hrl is not included, the fun is not
translated, which can result in a runtime error (depending on
whether the fun is valid in a pure Erlang context).
Ensure that the header is included when using ets and dbg:fun2ms/1
in compiled code.
* If pseudo function triggering the translation is ets:fun2ms/1, the
head of the fun must contain a single variable or a single tuple.
If the pseudo function is dbg:fun2ms/1, the head of the fun must
contain a single variable or a single list.
The translation from funs to match specifications is done at compile
time, so runtime performance is not affected by using these pseudo
functions.
For more information about match specifications, see the Match specifi-
cations in Erlang in ERTS User's Guide.
EXPORTS
format_error(Error) -> Chars
Types:
Error = {error, module(), term()}
Chars = io_lib:chars()
Takes an error code returned by one of the other functions in
the module and creates a textual description of the error.
parse_transform(Forms, Options) -> Forms2 | Errors | Warnings
Types:
Forms = Forms2 = [erl_parse:abstract_form() |
erl_parse:form_info()]
Options = term()
Option list, required but not used.
Errors = {error, ErrInfo :: [tuple()], WarnInfo :: []}
Warnings = {warning, Forms2, WarnInfo :: [tuple()]}
Implements the transformation at compile time. This function is
called by the compiler to do the source code transformation if
and when header file ms_transform.hrl is included in the source
code.
For information about how to use this parse transformation, see
ets and dbg:fun2ms/1.
For a description of match specifications, see section Match
Specification in Erlang in ERTS User's Guide.
transform_from_shell(Dialect, Clauses, BoundEnvironment) -> term()
Types:
Dialect = ets | dbg
Clauses = [erl_parse:abstract_clause()]
BoundEnvironment = erl_eval:binding_struct()
List of variable bindings in the shell environment.
Implements the transformation when the fun2ms/1 functions are
called from the shell. In this case, the abstract form is for
one single fun (parsed by the Erlang shell). All imported vari-
ables are to be in the key-value list passed as BoundEnviron-
ment. The result is a term, normalized, that is, not in abstract
format.
Ericsson AB stdlib 3.13 ms_transform(3erl)