binary(3erl) Erlang Module Definition binary(3erl)
NAME
binary - Library for handling binary data.
DESCRIPTION
This module contains functions for manipulating byte-oriented binaries.
Although the majority of functions could be provided using bit-syntax,
the functions in this library are highly optimized and are expected to
either execute faster or consume less memory, or both, than a counter-
part written in pure Erlang.
The module is provided according to Erlang Enhancement Proposal (EEP)
31.
Note:
The library handles byte-oriented data. For bitstrings that are not bi-
naries (does not contain whole octets of bits) a badarg exception is
thrown from any of the functions in this module.
DATA TYPES
cp()
Opaque data type representing a compiled search pattern. Guaran-
teed to be a tuple() to allow programs to distinguish it from
non-precompiled search patterns.
part() = {Start :: integer() >= 0, Length :: integer()}
A representaion of a part (or range) in a binary. Start is a
zero-based offset into a binary() and Length is the length of
that part. As input to functions in this module, a reverse part
specification is allowed, constructed with a negative Length, so
that the part of the binary begins at Start + Length and is
-Length long. This is useful for referencing the last N bytes of
a binary as {size(Binary), -N}. The functions in this module al-
ways return part()s with positive Length.
EXPORTS
at(Subject, Pos) -> byte()
Types:
Subject = binary()
Pos = integer() >= 0
Returns the byte at position Pos (zero-based) in binary Subject
as an integer. If Pos >= byte_size(Subject), a badarg exception
is raised.
bin_to_list(Subject) -> [byte()]
Types:
Subject = binary()
Same as bin_to_list(Subject, {0,byte_size(Subject)}).
bin_to_list(Subject, PosLen) -> [byte()]
Types:
Subject = binary()
PosLen = part()
Converts Subject to a list of byte()s, each representing the
value of one byte. part() denotes which part of the binary() to
convert.
Example:
1> binary:bin_to_list(<<"erlang">>, {1,3}).
"rla"
%% or [114,108,97] in list notation.
If PosLen in any way references outside the binary, a badarg ex-
ception is raised.
bin_to_list(Subject, Pos, Len) -> [byte()]
Types:
Subject = binary()
Pos = integer() >= 0
Len = integer()
Same as bin_to_list(Subject, {Pos, Len}).
compile_pattern(Pattern) -> cp()
Types:
Pattern = binary() | [binary()]
Builds an internal structure representing a compilation of a
search pattern, later to be used in functions match/3,
matches/3, split/3, or replace/4. The cp() returned is guaran-
teed to be a tuple() to allow programs to distinguish it from
non-precompiled search patterns.
When a list of binaries is specified, it denotes a set of alter-
native binaries to search for. For example, if [<<"func-
tional">>,<<"programming">>] is specified as Pattern, this means
either <<"functional">> or <<"programming">>". The pattern is a
set of alternatives; when only a single binary is specified, the
set has only one element. The order of alternatives in a pattern
is not significant.
The list of binaries used for search alternatives must be flat
and proper.
If Pattern is not a binary or a flat proper list of binaries
with length > 0, a badarg exception is raised.
copy(Subject) -> binary()
Types:
Subject = binary()
Same as copy(Subject, 1).
copy(Subject, N) -> binary()
Types:
Subject = binary()
N = integer() >= 0
Creates a binary with the content of Subject duplicated N times.
This function always creates a new binary, even if N = 1. By us-
ing copy/1 on a binary referencing a larger binary, one can free
up the larger binary for garbage collection.
Note:
By deliberately copying a single binary to avoid referencing a
larger binary, one can, instead of freeing up the larger binary
for later garbage collection, create much more binary data than
needed. Sharing binary data is usually good. Only in special
cases, when small parts reference large binaries and the large
binaries are no longer used in any process, deliberate copying
can be a good idea.
If N < 0, a badarg exception is raised.
decode_unsigned(Subject) -> Unsigned
Types:
Subject = binary()
Unsigned = integer() >= 0
Same as decode_unsigned(Subject, big).
decode_unsigned(Subject, Endianness) -> Unsigned
Types:
Subject = binary()
Endianness = big | little
Unsigned = integer() >= 0
Converts the binary digit representation, in big endian or lit-
tle endian, of a positive integer in Subject to an Erlang inte-
ger().
Example:
1> binary:decode_unsigned(<<169,138,199>>,big).
11111111
encode_unsigned(Unsigned) -> binary()
Types:
Unsigned = integer() >= 0
Same as encode_unsigned(Unsigned, big).
encode_unsigned(Unsigned, Endianness) -> binary()
Types:
Unsigned = integer() >= 0
Endianness = big | little
Converts a positive integer to the smallest possible representa-
tion in a binary digit representation, either big endian or lit-
tle endian.
Example:
1> binary:encode_unsigned(11111111, big).
<<169,138,199>>
first(Subject) -> byte()
Types:
Subject = binary()
Returns the first byte of binary Subject as an integer. If the
size of Subject is zero, a badarg exception is raised.
last(Subject) -> byte()
Types:
Subject = binary()
Returns the last byte of binary Subject as an integer. If the
size of Subject is zero, a badarg exception is raised.
list_to_bin(ByteList) -> binary()
Types:
ByteList = iolist()
Works exactly as erlang:list_to_binary/1, added for complete-
ness.
longest_common_prefix(Binaries) -> integer() >= 0
Types:
Binaries = [binary()]
Returns the length of the longest common prefix of the binaries
in list Binaries.
Example:
1> binary:longest_common_prefix([<<"erlang">>, <<"ergonomy">>]).
2
2> binary:longest_common_prefix([<<"erlang">>, <<"perl">>]).
0
If Binaries is not a flat list of binaries, a badarg exception
is raised.
longest_common_suffix(Binaries) -> integer() >= 0
Types:
Binaries = [binary()]
Returns the length of the longest common suffix of the binaries
in list Binaries.
Example:
1> binary:longest_common_suffix([<<"erlang">>, <<"fang">>]).
3
2> binary:longest_common_suffix([<<"erlang">>, <<"perl">>]).
0
If Binaries is not a flat list of binaries, a badarg exception
is raised.
match(Subject, Pattern) -> Found | nomatch
Types:
Subject = binary()
Pattern = binary() | [binary()] | cp()
Found = part()
Same as match(Subject, Pattern, []).
match(Subject, Pattern, Options) -> Found | nomatch
Types:
Subject = binary()
Pattern = binary() | [binary()] | cp()
Found = part()
Options = [Option]
Option = {scope, part()}
part() = {Start :: integer() >= 0, Length :: integer()}
Searches for the first occurrence of Pattern in Subject and re-
turns the position and length.
The function returns {Pos, Length} for the binary in Pattern,
starting at the lowest position in Subject.
Example:
1> binary:match(<<"abcde">>, [<<"bcde">>, <<"cd">>],[]).
{1,4}
Even though <<"cd">> ends before <<"bcde">>, <<"bcde">> begins
first and is therefore the first match. If two overlapping
matches begin at the same position, the longest is returned.
Summary of the options:
{scope, {Start, Length}}:
Only the specified part is searched. Return values still
have offsets from the beginning of Subject. A negative
Length is allowed as described in section Data Types in this
manual.
If none of the strings in Pattern is found, the atom nomatch is
returned.
For a description of Pattern, see function compile_pattern/1.
If {scope, {Start,Length}} is specified in the options such that
Start > size of Subject, Start + Length < 0 or Start + Length >
size of Subject, a badarg exception is raised.
matches(Subject, Pattern) -> Found
Types:
Subject = binary()
Pattern = binary() | [binary()] | cp()
Found = [part()]
Same as matches(Subject, Pattern, []).
matches(Subject, Pattern, Options) -> Found
Types:
Subject = binary()
Pattern = binary() | [binary()] | cp()
Found = [part()]
Options = [Option]
Option = {scope, part()}
part() = {Start :: integer() >= 0, Length :: integer()}
As match/2, but Subject is searched until exhausted and a list
of all non-overlapping parts matching Pattern is returned (in
order).
The first and longest match is preferred to a shorter, which is
illustrated by the following example:
1> binary:matches(<<"abcde">>,
[<<"bcde">>,<<"bc">>,<<"de">>],[]).
[{1,4}]
The result shows that <<"bcde">> is selected instead of the
shorter match <<"bc">> (which would have given raise to one more
match, <<"de">>). This corresponds to the behavior of POSIX reg-
ular expressions (and programs like awk), but is not consistent
with alternative matches in re (and Perl), where instead lexical
ordering in the search pattern selects which string matches.
If none of the strings in a pattern is found, an empty list is
returned.
For a description of Pattern, see compile_pattern/1. For a de-
scription of available options, see match/3.
If {scope, {Start,Length}} is specified in the options such that
Start > size of Subject, Start + Length < 0 or Start + Length is
> size of Subject, a badarg exception is raised.
part(Subject, PosLen) -> binary()
Types:
Subject = binary()
PosLen = part()
Extracts the part of binary Subject described by PosLen.
A negative length can be used to extract bytes at the end of a
binary:
1> Bin = <<1,2,3,4,5,6,7,8,9,10>>.
2> binary:part(Bin, {byte_size(Bin), -5}).
<<6,7,8,9,10>>
Note:
part/2 and part/3 are also available in the erlang module under
the names binary_part/2 and binary_part/3. Those BIFs are al-
lowed in guard tests.
If PosLen in any way references outside the binary, a badarg ex-
ception is raised.
part(Subject, Pos, Len) -> binary()
Types:
Subject = binary()
Pos = integer() >= 0
Len = integer()
Same as part(Subject, {Pos, Len}).
referenced_byte_size(Binary) -> integer() >= 0
Types:
Binary = binary()
If a binary references a larger binary (often described as being
a subbinary), it can be useful to get the size of the referenced
binary. This function can be used in a program to trigger the
use of copy/1. By copying a binary, one can dereference the
original, possibly large, binary that a smaller binary is a ref-
erence to.
Example:
store(Binary, GBSet) ->
NewBin =
case binary:referenced_byte_size(Binary) of
Large when Large > 2 * byte_size(Binary) ->
binary:copy(Binary);
_ ->
Binary
end,
gb_sets:insert(NewBin,GBSet).
In this example, we chose to copy the binary content before in-
serting it in gb_sets:set() if it references a binary more than
twice the data size we want to keep. Of course, different rules
apply when copying to different programs.
Binary sharing occurs whenever binaries are taken apart. This is
the fundamental reason why binaries are fast, decomposition can
always be done with O(1) complexity. In rare circumstances this
data sharing is however undesirable, why this function together
with copy/1 can be useful when optimizing for memory use.
Example of binary sharing:
1> A = binary:copy(<<1>>, 100).
<<1,1,1,1,1 ...
2> byte_size(A).
100
3> binary:referenced_byte_size(A).
100
4> <<B:10/binary, C:90/binary>> = A.
<<1,1,1,1,1 ...
5> {byte_size(B), binary:referenced_byte_size(B)}.
{10,10}
6> {byte_size(C), binary:referenced_byte_size(C)}.
{90,100}
In the above example, the small binary B was copied while the
larger binary C references binary A.
Note:
Binary data is shared among processes. If another process still
references the larger binary, copying the part this process uses
only consumes more memory and does not free up the larger binary
for garbage collection. Use this kind of intrusive functions
with extreme care and only if a real problem is detected.
replace(Subject, Pattern, Replacement) -> Result
Types:
Subject = binary()
Pattern = binary() | [binary()] | cp()
Replacement = Result = binary()
Same as replace(Subject, Pattern, Replacement,[]).
replace(Subject, Pattern, Replacement, Options) -> Result
Types:
Subject = binary()
Pattern = binary() | [binary()] | cp()
Replacement = binary()
Options = [Option]
Option = global | {scope, part()} | {insert_replaced, InsPos}
InsPos = OnePos | [OnePos]
OnePos = integer() >= 0
An integer() =< byte_size(Replacement)
Result = binary()
Constructs a new binary by replacing the parts in Subject match-
ing Pattern with the content of Replacement.
If the matching subpart of Subject giving raise to the replace-
ment is to be inserted in the result, option {insert_replaced,
InsPos} inserts the matching part into Replacement at the speci-
fied position (or positions) before inserting Replacement into
Subject.
Example:
1> binary:replace(<<"abcde">>,<<"b">>,<<"[]">>, [{insert_replaced,1}]).
<<"a[b]cde">>
2> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>,[global,{insert_replaced,1}]).
<<"a[b]c[d]e">>
3> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>,[global,{insert_replaced,[1,1]}]).
<<"a[bb]c[dd]e">>
4> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[-]">>,[global,{insert_replaced,[1,2]}]).
<<"a[b-b]c[d-d]e">>
If any position specified in InsPos > size of the replacement
binary, a badarg exception is raised.
Options global and {scope, part()} work as for split/3. The re-
turn type is always a binary().
For a description of Pattern, see compile_pattern/1.
split(Subject, Pattern) -> Parts
Types:
Subject = binary()
Pattern = binary() | [binary()] | cp()
Parts = [binary()]
Same as split(Subject, Pattern, []).
split(Subject, Pattern, Options) -> Parts
Types:
Subject = binary()
Pattern = binary() | [binary()] | cp()
Options = [Option]
Option = {scope, part()} | trim | global | trim_all
Parts = [binary()]
Splits Subject into a list of binaries based on Pattern. If op-
tion global is not specified, only the first occurrence of Pat-
tern in Subject gives rise to a split.
The parts of Pattern found in Subject are not included in the
result.
Example:
1> binary:split(<<1,255,4,0,0,0,2,3>>, [<<0,0,0>>,<<2>>],[]).
[<<1,255,4>>, <<2,3>>]
2> binary:split(<<0,1,0,0,4,255,255,9>>, [<<0,0>>, <<255,255>>],[global]).
[<<0,1>>,<<4>>,<<9>>]
Summary of options:
{scope, part()}:
Works as in match/3 and matches/3. Notice that this only de-
fines the scope of the search for matching strings, it does
not cut the binary before splitting. The bytes before and
after the scope are kept in the result. See the example be-
low.
trim:
Removes trailing empty parts of the result (as does trim in
re:split/3.
trim_all:
Removes all empty parts of the result.
global:
Repeats the split until Subject is exhausted. Conceptually
option global makes split work on the positions returned by
matches/3, while it normally works on the position returned
by match/3.
Example of the difference between a scope and taking the binary
apart before splitting:
1> binary:split(<<"banana">>, [<<"a">>],[{scope,{2,3}}]).
[<<"ban">>,<<"na">>]
2> binary:split(binary:part(<<"banana">>,{2,3}), [<<"a">>],[]).
[<<"n">>,<<"n">>]
The return type is always a list of binaries that are all refer-
encing Subject. This means that the data in Subject is not
copied to new binaries, and that Subject cannot be garbage col-
lected until the results of the split are no longer referenced.
For a description of Pattern, see compile_pattern/1.
Ericsson AB stdlib 3.13 binary(3erl)