uri_string(3erl) Erlang Module Definition uri_string(3erl)
NAME
uri_string - URI processing functions.
DESCRIPTION
This module contains functions for parsing and handling URIs (RFC 3986)
and form-urlencoded query strings (HTML 5.2).
Parsing and serializing non-UTF-8 form-urlencoded query strings are
also supported (HTML 5.0).
A URI is an identifier consisting of a sequence of characters matching
the syntax rule named URI in RFC 3986.
The generic URI syntax consists of a hierarchical sequence of compo-
nents referred to as the scheme, authority, path, query, and fragment:
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
hier-part = "//" authority path-abempty
/ path-absolute
/ path-rootless
/ path-empty
scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
authority = [ userinfo "@" ] host [ ":" port ]
userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
The interpretation of a URI depends only on the characters used and not
on how those characters are represented in a network protocol.
The functions implemented by this module cover the following use cases:
* Parsing URIs into its components and returing a map
parse/1
* Recomposing a map of URI components into a URI string
recompose/1
* Changing inbound binary and percent-encoding of URIs
transcode/2
* Transforming URIs into a normalized form
normalize/1
normalize/2
* Composing form-urlencoded query strings from a list of key-value
pairs
compose_query/1
compose_query/2
* Dissecting form-urlencoded query strings into a list of key-value
pairs
dissect_query/1
There are four different encodings present during the handling of URIs:
* Inbound binary encoding in binaries
* Inbound percent-encoding in lists and binaries
* Outbound binary encoding in binaries
* Outbound percent-encoding in lists and binaries
Functions with uri_string() argument accept lists, binaries and mixed
lists (lists with binary elements) as input type. All of the functions
but transcode/2 expects input as lists of unicode codepoints, UTF-8 en-
coded binaries and UTF-8 percent-encoded URI parts ("%C3%B6" corre-
sponds to the unicode character "o").
Unless otherwise specified the return value type and encoding are the
same as the input type and encoding. That is, binary input returns bi-
nary output, list input returns a list output but mixed input returns
list output.
In case of lists there is only percent-encoding. In binaries, however,
both binary encoding and percent-encoding shall be considered.
transcode/2 provides the means to convert between the supported encod-
ings, it takes a uri_string() and a list of options specifying inbound
and outbound encodings.
RFC 3986 does not mandate any specific character encoding and it is
usually defined by the protocol or surrounding text. This library takes
the same assumption, binary and percent-encoding are handled as one
configuration unit, they cannot be set to different values.
DATA TYPES
error() = {error, atom(), term()}
Error tuple indicating the type of error. Possible values of the
second component:
* invalid_character
* invalid_encoding
* invalid_input
* invalid_map
* invalid_percent_encoding
* invalid_scheme
* invalid_uri
* invalid_utf8
* missing_value
The third component is a term providing additional information
about the cause of the error.
uri_map() =
#{fragment => unicode:chardata(),
host => unicode:chardata(),
path => unicode:chardata(),
port => integer() >= 0 | undefined,
query => unicode:chardata(),
scheme => unicode:chardata(),
userinfo => unicode:chardata()} |
#{}
Map holding the main components of a URI.
uri_string() = iodata()
List of unicode codepoints, a UTF-8 encoded binary, or a mix of
the two, representing an RFC 3986 compliant URI (percent-encoded
form). A URI is a sequence of characters from a very limited
set: the letters of the basic Latin alphabet, digits, and a few
special characters.
EXPORTS
compose_query(QueryList) -> QueryString
Types:
QueryList = [{unicode:chardata(), unicode:chardata() | true}]
QueryString = uri_string() | error()
Composes a form-urlencoded QueryString based on a QueryList, a
list of non-percent-encoded key-value pairs. Form-urlencoding is
defined in section 4.10.21.6 of the HTML 5.2 specification and
in section 4.10.22.6 of the HTML 5.0 specification for non-UTF-8
encodings.
See also the opposite operation dissect_query/1.
Example:
1> uri_string:compose_query([{"foo bar","1"},{"city","orebro"}]).
"foo+bar=1&city=%C3%B6rebro"
2> uri_string:compose_query([{<<"foo bar">>,<<"1">>},
2> {<<"city">>,<<"orebro"/utf8>>}]).
<<"foo+bar=1&city=%C3%B6rebro">>
compose_query(QueryList, Options) -> QueryString
Types:
QueryList = [{unicode:chardata(), unicode:chardata() | true}]
Options = [{encoding, atom()}]
QueryString = uri_string() | error()
Same as compose_query/1 but with an additional Options parame-
ter, that controls the encoding ("charset") used by the encoding
algorithm. There are two supported encodings: utf8 (or unicode)
and latin1.
Each character in the entry's name and value that cannot be ex-
pressed using the selected character encoding, is replaced by a
string consisting of a U+0026 AMPERSAND character (&), a "#"
(U+0023) character, one or more ASCII digits representing the
Unicode code point of the character in base ten, and finally a
";" (U+003B) character.
Bytes that are out of the range 0x2A, 0x2D, 0x2E, 0x30 to 0x39,
0x41 to 0x5A, 0x5F, 0x61 to 0x7A, are percent-encoded (U+0025
PERCENT SIGN character (%) followed by uppercase ASCII hex dig-
its representing the hexadecimal value of the byte).
See also the opposite operation dissect_query/1.
Example:
1> uri_string:compose_query([{"foo bar","1"},{"city","orebro"}],
1> [{encoding, latin1}]).
"foo+bar=1&city=%F6rebro"
2> uri_string:compose_query([{<<"foo bar">>,<<"1">>},
2> {<<"city">>,<<""/utf8>>}], [{encoding, latin1}]).
<<"foo+bar=1&city=%26%2326481%3B%26%2320140%3B">>
dissect_query(QueryString) -> QueryList
Types:
QueryString = uri_string()
QueryList =
[{unicode:chardata(), unicode:chardata() | true}] | er-
ror()
Dissects an urlencoded QueryString and returns a QueryList, a
list of non-percent-encoded key-value pairs. Form-urlencoding is
defined in section 4.10.21.6 of the HTML 5.2 specification and
in section 4.10.22.6 of the HTML 5.0 specification for non-UTF-8
encodings.
See also the opposite operation compose_query/1.
Example:
1> uri_string:dissect_query("foo+bar=1&city=%C3%B6rebro").
[{"foo bar","1"},{"city","orebro"}]
2> uri_string:dissect_query(<<"foo+bar=1&city=%26%2326481%3B%26%2320140%3B">>).
[{<<"foo bar">>,<<"1">>},
{<<"city">>,<<230,157,177,228,186,172>>}]
normalize(URI) -> NormalizedURI
Types:
URI = uri_string() | uri_map()
NormalizedURI = uri_string() | error()
Transforms an URI into a normalized form using Syntax-Based Nor-
malization as defined by RFC 3986.
This function implements case normalization, percent-encoding
normalization, path segment normalization and scheme based nor-
malization for HTTP(S) with basic support for FTP, SSH, SFTP and
TFTP.
Example:
1> uri_string:normalize("/a/b/c/./../../g").
"/a/g"
2> uri_string:normalize(<<"mid/content=5/../6">>).
<<"mid/6">>
3> uri_string:normalize("http://localhost:80").
"https://localhost/"
4> uri_string:normalize(#{scheme => "http",port => 80,path => "/a/b/c/./../../g",
4> host => "localhost-orebro"}).
"http://localhost-%C3%B6rebro/a/g"
normalize(URI, Options) -> NormalizedURI
Types:
URI = uri_string() | uri_map()
Options = [return_map]
NormalizedURI = uri_string() | uri_map() | error()
Same as normalize/1 but with an additional Options parameter,
that controls whether the normalized URI shall be returned as an
uri_map(). There is one supported option: return_map.
Example:
1> uri_string:normalize("/a/b/c/./../../g", [return_map]).
#{path => "/a/g"}
2> uri_string:normalize(<<"mid/content=5/../6">>, [return_map]).
#{path => <<"mid/6">>}
3> uri_string:normalize("http://localhost:80", [return_map]).
#{scheme => "http",path => "/",host => "localhost"}
4> uri_string:normalize(#{scheme => "http",port => 80,path => "/a/b/c/./../../g",
4> host => "localhost-orebro"}, [return_map]).
#{scheme => "http",path => "/a/g",host => "localhost-orebro"}
parse(URIString) -> URIMap
Types:
URIString = uri_string()
URIMap = uri_map() | error()
Parses an RFC 3986 compliant uri_string() into a uri_map(), that
holds the parsed components of the URI. If parsing fails, an er-
ror tuple is returned.
See also the opposite operation recompose/1.
Example:
1> uri_string:parse("foo://user@example.com:8042/over/there?name=ferret#nose").
#{fragment => "nose",host => "example.com",
path => "/over/there",port => 8042,query => "name=ferret",
scheme => foo,userinfo => "user"}
2> uri_string:parse(<<"foo://user@example.com:8042/over/there?name=ferret">>).
#{host => <<"example.com">>,path => <<"/over/there">>,
port => 8042,query => <<"name=ferret">>,scheme => <<"foo">>,
userinfo => <<"user">>}
recompose(URIMap) -> URIString
Types:
URIMap = uri_map()
URIString = uri_string() | error()
Creates an RFC 3986 compliant URIString (percent-encoded), based
on the components of URIMap. If the URIMap is invalid, an error
tuple is returned.
See also the opposite operation parse/1.
Example:
1> URIMap = #{fragment => "nose", host => "example.com", path => "/over/there",
1> port => 8042, query => "name=ferret", scheme => "foo", userinfo => "user"}.
#{fragment => "nose",host => "example.com",
path => "/over/there",port => 8042,query => "name=ferret",
scheme => "foo",userinfo => "user"}
2> uri_string:recompose(URIMap).
"foo://example.com:8042/over/there?name=ferret#nose"
resolve(RefURI, BaseURI) -> TargetURI
Types:
RefURI = BaseURI = uri_string() | uri_map()
TargetURI = uri_string() | error()
Convert a RefURI reference that might be relative to a given
base URI into the parsed components of the reference's target,
which can then be recomposed to form the target URI.
Example:
1> uri_string:resolve("/abs/ol/ute", "http://localhost/a/b/c?q").
"http://localhost/abs/ol/ute"
2> uri_string:resolve("../relative", "http://localhost/a/b/c?q").
"http://localhost/a/relative"
3> uri_string:resolve("http://localhost/full", "http://localhost/a/b/c?q").
"http://localhost/full"
4> uri_string:resolve(#{path => "path", query => "xyz"}, "http://localhost/a/b/c?q").
"http://localhost/a/b/path?xyz"
resolve(RefURI, BaseURI, Options) -> TargetURI
Types:
RefURI = BaseURI = uri_string() | uri_map()
Options = [return_map]
TargetURI = uri_string() | uri_map() | error()
Same as resolve/2 but with an additional Options parameter, that
controls whether the target URI shall be returned as an
uri_map(). There is one supported option: return_map.
Example:
1> uri_string:resolve("/abs/ol/ute", "http://localhost/a/b/c?q", [return_map]).
#{host => "localhost",path => "/abs/ol/ute",scheme => "http"}
2> uri_string:resolve(#{path => "/abs/ol/ute"}, #{scheme => "http",
2> host => "localhost", path => "/a/b/c?q"}, [return_map]).
#{host => "localhost",path => "/abs/ol/ute",scheme => "http"}
transcode(URIString, Options) -> Result
Types:
URIString = uri_string()
Options =
[{in_encoding, unicode:encoding()} |
{out_encoding, unicode:encoding()}]
Result = uri_string() | error()
Transcodes an RFC 3986 compliant URIString, where Options is a
list of tagged tuples, specifying the inbound (in_encoding) and
outbound (out_encoding) encodings. in_encoding and out_encoding
specifies both binary encoding and percent-encoding for the in-
put and output data. Mixed encoding, where binary encoding is
not the same as percent-encoding, is not supported. If an argu-
ment is invalid, an error tuple is returned.
Example:
1> uri_string:transcode(<<"foo%00%00%00%F6bar"/utf32>>,
1> [{in_encoding, utf32},{out_encoding, utf8}]).
<<"foo%C3%B6bar"/utf8>>
2> uri_string:transcode("foo%F6bar", [{in_encoding, latin1},
2> {out_encoding, utf8}]).
"foo%C3%B6bar"
Ericsson AB stdlib 3.13 uri_string(3erl)