Manual of Lua-DataFilter: pctenc
percent_encode and percent_decode - DataFilter algorithms for percent encoding and decoding
Overview
These two algorithms are part of the Lua-DataFilter package. See the overview documentation in lua-datafilter(3) for information about how to use them.
Default behaviour of percent_encode
percent_encode
will escape any bytes in the input which it
considers to be 'unsafe', as a percent sign followed by a two-digit
hexadecimal number indicating the value of the original byte. Bytes which
are safe are output unchanged.
The bytes which are escaped are those which don't occur in a defined set of
'safe' bytes. The default set of safe bytes are those which are US-ASCII
characters unreserved for use in URIs by RFC 3986. These are all uppercase
and lowercase letters, digits, and the characters hyphen (-
),
fullstop (.
), underscore (_
), and tilde (~
).
Uppercase characters are always used for the hex numbers.
Options for percent_encode
The default set of safe characters can be overridden by providing the
option safe_bytes
, which should be a string. Each byte which occurs
in this string is one which the percent_encode
algorithm need not
encode, so it will encode everything which isn't in this set.
An exception will be thrown if you try to declare the percent %
character
to be safe, because that would produce an encoded value which could not
safely be decoded again (at least not without corruption).
An exception will also be thrown if you include the same byte twice in
the safe_bytes
string.
Default behaviour of percent_decode
This will reverse the encoding of characters performed by
percent_encode
. It does not matter which set of safe bytes were used
for the encoding, they will all be either unescaped (when they are encoded
with a %
character), or passed through to the output as-is.
percent_decode
currently doesn't accept any options.
Whenever a percent sign appears in the input it must be followed by two hexadecimal digits. If one of the two following characters is not a hexadecimal digit, or if there aren't enough characters left at the end of the input, then an exception will be thrown.
Both uppercase and lowercase letters may be used in hexadecimal digits.