This module provides functions for encoding binary data to printable
ASCII characters and decoding such encodings back to binary data.
This includes the :ref:`encodings specified in <base64-rfc-4648>`
-:rfc:`4648` (Base64, Base32 and Base16)
-and the non-standard :ref:`Base85 encodings <base64-base-85>`.
+:rfc:`4648` (Base64, Base32 and Base16), the :ref:`Base85 encoding
+<base64-base-85>` specified in `PDF 2.0
+<https://pdfa.org/resource/iso-32000-2/>`_, and non-standard variants
+of Base85 used elsewhere.
There are two interfaces provided by this module. The modern interface
supports encoding :term:`bytes-like objects <bytes-like object>` to ASCII
Base85 Encodings
-----------------
-Base85 encoding is not formally specified but rather a de facto standard,
-thus different systems perform the encoding differently.
+Base85 encoding is a family of algorithms which represent four bytes
+using five ASCII characters. Originally implemented in the Unix
+``btoa(1)`` utility, a version of it was later adopted by Adobe in the
+PostScript language and is standardized in PDF 2.0 (ISO 32000-2).
+This version, in both its ``btoa`` and PDF variants, is implemented by
+:func:`a85encode`.
-The :func:`a85encode` and :func:`b85encode` functions in this module are two implementations of
-the de facto standard. You should call the function with the Base85
-implementation used by the software you intend to work with.
+A separate version, using a different output character set, was
+defined as an April Fool's joke in :rfc:`1924` but is now used by Git
+and other software. This version is implemented by :func:`b85encode`.
-The two functions present in this module differ in how they handle the following:
+Finally, a third version, using yet another output character set
+designed for safe inclusion in programming language strings, is
+defined by ZeroMQ and implemented here by :func:`z85encode`.
-* Whether to include enclosing ``<~`` and ``~>`` markers
-* Whether to include newline characters
-* The set of ASCII characters used for encoding
-* Handling of null bytes
+The functions present in this module differ in how they handle the following:
+
+* Whether to include and expect enclosing ``<~`` and ``~>`` markers.
+* Whether to fold the input into multiple lines.
+* The set of ASCII characters used for encoding.
+* Compact encodings of sequences of spaces and null bytes.
+* The encoding of zero-padding bytes applied to the input.
Refer to the documentation of the individual functions for more information.
*foldspaces* is an optional flag that uses the special short sequence 'y'
instead of 4 consecutive spaces (ASCII 0x20) as supported by 'btoa'. This
- feature is not supported by the "standard" Ascii85 encoding.
+ feature is not supported by the standard encoding used in PDF.
If *wrapcol* is non-zero, insert a newline (``b'\n'``) character
after at most every *wrapcol* characters.
If *wrapcol* is zero (default), do not insert any newlines.
- If *pad* is true, the input is padded with ``b'\0'`` so its length is a
- multiple of 4 bytes before encoding.
- Note that the ``btoa`` implementation always pads.
+ *pad* controls whether zero-padding applied to the end of the input
+ is fully retained in the output encoding, as done by ``btoa``,
+ producing an exact multiple of 5 bytes of output. This is not part
+ of the standard encoding used in PDF, as it does not preserve the
+ length of the data.
- *adobe* controls whether the encoded byte sequence is framed with ``<~``
- and ``~>``, which is used by the Adobe implementation.
+ *adobe* controls whether the encoded byte sequence is framed with
+ ``<~`` and ``~>``, as in a PostScript base-85 string literal. Note
+ that while ASCII85Decode streams in PDF documents *must* be
+ terminated with ``~>``, they *must not* use a leading ``<~``.
.. versionadded:: 3.4
*foldspaces* is a flag that specifies whether the 'y' short sequence
should be accepted as shorthand for 4 consecutive spaces (ASCII 0x20).
- This feature is not supported by the "standard" Ascii85 encoding.
+ This feature is not supported by the standard Ascii85 encoding used in
+ PDF and PostScript.
- *adobe* controls whether the input sequence is in Adobe Ascii85 format
- (i.e. is framed with <~ and ~>).
+ *adobe* controls whether the ``<~`` and ``~>`` markers are
+ present. While the leading ``<~`` is not required, the input must
+ end with ``~>``, or a :exc:`ValueError` is raised.
*ignorechars* should be a :term:`bytes-like object` containing characters
to ignore from the input.
Encode the :term:`bytes-like object` *b* using base85 (as used in e.g.
git-style binary diffs) and return the encoded :class:`bytes`.
- If *pad* is true, the input is padded with ``b'\0'`` so its length is a
- multiple of 4 bytes before encoding.
+ The input is padded with ``b'\0'`` so its length is a multiple of 4
+ bytes before encoding. If *pad* is true, all the resulting
+ characters are retained in the output, which will always be a
+ multiple of 5 bytes, and thus the length of the data may not be
+ preserved on decoding.
If *wrapcol* is non-zero, insert a newline (``b'\n'``) character
after at most every *wrapcol* characters.
.. function:: b85decode(b, *, ignorechars=b'', canonical=False)
Decode the base85-encoded :term:`bytes-like object` or ASCII string *b* and
- return the decoded :class:`bytes`. Padding is implicitly removed, if
- necessary.
+ return the decoded :class:`bytes`.
*ignorechars* should be a :term:`bytes-like object` containing characters
to ignore from the input.
.. function:: z85encode(s, pad=False, *, wrapcol=0)
Encode the :term:`bytes-like object` *s* using Z85 (as used in ZeroMQ)
- and return the encoded :class:`bytes`. See `Z85 specification
- <https://rfc.zeromq.org/spec/32/>`_ for more information.
+ and return the encoded :class:`bytes`.
- If *pad* is true, the input is padded with ``b'\0'`` so its length is a
- multiple of 4 bytes before encoding.
+ The input is padded with ``b'\0'`` so its length is a multiple of 4
+ bytes before encoding. If *pad* is true, all the resulting
+ characters are retained in the output, which will always be a
+ multiple of 5 bytes, as required by the ZeroMQ standard.
If *wrapcol* is non-zero, insert a newline (``b'\n'``) character
after at most every *wrapcol* characters.
.. function:: z85decode(s, *, ignorechars=b'', canonical=False)
Decode the Z85-encoded :term:`bytes-like object` or ASCII string *s* and
- return the decoded :class:`bytes`. See `Z85 specification
- <https://rfc.zeromq.org/spec/32/>`_ for more information.
+ return the decoded :class:`bytes`.
*ignorechars* should be a :term:`bytes-like object` containing characters
to ignore from the input.
Section 5.2, "Base64 Content-Transfer-Encoding," provides the definition of the
base64 encoding.
+ `ISO 32000-2 Portable document format - Part 2: PDF 2.0 <https://pdfa.org/resource/iso-32000-2/>`_
+ Section 7.4.3, "ASCII85Decode Filter," provides the definition
+ of the Ascii85 encoding used in PDF and PostScript, including
+ the output character set and the details of data length preservation
+ using zero-padding and partial output groups.
+
+ `ZeroMQ RFC 32/Z85 <https://rfc.zeromq.org/spec/32/>`_
+ The "Formal Specification" section provides the character set used in Z85.
should be accepted as shorthand for 4 consecutive spaces (ASCII 0x20).
This feature is not supported by the "standard" Ascii85 encoding.
- *adobe* controls whether the input sequence is in Adobe Ascii85 format
- (i.e. is framed with <~ and ~>).
+ *adobe* controls whether the encoded byte sequence is framed with
+ ``<~`` and ``~>``, as in a PostScript base-85 string literal. If
+ *adobe* is true, a leading ``<~`` is optionally accepted, while a
+ trailing ``~>`` is *required*, and :exc:`binascii.Error` is raised
+ if it is not found.
*ignorechars* should be a :term:`bytes-like object` containing characters
to ignore from the input.
after at most every *wrapcol* characters.
If *wrapcol* is zero (default), do not insert any newlines.
- If *pad* is true, the input is padded with ``b'\0'`` so its length is a
- multiple of 4 bytes before encoding.
- Note that the ``btoa`` implementation always pads.
+ If *pad* is true, the zero-padding applied to the end of the input
+ is fully retained in the output encoding, as done by ``btoa``,
+ producing an exact multiple of 5 bytes of output. This is not part
+ of the standard encoding used in PDF, as it does not preserve the
+ length of the data.
- *adobe* controls whether the encoded byte sequence is framed with ``<~``
- and ``~>``, which is used by the Adobe implementation.
+ *adobe* controls whether the encoded byte sequence is framed with
+ ``<~`` and ``~>``, as in a PostScript base-85 string literal. Note
+ that while ASCII85Decode streams in PDF documents *must* be
+ terminated with ``~>``, they *must not* use a leading ``<~``.
.. versionadded:: 3.15
after at most every *wrapcol* characters.
If *wrapcol* is zero (default), do not insert any newlines.
- If *pad* is true, the input is padded with ``b'\0'`` so its length is a
- multiple of 4 bytes before encoding.
+ If *pad* is true, the zero-padding applied to the end of the input
+ is retained in the output, which will always be a multiple of 5
+ bytes, and thus the length of the data may not be preserved on
+ decoding.
.. versionadded:: 3.15
foldspaces is an optional flag that uses the special short sequence 'y'
instead of 4 consecutive spaces (ASCII 0x20) as supported by 'btoa'. This
- feature is not supported by the "standard" Adobe encoding.
+ feature is not supported by the standard encoding used in PDF.
If wrapcol is non-zero, insert a newline (b'\\n') character after at most
every wrapcol characters.
- pad controls whether the input is padded to a multiple of 4 before
- encoding. Note that the btoa implementation always pads.
+ pad controls whether zero-padding applied to the end of the input
+ is fully retained in the output encoding, as done by btoa,
+ producing an exact multiple of 5 bytes of output.
+
+ adobe controls whether the encoded byte sequence is framed with <~
+ and ~>, as in a PostScript base-85 string literal. Note that
+ while ASCII85Decode streams in PDF documents must be terminated
+ with ~>, they must not use a leading <~.
- adobe controls whether the encoded byte sequence is framed with <~ and ~>,
- which is used by the Adobe implementation.
"""
return binascii.b2a_ascii85(b, foldspaces=foldspaces,
adobe=adobe, wrapcol=wrapcol, pad=pad)
canonical=False):
"""Decode the Ascii85 encoded bytes-like object or ASCII string b.
- foldspaces is a flag that specifies whether the 'y' short sequence should be
- accepted as shorthand for 4 consecutive spaces (ASCII 0x20). This feature is
- not supported by the "standard" Adobe encoding.
+ foldspaces is a flag that specifies whether the 'y' short sequence
+ should be accepted as shorthand for 4 consecutive spaces (ASCII
+ 0x20). This feature is not supported by the standard Ascii85
+ encoding used in PDF and PostScript.
- adobe controls whether the input sequence is in Adobe Ascii85 format (i.e.
- is framed with <~ and ~>).
+ adobe controls whether the <~ and ~> markers are present. While
+ the leading <~ is not required, the input must end with ~>, or a
+ ValueError is raised.
ignorechars should be a byte string containing characters to ignore from the
input. This should only contain whitespace characters, and by default
If wrapcol is non-zero, insert a newline (b'\\n') character after at most
every wrapcol characters.
- If pad is true, the input is padded with b'\\0' so its length is a multiple of
- 4 bytes before encoding.
+ The input is padded with b'\0' so its length is a multiple of 4
+ bytes before encoding. If pad is true, all the resulting
+ characters are retained in the output, which will always be a
+ multiple of 5 bytes.
"""
return binascii.b2a_base85(b, wrapcol=wrapcol, pad=pad)
If wrapcol is non-zero, insert a newline (b'\\n') character after at most
every wrapcol characters.
- If pad is true, the input is padded with b'\\0' so its length is a multiple of
- 4 bytes before encoding.
+ The input is padded with b'\0' so its length is a multiple of
+ bytes before encoding. If pad is true, all the resulting
+ characters are retained in the output, which will always be a
+ multiple of 5 bytes, as required by the ZeroMQ standard.
"""
return binascii.b2a_base85(s, wrapcol=wrapcol, pad=pad,
alphabet=binascii.Z85_ALPHABET)
foldspaces: bool = False
Allow 'y' as a short form encoding four spaces.
adobe: bool = False
- Expect data to be wrapped in '<~' and '~>' as in Adobe Ascii85.
+ Expect data to be terminated with '~>' as in Adobe Ascii85, and
+ optionally accept leading '<~'.
ignorechars: Py_buffer = b''
A byte string containing characters to ignore from the input.
canonical: bool = False
static PyObject *
binascii_a2b_ascii85_impl(PyObject *module, Py_buffer *data, int foldspaces,
int adobe, Py_buffer *ignorechars, int canonical)
-/*[clinic end generated code: output=09b35f1eac531357 input=dd050604ed30199e]*/
+/*[clinic end generated code: output=09b35f1eac531357 input=08eab2e53c62f1a8]*/
{
const unsigned char *ascii_data = data->buf;
Py_ssize_t ascii_len = data->len;
wrapcol: size_t = 0
Split result into lines of provided width.
pad: bool = False
- Pad input to a multiple of 4 before encoding.
+ Retain zero-padding bytes at end of output.
adobe: bool = False
Wrap result in '<~' and '~>' as in Adobe Ascii85.
static PyObject *
binascii_b2a_ascii85_impl(PyObject *module, Py_buffer *data, int foldspaces,
size_t wrapcol, int pad, int adobe)
-/*[clinic end generated code: output=5ce8fdee843073f4 input=791da754508c7d17]*/
+/*[clinic end generated code: output=5ce8fdee843073f4 input=a77e31d63517bf19]*/
{
const unsigned char *bin_data = data->buf;
Py_ssize_t bin_len = data->len;
/
*
pad: bool = False
- Pad input to a multiple of 4 before encoding.
+ Retain zero-padding bytes at end of output.
wrapcol: size_t = 0
alphabet: Py_buffer(c_default="{NULL, NULL}") = BASE85_ALPHABET
static PyObject *
binascii_b2a_base85_impl(PyObject *module, Py_buffer *data, int pad,
size_t wrapcol, Py_buffer *alphabet)
-/*[clinic end generated code: output=98b962ed52c776a4 input=1b20b0bd6572691b]*/
+/*[clinic end generated code: output=98b962ed52c776a4 input=54886d05128d41a8]*/
{
const unsigned char *bin_data = data->buf;
Py_ssize_t bin_len = data->len;
" foldspaces\n"
" Allow \'y\' as a short form encoding four spaces.\n"
" adobe\n"
-" Expect data to be wrapped in \'<~\' and \'~>\' as in Adobe Ascii85.\n"
+" Expect data to be terminated with \'~>\' as in Adobe Ascii85, and\n"
+" optionally accept leading \'<~\'.\n"
" ignorechars\n"
" A byte string containing characters to ignore from the input.\n"
" canonical\n"
" wrapcol\n"
" Split result into lines of provided width.\n"
" pad\n"
-" Pad input to a multiple of 4 before encoding.\n"
+" Retain zero-padding bytes at end of output.\n"
" adobe\n"
" Wrap result in \'<~\' and \'~>\' as in Adobe Ascii85.");
"Base85-code line of data.\n"
"\n"
" pad\n"
-" Pad input to a multiple of 4 before encoding.");
+" Retain zero-padding bytes at end of output.");
#define BINASCII_B2A_BASE85_METHODDEF \
{"b2a_base85", _PyCFunction_CAST(binascii_b2a_base85), METH_FASTCALL|METH_KEYWORDS, binascii_b2a_base85__doc__},
return return_value;
}
-/*[clinic end generated code: output=b41544f39b0ef681 input=a9049054013a1b77]*/
+/*[clinic end generated code: output=42dd48f323cbb118 input=a9049054013a1b77]*/