From: Daniele Varrazzo Date: Wed, 16 Dec 2020 21:05:02 +0000 (+0100) Subject: Added docs about string/binary adaptation X-Git-Tag: 3.0.dev0~267 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=464b4d3460fda3626f425f0ae5b7185f4a41237c;p=thirdparty%2Fpsycopg.git Added docs about string/binary adaptation --- diff --git a/docs/adapt-types.rst b/docs/adapt-types.rst index 880dffa04..c002fd184 100644 --- a/docs/adapt-types.rst +++ b/docs/adapt-types.rst @@ -1,3 +1,5 @@ +.. currentmodule:: psycopg3 + .. index:: single: Adaptation pair: Objects; Adaptation @@ -37,8 +39,10 @@ TODO: complete table +--------------------+-------------------------+--------------------------+ | | `!str` | | :sql:`varchar` | :ref:`adapt-string` | | | | | :sql:`text` | | - +--------------------+-------------------------+ | - | | `!bytes` | :sql:`bytea` | | + +--------------------+-------------------------+--------------------------+ + | | `bytes` | :sql:`bytea` | :ref:`adapt-binary` | + | | `bytearray` | | | + | | `memoryview` | | | +--------------------+-------------------------+--------------------------+ | `!date` | :sql:`date` | :ref:`adapt-date` | +--------------------+-------------------------+ | @@ -145,7 +149,106 @@ promoted to the larger Python counterpart. `__ +.. index:: + pair: Strings; Adaptation + single: Unicode; Adaptation + pair: Encoding; SQL_ASCII + .. _adapt-string: + +Strings adaptation +------------------ + +Python `str` is converted to PostgreSQL string syntax, and PostgreSQL types +such as :sql:`text` and :sql:`varchar` are converted back to Python `!str`: + +.. code:: python + + conn = psycopg3.connect() + conn.execute( + "insert into strtest (id, data) values (%s, %s)", + (1, "Crème Brûlée at 4.99€")) + conn.execute("select data from strtest where id = 1").fetchone()[0] + 'Crème Brûlée at 4.99€' + +PostgreSQL databases `have an encoding`__, and `the session has an encoding`__ +too, exposed in the `Connection.client_encoding` attribute. If your database +and connection are in UTF-8 encoding you will likely have no problem, +otherwise you will have to make sure that your application only deals with the +non-ASCII chars that the database can handle; failing to do so may result in +encoding/decoding errors: + +.. __: https://www.postgresql.org/docs/current/sql-createdatabase.html +.. __: https://www.postgresql.org/docs/current/multibyte.html + +.. code:: python + + # The encoding is set at connection time according to the db configuration + conn.client_encoding + 'utf-8' + + # The Latin-9 encoding can manage some European accented letters + # and the Euro symbol + conn.client_encoding = 'latin9' + conn.execute("select data from strtest where id = 1").fetchone()[0] + 'Crème Brûlée at 4.99€' + + # The Latin-1 encoding doesn't have a representation for the Euro symbol + conn.client_encoding = 'latin1' + conn.execute("select data from strtest where id = 1").fetchone()[0] + # Traceback (most recent call last) + # ... + # UntranslatableCharacter: character with byte sequence 0xe2 0x82 0xac + # in encoding "UTF8" has no equivalent in encoding "LATIN1" + +In rare cases you may have strings with unexpected encodings in the database. +Using the ``SQL_ASCII`` client encoding (or setting +`~Connection.client_encoding` ``= "ascii"``) will disable decoding of the data +coming from the database, which will be returned as `bytes`: + +.. code:: python + + conn.client_encoding = "ascii" + conn.execute("select data from strtest where id = 1").fetchone()[0] + b'Cr\xc3\xa8me Br\xc3\xbbl\xc3\xa9e at 4.99\xe2\x82\xac' + +Alternatively you can cast the unknown encoding data to :sql:`bytea` to +retrieve it as bytes, leaving other strings unaltered: see :ref:`adapt-binary` + +Note that PostgreSQL text cannot contain the ``0x00`` byte. If you need to +store Python strings that may contain binary zeros you should use a +:sql:`bytea` field. + + +.. index:: + single: bytea; Adaptation + single: bytes; Adaptation + single: bytearray; Adaptation + single: memoryview; Adaptation + single: Binary string + +.. _adapt-binary: + +Binary adaptation +----------------- + +Python types representing binary objects (`bytes`, `bytearray`, `memoryview`) +are converted by default to :sql:`bytea` fields. By default data received is +returned as `!bytes`. + +.. admonition:: todo + + Make sure bytearry/memoryview work and are compsable with + arrays/composite + +If you are storing large binary data in bytea fields (such as binary documents +or images) you should probably use the binary format to pass and return +values, otherwise binary data will undergo `ASCII escaping`__, taking some CPU +time and more bandwidth. See :ref:`binary-data` for details. + +.. __: https://www.postgresql.org/docs/current/datatype-binary.html + + .. _adapt-date: .. _adapt-list: .. _adapt-composite: diff --git a/docs/connection.rst b/docs/connection.rst index ba95df4f8..5a7d4befb 100644 --- a/docs/connection.rst +++ b/docs/connection.rst @@ -100,6 +100,31 @@ The `!Connection` class ones: you should call ``await`` `~AsyncConnection.set_client_encoding`\ :samp:`({value})` instead. + The value returned is always normalized to the Python codec + `~codecs.CodecInfo.name`:: + + conn.client_encoding = 'latin9' + conn.client_encoding + 'iso8859-15' + + and it reflects the current connection property, even if it is set + outside Python:: + + conn.execute("SET client_encoding TO LATIN1") + conn.client_encoding + 'iso8859-1' + + A few PostgreSQL encodings are not available in Python and cannot be + selected (currently ``EUC_TW``, ``MULE_INTERNAL``). The PostgreSQL + ``SQL_ASCII`` encoding has the special meaning of "no encoding": see + :ref:`adapt-string` for details. + + .. seealso:: + + The `PostgreSQL supported encodings`__. + + .. __: https://www.postgresql.org/docs/current/multibyte.html + .. attribute:: info TODO