From: Mike Bayer Date: Thu, 6 Feb 2020 15:26:50 +0000 (-0500) Subject: Document SQLite "mixed binary" behavior X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=cc6510bc4677fe54aada005763b5fc42f0c6db7b;p=thirdparty%2Fsqlalchemy%2Fsqlalchemy.git Document SQLite "mixed binary" behavior The Pysqlite driver can store a string value with or without an indicator that the value is to be retrieved as bytes or as a unicode string object. To suit the use case where a SQLite database has mixed values on a row by row basis, provide a recipe for a MixedBinary datatype. Change-Id: I9a166bd6fc673d8d46a53ab9697cb3d412e5fcee References: #5073 (cherry picked from commit ee1d914888113ceb9928ece6e0a715c813bdfcfa) --- diff --git a/lib/sqlalchemy/dialects/sqlite/pysqlite.py b/lib/sqlalchemy/dialects/sqlite/pysqlite.py index c873d8884a..babcf8423c 100644 --- a/lib/sqlalchemy/dialects/sqlite/pysqlite.py +++ b/lib/sqlalchemy/dialects/sqlite/pysqlite.py @@ -192,6 +192,44 @@ require unicode, however, so that non-``unicode`` values passed inadvertently will emit a warning. Pysqlite will emit an error if a non-``unicode`` string is passed containing non-ASCII characters. +Dealing with Mixed String / Binary Columns in Python 3 +------------------------------------------------------ + +The SQLite database is weakly typed, and as such it is possible when using +binary values, which in Python 3 are represented as ``b'some string'``, that a +particular SQLite database can have data values within different rows where +some of them will be returned as a ``b''`` value by the Pysqlite driver, and +others will be returned as Python strings, e.g. ``''`` values. This situation +is not known to occur if the SQLAlchemy :class:`.LargeBinary` datatype is used +consistently, however if a particular SQLite database has data that was +inserted using the Pysqlite driver directly, or when using the SQLAlchemy +:class:`.String` type which was later changed to :class:`.LargeBinary`, the +table will not be consistently readable because SQLAlchemy's +:class:`.LargeBinary` datatype does not handle strings so it has no way of +"encoding" a value that is in string format. + +To deal with a SQLite table that has mixed string / binary data in the +same column, use a custom type that will check each row individually:: + + # note this is Python 3 only + + from sqlalchemy import String + from sqlalchemy import TypeDecorator + + class MixedBinary(TypeDecorator): + impl = String + + def process_result_value(self, value, dialect): + if isinstance(value, str): + value = bytes(value, 'utf-8') + elif value is not None: + value = bytes(value) + + return value + +Then use the above ``MixedBinary`` datatype in the place where +:class:`.LargeBinary` would normally be used. + .. _pysqlite_serializable: Serializable isolation / Savepoints / Transactional DDL