From: Mike Bayer Date: Sun, 24 Jun 2018 17:06:38 +0000 (-0400) Subject: Use utf8mb4 (or utf8mb3) for all things MySQL X-Git-Tag: rel_1_3_0b1~161 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=c99345ee9994c3ea2a5e6536cc3365f18d017cc1;p=thirdparty%2Fsqlalchemy%2Fsqlalchemy.git Use utf8mb4 (or utf8mb3) for all things MySQL Fixed bug in MySQLdb dialect and variants such as PyMySQL where an additional "unicode returns" check upon connection makes explicit use of the "utf8" character set, which in MySQL 8.0 emits a warning that utf8mb4 should be used. This is now replaced with a utf8mb4 equivalent. Documentation is also updated for the MySQL dialect to specify utf8mb4 in all examples. Additional changes have been made to the test suite to use utf8mb3 charsets and databases (there seem to be collation issues in some edge cases with utf8mb4), and to support configuration default changes made in MySQL 8.0 such as explicit_defaults_for_timestamp as well as new errors raised for invalid MyISAM indexes. Change-Id: Ib596ea7de4f69f976872a33bffa4c902d17dea25 Fixes: #4283 Fixes: #4192 --- diff --git a/doc/build/changelog/unreleased_12/4283.rst b/doc/build/changelog/unreleased_12/4283.rst new file mode 100644 index 0000000000..a110015091 --- /dev/null +++ b/doc/build/changelog/unreleased_12/4283.rst @@ -0,0 +1,16 @@ +.. change:: + :tags: bug, mysql + :tickets: 4283 + + Fixed bug in MySQLdb dialect and variants such as PyMySQL where an + additional "unicode returns" check upon connection makes explicit use of + the "utf8" character set, which in MySQL 8.0 emits a warning that utf8mb4 + should be used. This is now replaced with a utf8mb4 equivalent. + Documentation is also updated for the MySQL dialect to specify utf8mb4 in + all examples. Additional changes have been made to the test suite to use + utf8mb3 charsets and databases (there seem to be collation issues in some + edge cases with utf8mb4), and to support configuration default changes made + in MySQL 8.0 such as explicit_defaults_for_timestamp as well as new errors + raised for invalid MyISAM indexes. + + diff --git a/lib/sqlalchemy/dialects/mysql/base.py b/lib/sqlalchemy/dialects/mysql/base.py index 62753e1a5c..3aede4a058 100644 --- a/lib/sqlalchemy/dialects/mysql/base.py +++ b/lib/sqlalchemy/dialects/mysql/base.py @@ -54,13 +54,13 @@ including ``ENGINE``, ``CHARSET``, ``MAX_ROWS``, ``ROW_FORMAT``, ``INSERT_METHOD``, and many more. To accommodate the rendering of these arguments, specify the form ``mysql_argument_name="value"``. For example, to specify a table with -``ENGINE`` of ``InnoDB``, ``CHARSET`` of ``utf8``, and ``KEY_BLOCK_SIZE`` +``ENGINE`` of ``InnoDB``, ``CHARSET`` of ``utf8mb4``, and ``KEY_BLOCK_SIZE`` of ``1024``:: Table('mytable', metadata, Column('data', String(32)), mysql_engine='InnoDB', - mysql_charset='utf8', + mysql_charset='utf8mb4', mysql_key_block_size="1024" ) @@ -213,7 +213,7 @@ a connection. This is typically delivered using the ``charset`` parameter in the URL, such as:: e = create_engine( - "mysql+pymysql://scott:tiger@localhost/test?charset=utf8") + "mysql+pymysql://scott:tiger@localhost/test?charset=utf8mb4") This charset is the **client character set** for the connection. Some MySQL DBAPIs will default this to a value such as ``latin1``, and some @@ -223,8 +223,10 @@ for specific behavior. The encoding used for Unicode has traditionally been ``'utf8'``. However, for MySQL versions 5.5.3 on forward, a new MySQL-specific encoding -``'utf8mb4'`` has been introduced. The rationale for this new encoding -is due to the fact that MySQL's utf-8 encoding only supports +``'utf8mb4'`` has been introduced, and as of MySQL 8.0 a warning is emitted +by the server if plain ``utf8`` is specified within any server-side +directives, replaced with ``utf8mb3``. The rationale for this new encoding +is due to the fact that MySQL's legacy utf-8 encoding only supports codepoints up to three bytes instead of four. Therefore, when communicating with a MySQL database that includes codepoints more than three bytes in size, @@ -234,12 +236,11 @@ as the client DBAPI, as in:: e = create_engine( "mysql+pymysql://scott:tiger@localhost/test?charset=utf8mb4") -At the moment, up-to-date versions of MySQLdb and PyMySQL support the -``utf8mb4`` charset. Other DBAPIs such as MySQL-Connector and OurSQL -may **not** support it as of yet. +All modern DBAPIs should support the ``utf8mb4`` charset. -In order to use ``utf8mb4`` encoding, changes to -the MySQL schema and/or server configuration may be required. +In order to use ``utf8mb4`` encoding for a schema that was created with legacy +``utf8``, changes to the MySQL schema and/or server configuration may be +required. .. seealso:: @@ -252,38 +253,16 @@ Unicode Encoding / Decoding All modern MySQL DBAPIs all offer the service of handling the encoding and decoding of unicode data between the Python application space and the database. -As this was not always the case, SQLAlchemy also includes a comprehensive system -of performing the encode/decode task as well. As only one of these systems -should be in use at at time, SQLAlchemy has long included functionality -to automatically detect upon first connection whether or not the DBAPI is -automatically handling unicode. - -Whether or not the MySQL DBAPI will handle encoding can usually be configured -using a DBAPI flag ``use_unicode``, which is known to be supported at least -by MySQLdb, PyMySQL, and MySQL-Connector. Setting this value to ``0`` -in the "connect args" or query string will have the effect of disabling the -DBAPI's handling of unicode, such that it instead will return data of the -``str`` type or ``bytes`` type, with data in the configured charset:: - - # connect while disabling the DBAPI's unicode encoding/decoding +As this was not always the case, SQLAlchemy also includes a comprehensive +system of performing the encode/decode task as well, which for MySQL dialects +can be enabled by passing the flag ``use_unicode=0`` onto the query string, as +in:: + e = create_engine( - "mysql+mysqldb://scott:tiger@localhost/test?charset=utf8&use_unicode=0") - -Current recommendations for modern DBAPIs are as follows: - -* It is generally always safe to leave the ``use_unicode`` flag set at - its default; that is, don't use it at all. -* Under Python 3, the ``use_unicode=0`` flag should **never be used**. - SQLAlchemy under Python 3 generally assumes the DBAPI receives and returns - string values as Python 3 strings, which are inherently unicode objects. -* Under Python 2 with MySQLdb, the ``use_unicode=0`` flag will **offer - superior performance**, as MySQLdb's unicode converters under Python 2 only - have been observed to have unusually slow performance compared to SQLAlchemy's - fast C-based encoders/decoders. - -In short: don't specify ``use_unicode`` *at all*, with the possible -exception of ``use_unicode=0`` on MySQLdb with Python 2 **only** for a -potential performance gain. + "mysql+mysqldb://scott:tiger@localhost/test?charset=utf8mb4&use_unicode=0") + +Current recommendations are to **not** use this flag. All modern MySQL DBAPIs +handle unicode natively as is required on Python 3 in any case. Ansi Quoting Style ------------------ diff --git a/lib/sqlalchemy/dialects/mysql/mysqldb.py b/lib/sqlalchemy/dialects/mysql/mysqldb.py index 3da64a4913..535c8ec52d 100644 --- a/lib/sqlalchemy/dialects/mysql/mysqldb.py +++ b/lib/sqlalchemy/dialects/mysql/mysqldb.py @@ -109,21 +109,21 @@ class MySQLDialect_mysqldb(MySQLDialect): def _check_unicode_returns(self, connection): # work around issue fixed in # https://github.com/farcepest/MySQLdb1/commit/cd44524fef63bd3fcb71947392326e9742d520e8 - # specific issue w/ the utf8_bin collation and unicode returns + # specific issue w/ the utf8mb4_bin collation and unicode returns - has_utf8_bin = self.server_version_info > (5, ) and \ + has_utf8mb4_bin = self.server_version_info > (5, ) and \ connection.scalar( - "show collation where %s = 'utf8' and %s = 'utf8_bin'" + "show collation where %s = 'utf8mb4' and %s = 'utf8mb4_bin'" % ( self.identifier_preparer.quote("Charset"), self.identifier_preparer.quote("Collation") )) - if has_utf8_bin: + if has_utf8mb4_bin: additional_tests = [ sql.collate(sql.cast( sql.literal_column( "'test collated returns'"), - TEXT(charset='utf8')), "utf8_bin") + TEXT(charset='utf8mb4')), "utf8mb4_bin") ] else: additional_tests = [] diff --git a/lib/sqlalchemy/testing/provision.py b/lib/sqlalchemy/testing/provision.py index 687f84b182..8abfa3301c 100644 --- a/lib/sqlalchemy/testing/provision.py +++ b/lib/sqlalchemy/testing/provision.py @@ -194,9 +194,15 @@ def _mysql_create_db(cfg, eng, ident): _mysql_drop_db(cfg, conn, ident) except Exception: pass - conn.execute("CREATE DATABASE %s" % ident) - conn.execute("CREATE DATABASE %s_test_schema" % ident) - conn.execute("CREATE DATABASE %s_test_schema_2" % ident) + + # using utf8mb4 we are getting collation errors on UNIONS: + # test/orm/inheritance/test_polymorphic_rel.py" + # 1271, u"Illegal mix of collations for operation 'UNION'" + conn.execute("CREATE DATABASE %s CHARACTER SET utf8mb3" % ident) + conn.execute( + "CREATE DATABASE %s_test_schema CHARACTER SET utf8mb3" % ident) + conn.execute( + "CREATE DATABASE %s_test_schema_2 CHARACTER SET utf8mb3" % ident) @_configure_follower.for_db("mysql") diff --git a/lib/sqlalchemy/testing/schema.py b/lib/sqlalchemy/testing/schema.py index 3ca91b9017..401c8cbb78 100644 --- a/lib/sqlalchemy/testing/schema.py +++ b/lib/sqlalchemy/testing/schema.py @@ -22,7 +22,8 @@ def Table(*args, **kw): kw.update(table_options) if exclusions.against(config._current, 'mysql'): - if 'mysql_engine' not in kw and 'mysql_type' not in kw: + if 'mysql_engine' not in kw and 'mysql_type' not in kw and \ + "autoload_with" not in kw: if 'test_needs_fk' in test_opts or 'test_needs_acid' in test_opts: kw['mysql_engine'] = 'InnoDB' else: diff --git a/lib/sqlalchemy/testing/suite/test_reflection.py b/lib/sqlalchemy/testing/suite/test_reflection.py index 70a2f3f4f8..00a5aac018 100644 --- a/lib/sqlalchemy/testing/suite/test_reflection.py +++ b/lib/sqlalchemy/testing/suite/test_reflection.py @@ -153,14 +153,18 @@ class ComponentReflectionTest(fixtures.TablesTest): cls.define_index(metadata, users) if not schema: + # test_needs_fk is at the moment to force MySQL InnoDB noncol_idx_test_nopk = Table( 'noncol_idx_test_nopk', metadata, Column('q', sa.String(5)), + test_needs_fk=True, ) + noncol_idx_test_pk = Table( 'noncol_idx_test_pk', metadata, Column('id', sa.Integer, primary_key=True), Column('q', sa.String(5)), + test_needs_fk=True, ) Index('noncol_idx_nopk', noncol_idx_test_nopk.c.q.desc()) Index('noncol_idx_pk', noncol_idx_test_pk.c.q.desc()) diff --git a/test/dialect/mysql/test_dialect.py b/test/dialect/mysql/test_dialect.py index 328348a9c3..d72418ba3b 100644 --- a/test/dialect/mysql/test_dialect.py +++ b/test/dialect/mysql/test_dialect.py @@ -131,18 +131,19 @@ class DialectTest(fixtures.TestBase): assert not c.execute('SELECT @@autocommit;').scalar() def test_isolation_level(self): - values = { - # sqlalchemy -> mysql - 'READ UNCOMMITTED': 'READ-UNCOMMITTED', - 'READ COMMITTED': 'READ-COMMITTED', - 'REPEATABLE READ': 'REPEATABLE-READ', - 'SERIALIZABLE': 'SERIALIZABLE' - } - for sa_value, mysql_value in values.items(): + values = [ + 'READ UNCOMMITTED', + 'READ COMMITTED', + 'REPEATABLE READ', + 'SERIALIZABLE' + ] + for value in values: c = testing.db.connect().execution_options( - isolation_level=sa_value + isolation_level=value ) - assert c.execute('SELECT @@tx_isolation;').scalar() == mysql_value + eq_( + testing.db.dialect.get_isolation_level(c.connection), + value) class ParseVersionTest(fixtures.TestBase): diff --git a/test/dialect/mysql/test_reflection.py b/test/dialect/mysql/test_reflection.py index 86937cd0d5..76b6954aed 100644 --- a/test/dialect/mysql/test_reflection.py +++ b/test/dialect/mysql/test_reflection.py @@ -483,6 +483,11 @@ class ReflectionTest(fixtures.TestBase, AssertsCompiledSQL): # this is ideally one table, but older MySQL versions choke # on the multiple TIMESTAMP columns + row = testing.db.execute( + "show variables like '%%explicit_defaults_for_timestamp%%'" + ).first() + explicit_defaults_for_timestamp = row[1].lower() in ('on', '1', 'true') + reflected = [] for idx, cols in enumerate([ [ @@ -528,16 +533,20 @@ class ReflectionTest(fixtures.TestBase, AssertsCompiledSQL): {'name': 'p', 'nullable': True, 'default': current_timestamp}, {'name': 'r', 'nullable': False, - 'default': + 'default': None if explicit_defaults_for_timestamp else "%(current_timestamp)s ON UPDATE %(current_timestamp)s" % {"current_timestamp": current_timestamp}}, {'name': 's', 'nullable': False, 'default': current_timestamp}, - {'name': 't', 'nullable': False, - 'default': + {'name': 't', + 'nullable': True if explicit_defaults_for_timestamp else + False, + 'default': None if explicit_defaults_for_timestamp else "%(current_timestamp)s ON UPDATE %(current_timestamp)s" % {"current_timestamp": current_timestamp}}, - {'name': 'u', 'nullable': False, + {'name': 'u', + 'nullable': True if explicit_defaults_for_timestamp else + False, 'default': current_timestamp}, ] ) diff --git a/test/requirements.py b/test/requirements.py index c1e30daf6d..3cc80318cb 100644 --- a/test/requirements.py +++ b/test/requirements.py @@ -1033,7 +1033,9 @@ class DefaultRequirements(SuiteRequirements): # will raise without quoting "postgresql": "POSIX", - "mysql": "latin1_general_ci", + # note MySQL databases need to be created w/ utf8mb3 charset + # for the test suite + "mysql": "utf8mb3_bin", "sqlite": "NOCASE", # will raise *with* quoting diff --git a/test/sql/test_defaults.py b/test/sql/test_defaults.py index c53670a05f..0d4eecf6aa 100644 --- a/test/sql/test_defaults.py +++ b/test/sql/test_defaults.py @@ -447,7 +447,7 @@ class DefaultTest(fixtures.TestBase): t.insert().execute({}, {}, {}) ctexec = currenttime.scalar() - result = t.select().execute() + result = t.select().order_by(t.c.col1).execute() today = datetime.date.today() eq_(result.fetchall(), [(51, 'imthedefault', f, ts, ts, ctexec, True, False, @@ -463,7 +463,7 @@ class DefaultTest(fixtures.TestBase): t.insert().values([{}, {}, {}]).execute() ctexec = currenttime.scalar() - result = t.select().execute() + result = t.select().order_by(t.c.col1).execute() today = datetime.date.today() eq_(result.fetchall(), [(51, 'imthedefault', f, ts, ts, ctexec, True, False,