turn off use_insertmanyvalues for SQL Server

author Mike Bayer <mike_mp@zzzcomputing.com>

Wed, 5 Apr 2023 16:59:13 +0000 (12:59 -0400)

committer Mike Bayer <mike_mp@zzzcomputing.com>

Wed, 5 Apr 2023 22:24:59 +0000 (18:24 -0400)
author Mike Bayer <mike_mp@zzzcomputing.com>
Wed, 5 Apr 2023 16:59:13 +0000 (12:59 -0400)
committer Mike Bayer <mike_mp@zzzcomputing.com>
Wed, 5 Apr 2023 22:24:59 +0000 (18:24 -0400)
diff --git a/doc/build/changelog/unreleased_20/9603.rst b/doc/build/changelog/unreleased_20/9603.rst

new file mode 100644 (file)

index 0000000..0af82ae
--- /dev/null
+++ b/doc/build/changelog/unreleased_20/9603.rst
@@ -0,0 +1,14 @@
+.. change::
+    :tags: bug, mssql
+    :tickets: 9603
+
+    Due to a critical bug identified in SQL Server, the SQLAlchemy
+    "insertmanyvalues" feature which allows fast INSERT of many rows while also
+    supporting RETURNING unfortunately needs to be disabled for SQL Server. SQL
+    Server is apparently unable to guarantee that the order of rows inserted
+    matches the order in which they are sent back by OUTPUT inserted when
+    table-valued rows are used with INSERT in conjunction with OUTPUT inserted.
+    We are trying to see if Microsoft is able to confirm this undocumented
+    behavior however there is no known workaround, other than it's not safe to
+    use table-valued expressions with OUTPUT inserted for now.
+
diff --git a/doc/build/changelog/whatsnew_20.rst b/doc/build/changelog/whatsnew_20.rst

index 63fadc8a55f7186bdc254dfca49b95eb92f654a6..6b023bb483be00abafb49b21f89d040fe7beda76 100644 (file)
--- a/doc/build/changelog/whatsnew_20.rst
+++ b/doc/build/changelog/whatsnew_20.rst
@@ -859,7 +859,8 @@ Optimized ORM bulk insert now implemented for all backends other than MySQL
  The dramatic performance improvement introduced in the 1.4 series and described
  at :ref:`change_5263` has now been generalized to all included backends that
  support RETURNING, which is all backends other than MySQL: SQLite, MariaDB,
-PostgreSQL (all drivers), Oracle, and SQL Server. While the original feature
+PostgreSQL (all drivers), and Oracle; SQL Server has support but unfortunately
+had to be turned off due to an issue with SQL Server [#]_. While the original feature
  was most critical for the psycopg2 driver which otherwise had major performance
  issues when using ``cursor.executemany()``, the change is also critical for
  other PostgreSQL drivers such as asyncpg, as when using RETURNING,
@@ -894,16 +895,16 @@ With most databases now offering RETURNING (with the conspicuous exception of
  MySQL, given that MariaDB supports it), the new change generalizes the psycopg2
  "fast execution helper" approach to all dialects that support RETURNING, which
  now includes SQlite and MariaDB, and for which no other approach for
-"executemany plus RETURNING" is possible, which includes SQLite, MariaDB, all
-PG drivers, and SQL Server. The cx_Oracle and oracledb drivers used for Oracle
+"executemany plus RETURNING" is possible, which includes SQLite, MariaDB, and all
+PG drivers. The cx_Oracle and oracledb drivers used for Oracle
  support RETURNING with executemany natively, and this has also been implemented
-to provide equivalent performance improvements. With SQLite and MariaDB now
+to provide equivalent performance improvements.  With SQLite and MariaDB now
  offering RETURNING support, ORM use of ``cursor.lastrowid`` is nearly a thing
  of the past, with only MySQL still relying upon it.
  
  For INSERT statements that don't use RETURNING, traditional executemany()
-behavior is used for most backends, with the current exceptions of psycopg2
-and mssql+pyodbc, which both have very slow executemany() performance overall
+behavior is used for most backends, with the current exception of psycopg2,
+which has very slow executemany() performance overall
  and are still improved by the "insertmanyvalues" approach.
  
  Benchmarks
@@ -974,10 +975,18 @@ sqlite+pysqlite2 (memory)      6.204843                3.554856
  postgresql+asyncpg (network)   88.292285               4.561492
  postgresql+psycopg (network)   N/A (psycopg3)          4.861368
  oracle+cx_Oracle (network)     92.603953               4.809520
-mssql+pyodbc (network)         158.396667              4.825139
  mariadb+mysqldb (network)      71.705197               4.075377
  ============================   ====================    ====================
  
+.. mssql+pyodbc (network)      .. 158.396667           .. 4.825139
+
+
+.. note::
+
+   .. [#] The feature is disabled for SQL Server as of SQLAlchemy 2.0.9 due
+      to incompatibilities in how table-valued expressions are handled by
+      SQL Server.  See https://github.com/sqlalchemy/sqlalchemy/issues/9603
+
  Two additional drivers have no change in performance; the psycopg2 drivers,
  for which fast executemany was already implemented in SQLAlchemy 1.4,
  and MySQL, which continues to not offer RETURNING support:
diff --git a/doc/build/core/connections.rst b/doc/build/core/connections.rst

index 80156b4183d6439395e244a5c82c82b8399a2136..726b5a501d56b1b538f540139feaca46b8579f00 100644 (file)
--- a/doc/build/core/connections.rst
+++ b/doc/build/core/connections.rst
@@ -1854,7 +1854,9 @@ as follows:
  
  * SQLite - supported for SQLite versions 3.35 and above
  * PostgreSQL - all supported Postgresql versions (9 and above)
-* SQL Server - all supported SQL Server versions
+* SQL Server - **disabled by default as of SQLAlchemy 2.0.9** - the SQL syntax
+  used has been shown to not be safe for RETURNING
+  (see https://github.com/sqlalchemy/sqlalchemy/issues/9603)
  * MariaDB - supported for MariaDB versions 10.5 and above
  * MySQL - no support, no RETURNING feature is present
  * Oracle - supports RETURNING with executemany using native cx_Oracle / OracleDB
@@ -1887,11 +1889,7 @@ The feature can also be disabled from being used implicitly for a particular
        )
  
  The reason one might want to disable RETURNING for a specific table is to
-work around backend-specific limitations.  For example, there is a known
-limitation of SQL Server that the ``OUTPUT inserted.<colname>`` feature
-may not work correctly for a table that has INSERT triggers established;
-such a table may need to include ``implicit_returning=False`` (see
-:ref:`mssql_triggers`).
+work around backend-specific limitations.
  
  .. _engine_insertmanyvalues_page_size:
  
@@ -1929,9 +1927,8 @@ varies by dialect and server version; the largest size is 32700 (chosen as a
  healthy distance away from PostgreSQL's limit of 32767 and SQLite's modern
  limit of 32766, while leaving room for additional parameters in the statement
  as well as for DBAPI quirkiness). Older versions of SQLite (prior to 3.32.0)
-will set this value to 999; SQL Server sets it to 2099.  MariaDB has no
-established limit however 32700 remains as a limiting factor for SQL message
-size.
+will set this value to 999. MariaDB has no established limit however 32700
+remains as a limiting factor for SQL message size.
  
  The value of the "batch size" can be affected :class:`_engine.Engine`
  wide via the :paramref:`_sa.create_engine.insertmanyvalues_page_size` parameter.
diff --git a/lib/sqlalchemy/dialects/mssql/base.py b/lib/sqlalchemy/dialects/mssql/base.py

index 31745d600edd0040600bf7791ecfce12bdd93d01..808fdf16fe30fa352bd78d5676ff464fff70b851 100644 (file)
--- a/lib/sqlalchemy/dialects/mssql/base.py
+++ b/lib/sqlalchemy/dialects/mssql/base.py
@@ -251,18 +251,11 @@ The process for fetching this value has several variants:
  
      INSERT INTO t (x) OUTPUT inserted.id VALUES (?)
  
-  As of SQLAlchemy 2.0, the :ref:`engine_insertmanyvalues` feature is also
-  used by default to optimize many-row INSERT statements; for SQL Server
-  the feature takes place for both RETURNING and-non RETURNING
-  INSERT statements.
-
-* The value of :paramref:`_sa.create_engine.insertmanyvalues_page_size`
-  defaults to 1000, however the ultimate page size for a particular INSERT
-  statement may be limited further, based on an observed limit of
-  2100 bound parameters for a single statement in SQL Server.
-  The page size may also be modified on a per-engine
-  or per-statement basis; see the section
-  :ref:`engine_insertmanyvalues_page_size` for details.
+  .. note::  SQLAlchemy 2.0 introduced the :ref:`engine_insertmanyvalues`
+     feature for SQL Server, which is used by default to optimize many-row
+     INSERT statements; however as of SQLAlchemy 2.0.9 this feature had
+     to be turned off for SQL Server as the database does not support
+     deterministic RETURNING of INSERT rows for a multi-row INSERT statement.
  
  * When RETURNING is not available or has been disabled via
    ``implicit_returning=False``, either the ``scope_identity()`` function or
@@ -3017,7 +3010,8 @@ class MSDialect(default.DefaultDialect):
      # may be changed at server inspection time for older SQL server versions
      supports_multivalues_insert = True
  
-    use_insertmanyvalues = True
+    # disabled due to #9603
+    use_insertmanyvalues = False
  
      # note pyodbc will set this to False if fast_executemany is set,
      # as of SQLAlchemy 2.0.9
@@ -3085,6 +3079,14 @@ class MSDialect(default.DefaultDialect):
  
          super().__init__(**opts)
  
+        if self.use_insertmanyvalues:
+            raise exc.ArgumentError(
+                "The use_insertmanyvalues feature on SQL Server is currently "
+                "not safe to use, as returned result rows may be returned in "
+                "random order.  Ensure use_insertmanyvalues is left at its "
+                "default of False (this setting changed in SQLAlchemy 2.0.9)"
+            )
+
          self._json_serializer = json_serializer
          self._json_deserializer = json_deserializer
  
diff --git a/lib/sqlalchemy/dialects/mssql/pyodbc.py b/lib/sqlalchemy/dialects/mssql/pyodbc.py

index f9f2479ad39234c2eec1b979c7a6c600ad5a529e..08c6bc48f1824f131a280228ca33991bf0168071 100644 (file)
--- a/lib/sqlalchemy/dialects/mssql/pyodbc.py
+++ b/lib/sqlalchemy/dialects/mssql/pyodbc.py
@@ -290,25 +290,19 @@ Pyodbc have been resolved as of SQLAlchemy 2.0.5. See the notes at
  Fast Executemany Mode
  ---------------------
  
-.. note:: SQLAlchemy 2.0 now includes an equivalent "fast executemany"
-   handler for INSERT statements that is more robust than the PyODBC feature
-   (but is not quite as performant particularly for very large datasets);
-   the feature is called :ref:`insertmanyvalues <engine_insertmanyvalues>`
-   and is enabled for all INSERT statements by default.
-   SQLAlchemy's feature integrates with the PyODBC ``setinputsizes()`` method
-   which allows for more accurate specification of datatypes, and additionally
-   uses a dynamically sized, batched approach that scales to any number of
-   columns and/or rows.
-
-   The SQL Server ``fast_executemany`` parameter may be used at the same time
-   as ``insertmanyvalues`` is enabled; however, the parameter will not be used
-   for INSERT statements that include RETURNING.
+  .. note::  SQLAlchemy 2.0 introduced the :ref:`engine_insertmanyvalues`
+     feature for SQL Server, which is used by default to optimize many-row
+     INSERT statements; however as of SQLAlchemy 2.0.9 this feature had
+     to be turned off for SQL Server as the database does not support
+     deterministic RETURNING of INSERT rows for a multi-row INSERT statement.
  
  .. versionchanged:: 2.0.9 - ``fast_executemany`` executions will be used
     for INSERT statements that don't include RETURNING, when
     ``fast_executemany`` is set. Previously, ``use_insertmanyvalues`` would
     cause ``fast_executemany`` to not be used in most cases.
  
+   ``use_insertmanyvalues`` is disabled for SQL Server overall as of 2.0.9.
+
  The PyODBC driver includes support for a "fast executemany" mode of execution
  which greatly reduces round trips for a DBAPI ``executemany()`` call when using
  Microsoft ODBC drivers, for **limited size batches that fit in memory**.  The
diff --git a/test/dialect/mssql/test_engine.py b/test/dialect/mssql/test_engine.py

index 363c19584b784a60fa31be3d057cdaacc577a95b..095df2eaf1b5e947d4ccc2a28c9ff8f2a3cf101d 100644 (file)
--- a/test/dialect/mssql/test_engine.py
+++ b/test/dialect/mssql/test_engine.py
@@ -3,6 +3,7 @@ import re
  from unittest.mock import Mock
  
  from sqlalchemy import Column
+from sqlalchemy import create_engine
  from sqlalchemy import event
  from sqlalchemy import exc
  from sqlalchemy import inspect
@@ -423,7 +424,7 @@ class FastExecutemanyTest(fixtures.TestBase):
      @testing.variation("add_event", [True, False])
      @testing.variation("setinputsizes", [True, False])
      @testing.variation("fastexecutemany", [True, False])
-    @testing.variation("insertmanyvalues", [True, False])
+    @testing.variation("insertmanyvalues", [False])  # disabled due to #9603
      @testing.variation("broken_types", [True, False])
      def test_insert_typing(
          self,
@@ -628,6 +629,14 @@ class MiscTest(fixtures.TestBase):
      __only_on__ = "mssql"
      __backend__ = True
  
+    def test_no_insertmanyvalues(self):
+        with expect_raises_message(
+            exc.ArgumentError,
+            "The use_insertmanyvalues feature on SQL Server is "
+            "currently not safe to use",
+        ):
+            create_engine("mssql+pyodbc://", use_insertmanyvalues=True)
+
      @testing.variation("enable_comments", [True, False])
      def test_comments_enabled_disabled(
          self, testing_engine, metadata, enable_comments
diff --git a/test/dialect/mssql/test_query.py b/test/dialect/mssql/test_query.py

index 35575bc13ae5752e81783964a1b46a5c9606a7b5..a11d847cfba5a98dbf469f6eadbf43c45b4c505c 100644 (file)
--- a/test/dialect/mssql/test_query.py
+++ b/test/dialect/mssql/test_query.py
@@ -19,6 +19,7 @@ from sqlalchemy import Table
  from sqlalchemy import testing
  from sqlalchemy.dialects.mssql import base as mssql
  from sqlalchemy.dialects.mssql import pyodbc as mssql_pyodbc
+from sqlalchemy.orm import Session
  from sqlalchemy.testing import AssertsCompiledSQL
  from sqlalchemy.testing import config
  from sqlalchemy.testing import engines
@@ -159,6 +160,74 @@ class QueryTest(testing.AssertsExecutionResults, fixtures.TestBase):
      __only_on__ = "mssql"
      __backend__ = True
  
+    def test_broken_table_values_test(self, decl_base, connection):
+        """test #9603.
+
+        this uses the ORM to ensure the ORM is not using any kind of
+        insertmany that causes the problem.
+
+        """
+
+        class Datum(decl_base):
+
+            __tablename__ = "datum"
+
+            id = Column(Integer, autoincrement=False, primary_key=True)
+            data = Column(String(10))
+
+        class Result(decl_base):
+
+            __tablename__ = "result"
+
+            id = Column(Integer, primary_key=True)
+
+            thing = Column(Integer)
+            lft_datum_id = Column(Integer, ForeignKey(Datum.id))
+
+        decl_base.metadata.create_all(connection)
+        with Session(connection) as sess:
+
+            size = 15
+            datum_ids = list(range(1, size + 1))
+
+            sess.add_all([Datum(id=id_, data=f"d{id_}") for id_ in datum_ids])
+            sess.flush()
+
+            result_data = [
+                Result(thing=num, lft_datum_id=datum_ids[num % size])
+                for num in range(size * size)
+            ]
+            sess.add_all(result_data)
+            sess.flush()
+
+            # this is what we expected we put in
+            the_data_in_order_should_be = [
+                (num + 1, num, datum_ids[num % size])
+                for num in range(size * size)
+            ]
+
+            # and yes, that's what went in
+            eq_(
+                sess.execute(
+                    select(
+                        Result.id, Result.thing, Result.lft_datum_id
+                    ).order_by(Result.id)
+                ).all(),
+                the_data_in_order_should_be,
+            )
+
+            # however, if insertmanyvalues is turned on, OUTPUT inserted
+            # did not give us the rows in the order we sent, so ids were
+            # mis-applied.  even if we sort the original records by the
+            # ids that were given
+            eq_(
+                [
+                    (r.id, r.thing, r.lft_datum_id)
+                    for r in sorted(result_data, key=lambda r: r.id)
+                ],
+                the_data_in_order_should_be,
+            )
+
      def test_fetchid_trigger(self, metadata, connection):
          # TODO: investigate test hang on mssql when connection fixture is used
          """
diff --git a/test/dialect/mssql/test_types.py b/test/dialect/mssql/test_types.py

index cb7ed3102e12bba30ba836af300fcf625970382e..d94984e0e614f402752040052e6e1151d82d26fe 100644 (file)
--- a/test/dialect/mssql/test_types.py
+++ b/test/dialect/mssql/test_types.py
@@ -1335,7 +1335,8 @@ class StringRoundTripTest(fixtures.TestBase):
          id_="ia",
      )
      @testing.combinations(
-        ("insertmany", True),
+        # disabled due to #9603
+        # ("insertmany", True),
          ("insertsingle", False),
          argnames="insertmany",
          id_="ia",
diff --git a/test/orm/test_unitofworkv2.py b/test/orm/test_unitofworkv2.py

index ae47dfa4fd4ffc3b5dfe7a626585b737841c5079..1b909f2b988a749b848ec383fb94324f2037a590 100644 (file)
--- a/test/orm/test_unitofworkv2.py
+++ b/test/orm/test_unitofworkv2.py
@@ -3217,14 +3217,34 @@ class EagerDefaultsSettingTest(
              Conditional(
                  expect_returning,
                  [
-                    CompiledSQL(
-                        "INSERT INTO test (id, bar) VALUES (:id, :bar) "
-                        "RETURNING test.foo",
+                    Conditional(
+                        connection.dialect.insert_executemany_returning,
                          [
-                            {"id": 1, "bar": 6},
-                            {"id": 2, "bar": 6},
+                            CompiledSQL(
+                                "INSERT INTO test (id, bar) "
+                                "VALUES (:id, :bar) "
+                                "RETURNING test.foo",
+                                [
+                                    {"id": 1, "bar": 6},
+                                    {"id": 2, "bar": 6},
+                                ],
+                            )
                          ],
-                    )
+                        [
+                            CompiledSQL(
+                                "INSERT INTO test (id, bar) "
+                                "VALUES (:id, :bar) "
+                                "RETURNING test.foo",
+                                {"id": 1, "bar": 6},
+                            ),
+                            CompiledSQL(
+                                "INSERT INTO test (id, bar) "
+                                "VALUES (:id, :bar) "
+                                "RETURNING test.foo",
+                                {"id": 2, "bar": 6},
+                            ),
+                        ],
+                    ),
                  ],
                  [
                      CompiledSQL(
author	Mike Bayer <mike_mp@zzzcomputing.com>
	Wed, 5 Apr 2023 16:59:13 +0000 (12:59 -0400)
committer	Mike Bayer <mike_mp@zzzcomputing.com>
	Wed, 5 Apr 2023 22:24:59 +0000 (18:24 -0400)
doc/build/changelog/unreleased_20/9603.rst	[new file with mode: 0644]	patch \| blob
doc/build/changelog/whatsnew_20.rst		patch \| blob \| blame \| history
doc/build/core/connections.rst		patch \| blob \| blame \| history
lib/sqlalchemy/dialects/mssql/base.py		patch \| blob \| blame \| history
lib/sqlalchemy/dialects/mssql/pyodbc.py		patch \| blob \| blame \| history
test/dialect/mssql/test_engine.py		patch \| blob \| blame \| history
test/dialect/mssql/test_query.py		patch \| blob \| blame \| history
test/dialect/mssql/test_types.py		patch \| blob \| blame \| history
test/orm/test_unitofworkv2.py		patch \| blob \| blame \| history