From: Mike Bayer <mike_mp@zzzcomputing.com>
Date: Wed, 5 Apr 2023 16:59:13 +0000 (-0400)
Subject: turn off use_insertmanyvalues for SQL Server
X-Git-Tag: rel_2_0_9~2
X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=7fb7365622ee751b264bbbd4e7394ef8e81d45c5;p=thirdparty%2Fsqlalchemy%2Fsqlalchemy.git

turn off use_insertmanyvalues for SQL Server

we will keep trying to find workarounds, however this
patch is the "turn it off" patch

Due to a critical bug identified in SQL Server, the SQLAlchemy
"insertmanyvalues" feature which allows fast INSERT of many rows while also
supporting RETURNING unfortunately needs to be disabled for SQL Server. SQL
Server is apparently unable to guarantee that the order of rows inserted
matches the order in which they are sent back by OUTPUT inserted when
table-valued rows are used with INSERT in conjunction with OUTPUT inserted.
We are trying to see if Microsoft is able to confirm this undocumented
behavior however there is no known workaround, other than it's not safe to
use table-valued expressions with OUTPUT inserted for now.

Fixes: #9603
Change-Id: I4b932fb8774390bbdf4e870a1f6cfe9a78c4b105
---

diff --git a/doc/build/changelog/unreleased_20/9603.rst b/doc/build/changelog/unreleased_20/9603.rst
new file mode 100644
index 0000000000..0af82ae1f1
--- /dev/null
+++ b/doc/build/changelog/unreleased_20/9603.rst
@@ -0,0 +1,14 @@
+.. change::
+    :tags: bug, mssql
+    :tickets: 9603
+
+    Due to a critical bug identified in SQL Server, the SQLAlchemy
+    "insertmanyvalues" feature which allows fast INSERT of many rows while also
+    supporting RETURNING unfortunately needs to be disabled for SQL Server. SQL
+    Server is apparently unable to guarantee that the order of rows inserted
+    matches the order in which they are sent back by OUTPUT inserted when
+    table-valued rows are used with INSERT in conjunction with OUTPUT inserted.
+    We are trying to see if Microsoft is able to confirm this undocumented
+    behavior however there is no known workaround, other than it's not safe to
+    use table-valued expressions with OUTPUT inserted for now.
+
diff --git a/doc/build/changelog/whatsnew_20.rst b/doc/build/changelog/whatsnew_20.rst
index 63fadc8a55..6b023bb483 100644
--- a/doc/build/changelog/whatsnew_20.rst
+++ b/doc/build/changelog/whatsnew_20.rst
@@ -859,7 +859,8 @@ Optimized ORM bulk insert now implemented for all backends other than MySQL
 The dramatic performance improvement introduced in the 1.4 series and described
 at :ref:`change_5263` has now been generalized to all included backends that
 support RETURNING, which is all backends other than MySQL: SQLite, MariaDB,
-PostgreSQL (all drivers), Oracle, and SQL Server. While the original feature
+PostgreSQL (all drivers), and Oracle; SQL Server has support but unfortunately
+had to be turned off due to an issue with SQL Server [#]_. While the original feature
 was most critical for the psycopg2 driver which otherwise had major performance
 issues when using ``cursor.executemany()``, the change is also critical for
 other PostgreSQL drivers such as asyncpg, as when using RETURNING,
@@ -894,16 +895,16 @@ With most databases now offering RETURNING (with the conspicuous exception of
 MySQL, given that MariaDB supports it), the new change generalizes the psycopg2
 "fast execution helper" approach to all dialects that support RETURNING, which
 now includes SQlite and MariaDB, and for which no other approach for
-"executemany plus RETURNING" is possible, which includes SQLite, MariaDB, all
-PG drivers, and SQL Server. The cx_Oracle and oracledb drivers used for Oracle
+"executemany plus RETURNING" is possible, which includes SQLite, MariaDB, and all
+PG drivers. The cx_Oracle and oracledb drivers used for Oracle
 support RETURNING with executemany natively, and this has also been implemented
-to provide equivalent performance improvements. With SQLite and MariaDB now
+to provide equivalent performance improvements.  With SQLite and MariaDB now
 offering RETURNING support, ORM use of ``cursor.lastrowid`` is nearly a thing
 of the past, with only MySQL still relying upon it.
 
 For INSERT statements that don't use RETURNING, traditional executemany()
-behavior is used for most backends, with the current exceptions of psycopg2
-and mssql+pyodbc, which both have very slow executemany() performance overall
+behavior is used for most backends, with the current exception of psycopg2,
+which has very slow executemany() performance overall
 and are still improved by the "insertmanyvalues" approach.
 
 Benchmarks
@@ -974,10 +975,18 @@ sqlite+pysqlite2 (memory)      6.204843                3.554856
 postgresql+asyncpg (network)   88.292285               4.561492
 postgresql+psycopg (network)   N/A (psycopg3)          4.861368
 oracle+cx_Oracle (network)     92.603953               4.809520
-mssql+pyodbc (network)         158.396667              4.825139
 mariadb+mysqldb (network)      71.705197               4.075377
 ============================   ====================    ====================
 
+.. mssql+pyodbc (network)      .. 158.396667           .. 4.825139
+
+
+.. note::
+
+   .. [#] The feature is disabled for SQL Server as of SQLAlchemy 2.0.9 due
+      to incompatibilities in how table-valued expressions are handled by
+      SQL Server.  See https://github.com/sqlalchemy/sqlalchemy/issues/9603
+
 Two additional drivers have no change in performance; the psycopg2 drivers,
 for which fast executemany was already implemented in SQLAlchemy 1.4,
 and MySQL, which continues to not offer RETURNING support:
diff --git a/doc/build/core/connections.rst b/doc/build/core/connections.rst
index 80156b4183..726b5a501d 100644
--- a/doc/build/core/connections.rst
+++ b/doc/build/core/connections.rst
@@ -1854,7 +1854,9 @@ as follows:
 
 * SQLite - supported for SQLite versions 3.35 and above
 * PostgreSQL - all supported Postgresql versions (9 and above)
-* SQL Server - all supported SQL Server versions
+* SQL Server - **disabled by default as of SQLAlchemy 2.0.9** - the SQL syntax
+  used has been shown to not be safe for RETURNING
+  (see https://github.com/sqlalchemy/sqlalchemy/issues/9603)
 * MariaDB - supported for MariaDB versions 10.5 and above
 * MySQL - no support, no RETURNING feature is present
 * Oracle - supports RETURNING with executemany using native cx_Oracle / OracleDB
@@ -1887,11 +1889,7 @@ The feature can also be disabled from being used implicitly for a particular
       )
 
 The reason one might want to disable RETURNING for a specific table is to
-work around backend-specific limitations.  For example, there is a known
-limitation of SQL Server that the ``OUTPUT inserted.<colname>`` feature
-may not work correctly for a table that has INSERT triggers established;
-such a table may need to include ``implicit_returning=False`` (see
-:ref:`mssql_triggers`).
+work around backend-specific limitations.
 
 .. _engine_insertmanyvalues_page_size:
 
@@ -1929,9 +1927,8 @@ varies by dialect and server version; the largest size is 32700 (chosen as a
 healthy distance away from PostgreSQL's limit of 32767 and SQLite's modern
 limit of 32766, while leaving room for additional parameters in the statement
 as well as for DBAPI quirkiness). Older versions of SQLite (prior to 3.32.0)
-will set this value to 999; SQL Server sets it to 2099.  MariaDB has no
-established limit however 32700 remains as a limiting factor for SQL message
-size.
+will set this value to 999. MariaDB has no established limit however 32700
+remains as a limiting factor for SQL message size.
 
 The value of the "batch size" can be affected :class:`_engine.Engine`
 wide via the :paramref:`_sa.create_engine.insertmanyvalues_page_size` parameter.
diff --git a/lib/sqlalchemy/dialects/mssql/base.py b/lib/sqlalchemy/dialects/mssql/base.py
index 31745d600e..808fdf16fe 100644
--- a/lib/sqlalchemy/dialects/mssql/base.py
+++ b/lib/sqlalchemy/dialects/mssql/base.py
@@ -251,18 +251,11 @@ The process for fetching this value has several variants:
 
     INSERT INTO t (x) OUTPUT inserted.id VALUES (?)
 
-  As of SQLAlchemy 2.0, the :ref:`engine_insertmanyvalues` feature is also
-  used by default to optimize many-row INSERT statements; for SQL Server
-  the feature takes place for both RETURNING and-non RETURNING
-  INSERT statements.
-
-* The value of :paramref:`_sa.create_engine.insertmanyvalues_page_size`
-  defaults to 1000, however the ultimate page size for a particular INSERT
-  statement may be limited further, based on an observed limit of
-  2100 bound parameters for a single statement in SQL Server.
-  The page size may also be modified on a per-engine
-  or per-statement basis; see the section
-  :ref:`engine_insertmanyvalues_page_size` for details.
+  .. note::  SQLAlchemy 2.0 introduced the :ref:`engine_insertmanyvalues`
+     feature for SQL Server, which is used by default to optimize many-row
+     INSERT statements; however as of SQLAlchemy 2.0.9 this feature had
+     to be turned off for SQL Server as the database does not support
+     deterministic RETURNING of INSERT rows for a multi-row INSERT statement.
 
 * When RETURNING is not available or has been disabled via
   ``implicit_returning=False``, either the ``scope_identity()`` function or
@@ -3017,7 +3010,8 @@ class MSDialect(default.DefaultDialect):
     # may be changed at server inspection time for older SQL server versions
     supports_multivalues_insert = True
 
-    use_insertmanyvalues = True
+    # disabled due to #9603
+    use_insertmanyvalues = False
 
     # note pyodbc will set this to False if fast_executemany is set,
     # as of SQLAlchemy 2.0.9
@@ -3085,6 +3079,14 @@ class MSDialect(default.DefaultDialect):
 
         super().__init__(**opts)
 
+        if self.use_insertmanyvalues:
+            raise exc.ArgumentError(
+                "The use_insertmanyvalues feature on SQL Server is currently "
+                "not safe to use, as returned result rows may be returned in "
+                "random order.  Ensure use_insertmanyvalues is left at its "
+                "default of False (this setting changed in SQLAlchemy 2.0.9)"
+            )
+
         self._json_serializer = json_serializer
         self._json_deserializer = json_deserializer
 
diff --git a/lib/sqlalchemy/dialects/mssql/pyodbc.py b/lib/sqlalchemy/dialects/mssql/pyodbc.py
index f9f2479ad3..08c6bc48f1 100644
--- a/lib/sqlalchemy/dialects/mssql/pyodbc.py
+++ b/lib/sqlalchemy/dialects/mssql/pyodbc.py
@@ -290,25 +290,19 @@ Pyodbc have been resolved as of SQLAlchemy 2.0.5. See the notes at
 Fast Executemany Mode
 ---------------------
 
-.. note:: SQLAlchemy 2.0 now includes an equivalent "fast executemany"
-   handler for INSERT statements that is more robust than the PyODBC feature
-   (but is not quite as performant particularly for very large datasets);
-   the feature is called :ref:`insertmanyvalues <engine_insertmanyvalues>`
-   and is enabled for all INSERT statements by default.
-   SQLAlchemy's feature integrates with the PyODBC ``setinputsizes()`` method
-   which allows for more accurate specification of datatypes, and additionally
-   uses a dynamically sized, batched approach that scales to any number of
-   columns and/or rows.
-
-   The SQL Server ``fast_executemany`` parameter may be used at the same time
-   as ``insertmanyvalues`` is enabled; however, the parameter will not be used
-   for INSERT statements that include RETURNING.
+  .. note::  SQLAlchemy 2.0 introduced the :ref:`engine_insertmanyvalues`
+     feature for SQL Server, which is used by default to optimize many-row
+     INSERT statements; however as of SQLAlchemy 2.0.9 this feature had
+     to be turned off for SQL Server as the database does not support
+     deterministic RETURNING of INSERT rows for a multi-row INSERT statement.
 
 .. versionchanged:: 2.0.9 - ``fast_executemany`` executions will be used
    for INSERT statements that don't include RETURNING, when
    ``fast_executemany`` is set. Previously, ``use_insertmanyvalues`` would
    cause ``fast_executemany`` to not be used in most cases.
 
+   ``use_insertmanyvalues`` is disabled for SQL Server overall as of 2.0.9.
+
 The PyODBC driver includes support for a "fast executemany" mode of execution
 which greatly reduces round trips for a DBAPI ``executemany()`` call when using
 Microsoft ODBC drivers, for **limited size batches that fit in memory**.  The
diff --git a/test/dialect/mssql/test_engine.py b/test/dialect/mssql/test_engine.py
index 363c19584b..095df2eaf1 100644
--- a/test/dialect/mssql/test_engine.py
+++ b/test/dialect/mssql/test_engine.py
@@ -3,6 +3,7 @@ import re
 from unittest.mock import Mock
 
 from sqlalchemy import Column
+from sqlalchemy import create_engine
 from sqlalchemy import event
 from sqlalchemy import exc
 from sqlalchemy import inspect
@@ -423,7 +424,7 @@ class FastExecutemanyTest(fixtures.TestBase):
     @testing.variation("add_event", [True, False])
     @testing.variation("setinputsizes", [True, False])
     @testing.variation("fastexecutemany", [True, False])
-    @testing.variation("insertmanyvalues", [True, False])
+    @testing.variation("insertmanyvalues", [False])  # disabled due to #9603
     @testing.variation("broken_types", [True, False])
     def test_insert_typing(
         self,
@@ -628,6 +629,14 @@ class MiscTest(fixtures.TestBase):
     __only_on__ = "mssql"
     __backend__ = True
 
+    def test_no_insertmanyvalues(self):
+        with expect_raises_message(
+            exc.ArgumentError,
+            "The use_insertmanyvalues feature on SQL Server is "
+            "currently not safe to use",
+        ):
+            create_engine("mssql+pyodbc://", use_insertmanyvalues=True)
+
     @testing.variation("enable_comments", [True, False])
     def test_comments_enabled_disabled(
         self, testing_engine, metadata, enable_comments
diff --git a/test/dialect/mssql/test_query.py b/test/dialect/mssql/test_query.py
index 35575bc13a..a11d847cfb 100644
--- a/test/dialect/mssql/test_query.py
+++ b/test/dialect/mssql/test_query.py
@@ -19,6 +19,7 @@ from sqlalchemy import Table
 from sqlalchemy import testing
 from sqlalchemy.dialects.mssql import base as mssql
 from sqlalchemy.dialects.mssql import pyodbc as mssql_pyodbc
+from sqlalchemy.orm import Session
 from sqlalchemy.testing import AssertsCompiledSQL
 from sqlalchemy.testing import config
 from sqlalchemy.testing import engines
@@ -159,6 +160,74 @@ class QueryTest(testing.AssertsExecutionResults, fixtures.TestBase):
     __only_on__ = "mssql"
     __backend__ = True
 
+    def test_broken_table_values_test(self, decl_base, connection):
+        """test #9603.
+
+        this uses the ORM to ensure the ORM is not using any kind of
+        insertmany that causes the problem.
+
+        """
+
+        class Datum(decl_base):
+
+            __tablename__ = "datum"
+
+            id = Column(Integer, autoincrement=False, primary_key=True)
+            data = Column(String(10))
+
+        class Result(decl_base):
+
+            __tablename__ = "result"
+
+            id = Column(Integer, primary_key=True)
+
+            thing = Column(Integer)
+            lft_datum_id = Column(Integer, ForeignKey(Datum.id))
+
+        decl_base.metadata.create_all(connection)
+        with Session(connection) as sess:
+
+            size = 15
+            datum_ids = list(range(1, size + 1))
+
+            sess.add_all([Datum(id=id_, data=f"d{id_}") for id_ in datum_ids])
+            sess.flush()
+
+            result_data = [
+                Result(thing=num, lft_datum_id=datum_ids[num % size])
+                for num in range(size * size)
+            ]
+            sess.add_all(result_data)
+            sess.flush()
+
+            # this is what we expected we put in
+            the_data_in_order_should_be = [
+                (num + 1, num, datum_ids[num % size])
+                for num in range(size * size)
+            ]
+
+            # and yes, that's what went in
+            eq_(
+                sess.execute(
+                    select(
+                        Result.id, Result.thing, Result.lft_datum_id
+                    ).order_by(Result.id)
+                ).all(),
+                the_data_in_order_should_be,
+            )
+
+            # however, if insertmanyvalues is turned on, OUTPUT inserted
+            # did not give us the rows in the order we sent, so ids were
+            # mis-applied.  even if we sort the original records by the
+            # ids that were given
+            eq_(
+                [
+                    (r.id, r.thing, r.lft_datum_id)
+                    for r in sorted(result_data, key=lambda r: r.id)
+                ],
+                the_data_in_order_should_be,
+            )
+
     def test_fetchid_trigger(self, metadata, connection):
         # TODO: investigate test hang on mssql when connection fixture is used
         """
diff --git a/test/dialect/mssql/test_types.py b/test/dialect/mssql/test_types.py
index cb7ed3102e..d94984e0e6 100644
--- a/test/dialect/mssql/test_types.py
+++ b/test/dialect/mssql/test_types.py
@@ -1335,7 +1335,8 @@ class StringRoundTripTest(fixtures.TestBase):
         id_="ia",
     )
     @testing.combinations(
-        ("insertmany", True),
+        # disabled due to #9603
+        # ("insertmany", True),
         ("insertsingle", False),
         argnames="insertmany",
         id_="ia",
diff --git a/test/orm/test_unitofworkv2.py b/test/orm/test_unitofworkv2.py
index ae47dfa4fd..1b909f2b98 100644
--- a/test/orm/test_unitofworkv2.py
+++ b/test/orm/test_unitofworkv2.py
@@ -3217,14 +3217,34 @@ class EagerDefaultsSettingTest(
             Conditional(
                 expect_returning,
                 [
-                    CompiledSQL(
-                        "INSERT INTO test (id, bar) VALUES (:id, :bar) "
-                        "RETURNING test.foo",
+                    Conditional(
+                        connection.dialect.insert_executemany_returning,
                         [
-                            {"id": 1, "bar": 6},
-                            {"id": 2, "bar": 6},
+                            CompiledSQL(
+                                "INSERT INTO test (id, bar) "
+                                "VALUES (:id, :bar) "
+                                "RETURNING test.foo",
+                                [
+                                    {"id": 1, "bar": 6},
+                                    {"id": 2, "bar": 6},
+                                ],
+                            )
                         ],
-                    )
+                        [
+                            CompiledSQL(
+                                "INSERT INTO test (id, bar) "
+                                "VALUES (:id, :bar) "
+                                "RETURNING test.foo",
+                                {"id": 1, "bar": 6},
+                            ),
+                            CompiledSQL(
+                                "INSERT INTO test (id, bar) "
+                                "VALUES (:id, :bar) "
+                                "RETURNING test.foo",
+                                {"id": 2, "bar": 6},
+                            ),
+                        ],
+                    ),
                 ],
                 [
                     CompiledSQL(