Behavioral Changes - General
============================
+.. _change_5159:
+
+ORM Query is internally unified with select, update, delete; 2.0 style execution available
+------------------------------------------------------------------------------------------
+
+The biggest conceptual change to SQLAlchemy for version 2.0 and essentially
+in 1.4 as well is that the great separation between the :class:`_sql.Select`
+construct in Core and the :class:`_orm.Query` object in the ORM has been removed,
+as well as between the :meth:`_orm.Query.update` and :meth:`_orm.Query.delete`
+methods in how they relate to :class:`_dml.Update` and :class:`_dml.Delete`.
+
+With regards to :class:`_sql.Select` and :class:`_orm.Query`, these two objects
+have for many versions had similar, largely overlapping APIs and even some
+ability to change between one and the other, while remaining very different in
+their usage patterns and behaviors. The historical background for this was
+that the :class:`_orm.Query` object was introduced to overcome shortcomings in
+the :class:`_sql.Select` object which used to be at the core of how ORM objects
+were queried, except that they had to be queried in terms of
+:class:`_schema.Table` metadata only. However :class:`_orm.Query` had only a
+simplistic interface for loading objects, and only over the course of many
+major releases did it eventually gain most of the flexibility of the
+:class:`_sql.Select` object, which then led to the ongoing awkwardness that
+these two objects became highly similar yet still largely incompatible with
+each other.
+
+In version 1.4, all Core and ORM SELECT statements are rendered from a
+:class:`_sql.Select` object directly; when the :class:`_orm.Query` object
+is used, at statement invocation time it copies its state to a :class:`_sql.Select`
+which is then invoked internally using :term:`2.0 style` execution. Going forward,
+the :class:`_orm.Query` object will become legacy only, and applications will
+be encouraged to move to :term:`2.0 style` execution which allows Core constructs
+to be used freely against ORM entities::
+
+ with Session(engine, future=True) as sess:
+
+ stmt = select(User).where(
+ User.name == 'sandy'
+ ).join(User.addresses).where(Address.email_address.like("%gmail%"))
+
+ result = sess.execute(stmt)
+
+ for user in result.scalars():
+ print(user)
+
+Things to note about the above example:
+
+* The :class:`_orm.Session` and :class:`_orm.sessionmaker` objects now feature
+ full context manager (i.e. the ``with:`` statement) capability;
+ see the revised documentation at :ref:`session_getting` for an example.
+
+* Within the 1.4 series, all :term:`2.0 style` ORM invocation uses a
+ :class:`_orm.Session` that includes the :paramref:`_orm.Session.future`
+ flag set to ``True``; this flag indicates the :class:`_orm.Session` should
+ have 2.0-style behaviors, which include that ORM queries can be invoked
+ from :class:`_orm.Session.execute` as well as some changes in transactional
+ features. In version 2.0 this flag will always be ``True``.
+
+* The :func:`_sql.select` construct no longer needs brackets around the
+ columns clause; see :ref:`change_5284` for background on this improvement.
+
+* The :func:`_sql.select` / :class:`_sql.Select` object has a :meth:`_sql.Select.join`
+ method that acts like that of the :class:`_orm.Query` and even accommodates
+ an ORM relationship attribute (without breaking the separation between
+ Core and ORM!) - see :ref:`change_select_join` for background on this.
+
+* Statements that work with ORM entities and are expected to return ORM
+ results are invoked using :meth:`.orm.Session.execute`. See
+ :ref:`session_querying_20` for a primer.
+
+* a :class:`_engine.Result` object is returned, rather than a plain list, which
+ itself is a much more sophisticated version of the previous ``ResultProxy``
+ object; this object is now used both for Core and ORM results. See
+ :ref:`change_result_14_core`,
+ :ref:`change_4710_core`, and :ref:`change_4710_orm` for information on this.
+
+Throughout SQLAlchemy's documentation, there will be many references to
+:term:`1.x style` and :term:`2.0 style` execution. This is to distinguish
+between the two querying styles and to attempt to forwards-document the new
+calling style going forward. In SQLAlchemy 2.0, while the :class:`_orm.Query`
+object may remain as a legacy construct, it will no longer be featured in
+most documentation.
+
+Similar adjustments have been made to "bulk updates and deletes" such that
+Core :func:`_sql.update` and :func:`_sql.delete` can be used for bulk
+operations. A bulk update like the following::
+
+ session.query(User).filter(User.name == 'sandy').update({"password": "foobar"}, synchronize_session="fetch")
+
+can now be achieved in :term:`2.0 style` (and indeed the above runs internally
+in this way) as follows::
+
+ with Session(engine, future=True) as sess:
+ stmt = update(User).where(
+ User.name == 'sandy'
+ ).values(password="foobar").execution_options(
+ synchronize_session="fetch"
+ )
+
+ sess.execute(stmt)
+
+Note the use of the :meth:`_sql.Executable.execution_options` method to pass
+ORM-related options. The use of "execution options" is now much more prevalent
+within both Core and ORM, and many ORM-related methods from :class:`_orm.Query`
+are now implemented as execution options (see :meth:`_orm.Query.execution_options`
+for some examples).
+
+.. seealso::
+
+ :ref:`migration_20_toplevel`
+
+:ticket:`5159`
+
.. _change_4639:
Transparent SQL Compilation Caching added to All DQL, DML Statements in Core, ORM
majority of Python computation involved producing SQL strings and related
statement metadata from a user-constructed statement to be cached in memory,
such that subsequent invocations of an identical statement construct will use
-35-60% fewer resources.
+35-60% fewer CPU resources.
This caching goes beyond the construction of the SQL string to also include the
construction of result fetching structures that link the SQL construct to the
This first test indicates that regular ORM queries when using caching can run
over many iterations in the range of **30% faster**.
-"Baked Query" style construction now available for all Core and ORM Queries
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-The "Baked Query" extension has been in SQLAlchemy for several years and
-provides a caching system that is based on defining segments of SQL statements
-within Python functions, so that the functions both serve as cache keys
-(since they uniquely and persistently identify a specific line in the
-source code) as well as that they allow the construction of a statement
-to be deferred so that it only need to be invoked once, rather than every
-time the query is rendered. The functionality of "Baked Query" is now a native
-part of the new caching system, which is available by simply using Python
-functions, typically lambda expressions, either inside of a statement,
-or on the outside using the ``lambda_stmt()`` function that works just
-like a Baked Query.
+A second variant of the feature is the optional use of Python lambdas to defer
+the construction of the query itself. This is a more sophisticated variant of
+the approach used by the "Baked Query" extension, which was introduced in
+version 1.0.0. The "lambda" feature may be used in a style very similar to
+that of baked queries, except that it is available in an ad-hoc way for any SQL
+construct. It additionally includes the ability to scan each invocation of the
+lambda for bound literal values that change on every invocation, as well as
+changes to other constructs, such as querying from a different entity or column
+each time, while still not having to run the actual code each time.
-Making use of the newer 2.0 style of using ``select()`` and adding the use
-of **optional** lambdas to defer the computation::
+Using this API looks as follows::
session = Session(bind=engine)
for id_ in random.sample(ids, n):
This test indicates that using the newer "select()" style of ORM querying,
in conjunction with a full "baked" style invocation that caches the entire
-construction, can run over many iterations in the range of **60% faster**.
-This performance is roughly the same as what the Baked Query extension
-provides as well. The new approach effectively supersedes the Baked Query
-extension.
+construction, can run over many iterations in the range of **60% faster** and
+grants performace about the same as the baked query system which is now superseded
+by the native caching system.
-For comparison, a Baked Query looks like the following::
-
- bakery = baked.bakery()
- s = Session(bind=engine)
- for id_ in random.sample(ids, n):
- q = bakery(lambda s: s.query(Customer))
- q += lambda q: q.filter(Customer.id == bindparam("id"))
- q(s).params(id=id_).one()
-
-The new API allows the same very fast "baked query" approach of building up a
-statement with lambdas, but does not require any other syntactical changes from
-regular statements. It also no longer requires that "bindparam()" is used for
-literal values that may change; the "closure" of the Python function is scanned
-on every call to extract Python literal values that should be turned into
-parameters.
+The new system makes use of the existing
+:paramref:`_engine.Connection.execution_options.compiled_cache` execution
+option and also adds a cache to the :class:`_engine.Engine` directly, which is
+configured using the :paramref:`_engine.Engine.query_cache_size` parameter.
-Methodology Overview
-^^^^^^^^^^^^^^^^^^^^
-
-SQLAlchemy has also for many years included a "compiled_cache" option that is
-used internally by the ORM flush process as well as the Baked Query extension,
-which caches a SQL expression object based on the identity of the object
-itself. That is, if you create a particular select() object and make use of
-the compiled cache feature, if you pass the same select() object each time, the
-SQL compilation would be cached. This feature was of limited use since
-SQLAlchemy's programming paradigm is based on the continuous construction of
-new SQL expression objects each time one is required.
-
-The new caching feature uses the same "compiled_cache", however instead of
-using the statement object itself as the cache key, a separate tuple-oriented
-cache key is generated which represents the complete structure of the
-statement. Two SQL constructs that are composed in exactly the same way will
-produce the same cache key, independent of the bound parameter values that are
-bundled with the statement; these are collected separately from each statement
-and are used when the cached SQL is executed. The ORM ``Query`` integrates by
-producing a ``select()`` object from itself that is interpreted as an
-ORM-enabled SELECT within the SQL compilation process that occurs beyond the
-cache boundary.
-
-A general listing of architectural changes needed to support this feature:
-
-* The system by which arguments passed to SQL constructs are type-checked and
- coerced into their desired form was rewritten from an ad-hoc and disorganized
- system into the ``sqlalchemy.sql.roles`` and
- ``sqlalchemy.sql.coercions`` modules which provide a type-based approach
- to the task of composing SQL expression objects, error handling, coercion
- of objects such as turning SELECT statements into subqueries, as well as
- integrating with a new "plugin" system that allows SQL constructs to include
- ORM functionality.
-
-* The system by which clause expressions constructs are iterated and compared
- from an object structure point of view was also
- rewritten from one which was ad-hoc and inconsistent into a complete system
- within the new ``sqlalchemy.sql.traversals`` module. A test suite was added
- which ensures that all SQL construction objects include fully consistent
- comparison and iteration behavior. This work began with :ticket:`4336`.
-
-* The new iteration system naturally gave rise to the cache-key creation
- system, which also uses a performance-optimized version of the
- ``sqlalchemy.sql.traversals`` module to generate a deterministic cache key
- for any SQL expression based on its structure. Two instances of a SQL
- expression that represent the same SQL structure, such as ``select(table('x',
- column('q'))).where(column('z') > 5)``, are guaranteed to produce the same
- cache key, independent of the bound parameters which for this statement would
- be one parameter with the value "5". Two instances of a SQL expression
- where any elements are different will produce different cache keys. When
- the cache key is generated, the parameters are also collected which will be
- used to formulate the final parameter list. This work was completed over
- many merges and was overall related to :ticket:`4639`.
-
-* The mechanism by which statements such as ``select()`` generate expensive
- collections and datamembers that are only used for SQL compilation, such
- as the list of columns and their labels, were organized into a new
- decoupled system called ``CompileState``.
-
-* All elements of queries that needed to be made compatible with the concept of
- deterministic SQL compilation were updated, including an expansion of the
- "postcompile" concept used to render individual parameters inside of "IN"
- expressions first included in 1.3 as well as alterations to how dialects like
- the SQL Server dialect render LIMIT / OFFSET expressions that are not
- compatible with bound parameters.
-
-* The ORM ``Query`` object was fully refactored such that all of the intense
- computation which would previously occur whenever methods of ``Query`` were
- called, such as the construction of the ``Query`` itself, when methods
- ``filter()`` or ``join()`` would be called, etc., was completely reorganized
- to take place within the ``CompileState`` architecture, meaning the ORM
- process that generates a Core ``select()`` to render now takes place
- **within** the SQL compilation process, beyond the caching boundary. More
- detail on this change is at
- :ref:`change_deferred_construction`.
-
-* The ``Query`` object was unified with the ``select()`` object, such that
- these two objects now have cross-compatible internal state. The ``Query``
- can turn itself into a ``select()`` that generates ORM queries by copying its
- ``__dict__`` into a new ``Select`` object.
-
-* The 2.0-style :class:`.Result` object as well as the "future" version of
- :class:`_engine.Engine` were developed and integrated into Core and later
- the ORM also integrated on top of :class:`.Result`.
-
-* The Core and ORM execution models were completely reworked to integrate the
- new cache key system, and in particular the ORM ``Query`` was reworked such
- that its execution model now produces a ``Select`` which is passed to
- ``Session.execute()``, which then invokes the 2.0-style execution model that
- allows the ``Select`` to be processed as an ORM query beyond the caching
- boundary.
-
-* Other systems such as ``Query`` bulk updates and deletes, the horizontal
- sharding extension, the Baked Query extension, and the dogpile caching
- example were updated to integrate with the new execution model and a new
- event hook :meth:`.SessionEvents.do_orm_execute` has been added.
-
-* The caching has been enabled via the :paramref:`.create_engine.query_cache_size`
- parameter, new logging features were added, and the "lambda" argument
- construction module was added.
+A significant portion of API and behavioral changes throughout 1.4 were
+driven in order to support this new feature.
.. seealso::
.. _change_deferred_construction:
-Many Core and ORM statement objects now perform much of their validation in the compile phase
----------------------------------------------------------------------------------------------
+Many Core and ORM statement objects now perform much of their construction and validation in the compile phase
+--------------------------------------------------------------------------------------------------------------
A major initiative in the 1.4 series is to approach the model of both Core SQL
statements as well as the ORM Query to allow for an efficient, cacheable model
cached, based on a cache key generated by the created statement object, which
itself is newly created for each use. Towards this goal, much of the Python
computation which occurs within the construction of statements, particularly
-the ORM :class:`_query.Query`, is being moved to occur later, when the
-statement is actually compiled, and additionally that it will only occur if the
-compiled form of the statement is not already cached. This means that some of
-the error messages which can arise based on arguments passed to the object will
-no longer be raised immediately, and instead will occur only when the statement
-is invoked and its compiled form is not yet cached.
+that of the ORM :class:`_query.Query` as well as the :func:`_sql.select`
+construct when used to invoke ORM queries, is being moved to occur within
+the compilation phase of the statement which only occurs after the statement
+has been invoked, and only if the statement's compiled form was not yet
+cached.
+
+From an end-user perspective, this means that some of the error messages which
+can arise based on arguments passed to the object will no longer be raised
+immediately, and instead will occur only when the statement is invoked for
+the first time. These conditions are always structural and not data driven,
+so there is no risk of such a condition being missed due to a cached statement.
Error conditions which fall under this category include:
:ticket:`4689`
-API Changes - Core
-==================
+API and Behavioral Changes - Core
+==================================
.. _change_4617:
got <...Select object ...>. To create a FROM clause from a <class
'sqlalchemy.sql.selectable.Select'> object, use the .subquery() method.
-The correct calling form is instead::
+The correct calling form is instead (noting also that :ref:`brackets are no
+longer required for select() <change_5284>`)::
- sq1 = select([user.c.id, user.c.name]).subquery()
- stmt2 = select([addresses, sq1]).select_from(addresses.join(sq1))
+ sq1 = select(user.c.id, user.c.name).subquery()
+ stmt2 = select(addresses, sq1).select_from(addresses.join(sq1))
Noting above that the :meth:`_expression.SelectBase.subquery` method is essentially
equivalent to using the :meth:`_expression.SelectBase.alias` method.
-The above calling form is typically required in any case as the call to
-:meth:`_expression.SelectBase.subquery` or :meth:`_expression.SelectBase.alias` is needed to
-ensure the subquery has a name. The MySQL and PostgreSQL databases do not
-accept unnamed subqueries in the FROM clause and they are of limited use
-on other platforms; this is described further below.
-
-Along with the above change, the general capability of :func:`_expression.select` and
-related constructs to create unnamed subqueries, which means a FROM subquery
-that renders without any name i.e. "AS somename", has been removed, and the
-ability of the :func:`_expression.select` construct to implicitly create subqueries
-without explicit calling code to do so is mostly deprecated. In the above
-example, as has always been the case, using the :meth:`_expression.SelectBase.alias`
-method as well as the new :meth:`_expression.SelectBase.subquery` method without passing a
-name will generate a so-called "anonymous" name, which is the familiar
-``anon_1`` name we see in SQLAlchemy queries::
- SELECT
- addresses.id, addresses.email, addresses.user_id,
- anon_1.id, anon_1.name
- FROM
- addresses JOIN
- (SELECT users.id AS id, users.name AS name FROM users) AS anon_1
- ON addresses.user_id = anon_1.id
-
-Unnamed subqueries in the FROM clause (which note are different from
-so-called "scalar subqueries" which take the place of a column expression
-in the columns clause or WHERE clause) are of extremely limited use in SQL,
-and their production in SQLAlchemy has mostly presented itself as an
-undesirable behavior that needs to be worked around. For example,
-both the MySQL and PostgreSQL outright reject the usage of unnamed subqueries::
-
- # MySQL / MariaDB:
-
- MariaDB [(none)]> select * from (select 1);
- ERROR 1248 (42000): Every derived table must have its own alias
-
-
- # PostgreSQL:
-
- test=> select * from (select 1);
- ERROR: subquery in FROM must have an alias
- LINE 1: select * from (select 1);
- ^
- HINT: For example, FROM (SELECT ...) [AS] foo.
-
-A database like SQLite accepts them, however it is still often the case that
-the names produced from such a subquery are too ambiguous to be useful::
-
- sqlite> CREATE TABLE a(id integer);
- sqlite> CREATE TABLE b(id integer);
- sqlite> SELECT * FROM a JOIN (SELECT * FROM b) ON a.id=id;
- Error: ambiguous column name: id
- sqlite> SELECT * FROM a JOIN (SELECT * FROM b) ON a.id=b.id;
- Error: no such column: b.id
-
- # use a name
- sqlite> SELECT * FROM a JOIN (SELECT * FROM b) AS anon_1 ON a.id=anon_1.id;
-
-Due to the above limitations, there are very few places in SQLAlchemy where
-such a query form was valid; the one exception was within the Oracle dialect
-where they were used to create OFFSET / LIMIT subqueries as Oracle does not
-support these keywords directly; this implementation has been replaced by
-one which uses anonymous subqueries. Throughout the ORM, exception cases
-that detect where a SELECT statement would be SELECTed from either encourage
-the user to, or implicitly create, an anonymously named subquery; it is hoped
-by moving to an all-explicit subquery much of the complexity incurred by
-these areas can be removed.
-
-As :class:`_expression.SelectBase` objects are no longer :class:`_expression.FromClause` objects,
-attributes like the ``.c`` attribute as well as methods like ``.select()``,
-``.join()``, and ``.outerjoin()`` upon :class:`_expression.SelectBase` are now
-deprecated, as these methods all imply implicit production of a subquery.
-Instead, as is already what the vast majority of applications have to do
-in any case, invoking :meth:`_expression.SelectBase.alias` or :meth:`_expression.SelectBase.subquery`
-will provide for a :class:`.Subquery` object that provides all these attributes,
-as it is part of the :class:`_expression.FromClause` hierarchy. In the interim, these
-methods are still available, however they now produce an anonymously named
-subquery rather than an unnamed one, and this subquery is distinct from the
-:class:`_expression.SelectBase` construct itself.
+The rationale for this change is based on the following:
+
+* In order to support the unification of :class:`_sql.Select` with
+ :class:`_orm.Query`, the :class:`_sql.Select` object needs to have
+ :meth:`_sql.Select.join` and :meth:`_sql.Select.outerjoin` methods that
+ actually add JOIN criteria to the existing FROM clause, as is what users have
+ always expected it to do in any case. The previous behavior, having to
+ align with what a :class:`.FromClause` would do, was that it would generate
+ an unnamed subquery and then JOIN to it, which was a completely useless
+ feature that only confused those users unfortunate enough to try this. This
+ change is discussed at :ref:`change_select_join`.
+
+* The behavior of including a SELECT in the FROM clause of another SELECT
+ without first creating an alias or subquery would be that it creates an
+ unnamed subquery. While standard SQL does support this syntax, in practice
+ it is rejected by most databases. For example, both the MySQL and PostgreSQL
+ outright reject the usage of unnamed subqueries::
+
+ # MySQL / MariaDB:
+
+ MariaDB [(none)]> select * from (select 1);
+ ERROR 1248 (42000): Every derived table must have its own alias
+
+
+ # PostgreSQL:
+
+ test=> select * from (select 1);
+ ERROR: subquery in FROM must have an alias
+ LINE 1: select * from (select 1);
+ ^
+ HINT: For example, FROM (SELECT ...) [AS] foo.
+
+ A database like SQLite accepts them, however it is still often the case that
+ the names produced from such a subquery are too ambiguous to be useful::
+
+ sqlite> CREATE TABLE a(id integer);
+ sqlite> CREATE TABLE b(id integer);
+ sqlite> SELECT * FROM a JOIN (SELECT * FROM b) ON a.id=id;
+ Error: ambiguous column name: id
+ sqlite> SELECT * FROM a JOIN (SELECT * FROM b) ON a.id=b.id;
+ Error: no such column: b.id
+
+ # use a name
+ sqlite> SELECT * FROM a JOIN (SELECT * FROM b) AS anon_1 ON a.id=anon_1.id;
+
+ ..
+
+As :class:`_expression.SelectBase` objects are no longer
+:class:`_expression.FromClause` objects, attributes like the ``.c`` attribute
+as well as methods like ``.select()`` is now deprecated, as they imply implicit
+production of a subquery. The ``.join()`` and ``.outerjoin()`` methods are now
+:ref:`repurposed to append JOIN criteria to the existing query <change_select_join>` in a similar
+way as that of :meth:`_orm.Query.join`, which is what users have always
+expected these methods to do in any case.
In place of the ``.c`` attribute, a new attribute :attr:`_expression.SelectBase.selected_columns`
is added. This attribute resolves to a column collection that is what most
stmt = select([users])
stmt = stmt.where(stmt.selected_columns.name == 'foo')
-There is of course the notion that perhaps ``.c`` on :class:`_expression.SelectBase` could
-simply act the way :attr:`_expression.SelectBase.selected_columns` does above, however in
-light of the fact that ``.c`` is strongly associated with the :class:`_expression.FromClause`
-hierarchy, meaning that it is a set of columns that can be directly in the
-FROM clause of another SELECT, it's better that a column collection that
-serves an entirely different purpose have a new name.
-
-In the bigger picture, the reason this change is being made now is towards the
-goal of unifying the ORM :class:`_query.Query` object into the :class:`_expression.SelectBase`
-hierarchy in SQLAlchemy 2.0, so that the ORM will have a "``select()``"
-construct that extends directly from the existing :func:`_expression.select` object,
-having the same methods and behaviors except that it will have additional ORM
-functionality. All statement objects in Core will also be fully cacheable
-using a new system that resembles "baked queries" except that it will work
-transparently for all statements across Core and ORM. In order to achieve
-this, the Core class hierarchy needs to be refined to behave in such a way that
-is more easily compatible with the ORM, and the ORM class hierarchy needs to be
-refined so that it is more compatible with Core.
-
:ticket:`4617`
+.. _change_select_join:
+
+select().join() and outerjoin() add JOIN criteria to the current query, rather than creating a subquery
+-------------------------------------------------------------------------------------------------------
+
+Towards the goal of unifying :class:`_orm.Query` and :class:`_sql.Select`,
+particularly for :term:`2.0 style` use of :class:`_sql.Select`, it was critical
+that there be a working :meth:`_sql.Select.join` method that behaves like the
+:meth:`_orm.Query.join` method, adding additional entries to the FROM clause of
+the existing SELECT and then returning the new :class:`_sql.Select` object for
+further modification, instead of wrapping the object inside of an unnamed
+subquery and returning a JOIN from that subquery, a behavior that has always
+been virtually useless and completely misleading to users.
+
+To allow this to be the case, :ref:`change_4617` was first implemented which
+splits off :class:`_sql.Select` from having to be a :class:`_sql.FromClause`;
+this removed the requirement that :meth:`_sql.Select.join` would need to
+return a :class:`_sql.Join` object rather than a new version of that
+:class:`_sql.Select` object that includes a new JOIN in its FROM clause.
+
+From that point on, as the :meth:`_sql.Select.join` and :meth:`_sql.Select.outerjoin`
+did have an existing behavior, the original plan was that these
+methods would be deprecated, and the new "useful" version of
+the methods would be available on an alternate, "future" :class:`_sql.Select`
+object available as a separate import.
+
+However, after some time working with this particular codebase, it was decided
+that having two different kinds of :class:`_sql.Select` objects floating
+around, each with 95% the same behavior except for some subtle difference
+in how some of the methods behave was going to be more misleading and inconvenient
+than simply making a hard change in how these two methods behave, given
+that the existing behavior of :meth:`_sql.Select.join` and :meth:`_sql.Select.outerjoin`
+is essentially never used and only causes confusion.
+
+So it was decided, given how very useless the current behavior is, and how
+extremely useful and important and useful the new behavior would be, to make a
+**hard behavioral change** in this one area, rather than waiting another year
+and having a more awkward API in the interim. SQLAlchemy developers do not
+take it lightly to make a completely breaking change like this, however this is
+a very special case and it is extremely unlikely that the previous
+implementation of these methods was being used; as noted in
+:ref:`change_4617`, major databases such as MySQL and PostgreSQL don't allow
+for unnamed subqueries in any case and from a syntactical point of view it's
+nearly impossible for a JOIN from an unnamed subquery to be useful since it's
+very difficult to refer to the columns within it unambiguously.
+
+With the new implementation, :meth:`_sql.Select.join` and
+:meth:`_sql.Select.outerjoin` now behave very similarly to that of
+:meth:`_orm.Query.join`, adding JOIN criteria to the existing statement by
+matching to the left entity::
+
+ stmt = select(user_table).join(addresses_table, user_table.c.id == addresses_table.c.user_id)
+
+producing::
+
+ SELECT user.id, user.name FROM user JOIN address ON user.id=address.user_id
+
+As is the case for :class:`_sql.Join`, the ON clause is automatically determined
+if feasible::
+
+ stmt = select(user_table).join(addresses_table)
+
+When ORM entities are used in the statement, this is essentially how ORM
+queries are built up using :term:`2.0 style` invocation. ORM entities will
+assign a "plugin" to the statement internally such that ORM-related compilation
+rules will take place when the statement is compiled into a SQL string. More
+directly, the :meth:`_sql.Select.join` method can accommodate ORM
+relationships, without breaking the hard separation between Core and ORM
+internals::
+
+ stmt = select(User).join(User.addresses)
+
+Another new method :meth:`_sql.Select.join_from` is also added, which
+allows easier specification of the left and right side of a join at once::
+
+ stmt = select(Address.email_address, User.name).join_from(User, Address)
+
+producing::
+
+ SELECT address.email_address, user.name FROM user JOIN address ON user.id == address.user_id
+
+
.. _change_5284:
-select() now accepts positional expressions
--------------------------------------------
+select(), case() now accept positional expressions
+---------------------------------------------------
-The :func:`.select` construct will now accept "columns clause"
-arguments positionally::
+As it may be seen elsewhere in this document, the :func:`_sql.select` construct will
+now accept "columns clause" arguments positionally, rather than requiring they
+be passed as a list::
# new way, supports 2.0
stmt = select(table.c.col1, table.c.col2, ...)
stmt = select([table.c.col1, table.c.col2, ...])
The above legacy calling style also accepts the old keyword arguments that have
-since been removed from most narrative documentation::
+since been removed from most narrative documentation. The existence of these
+keyword arguments is why the columns clause was passed as a list in the first place::
# very much the old way, but still works in 1.4
stmt = select([table.c.col1, table.c.col2, ...], whereclause=table.c.col1 == 5)
"future" API which includes an updated :meth:`.Select.join` method as well
as methods like :meth:`.Select.filter_by` and :meth:`.Select.join_from`.
+In a related change, the :func:`_sql.case` construct has also been modified
+to accept its list of WHEN clauses positionally, with a similar deprecation
+track for the old calling style::
+
+ stmt = select(users_table).where(
+ case(
+ (users_table.c.name == 'wendy', 'W'),
+ (users_table.c.name == 'jack', 'J'),
+ else_='E'
+ )
+ )
+
+The convention for SQLAlchemy constructs accepting ``*args`` vs. a list of
+values, as is the latter case for a construct like
+:meth:`_sql.ColumnOperators.in_`, is that **positional arguments are used for
+structural specification, lists are used for data specification**.
+
+
.. seealso::
:ref:`error_c9ae`
- :ref:`migration_20_toplevel`
-
:ticket:`5284`
:ticket:`4645`
+.. _change_4737:
+
+
+Built-in FROM linting will warn for any potential cartesian products in a SELECT statement
+------------------------------------------------------------------------------------------
+
+As the Core expression language as well as the ORM are built on an "implicit
+FROMs" model where a particular FROM clause is automatically added if any part
+of the query refers to it, a common issue is the case where a SELECT statement,
+either a top level statement or an embedded subquery, contains FROM elements
+that are not joined to the rest of the FROM elements in the query, causing
+what's referred to as a "cartesian product" in the result set, i.e. every
+possible combination of rows from each FROM element not otherwise joined. In
+relational databases, this is nearly always an undesirable outcome as it
+produces an enormous result set full of duplicated, uncorrelated data.
+
+SQLAlchemy, for all of its great features, is particularly prone to this sort
+of issue happening as a SELECT statement will have elements added to its FROM
+clause automatically from any table seen in the other clauses. A typical
+scenario looks like the following, where two tables are JOINed together,
+however an additional entry in the WHERE clause that perhaps inadvertently does
+not line up with these two tables will create an additional FROM entry::
+
+ address_alias = aliased(Address)
+
+ q = session.query(User).\
+ join(address_alias, User.addresses).\
+ filter(Address.email_address == 'foo')
+
+The above query selects from a JOIN of ``User`` and ``address_alias``, the
+latter of which is an alias of the ``Address`` entity. However, the
+``Address`` entity is used within the WHERE clause directly, so the above would
+result in the SQL::
+
+ SELECT
+ users.id AS users_id, users.name AS users_name,
+ users.fullname AS users_fullname,
+ users.nickname AS users_nickname
+ FROM addresses, users JOIN addresses AS addresses_1 ON users.id = addresses_1.user_id
+ WHERE addresses.email_address = :email_address_1
+
+In the above SQL, we can see what SQLAlchemy developers term "the dreaded
+comma", as we see "FROM addresses, users JOIN addresses" in the FROM clause
+which is the classic sign of a cartesian product; where a query is making use
+of JOIN in order to join FROM clauses together, however because one of them is
+not joined, it uses a comma. The above query will return a full set of
+rows that join the "user" and "addresses" table together on the "id / user_id"
+column, and will then apply all those rows into a cartesian product against
+every row in the "addresses" table directly. That is, if there are ten user
+rows and 100 rows in addresses, the above query will return its expected result
+rows, likely to be 100 as all address rows would be selected, multiplied by 100
+again, so that the total result size would be 10000 rows.
+
+The "table1, table2 JOIN table3" pattern is one that also occurs quite
+frequently within the SQLAlchemy ORM due to either subtle mis-application of
+ORM features particularly those related to joined eager loading or joined table
+inheritance, as well as a result of SQLAlchemy ORM bugs within those same
+systems. Similar issues apply to SELECT statements that use "implicit joins",
+where the JOIN keyword is not used and instead each FROM element is linked with
+another one via the WHERE clause.
+
+For some years there has been a recipe on the Wiki that applies a graph
+algorithm to a :func:`_expression.select` construct at query execution time and inspects
+the structure of the query for these un-linked FROM clauses, parsing through
+the WHERE clause and all JOIN clauses to determine how FROM elements are linked
+together and ensuring that all the FROM elements are connected in a single
+graph. This recipe has now been adapted to be part of the :class:`.SQLCompiler`
+itself where it now optionally emits a warning for a statement if this
+condition is detected. The warning is enabled using the
+:paramref:`_sa.create_engine.enable_from_linting` flag and is enabled by default.
+The computational overhead of the linter is very low, and additionally it only
+occurs during statement compilation which means for a cached SQL statement it
+only occurs once.
+
+Using this feature, our ORM query above will emit a warning::
+
+ >>> q.all()
+ SAWarning: SELECT statement has a cartesian product between FROM
+ element(s) "addresses_1", "users" and FROM element "addresses".
+ Apply join condition(s) between each element to resolve.
+
+The linter feature accommodates not just for tables linked together through the
+JOIN clauses but also through the WHERE clause Above, we can add a WHERE
+clause to link the new ``Address`` entity with the previous ``address_alias``
+entity and that will remove the warning::
+
+ q = session.query(User).\
+ join(address_alias, User.addresses).\
+ filter(Address.email_address == 'foo').\
+ filter(Address.id == address_alias.id) # resolve cartesian products,
+ # will no longer warn
+
+The cartesian product warning considers **any** kind of link between two
+FROM clauses to be a resolution, even if the end result set is still
+wasteful, as the linter is intended only to detect the common case of a
+FROM clause that is completely unexpected. If the FROM clause is referred
+to explicitly elsewhere and linked to the other FROMs, no warning is emitted::
+
+ q = session.query(User).\
+ join(address_alias, User.addresses).\
+ filter(Address.email_address == 'foo').\
+ filter(Address.id > address_alias.id) # will generate a lot of rows,
+ # but no warning
+
+Full cartesian products are also allowed if they are explicitly stated; if we
+wanted for example the cartesian product of ``User`` and ``Address``, we can
+JOIN on :func:`.true` so that every row will match with every other; the
+following query will return all rows and produce no warnings::
+
+ from sqlalchemy import true
+
+ # intentional cartesian product
+ q = session.query(User).join(Address, true()) # intentional cartesian product
+
+The warning is only generated by default when the statement is compiled by the
+:class:`_engine.Connection` for execution; calling the :meth:`_expression.ClauseElement.compile`
+method will not emit a warning unless the linting flag is supplied::
+
+ >>> from sqlalchemy.sql import FROM_LINTING
+ >>> print(q.statement.compile(linting=FROM_LINTING))
+ SAWarning: SELECT statement has a cartesian product between FROM element(s) "addresses" and FROM element "users". Apply join condition(s) between each element to resolve.
+ SELECT users.id, users.name, users.fullname, users.nickname
+ FROM addresses, users JOIN addresses AS addresses_1 ON users.id = addresses_1.user_id
+ WHERE addresses.email_address = :email_address_1
+
+:ticket:`4737`
+
+
.. _change_result_14_core:
New Result object
:ticket:`4710`
-New Features - ORM
-==================
+.. _change_4753:
-.. _change_4826:
+SELECT objects and derived FROM clauses allow for duplicate columns and column labels
+-------------------------------------------------------------------------------------
-Raiseload for Columns
----------------------
+This change allows that the :func:`_expression.select` construct now allows for duplicate
+column labels as well as duplicate column objects themselves, so that result
+tuples are organized and ordered in the identical way in that the columns were
+selected. The ORM :class:`_query.Query` already works this way, so this change
+allows for greater cross-compatibility between the two, which is a key goal of
+the 2.0 transition::
-The "raiseload" feature, which raises :class:`.InvalidRequestError` when an
-unloaded attribute is accessed, is now available for column-oriented attributes
-using the :paramref:`.orm.defer.raiseload` parameter of :func:`.defer`. This
-works in the same manner as that of the :func:`.raiseload` option used by
-relationship loading::
+ >>> from sqlalchemy import column, select
+ >>> c1, c2, c3, c4 = column('c1'), column('c2'), column('c3'), column('c4')
+ >>> stmt = select([c1, c2, c3.label('c2'), c2, c4])
+ >>> print(stmt)
+ SELECT c1, c2, c3 AS c2, c2, c4
- book = session.query(Book).options(defer(Book.summary, raiseload=True)).first()
+To support this change, the :class:`_expression.ColumnCollection` used by
+:class:`_expression.SelectBase` as well as for derived FROM clauses such as subqueries
+also support duplicate columns; this includes the new
+:attr:`_expression.SelectBase.selected_columns` attribute, the deprecated ``SelectBase.c``
+attribute, as well as the :attr:`_expression.FromClause.c` attribute seen on constructs
+such as :class:`.Subquery` and :class:`_expression.Alias`::
- # would raise an exception
- book.summary
+ >>> list(stmt.selected_columns)
+ [
+ <sqlalchemy.sql.elements.ColumnClause at 0x7fa540bcca20; c1>,
+ <sqlalchemy.sql.elements.ColumnClause at 0x7fa540bcc9e8; c2>,
+ <sqlalchemy.sql.elements.Label object at 0x7fa540b3e2e8>,
+ <sqlalchemy.sql.elements.ColumnClause at 0x7fa540bcc9e8; c2>,
+ <sqlalchemy.sql.elements.ColumnClause at 0x7fa540897048; c4>
+ ]
-To configure column-level raiseload on a mapping, the
-:paramref:`.deferred.raiseload` parameter of :func:`.deferred` may be used. The
-:func:`.undefer` option may then be used at query time to eagerly load
-the attribute::
+ >>> print(stmt.subquery().select())
+ SELECT anon_1.c1, anon_1.c2, anon_1.c2, anon_1.c2, anon_1.c4
+ FROM (SELECT c1, c2, c3 AS c2, c2, c4) AS anon_1
- class Book(Base):
- __tablename__ = 'book'
+:class:`_expression.ColumnCollection` also allows access by integer index to support
+when the string "key" is ambiguous::
- book_id = Column(Integer, primary_key=True)
- title = Column(String(200), nullable=False)
- summary = deferred(Column(String(2000)), raiseload=True)
- excerpt = deferred(Column(Text), raiseload=True)
+ >>> stmt.selected_columns[2]
+ <sqlalchemy.sql.elements.Label object at 0x7fa540b3e2e8>
- book_w_excerpt = session.query(Book).options(undefer(Book.excerpt)).first()
+To suit the use of :class:`_expression.ColumnCollection` in objects such as
+:class:`_schema.Table` and :class:`.PrimaryKeyConstraint`, the old "deduplicating"
+behavior which is more critical for these objects is preserved in a new class
+:class:`.DedupeColumnCollection`.
-It was originally considered that the existing :func:`.raiseload` option that
-works for :func:`_orm.relationship` attributes be expanded to also support column-oriented
-attributes. However, this would break the "wildcard" behavior of :func:`.raiseload`,
-which is documented as allowing one to prevent all relationships from loading::
+The change includes that the familiar warning ``"Column %r on table %r being
+replaced by %r, which has the same key. Consider use_labels for select()
+statements."`` is **removed**; the :meth:`_expression.Select.apply_labels` is still
+available and is still used by the ORM for all SELECT operations, however it
+does not imply deduplication of column objects, although it does imply
+deduplication of implicitly generated labels::
- session.query(Order).options(
- joinedload(Order.items), raiseload('*'))
-
-Above, if we had expanded :func:`.raiseload` to accommodate for columns as
-well, the wildcard would also prevent columns from loading and thus be a
-backwards incompatible change; additionally, it's not clear if
-:func:`.raiseload` covered both column expressions and relationships, how one
-would achieve the effect above of only blocking relationship loads, without
-new API being added. So to keep things simple, the option for columns
-remains on :func:`.defer`:
+ >>> from sqlalchemy import table
+ >>> user = table('user', column('id'), column('name'))
+ >>> stmt = select([user.c.id, user.c.name, user.c.id]).apply_labels()
+ >>> print(stmt)
+ SELECT "user".id AS user_id, "user".name AS user_name, "user".id AS id_1
+ FROM "user"
- :func:`.raiseload` - query option to raise for relationship loads
+Finally, the change makes it easier to create UNION and other
+:class:`_selectable.CompoundSelect` objects, by ensuring that the number and position
+of columns in a SELECT statement mirrors what was given, in a use case such
+as::
- :paramref:`.orm.defer.raiseload` - query option to raise for column expression loads
+ >>> s1 = select([user, user.c.id])
+ >>> s2 = select([c1, c2, c3])
+ >>> from sqlalchemy import union
+ >>> u = union(s1, s2)
+ >>> print(u)
+ SELECT "user".id, "user".name, "user".id
+ FROM "user" UNION SELECT c1, c2, c3
-As part of this change, the behavior of "deferred" in conjunction with
-attribute expiration has changed. Previously, when an object would be marked
-as expired, and then unexpired via the access of one of the expired attributes,
-attributes which were mapped as "deferred" at the mapper level would also load.
-This has been changed such that an attribute that is deferred in the mapping
-will never "unexpire", it only loads when accessed as part of the deferral
-loader.
-An attribute that is not mapped as "deferred", however was deferred at query
-time via the :func:`.defer` option, will be reset when the object or attribute
-is expired; that is, the deferred option is removed. This is the same behavior
-as was present previously.
+:ticket:`4753`
-.. seealso::
- :ref:`deferred_raiseload`
+.. _change_4449:
-:ticket:`4826`
+Improved column labeling for simple column expressions using CAST or similar
+----------------------------------------------------------------------------
-.. _change_5263:
+A user pointed out that the PostgreSQL database has a convenient behavior when
+using functions like CAST against a named column, in that the result column name
+is named the same as the inner expression::
-ORM Batch inserts with psycopg2 now batch statements with RETURNING in most cases
----------------------------------------------------------------------------------
+ test=> SELECT CAST(data AS VARCHAR) FROM foo;
-The change in :ref:`change_5401` adds support for "executemany" + "RETURNING"
-at the same time in Core, which is now enabled for the psycopg2 dialect
-by default using the psycopg2 ``execute_values()`` extension. The ORM flush
-process now makes use of this feature such that the retrieval of newly generated
-primary key values and server defaults can be achieved while not losing the
-performance benefits of being able to batch INSERT statements together. Additionally,
-psycopg2's ``execute_values()`` extension itself provides a five-fold performance
-improvement over psycopg2's default "executemany" implementation, by rewriting
-an INSERT statement to include many "VALUES" expressions all in one statement
-rather than invoking the same statement repeatedly, as psycopg2 lacks the ability
-to PREPARE the statement ahead of time as would normally be expected for this
-approach to be performant.
+ data
+ ------
+ 5
+ (1 row)
-SQLAlchemy includes a :ref:`performance suite <examples_performance>` within
-its examples, where we can compare the times generated for the "batch_inserts"
-runner against 1.3 and 1.4, revealing a 3x-5x speedup for most flavors
-of batch insert::
+This allows one to apply CAST to table columns while not losing the column
+name (above using the name ``"data"``) in the result row. Compare to
+databases such as MySQL/MariaDB, as well as most others, where the column
+name is taken from the full SQL expression and is not very portable::
- # 1.3
- $ python -m examples.performance bulk_inserts --dburl postgresql://scott:tiger@localhost/test
- test_flush_no_pk : (100000 iterations); total time 14.051527 sec
- test_bulk_save_return_pks : (100000 iterations); total time 15.002470 sec
- test_flush_pk_given : (100000 iterations); total time 7.863680 sec
- test_bulk_save : (100000 iterations); total time 6.780378 sec
- test_bulk_insert_mappings : (100000 iterations); total time 5.363070 sec
- test_core_insert : (100000 iterations); total time 5.362647 sec
+ MariaDB [test]> SELECT CAST(data AS CHAR) FROM foo;
+ +--------------------+
+ | CAST(data AS CHAR) |
+ +--------------------+
+ | 5 |
+ +--------------------+
+ 1 row in set (0.003 sec)
- # 1.4 with enhancement
- $ python -m examples.performance bulk_inserts --dburl postgresql://scott:tiger@localhost/test
- test_flush_no_pk : (100000 iterations); total time 3.820807 sec
- test_bulk_save_return_pks : (100000 iterations); total time 3.176378 sec
- test_flush_pk_given : (100000 iterations); total time 4.037789 sec
- test_bulk_save : (100000 iterations); total time 2.604446 sec
- test_bulk_insert_mappings : (100000 iterations); total time 1.204897 sec
- test_core_insert : (100000 iterations); total time 0.958976 sec
-Note that the ``execute_values()`` extension modifies the INSERT statement in the psycopg2
-layer, **after** it's been logged by SQLAlchemy. So with SQL logging, one will see the
-parameter sets batched together, but the joining of multiple "values" will not be visible
-on the application side::
+In SQLAlchemy Core expressions, we never deal with a raw generated name like
+the above, as SQLAlchemy applies auto-labeling to expressions like these, which
+are up until now always a so-called "anonymous" expression::
- 2020-06-27 19:08:18,166 INFO sqlalchemy.engine.Engine INSERT INTO a (data) VALUES (%(data)s) RETURNING a.id
- 2020-06-27 19:08:18,166 INFO sqlalchemy.engine.Engine [generated in 0.00698s] ({'data': 'data 1'}, {'data': 'data 2'}, {'data': 'data 3'}, {'data': 'data 4'}, {'data': 'data 5'}, {'data': 'data 6'}, {'data': 'data 7'}, {'data': 'data 8'} ... displaying 10 of 4999 total bound parameter sets ... {'data': 'data 4998'}, {'data': 'data 4999'})
- 2020-06-27 19:08:18,254 INFO sqlalchemy.engine.Engine COMMIT
+ >>> print(select([cast(foo.c.data, String)]))
+ SELECT CAST(foo.data AS VARCHAR) AS anon_1 # old behavior
+ FROM foo
-The ultimate INSERT statement can be seen by enabling statement logging on the PostgreSQL side::
+These anonymous expressions were necessary as SQLAlchemy's
+:class:`_engine.ResultProxy` made heavy use of result column names in order to match
+up datatypes, such as the :class:`.String` datatype which used to have
+result-row-processing behavior, to the correct column, so most importantly the
+names had to be both easy to determine in a database-agnostic manner as well as
+unique in all cases. In SQLAlchemy 1.0 as part of :ticket:`918`, this
+reliance on named columns in result rows (specifically the
+``cursor.description`` element of the PEP-249 cursor) was scaled back to not be
+necessary for most Core SELECT constructs; in release 1.4, the system overall
+is becoming more comfortable with SELECT statements that have duplicate column
+or label names such as in :ref:`change_4753`. So we now emulate PostgreSQL's
+reasonable behavior for simple modifications to a single column, most
+prominently with CAST::
- 2020-06-27 19:08:18.169 EDT [26960] LOG: statement: INSERT INTO a (data)
- VALUES ('data 1'),('data 2'),('data 3'),('data 4'),('data 5'),('data 6'),('data
- 7'),('data 8'),('data 9'),('data 10'),('data 11'),('data 12'),
- ... ('data 999'),('data 1000') RETURNING a.id
+ >>> print(select([cast(foo.c.data, String)]))
+ SELECT CAST(foo.data AS VARCHAR) AS data
+ FROM foo
- 2020-06-27 19:08:18.175 EDT
- [26960] LOG: statement: INSERT INTO a (data) VALUES ('data 1001'),('data
- 1002'),('data 1003'),('data 1004'),('data 1005 '),('data 1006'),('data
- 1007'),('data 1008'),('data 1009'),('data 1010'),('data 1011'), ...
+For CAST against expressions that don't have a name, the previous logic is used
+to generate the usual "anonymous" labels::
-The feature batches rows into groups of 1000 by default which can be affected
-using the ``executemany_values_page_size`` argument documented at
-:ref:`psycopg2_executemany_mode`.
+ >>> print(select([cast('hi there,' + foo.c.data, String)]))
+ SELECT CAST(:data_1 + foo.data AS VARCHAR) AS anon_1
+ FROM foo
-:ticket:`5263`
+A :func:`.cast` against a :class:`.Label`, despite having to omit the label
+expression as these don't render inside of a CAST, will nonetheless make use of
+the given name::
+ >>> print(select([cast(('hi there,' + foo.c.data).label('hello_data'), String)]))
+ SELECT CAST(:data_1 + foo.data AS VARCHAR) AS hello_data
+ FROM foo
-.. _change_orm_update_returning_14:
+And of course as was always the case, :class:`.Label` can be applied to the
+expression on the outside to apply an "AS <name>" label directly::
-ORM Bulk Update and Delete use RETURNING for "fetch" strategy when available
-----------------------------------------------------------------------------
+ >>> print(select([cast(('hi there,' + foo.c.data), String).label('hello_data')]))
+ SELECT CAST(:data_1 + foo.data AS VARCHAR) AS hello_data
+ FROM foo
-An ORM bulk update or delete that uses the "fetch" strategy::
- sess.query(User).filter(User.age > 29).update(
- {"age": User.age - 10}, synchronize_session="fetch"
- )
+:ticket:`4449`
-Will now use RETURNING if the backend database supports it; this currently
-includes PostgreSQL and SQL Server (the Oracle dialect does not support RETURNING
-of multiple rows)::
+.. _change_4808:
- UPDATE users SET age_int=(users.age_int - %(age_int_1)s) WHERE users.age_int > %(age_int_2)s RETURNING users.id
- [generated in 0.00060s] {'age_int_1': 10, 'age_int_2': 29}
- Col ('id',)
- Row (2,)
- Row (4,)
+New "post compile" bound parameters used for LIMIT/OFFSET in Oracle, SQL Server
+-------------------------------------------------------------------------------
-For backends that do not support RETURNING of multiple rows, the previous approach
-of emitting SELECT for the primary keys beforehand is still used::
+A major goal of the 1.4 series is to establish that all Core SQL constructs
+are completely cacheable, meaning that a particular :class:`.Compiled`
+structure will produce an identical SQL string regardless of any SQL parameters
+used with it, which notably includes those used to specify the LIMIT and
+OFFSET values, typically used for pagination and "top N" style results.
- SELECT users.id FROM users WHERE users.age_int > %(age_int_1)s
- [generated in 0.00043s] {'age_int_1': 29}
- Col ('id',)
- Row (2,)
- Row (4,)
- UPDATE users SET age_int=(users.age_int - %(age_int_1)s) WHERE users.age_int > %(age_int_2)s
- [generated in 0.00102s] {'age_int_1': 10, 'age_int_2': 29}
+While SQLAlchemy has used bound parameters for LIMIT/OFFSET schemes for many
+years, a few outliers remained where such parameters were not allowed, including
+a SQL Server "TOP N" statement, such as::
-One of the intricate challenges of this change is to support cases such as the
-horizontal sharding extension, where a single bulk update or delete may be
-multiplexed among backends some of which support RETURNING and some don't. The
-new 1.4 execution archiecture supports this case so that the "fetch" strategy
-can be left intact with a graceful degrade to using a SELECT, rather than having
-to add a new "returning" strategy that would not be backend-agnostic.
+ SELECT TOP 5 mytable.id, mytable.data FROM mytable
-As part of this change, the "fetch" strategy is also made much more efficient
-in that it will no longer expire the objects located which match the rows,
-for Python expressions used in the SET clause which can be evaluated in
-Python; these are instead assigned
-directly onto the object in the same way as the "evaluate" strategy. Only
-for SQL expressions that can't be evaluated does it fall back to expiring
-the attributes. The "evaluate" strategy has also been enhanced to fall back
-to "expire" for a value that cannot be evaluated.
+as well as with Oracle, where the FIRST_ROWS() hint (which SQLAlchemy will
+use if the ``optimize_limits=True`` parameter is passed to
+:func:`_sa.create_engine` with an Oracle URL) does not allow them,
+but also that using bound parameters with ROWNUM comparisons has been reported
+as producing slower query plans::
+ SELECT anon_1.id, anon_1.data FROM (
+ SELECT /*+ FIRST_ROWS(5) */
+ anon_2.id AS id,
+ anon_2.data AS data,
+ ROWNUM AS ora_rn FROM (
+ SELECT mytable.id, mytable.data FROM mytable
+ ) anon_2
+ WHERE ROWNUM <= :param_1
+ ) anon_1 WHERE ora_rn > :param_2
-Behavioral Changes - ORM
-========================
+In order to allow for all statements to be unconditionally cacheable at the
+compilation level, a new form of bound parameter called a "post compile"
+parameter has been added, which makes use of the same mechanism as that
+of "expanding IN parameters". This is a :func:`.bindparam` that behaves
+identically to any other bound parameter except that parameter value will
+be rendered literally into the SQL string before sending it to the DBAPI
+``cursor.execute()`` method. The new parameter is used internally by the
+SQL Server and Oracle dialects, so that the drivers receive the literal
+rendered value but the rest of SQLAlchemy can still consider this as a
+bound parameter. The above two statements when stringified using
+``str(statement.compile(dialect=<dialect>))`` now look like::
-.. _change_4710_orm:
+ SELECT TOP [POSTCOMPILE_param_1] mytable.id, mytable.data FROM mytable
-The "KeyedTuple" object returned by Query is replaced by Row
--------------------------------------------------------------
+and::
-As discussed at :ref:`change_4710_core`, the Core :class:`.RowProxy` object
-is now replaced by a class called :class:`.Row`. The base :class:`.Row`
-object now behaves more fully like a named tuple, and as such it is now
-used as the basis for tuple-like results returned by the :class:`_query.Query`
-object, rather than the previous "KeyedTuple" class.
+ SELECT anon_1.id, anon_1.data FROM (
+ SELECT /*+ FIRST_ROWS([POSTCOMPILE__ora_frow_1]) */
+ anon_2.id AS id,
+ anon_2.data AS data,
+ ROWNUM AS ora_rn FROM (
+ SELECT mytable.id, mytable.data FROM mytable
+ ) anon_2
+ WHERE ROWNUM <= [POSTCOMPILE_param_1]
+ ) anon_1 WHERE ora_rn > [POSTCOMPILE_param_2]
-The rationale is so that by SQLAlchemy 2.0, both Core and ORM SELECT statements
-will return result rows using the same :class:`.Row` object which behaves like
-a named tuple. Dictionary-like functionality is available from :class:`.Row`
-via the :attr:`.Row._mapping` attribute. In the interim, Core result sets
-will make use of a :class:`.Row` subclass :class:`.LegacyRow` which maintains
-the previous dict/tuple hybrid behavior for backwards compatibility while the
-:class:`.Row` class will be used directly for ORM tuple results returned
-by the :class:`_query.Query` object.
-
-Effort has been made to get most of the featureset of :class:`.Row` to be
-available within the ORM, meaning that access by string name as well
-as entity / column should work::
-
- row = s.query(User, Address).join(User.addresses).first()
+The ``[POSTCOMPILE_<param>]`` format is also what is seen when an
+"expanding IN" is used.
- row._mapping[User] # same as row[0]
- row._mapping[Address] # same as row[1]
- row._mapping["User"] # same as row[0]
- row._mapping["Address"] # same as row[1]
+When viewing the SQL logging output, the final form of the statement will
+be seen::
- u1 = aliased(User)
- row = s.query(u1).only_return_tuples(True).first()
- row._mapping[u1] # same as row[0]
+ SELECT anon_1.id, anon_1.data FROM (
+ SELECT /*+ FIRST_ROWS(5) */
+ anon_2.id AS id,
+ anon_2.data AS data,
+ ROWNUM AS ora_rn FROM (
+ SELECT mytable.id AS id, mytable.data AS data FROM mytable
+ ) anon_2
+ WHERE ROWNUM <= 8
+ ) anon_1 WHERE ora_rn > 3
- row = (
- s.query(User.id, Address.email_address)
- .join(User.addresses)
- .first()
- )
+The "post compile parameter" feature is exposed as public API through the
+:paramref:`.bindparam.literal_execute` parameter, however is currently not
+intended for general use. The literal values are rendered using the
+:meth:`.TypeEngine.literal_processor` of the underlying datatype, which in
+SQLAlchemy has **extremely limited** scope, supporting only integers and simple
+string values.
- row._mapping[User.id] # same as row[0]
- row._mapping["id"] # same as row[0]
- row._mapping[users.c.id] # same as row[0]
+:ticket:`4808`
-.. seealso::
+.. _change_4712:
- :ref:`change_4710_core`
+Connection-level transactions can now be inactive based on subtransaction
+-------------------------------------------------------------------------
-:ticket:`4710`.
+A :class:`_engine.Connection` now includes the behavior where a :class:`.Transaction`
+can be made inactive due to a rollback on an inner transaction, however the
+:class:`.Transaction` will not clear until it is itself rolled back.
-.. _change_5074:
+This is essentially a new error condition which will disallow statement
+executions to proceed on a :class:`_engine.Connection` if an inner "sub" transaction
+has been rolled back. The behavior works very similarly to that of the
+ORM :class:`.Session`, where if an outer transaction has been begun, it needs
+to be rolled back to clear the invalid transaction; this behavior is described
+in :ref:`faq_session_rollback`.
-Session does not immediately create a new SessionTransaction object
-----------------------------------------------------------------------------
+While the :class:`_engine.Connection` has had a less strict behavioral pattern than
+the :class:`.Session`, this change was made as it helps to identify when
+a subtransaction has rolled back the DBAPI transaction, however the external
+code isn't aware of this and attempts to continue proceeding, which in fact
+runs operations on a new transaction. The "test harness" pattern described
+at :ref:`session_external_transaction` is the common place for this to occur.
-The :class:`.Session` object's default behavior of ``autocommit=False``
-historically has meant that there is always a :class:`.SessionTransaction`
-object in play, associated with the :class:`.Session` via the
-:attr:`.Session.transaction` attribute. When the given
-:class:`.SessionTransaction` was complete, due to a commit, rollback, or close,
-it was immediately replaced with a new one. The :class:`.SessionTransaction`
-by itself does not imply the usage of any connection-oriented resources, so
-this long-standing behavior has a particular elegance to it in that the state
-of :attr:`.Session.transaction` is always predictable as non-None.
+The "subtransaction" feature of Core and ORM is itself deprecated and will
+no longer be present in version 2.0. As a result, this new error condition
+is itself temporary as it will no longer apply once subtransactions are removed.
-However, as part of the initiative in :ticket:`5056` to greatly reduce
-reference cycles, this assumption means that calling upon
-:meth:`.Session.close` results in a :class:`.Session` object that still has
-reference cycles and is more expensive to clean up, not to mention that there
-is a small overhead in constructing the :class:`.SessionTransaction`
-object, which meant that there would be unnecessary overhead created
-for a :class:`.Session` that for example invoked :meth:`.Session.commit`
-and then :meth:`.Session.close`.
+In order to work with the 2.0 style behavior that does not include
+subtransactions, use the :paramref:`_sa.create_engine.future` parameter
+on :func:`_sa.create_engine`.
-As such, it was decided that :meth:`.Session.close` should leave the internal
-state of ``self.transaction``, now referred to internally as
-``self._transaction``, as None, and that a new :class:`.SessionTransaction`
-should only be created when needed. For consistency and code coverage, this
-behavior was also expanded to include all the points at which "autobegin" is
-expected, not just when :meth:`.Session.close` were called.
+The error message is described in the errors page at :ref:`error_8s2a`.
-In particular, this causes a behavioral change for applications which
-subscribe to the :meth:`.SessionEvents.after_transaction_create` event hook;
-previously, this event would be emitted when the :class:`.Session` were first
-constructed, as well as for most actions that closed the previous transaction
-and would emit :meth:`.SessionEvents.after_transaction_end`. The new behavior
-is that :meth:`.SessionEvents.after_transaction_create` is emitted on demand,
-when the :class:`.Session` has not yet created a new
-:class:`.SessionTransaction` object and mapped objects are associated with the
-:class:`.Session` through methods like :meth:`.Session.add` and
-:meth:`.Session.delete`, when the :attr:`.Session.transaction` attribute is
-called upon, when the :meth:`.Session.flush` method has tasks to complete, etc.
-Besides the change in when the :meth:`.SessionEvents.after_transaction_create`
-event is emitted, the change should have no other user-visible impact on the
-:class:`.Session` object's behavior; the :class:`.Session` will continue to have
-the behavior that it remains usable for new operations after :meth:`.Session.close`
-is called, and the sequencing of how the :class:`.Session` interacts with the
-:class:`_engine.Engine` and the database itself should also remain unaffected, since
-these operations were already operating in an on-demand fashion.
-:ticket:`5074`
+New Features - ORM
+==================
-.. _change_5237_14:
+.. _change_4826:
-Viewonly relationships don't synchronize backrefs
--------------------------------------------------
+Raiseload for Columns
+---------------------
-In :ticket:`5149` in 1.3.14, SQLAlchemy began emitting a warning when the
-:paramref:`_orm.relationship.backref` or :paramref:`_orm.relationship.back_populates`
-keywords would be used at the same time as the :paramref:`_orm.relationship.viewonly`
-flag on the target relationship. This was because a "viewonly" relationship does
-not actually persist changes made to it, which could cause some misleading
-behaviors to occur. However, in :ticket:`5237`, we sought to refine this
-behavior as there are legitimate use cases to have backrefs set up on
-viewonly relationships, including that back populates attributes are used
-in some cases by the relationship lazy loaders to determine that an additional
-eager load in the other direction is not necessary, as well as that back
-populates can be used for mapper introspection and that :func:`_orm.backref`
-can be a convenient way to set up bi-directional relationships.
+The "raiseload" feature, which raises :class:`.InvalidRequestError` when an
+unloaded attribute is accessed, is now available for column-oriented attributes
+using the :paramref:`.orm.defer.raiseload` parameter of :func:`.defer`. This
+works in the same manner as that of the :func:`.raiseload` option used by
+relationship loading::
-The solution then was to make the "mutation" that occurs from a backref
-an optional thing, using the :paramref:`_orm.relationship.sync_backref`
-flag. In 1.4 the value of :paramref:`_orm.relationship.sync_backref` defaults
-to False for a relationship target that also sets :paramref:`_orm.relationship.viewonly`.
-This indicates that any changes made to a relationship with
-viewonly will not impact the state of the other side or of the :class:`_orm.Session`
-in any way::
+ book = session.query(Book).options(defer(Book.summary, raiseload=True)).first()
+ # would raise an exception
+ book.summary
- class User(Base):
- # ...
+To configure column-level raiseload on a mapping, the
+:paramref:`.deferred.raiseload` parameter of :func:`.deferred` may be used. The
+:func:`.undefer` option may then be used at query time to eagerly load
+the attribute::
- addresses = relationship(Address, backref=backref("user", viewonly=True))
+ class Book(Base):
+ __tablename__ = 'book'
- class Address(Base):
- # ...
+ book_id = Column(Integer, primary_key=True)
+ title = Column(String(200), nullable=False)
+ summary = deferred(Column(String(2000)), raiseload=True)
+ excerpt = deferred(Column(Text), raiseload=True)
+ book_w_excerpt = session.query(Book).options(undefer(Book.excerpt)).first()
- u1 = session.query(User).filter_by(name="x").first()
+It was originally considered that the existing :func:`.raiseload` option that
+works for :func:`_orm.relationship` attributes be expanded to also support column-oriented
+attributes. However, this would break the "wildcard" behavior of :func:`.raiseload`,
+which is documented as allowing one to prevent all relationships from loading::
- a1 = Address()
- a1.user = u1
+ session.query(Order).options(
+ joinedload(Order.items), raiseload('*'))
-Above, the ``a1`` object will **not** be added to the ``u1.addresses``
-collection, nor will the ``a1`` object be added to the session. Previously,
-both of these things would be true. The warning that
-:paramref:`.relationship.sync_backref` should be set to ``False`` when
-:paramref:`.relationship.viewonly` is ``False`` is no longer emitted as this is
-now the default behavior.
+Above, if we had expanded :func:`.raiseload` to accommodate for columns as
+well, the wildcard would also prevent columns from loading and thus be a
+backwards incompatible change; additionally, it's not clear if
+:func:`.raiseload` covered both column expressions and relationships, how one
+would achieve the effect above of only blocking relationship loads, without
+new API being added. So to keep things simple, the option for columns
+remains on :func:`.defer`:
-:ticket:`5237`
+ :func:`.raiseload` - query option to raise for relationship loads
-.. _change_1763:
+ :paramref:`.orm.defer.raiseload` - query option to raise for column expression loads
-Eager loaders emit during unexpire operations
----------------------------------------------
-A long sought behavior was that when an expired object is accessed, configured
-eager loaders will run in order to eagerly load relationships on the expired
-object when the object is refreshed or otherwise unexpired. This behavior has
-now been added, so that joinedloaders will add inline JOINs as usual, and
-selectin/subquery loaders will run an "immediateload" operation for a given
-relationship, when an expired object is unexpired or an object is refreshed::
+As part of this change, the behavior of "deferred" in conjunction with
+attribute expiration has changed. Previously, when an object would be marked
+as expired, and then unexpired via the access of one of the expired attributes,
+attributes which were mapped as "deferred" at the mapper level would also load.
+This has been changed such that an attribute that is deferred in the mapping
+will never "unexpire", it only loads when accessed as part of the deferral
+loader.
- >>> a1 = session.query(A).options(joinedload(A.bs)).first()
- >>> a1.data = 'new data'
- >>> session.commit()
+An attribute that is not mapped as "deferred", however was deferred at query
+time via the :func:`.defer` option, will be reset when the object or attribute
+is expired; that is, the deferred option is removed. This is the same behavior
+as was present previously.
-Above, the ``A`` object was loaded with a ``joinedload()`` option associated
-with it in order to eagerly load the ``bs`` collection. After the
-``session.commit()``, the state of the object is expired. Upon accessing
-the ``.data`` column attribute, the object is refreshed and this will now
-include the joinedload operation as well::
- >>> a1.data
- SELECT a.id AS a_id, a.data AS a_data, b_1.id AS b_1_id, b_1.a_id AS b_1_a_id
- FROM a LEFT OUTER JOIN b AS b_1 ON a.id = b_1.a_id
- WHERE a.id = ?
+.. seealso::
-The behavior applies both to loader strategies applied to the
-:func:`_orm.relationship` directly, as well as with options used with
-:meth:`_query.Query.options`, provided that the object was originally loaded by that
-query.
+ :ref:`deferred_raiseload`
-For the "secondary" eager loaders "selectinload" and "subqueryload", the SQL
-strategy for these loaders is not necessary in order to eagerly load attributes
-on a single object; so they will instead invoke the "immediateload" strategy in
-a refresh scenario, which resembles the query emitted by "lazyload", emitted as
-an additional query::
+:ticket:`4826`
- >>> a1 = session.query(A).options(selectinload(A.bs)).first()
- >>> a1.data = 'new data'
- >>> session.commit()
- >>> a1.data
- SELECT a.id AS a_id, a.data AS a_data
- FROM a
- WHERE a.id = ?
- (1,)
- SELECT b.id AS b_id, b.a_id AS b_a_id
- FROM b
- WHERE ? = b.a_id
- (1,)
+.. _change_5263:
-Note that a loader option does not apply to an object that was introduced
-into the :class:`.Session` in a different way. That is, if the ``a1`` object
-were just persisted in this :class:`.Session`, or was loaded with a different
-query before the eager option had been applied, then the object doesn't have
-an eager load option associated with it. This is not a new concept, however
-users who are looking for the eagerload on refresh behavior may find this
-to be more noticeable.
+ORM Batch inserts with psycopg2 now batch statements with RETURNING in most cases
+---------------------------------------------------------------------------------
-:ticket:`1763`
+The change in :ref:`change_5401` adds support for "executemany" + "RETURNING"
+at the same time in Core, which is now enabled for the psycopg2 dialect
+by default using the psycopg2 ``execute_values()`` extension. The ORM flush
+process now makes use of this feature such that the retrieval of newly generated
+primary key values and server defaults can be achieved while not losing the
+performance benefits of being able to batch INSERT statements together. Additionally,
+psycopg2's ``execute_values()`` extension itself provides a five-fold performance
+improvement over psycopg2's default "executemany" implementation, by rewriting
+an INSERT statement to include many "VALUES" expressions all in one statement
+rather than invoking the same statement repeatedly, as psycopg2 lacks the ability
+to PREPARE the statement ahead of time as would normally be expected for this
+approach to be performant.
-.. _change_4519:
+SQLAlchemy includes a :ref:`performance suite <examples_performance>` within
+its examples, where we can compare the times generated for the "batch_inserts"
+runner against 1.3 and 1.4, revealing a 3x-5x speedup for most flavors
+of batch insert::
-Accessing an uninitialized collection attribute on a transient object no longer mutates __dict__
--------------------------------------------------------------------------------------------------
+ # 1.3
+ $ python -m examples.performance bulk_inserts --dburl postgresql://scott:tiger@localhost/test
+ test_flush_no_pk : (100000 iterations); total time 14.051527 sec
+ test_bulk_save_return_pks : (100000 iterations); total time 15.002470 sec
+ test_flush_pk_given : (100000 iterations); total time 7.863680 sec
+ test_bulk_save : (100000 iterations); total time 6.780378 sec
+ test_bulk_insert_mappings : (100000 iterations); total time 5.363070 sec
+ test_core_insert : (100000 iterations); total time 5.362647 sec
-It has always been SQLAlchemy's behavior that accessing mapped attributes on a
-newly created object returns an implicitly generated value, rather than raising
-``AttributeError``, such as ``None`` for scalar attributes or ``[]`` for a
-list-holding relationship::
+ # 1.4 with enhancement
+ $ python -m examples.performance bulk_inserts --dburl postgresql://scott:tiger@localhost/test
+ test_flush_no_pk : (100000 iterations); total time 3.820807 sec
+ test_bulk_save_return_pks : (100000 iterations); total time 3.176378 sec
+ test_flush_pk_given : (100000 iterations); total time 4.037789 sec
+ test_bulk_save : (100000 iterations); total time 2.604446 sec
+ test_bulk_insert_mappings : (100000 iterations); total time 1.204897 sec
+ test_core_insert : (100000 iterations); total time 0.958976 sec
- >>> u1 = User()
- >>> u1.name
- None
- >>> u1.addresses
- []
+Note that the ``execute_values()`` extension modifies the INSERT statement in the psycopg2
+layer, **after** it's been logged by SQLAlchemy. So with SQL logging, one will see the
+parameter sets batched together, but the joining of multiple "values" will not be visible
+on the application side::
-The rationale for the above behavior was originally to make ORM objects easier
-to work with. Since an ORM object represents an empty row when first created
-without any state, it is intuitive that its un-accessed attributes would
-resolve to ``None`` (or SQL NULL) for scalars and to empty collections for
-relationships. In particular, it makes possible an extremely common pattern
-of being able to mutate the new collection without manually creating and
-assigning an empty collection first::
+ 2020-06-27 19:08:18,166 INFO sqlalchemy.engine.Engine INSERT INTO a (data) VALUES (%(data)s) RETURNING a.id
+ 2020-06-27 19:08:18,166 INFO sqlalchemy.engine.Engine [generated in 0.00698s] ({'data': 'data 1'}, {'data': 'data 2'}, {'data': 'data 3'}, {'data': 'data 4'}, {'data': 'data 5'}, {'data': 'data 6'}, {'data': 'data 7'}, {'data': 'data 8'} ... displaying 10 of 4999 total bound parameter sets ... {'data': 'data 4998'}, {'data': 'data 4999'})
+ 2020-06-27 19:08:18,254 INFO sqlalchemy.engine.Engine COMMIT
- >>> u1 = User()
- >>> u1.addresses.append(Address()) # no need to assign u1.addresses = []
+The ultimate INSERT statement can be seen by enabling statement logging on the PostgreSQL side::
-Up until version 1.0 of SQLAlchemy, the behavior of this initialization system
-for both scalar attributes as well as collections would be that the ``None`` or
-empty collection would be *populated* into the object's state, e.g.
-``__dict__``. This meant that the following two operations were equivalent::
+ 2020-06-27 19:08:18.169 EDT [26960] LOG: statement: INSERT INTO a (data)
+ VALUES ('data 1'),('data 2'),('data 3'),('data 4'),('data 5'),('data 6'),('data
+ 7'),('data 8'),('data 9'),('data 10'),('data 11'),('data 12'),
+ ... ('data 999'),('data 1000') RETURNING a.id
- >>> u1 = User()
- >>> u1.name = None # explicit assignment
+ 2020-06-27 19:08:18.175 EDT
+ [26960] LOG: statement: INSERT INTO a (data) VALUES ('data 1001'),('data
+ 1002'),('data 1003'),('data 1004'),('data 1005 '),('data 1006'),('data
+ 1007'),('data 1008'),('data 1009'),('data 1010'),('data 1011'), ...
- >>> u2 = User()
- >>> u2.name # implicit assignment just by accessing it
- None
+The feature batches rows into groups of 1000 by default which can be affected
+using the ``executemany_values_page_size`` argument documented at
+:ref:`psycopg2_executemany_mode`.
-Where above, both ``u1`` and ``u2`` would have the value ``None`` populated
-in the value of the ``name`` attribute. Since this is a SQL NULL, the ORM
-would skip including these values within an INSERT so that SQL-level defaults
-take place, if any, else the value defaults to NULL on the database side.
+:ticket:`5263`
-In version 1.0 as part of :ref:`migration_3061`, this behavior was refined so
-that the ``None`` value was no longer populated into ``__dict__``, only
-returned. Besides removing the mutating side effect of a getter operation,
-this change also made it possible to set columns that did have server defaults
-to the value NULL by actually assigning ``None``, which was now distinguished
-from just reading it.
-The change however did not accommodate for collections, where returning an
-empty collection that is not assigned meant that this mutable collection would
-be different each time and also would not be able to correctly accommodate for
-mutating operations (e.g. append, add, etc.) called upon it. While the
-behavior continued to generally not get in anyone's way, an edge case was
-eventually identified in :ticket:`4519` where this empty collection could be
-harmful, which is when the object is merged into a session::
+.. _change_orm_update_returning_14:
- >>> u1 = User(id=1) # create an empty User to merge with id=1 in the database
- >>> merged1 = session.merge(u1) # value of merged1.addresses is unchanged from that of the DB
+ORM Bulk Update and Delete use RETURNING for "fetch" strategy when available
+----------------------------------------------------------------------------
- >>> u2 = User(id=2) # create an empty User to merge with id=2 in the database
- >>> u2.addresses
- []
- >>> merged2 = session.merge(u2) # value of merged2.addresses has been emptied in the DB
+An ORM bulk update or delete that uses the "fetch" strategy::
-Above, the ``.addresses`` collection on ``merged1`` will contain all the
-``Address()`` objects that were already in the database. ``merged2`` will
-not; because it has an empty list implicitly assigned, the ``.addresses``
-collection will be erased. This is an example of where this mutating side
-effect can actually mutate the database itself.
+ sess.query(User).filter(User.age > 29).update(
+ {"age": User.age - 10}, synchronize_session="fetch"
+ )
-While it was considered that perhaps the attribute system should begin using
-strict "plain Python" behavior, raising ``AttributeError`` in all cases for
-non-existent attributes on non-persistent objects and requiring that all
-collections be explicitly assigned, such a change would likely be too extreme
-for the vast number of applications that have relied upon this behavior for
-many years, leading to a complex rollout / backwards compatibility problem as
-well as the likelihood that workarounds to restore the old behavior would
-become prevalent, thus rendering the whole change ineffective in any case.
+Will now use RETURNING if the backend database supports it; this currently
+includes PostgreSQL and SQL Server (the Oracle dialect does not support RETURNING
+of multiple rows)::
-The change then is to keep the default producing behavior, but to finally make
-the non-mutating behavior of scalars a reality for collections as well, via the
-addition of additional mechanics in the collection system. When accessing the
-empty attribute, the new collection is created and associated with the state,
-however is not added to ``__dict__`` until it is actually mutated::
+ UPDATE users SET age_int=(users.age_int - %(age_int_1)s) WHERE users.age_int > %(age_int_2)s RETURNING users.id
+ [generated in 0.00060s] {'age_int_1': 10, 'age_int_2': 29}
+ Col ('id',)
+ Row (2,)
+ Row (4,)
- >>> u1 = User()
- >>> l1 = u1.addresses # new list is created, associated with the state
- >>> assert u1.addresses is l1 # you get the same list each time you access it
- >>> assert "addresses" not in u1.__dict__ # but it won't go into __dict__ until it's mutated
- >>> from sqlalchemy import inspect
- >>> inspect(u1).attrs.addresses.history
- History(added=None, unchanged=None, deleted=None)
+For backends that do not support RETURNING of multiple rows, the previous approach
+of emitting SELECT for the primary keys beforehand is still used::
-When the list is changed, then it becomes part of the tracked changes to
-be persisted to the database::
+ SELECT users.id FROM users WHERE users.age_int > %(age_int_1)s
+ [generated in 0.00043s] {'age_int_1': 29}
+ Col ('id',)
+ Row (2,)
+ Row (4,)
+ UPDATE users SET age_int=(users.age_int - %(age_int_1)s) WHERE users.age_int > %(age_int_2)s
+ [generated in 0.00102s] {'age_int_1': 10, 'age_int_2': 29}
- >>> l1.append(Address())
- >>> assert "addresses" in u1.__dict__
- >>> inspect(u1).attrs.addresses.history
- History(added=[<__main__.Address object at 0x7f49b725eda0>], unchanged=[], deleted=[])
+One of the intricate challenges of this change is to support cases such as the
+horizontal sharding extension, where a single bulk update or delete may be
+multiplexed among backends some of which support RETURNING and some don't. The
+new 1.4 execution archiecture supports this case so that the "fetch" strategy
+can be left intact with a graceful degrade to using a SELECT, rather than having
+to add a new "returning" strategy that would not be backend-agnostic.
-This change is expected to have *nearly* no impact on existing applications
-in any way, except that it has been observed that some applications may be
-relying upon the implicit assignment of this collection, such as to assert that
-the object contains certain values based on its ``__dict__``::
+As part of this change, the "fetch" strategy is also made much more efficient
+in that it will no longer expire the objects located which match the rows,
+for Python expressions used in the SET clause which can be evaluated in
+Python; these are instead assigned
+directly onto the object in the same way as the "evaluate" strategy. Only
+for SQL expressions that can't be evaluated does it fall back to expiring
+the attributes. The "evaluate" strategy has also been enhanced to fall back
+to "expire" for a value that cannot be evaluated.
- >>> u1 = User()
- >>> u1.addresses
- []
- # this will now fail, would pass before
- >>> assert {k: v for k, v in u1.__dict__.items() if not k.startswith("_")} == {"addresses": []}
-or to ensure that the collection won't require a lazy load to proceed, the
-(admittedly awkward) code below will now also fail::
+Behavioral Changes - ORM
+========================
- >>> u1 = User()
- >>> u1.addresses
- []
- >>> s.add(u1)
- >>> s.flush()
- >>> s.close()
- >>> u1.addresses # <-- will fail, .addresses is not loaded and object is detached
+.. _change_4710_orm:
-Applications that rely upon the implicit mutating behavior of collections will
-need to be changed so that they assign the desired collection explicitly::
+The "KeyedTuple" object returned by Query is replaced by Row
+-------------------------------------------------------------
- >>> u1.addresses = []
+As discussed at :ref:`change_4710_core`, the Core :class:`.RowProxy` object
+is now replaced by a class called :class:`.Row`. The base :class:`.Row`
+object now behaves more fully like a named tuple, and as such it is now
+used as the basis for tuple-like results returned by the :class:`_query.Query`
+object, rather than the previous "KeyedTuple" class.
-:ticket:`4519`
+The rationale is so that by SQLAlchemy 2.0, both Core and ORM SELECT statements
+will return result rows using the same :class:`.Row` object which behaves like
+a named tuple. Dictionary-like functionality is available from :class:`.Row`
+via the :attr:`.Row._mapping` attribute. In the interim, Core result sets
+will make use of a :class:`.Row` subclass :class:`.LegacyRow` which maintains
+the previous dict/tuple hybrid behavior for backwards compatibility while the
+:class:`.Row` class will be used directly for ORM tuple results returned
+by the :class:`_query.Query` object.
-.. _change_4662:
+Effort has been made to get most of the featureset of :class:`.Row` to be
+available within the ORM, meaning that access by string name as well
+as entity / column should work::
-The "New instance conflicts with existing identity" error is now a warning
----------------------------------------------------------------------------
+ row = s.query(User, Address).join(User.addresses).first()
-SQLAlchemy has always had logic to detect when an object in the :class:`.Session`
-to be inserted has the same primary key as an object that is already present::
+ row._mapping[User] # same as row[0]
+ row._mapping[Address] # same as row[1]
+ row._mapping["User"] # same as row[0]
+ row._mapping["Address"] # same as row[1]
- class Product(Base):
- __tablename__ = 'product'
+ u1 = aliased(User)
+ row = s.query(u1).only_return_tuples(True).first()
+ row._mapping[u1] # same as row[0]
- id = Column(Integer, primary_key=True)
- session = Session(engine)
+ row = (
+ s.query(User.id, Address.email_address)
+ .join(User.addresses)
+ .first()
+ )
- # add Product with primary key 1
- session.add(Product(id=1))
- session.flush()
+ row._mapping[User.id] # same as row[0]
+ row._mapping["id"] # same as row[0]
+ row._mapping[users.c.id] # same as row[0]
- # add another Product with same primary key
- session.add(Product(id=1))
- s.commit() # <-- will raise FlushError
+.. seealso::
-The change is that the :class:`.FlushError` is altered to be only a warning::
+ :ref:`change_4710_core`
- sqlalchemy/orm/persistence.py:408: SAWarning: New instance <Product at 0x7f1ff65e0ba8> with identity key (<class '__main__.Product'>, (1,), None) conflicts with persistent instance <Product at 0x7f1ff60a4550>
+:ticket:`4710`.
+.. _change_5074:
-Subsequent to that, the condition will attempt to insert the row into the
-database which will emit :class:`.IntegrityError`, which is the same error that
-would be raised if the primary key identity was not already present in the
-:class:`.Session`::
+Session does not immediately create a new SessionTransaction object
+----------------------------------------------------------------------------
- sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: product.id
+The :class:`.Session` object's default behavior of ``autocommit=False``
+historically has meant that there is always a :class:`.SessionTransaction`
+object in play, associated with the :class:`.Session` via the
+:attr:`.Session.transaction` attribute. When the given
+:class:`.SessionTransaction` was complete, due to a commit, rollback, or close,
+it was immediately replaced with a new one. The :class:`.SessionTransaction`
+by itself does not imply the usage of any connection-oriented resources, so
+this long-standing behavior has a particular elegance to it in that the state
+of :attr:`.Session.transaction` is always predictable as non-None.
-The rationale is to allow code that is using :class:`.IntegrityError` to catch
-duplicates to function regardless of the existing state of the
-:class:`.Session`, as is often done using savepoints::
+However, as part of the initiative in :ticket:`5056` to greatly reduce
+reference cycles, this assumption means that calling upon
+:meth:`.Session.close` results in a :class:`.Session` object that still has
+reference cycles and is more expensive to clean up, not to mention that there
+is a small overhead in constructing the :class:`.SessionTransaction`
+object, which meant that there would be unnecessary overhead created
+for a :class:`.Session` that for example invoked :meth:`.Session.commit`
+and then :meth:`.Session.close`.
+As such, it was decided that :meth:`.Session.close` should leave the internal
+state of ``self.transaction``, now referred to internally as
+``self._transaction``, as None, and that a new :class:`.SessionTransaction`
+should only be created when needed. For consistency and code coverage, this
+behavior was also expanded to include all the points at which "autobegin" is
+expected, not just when :meth:`.Session.close` were called.
- # add another Product with same primary key
- try:
- with session.begin_nested():
- session.add(Product(id=1))
- except exc.IntegrityError:
- print("row already exists")
+In particular, this causes a behavioral change for applications which
+subscribe to the :meth:`.SessionEvents.after_transaction_create` event hook;
+previously, this event would be emitted when the :class:`.Session` were first
+constructed, as well as for most actions that closed the previous transaction
+and would emit :meth:`.SessionEvents.after_transaction_end`. The new behavior
+is that :meth:`.SessionEvents.after_transaction_create` is emitted on demand,
+when the :class:`.Session` has not yet created a new
+:class:`.SessionTransaction` object and mapped objects are associated with the
+:class:`.Session` through methods like :meth:`.Session.add` and
+:meth:`.Session.delete`, when the :attr:`.Session.transaction` attribute is
+called upon, when the :meth:`.Session.flush` method has tasks to complete, etc.
-The above logic was not fully feasible earlier, as in the case that the
-``Product`` object with the existing identity were already in the
-:class:`.Session`, the code would also have to catch :class:`.FlushError`,
-which additionally is not filtered for the specific condition of integrity
-issues. With the change, the above block behaves consistently with the
-exception of the warning also being emitted.
+Besides the change in when the :meth:`.SessionEvents.after_transaction_create`
+event is emitted, the change should have no other user-visible impact on the
+:class:`.Session` object's behavior; the :class:`.Session` will continue to have
+the behavior that it remains usable for new operations after :meth:`.Session.close`
+is called, and the sequencing of how the :class:`.Session` interacts with the
+:class:`_engine.Engine` and the database itself should also remain unaffected, since
+these operations were already operating in an on-demand fashion.
-Since the logic in question deals with the primary key, all databases emit an
-integrity error in the case of primary key conflicts on INSERT. The case
-where an error would not be raised, that would have earlier, is the extremely
-unusual scenario of a mapping that defines a primary key on the mapped
-selectable that is more restrictive than what is actually configured in the
-database schema, such as when mapping to joins of tables or when defining
-additional columns as part of a composite primary key that is not actually
-constrained in the database schema. However, these situations also work more
-consistently in that the INSERT would theoretically proceed whether or not the
-existing identity were still in the database. The warning can also be
-configured to raise an exception using the Python warnings filter.
+:ticket:`5074`
+.. _change_5237_14:
-:ticket:`4662`
+Viewonly relationships don't synchronize backrefs
+-------------------------------------------------
+
+In :ticket:`5149` in 1.3.14, SQLAlchemy began emitting a warning when the
+:paramref:`_orm.relationship.backref` or :paramref:`_orm.relationship.back_populates`
+keywords would be used at the same time as the :paramref:`_orm.relationship.viewonly`
+flag on the target relationship. This was because a "viewonly" relationship does
+not actually persist changes made to it, which could cause some misleading
+behaviors to occur. However, in :ticket:`5237`, we sought to refine this
+behavior as there are legitimate use cases to have backrefs set up on
+viewonly relationships, including that back populates attributes are used
+in some cases by the relationship lazy loaders to determine that an additional
+eager load in the other direction is not necessary, as well as that back
+populates can be used for mapper introspection and that :func:`_orm.backref`
+can be a convenient way to set up bi-directional relationships.
+
+The solution then was to make the "mutation" that occurs from a backref
+an optional thing, using the :paramref:`_orm.relationship.sync_backref`
+flag. In 1.4 the value of :paramref:`_orm.relationship.sync_backref` defaults
+to False for a relationship target that also sets :paramref:`_orm.relationship.viewonly`.
+This indicates that any changes made to a relationship with
+viewonly will not impact the state of the other side or of the :class:`_orm.Session`
+in any way::
+
+
+ class User(Base):
+ # ...
+
+ addresses = relationship(Address, backref=backref("user", viewonly=True))
+
+ class Address(Base):
+ # ...
+
+
+ u1 = session.query(User).filter_by(name="x").first()
+
+ a1 = Address()
+ a1.user = u1
+
+Above, the ``a1`` object will **not** be added to the ``u1.addresses``
+collection, nor will the ``a1`` object be added to the session. Previously,
+both of these things would be true. The warning that
+:paramref:`.relationship.sync_backref` should be set to ``False`` when
+:paramref:`.relationship.viewonly` is ``False`` is no longer emitted as this is
+now the default behavior.
+
+:ticket:`5237`
.. _change_5150:
:ticket:`5150`
-.. _change_4994:
+.. _change_1763:
-Persistence-related cascade operations disallowed with viewonly=True
----------------------------------------------------------------------
+Eager loaders emit during unexpire operations
+---------------------------------------------
-When a :func:`_orm.relationship` is set as ``viewonly=True`` using the
-:paramref:`_orm.relationship.viewonly` flag, it indicates this relationship should
-only be used to load data from the database, and should not be mutated
-or involved in a persistence operation. In order to ensure this contract
-works successfully, the relationship can no longer specify
-:paramref:`_orm.relationship.cascade` settings that make no sense in terms of
-"viewonly".
+A long sought behavior was that when an expired object is accessed, configured
+eager loaders will run in order to eagerly load relationships on the expired
+object when the object is refreshed or otherwise unexpired. This behavior has
+now been added, so that joinedloaders will add inline JOINs as usual, and
+selectin/subquery loaders will run an "immediateload" operation for a given
+relationship, when an expired object is unexpired or an object is refreshed::
-The primary targets here are the "delete, delete-orphan" cascades, which
-through 1.3 continued to impact persistence even if viewonly were True, which
-is a bug; even if viewonly were True, an object would still cascade these
-two operations onto the related object if the parent were deleted or the
-object were detached. Rather than modify the cascade operations to check
-for viewonly, the configuration of both of these together is simply
-disallowed::
+ >>> a1 = session.query(A).options(joinedload(A.bs)).first()
+ >>> a1.data = 'new data'
+ >>> session.commit()
- class User(Base):
- # ...
+Above, the ``A`` object was loaded with a ``joinedload()`` option associated
+with it in order to eagerly load the ``bs`` collection. After the
+``session.commit()``, the state of the object is expired. Upon accessing
+the ``.data`` column attribute, the object is refreshed and this will now
+include the joinedload operation as well::
- # this is now an error
- addresses = relationship(
- "Address", viewonly=True, cascade="all, delete-orphan")
+ >>> a1.data
+ SELECT a.id AS a_id, a.data AS a_data, b_1.id AS b_1_id, b_1.a_id AS b_1_a_id
+ FROM a LEFT OUTER JOIN b AS b_1 ON a.id = b_1.a_id
+ WHERE a.id = ?
-The above will raise::
+The behavior applies both to loader strategies applied to the
+:func:`_orm.relationship` directly, as well as with options used with
+:meth:`_query.Query.options`, provided that the object was originally loaded by that
+query.
- sqlalchemy.exc.ArgumentError: Cascade settings
- "delete, delete-orphan, merge, save-update" apply to persistence
- operations and should not be combined with a viewonly=True relationship.
+For the "secondary" eager loaders "selectinload" and "subqueryload", the SQL
+strategy for these loaders is not necessary in order to eagerly load attributes
+on a single object; so they will instead invoke the "immediateload" strategy in
+a refresh scenario, which resembles the query emitted by "lazyload", emitted as
+an additional query::
-Applications that have this issue should be emitting a warning as of
-SQLAlchemy 1.3.12, and for the above error the solution is to remove
-the cascade settings for a viewonly relationship.
+ >>> a1 = session.query(A).options(selectinload(A.bs)).first()
+ >>> a1.data = 'new data'
+ >>> session.commit()
+ >>> a1.data
+ SELECT a.id AS a_id, a.data AS a_data
+ FROM a
+ WHERE a.id = ?
+ (1,)
+ SELECT b.id AS b_id, b.a_id AS b_a_id
+ FROM b
+ WHERE ? = b.a_id
+ (1,)
+Note that a loader option does not apply to an object that was introduced
+into the :class:`.Session` in a different way. That is, if the ``a1`` object
+were just persisted in this :class:`.Session`, or was loaded with a different
+query before the eager option had been applied, then the object doesn't have
+an eager load option associated with it. This is not a new concept, however
+users who are looking for the eagerload on refresh behavior may find this
+to be more noticeable.
-:ticket:`4993`
-:ticket:`4994`
+:ticket:`1763`
-.. _change_5122:
+.. _change_4519:
-Stricter behavior when querying inheritance mappings using custom queries
--------------------------------------------------------------------------
+Accessing an uninitialized collection attribute on a transient object no longer mutates __dict__
+-------------------------------------------------------------------------------------------------
-This change applies to the scenario where a joined- or single- table
-inheritance subclass entity is being queried, given a completed SELECT subquery
-to select from. If the given subquery returns rows that do not correspond to
-the requested polymorphic identity or identities, an error is raised.
-Previously, this condition would pass silently under joined table inheritance,
-returning an invalid subclass, and under single table inheritance, the
-:class:`_query.Query` would be adding additional criteria against the subquery to
-limit the results which could inappropriately interfere with the intent of the
-query.
+It has always been SQLAlchemy's behavior that accessing mapped attributes on a
+newly created object returns an implicitly generated value, rather than raising
+``AttributeError``, such as ``None`` for scalar attributes or ``[]`` for a
+list-holding relationship::
-Given the example mapping of ``Employee``, ``Engineer(Employee)``, ``Manager(Employee)``,
-in the 1.3 series if we were to emit the following query against a joined
-inheritance mapping::
+ >>> u1 = User()
+ >>> u1.name
+ None
+ >>> u1.addresses
+ []
- s = Session(e)
+The rationale for the above behavior was originally to make ORM objects easier
+to work with. Since an ORM object represents an empty row when first created
+without any state, it is intuitive that its un-accessed attributes would
+resolve to ``None`` (or SQL NULL) for scalars and to empty collections for
+relationships. In particular, it makes possible an extremely common pattern
+of being able to mutate the new collection without manually creating and
+assigning an empty collection first::
- s.add_all([Engineer(), Manager()])
+ >>> u1 = User()
+ >>> u1.addresses.append(Address()) # no need to assign u1.addresses = []
- s.commit()
+Up until version 1.0 of SQLAlchemy, the behavior of this initialization system
+for both scalar attributes as well as collections would be that the ``None`` or
+empty collection would be *populated* into the object's state, e.g.
+``__dict__``. This meant that the following two operations were equivalent::
- print(
- s.query(Manager).select_entity_from(s.query(Employee).subquery()).all()
- )
+ >>> u1 = User()
+ >>> u1.name = None # explicit assignment
+ >>> u2 = User()
+ >>> u2.name # implicit assignment just by accessing it
+ None
-The subquery selects both the ``Engineer`` and the ``Manager`` rows, and
+Where above, both ``u1`` and ``u2`` would have the value ``None`` populated
+in the value of the ``name`` attribute. Since this is a SQL NULL, the ORM
+would skip including these values within an INSERT so that SQL-level defaults
+take place, if any, else the value defaults to NULL on the database side.
+
+In version 1.0 as part of :ref:`migration_3061`, this behavior was refined so
+that the ``None`` value was no longer populated into ``__dict__``, only
+returned. Besides removing the mutating side effect of a getter operation,
+this change also made it possible to set columns that did have server defaults
+to the value NULL by actually assigning ``None``, which was now distinguished
+from just reading it.
+
+The change however did not accommodate for collections, where returning an
+empty collection that is not assigned meant that this mutable collection would
+be different each time and also would not be able to correctly accommodate for
+mutating operations (e.g. append, add, etc.) called upon it. While the
+behavior continued to generally not get in anyone's way, an edge case was
+eventually identified in :ticket:`4519` where this empty collection could be
+harmful, which is when the object is merged into a session::
+
+ >>> u1 = User(id=1) # create an empty User to merge with id=1 in the database
+ >>> merged1 = session.merge(u1) # value of merged1.addresses is unchanged from that of the DB
+
+ >>> u2 = User(id=2) # create an empty User to merge with id=2 in the database
+ >>> u2.addresses
+ []
+ >>> merged2 = session.merge(u2) # value of merged2.addresses has been emptied in the DB
+
+Above, the ``.addresses`` collection on ``merged1`` will contain all the
+``Address()`` objects that were already in the database. ``merged2`` will
+not; because it has an empty list implicitly assigned, the ``.addresses``
+collection will be erased. This is an example of where this mutating side
+effect can actually mutate the database itself.
+
+While it was considered that perhaps the attribute system should begin using
+strict "plain Python" behavior, raising ``AttributeError`` in all cases for
+non-existent attributes on non-persistent objects and requiring that all
+collections be explicitly assigned, such a change would likely be too extreme
+for the vast number of applications that have relied upon this behavior for
+many years, leading to a complex rollout / backwards compatibility problem as
+well as the likelihood that workarounds to restore the old behavior would
+become prevalent, thus rendering the whole change ineffective in any case.
+
+The change then is to keep the default producing behavior, but to finally make
+the non-mutating behavior of scalars a reality for collections as well, via the
+addition of additional mechanics in the collection system. When accessing the
+empty attribute, the new collection is created and associated with the state,
+however is not added to ``__dict__`` until it is actually mutated::
+
+ >>> u1 = User()
+ >>> l1 = u1.addresses # new list is created, associated with the state
+ >>> assert u1.addresses is l1 # you get the same list each time you access it
+ >>> assert "addresses" not in u1.__dict__ # but it won't go into __dict__ until it's mutated
+ >>> from sqlalchemy import inspect
+ >>> inspect(u1).attrs.addresses.history
+ History(added=None, unchanged=None, deleted=None)
+
+When the list is changed, then it becomes part of the tracked changes to
+be persisted to the database::
+
+ >>> l1.append(Address())
+ >>> assert "addresses" in u1.__dict__
+ >>> inspect(u1).attrs.addresses.history
+ History(added=[<__main__.Address object at 0x7f49b725eda0>], unchanged=[], deleted=[])
+
+This change is expected to have *nearly* no impact on existing applications
+in any way, except that it has been observed that some applications may be
+relying upon the implicit assignment of this collection, such as to assert that
+the object contains certain values based on its ``__dict__``::
+
+ >>> u1 = User()
+ >>> u1.addresses
+ []
+ # this will now fail, would pass before
+ >>> assert {k: v for k, v in u1.__dict__.items() if not k.startswith("_")} == {"addresses": []}
+
+or to ensure that the collection won't require a lazy load to proceed, the
+(admittedly awkward) code below will now also fail::
+
+ >>> u1 = User()
+ >>> u1.addresses
+ []
+ >>> s.add(u1)
+ >>> s.flush()
+ >>> s.close()
+ >>> u1.addresses # <-- will fail, .addresses is not loaded and object is detached
+
+Applications that rely upon the implicit mutating behavior of collections will
+need to be changed so that they assign the desired collection explicitly::
+
+ >>> u1.addresses = []
+
+:ticket:`4519`
+
+.. _change_4662:
+
+The "New instance conflicts with existing identity" error is now a warning
+---------------------------------------------------------------------------
+
+SQLAlchemy has always had logic to detect when an object in the :class:`.Session`
+to be inserted has the same primary key as an object that is already present::
+
+ class Product(Base):
+ __tablename__ = 'product'
+
+ id = Column(Integer, primary_key=True)
+
+ session = Session(engine)
+
+ # add Product with primary key 1
+ session.add(Product(id=1))
+ session.flush()
+
+ # add another Product with same primary key
+ session.add(Product(id=1))
+ s.commit() # <-- will raise FlushError
+
+The change is that the :class:`.FlushError` is altered to be only a warning::
+
+ sqlalchemy/orm/persistence.py:408: SAWarning: New instance <Product at 0x7f1ff65e0ba8> with identity key (<class '__main__.Product'>, (1,), None) conflicts with persistent instance <Product at 0x7f1ff60a4550>
+
+
+Subsequent to that, the condition will attempt to insert the row into the
+database which will emit :class:`.IntegrityError`, which is the same error that
+would be raised if the primary key identity was not already present in the
+:class:`.Session`::
+
+ sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: product.id
+
+The rationale is to allow code that is using :class:`.IntegrityError` to catch
+duplicates to function regardless of the existing state of the
+:class:`.Session`, as is often done using savepoints::
+
+
+ # add another Product with same primary key
+ try:
+ with session.begin_nested():
+ session.add(Product(id=1))
+ except exc.IntegrityError:
+ print("row already exists")
+
+The above logic was not fully feasible earlier, as in the case that the
+``Product`` object with the existing identity were already in the
+:class:`.Session`, the code would also have to catch :class:`.FlushError`,
+which additionally is not filtered for the specific condition of integrity
+issues. With the change, the above block behaves consistently with the
+exception of the warning also being emitted.
+
+Since the logic in question deals with the primary key, all databases emit an
+integrity error in the case of primary key conflicts on INSERT. The case
+where an error would not be raised, that would have earlier, is the extremely
+unusual scenario of a mapping that defines a primary key on the mapped
+selectable that is more restrictive than what is actually configured in the
+database schema, such as when mapping to joins of tables or when defining
+additional columns as part of a composite primary key that is not actually
+constrained in the database schema. However, these situations also work more
+consistently in that the INSERT would theoretically proceed whether or not the
+existing identity were still in the database. The warning can also be
+configured to raise an exception using the Python warnings filter.
+
+
+:ticket:`4662`
+
+.. _change_4994:
+
+Persistence-related cascade operations disallowed with viewonly=True
+---------------------------------------------------------------------
+
+When a :func:`_orm.relationship` is set as ``viewonly=True`` using the
+:paramref:`_orm.relationship.viewonly` flag, it indicates this relationship should
+only be used to load data from the database, and should not be mutated
+or involved in a persistence operation. In order to ensure this contract
+works successfully, the relationship can no longer specify
+:paramref:`_orm.relationship.cascade` settings that make no sense in terms of
+"viewonly".
+
+The primary targets here are the "delete, delete-orphan" cascades, which
+through 1.3 continued to impact persistence even if viewonly were True, which
+is a bug; even if viewonly were True, an object would still cascade these
+two operations onto the related object if the parent were deleted or the
+object were detached. Rather than modify the cascade operations to check
+for viewonly, the configuration of both of these together is simply
+disallowed::
+
+ class User(Base):
+ # ...
+
+ # this is now an error
+ addresses = relationship(
+ "Address", viewonly=True, cascade="all, delete-orphan")
+
+The above will raise::
+
+ sqlalchemy.exc.ArgumentError: Cascade settings
+ "delete, delete-orphan, merge, save-update" apply to persistence
+ operations and should not be combined with a viewonly=True relationship.
+
+Applications that have this issue should be emitting a warning as of
+SQLAlchemy 1.3.12, and for the above error the solution is to remove
+the cascade settings for a viewonly relationship.
+
+
+:ticket:`4993`
+:ticket:`4994`
+
+.. _change_5122:
+
+Stricter behavior when querying inheritance mappings using custom queries
+-------------------------------------------------------------------------
+
+This change applies to the scenario where a joined- or single- table
+inheritance subclass entity is being queried, given a completed SELECT subquery
+to select from. If the given subquery returns rows that do not correspond to
+the requested polymorphic identity or identities, an error is raised.
+Previously, this condition would pass silently under joined table inheritance,
+returning an invalid subclass, and under single table inheritance, the
+:class:`_query.Query` would be adding additional criteria against the subquery to
+limit the results which could inappropriately interfere with the intent of the
+query.
+
+Given the example mapping of ``Employee``, ``Engineer(Employee)``, ``Manager(Employee)``,
+in the 1.3 series if we were to emit the following query against a joined
+inheritance mapping::
+
+ s = Session(e)
+
+ s.add_all([Engineer(), Manager()])
+
+ s.commit()
+
+ print(
+ s.query(Manager).select_entity_from(s.query(Employee).subquery()).all()
+ )
+
+
+The subquery selects both the ``Engineer`` and the ``Manager`` rows, and
even though the outer query is against ``Manager``, we get a non ``Manager``
object back::
:ticket:`5122`
-
-New Features - Core
-====================
-
-.. _change_4737:
-
-
-Built-in FROM linting will warn for any potential cartesian products in a SELECT statement
-------------------------------------------------------------------------------------------
-
-As the Core expression language as well as the ORM are built on an "implicit
-FROMs" model where a particular FROM clause is automatically added if any part
-of the query refers to it, a common issue is the case where a SELECT statement,
-either a top level statement or an embedded subquery, contains FROM elements
-that are not joined to the rest of the FROM elements in the query, causing
-what's referred to as a "cartesian product" in the result set, i.e. every
-possible combination of rows from each FROM element not otherwise joined. In
-relational databases, this is nearly always an undesirable outcome as it
-produces an enormous result set full of duplicated, uncorrelated data.
-
-SQLAlchemy, for all of its great features, is particularly prone to this sort
-of issue happening as a SELECT statement will have elements added to its FROM
-clause automatically from any table seen in the other clauses. A typical
-scenario looks like the following, where two tables are JOINed together,
-however an additional entry in the WHERE clause that perhaps inadvertently does
-not line up with these two tables will create an additional FROM entry::
-
- address_alias = aliased(Address)
-
- q = session.query(User).\
- join(address_alias, User.addresses).\
- filter(Address.email_address == 'foo')
-
-The above query selects from a JOIN of ``User`` and ``address_alias``, the
-latter of which is an alias of the ``Address`` entity. However, the
-``Address`` entity is used within the WHERE clause directly, so the above would
-result in the SQL::
-
- SELECT
- users.id AS users_id, users.name AS users_name,
- users.fullname AS users_fullname,
- users.nickname AS users_nickname
- FROM addresses, users JOIN addresses AS addresses_1 ON users.id = addresses_1.user_id
- WHERE addresses.email_address = :email_address_1
-
-In the above SQL, we can see what SQLAlchemy developers term "the dreaded
-comma", as we see "FROM addresses, users JOIN addresses" in the FROM clause
-which is the classic sign of a cartesian product; where a query is making use
-of JOIN in order to join FROM clauses together, however because one of them is
-not joined, it uses a comma. The above query will return a full set of
-rows that join the "user" and "addresses" table together on the "id / user_id"
-column, and will then apply all those rows into a cartesian product against
-every row in the "addresses" table directly. That is, if there are ten user
-rows and 100 rows in addresses, the above query will return its expected result
-rows, likely to be 100 as all address rows would be selected, multiplied by 100
-again, so that the total result size would be 10000 rows.
-
-The "table1, table2 JOIN table3" pattern is one that also occurs quite
-frequently within the SQLAlchemy ORM due to either subtle mis-application of
-ORM features particularly those related to joined eager loading or joined table
-inheritance, as well as a result of SQLAlchemy ORM bugs within those same
-systems. Similar issues apply to SELECT statements that use "implicit joins",
-where the JOIN keyword is not used and instead each FROM element is linked with
-another one via the WHERE clause.
-
-For some years there has been a recipe on the Wiki that applies a graph
-algorithm to a :func:`_expression.select` construct at query execution time and inspects
-the structure of the query for these un-linked FROM clauses, parsing through
-the WHERE clause and all JOIN clauses to determine how FROM elements are linked
-together and ensuring that all the FROM elements are connected in a single
-graph. This recipe has now been adapted to be part of the :class:`.SQLCompiler`
-itself where it now optionally emits a warning for a statement if this
-condition is detected. The warning is enabled using the
-:paramref:`_sa.create_engine.enable_from_linting` flag and is enabled by default.
-The computational overhead of the linter is very low, and additionally it only
-occurs during statement compilation which means for a cached SQL statement it
-only occurs once.
-
-Using this feature, our ORM query above will emit a warning::
-
- >>> q.all()
- SAWarning: SELECT statement has a cartesian product between FROM
- element(s) "addresses_1", "users" and FROM element "addresses".
- Apply join condition(s) between each element to resolve.
-
-The linter feature accommodates not just for tables linked together through the
-JOIN clauses but also through the WHERE clause Above, we can add a WHERE
-clause to link the new ``Address`` entity with the previous ``address_alias``
-entity and that will remove the warning::
-
- q = session.query(User).\
- join(address_alias, User.addresses).\
- filter(Address.email_address == 'foo').\
- filter(Address.id == address_alias.id) # resolve cartesian products,
- # will no longer warn
-
-The cartesian product warning considers **any** kind of link between two
-FROM clauses to be a resolution, even if the end result set is still
-wasteful, as the linter is intended only to detect the common case of a
-FROM clause that is completely unexpected. If the FROM clause is referred
-to explicitly elsewhere and linked to the other FROMs, no warning is emitted::
-
- q = session.query(User).\
- join(address_alias, User.addresses).\
- filter(Address.email_address == 'foo').\
- filter(Address.id > address_alias.id) # will generate a lot of rows,
- # but no warning
-
-Full cartesian products are also allowed if they are explicitly stated; if we
-wanted for example the cartesian product of ``User`` and ``Address``, we can
-JOIN on :func:`.true` so that every row will match with every other; the
-following query will return all rows and produce no warnings::
-
- from sqlalchemy import true
-
- # intentional cartesian product
- q = session.query(User).join(Address, true()) # intentional cartesian product
-
-The warning is only generated by default when the statement is compiled by the
-:class:`_engine.Connection` for execution; calling the :meth:`_expression.ClauseElement.compile`
-method will not emit a warning unless the linting flag is supplied::
-
- >>> from sqlalchemy.sql import FROM_LINTING
- >>> print(q.statement.compile(linting=FROM_LINTING))
- SAWarning: SELECT statement has a cartesian product between FROM element(s) "addresses" and FROM element "users". Apply join condition(s) between each element to resolve.
- SELECT users.id, users.name, users.fullname, users.nickname
- FROM addresses, users JOIN addresses AS addresses_1 ON users.id = addresses_1.user_id
- WHERE addresses.email_address = :email_address_1
-
-:ticket:`4737`
-
-
-
-Behavior Changes - Core
-========================
-
-.. _change_4753:
-
-SELECT objects and derived FROM clauses allow for duplicate columns and column labels
--------------------------------------------------------------------------------------
-
-This change allows that the :func:`_expression.select` construct now allows for duplicate
-column labels as well as duplicate column objects themselves, so that result
-tuples are organized and ordered in the identical way in that the columns were
-selected. The ORM :class:`_query.Query` already works this way, so this change
-allows for greater cross-compatibility between the two, which is a key goal of
-the 2.0 transition::
-
- >>> from sqlalchemy import column, select
- >>> c1, c2, c3, c4 = column('c1'), column('c2'), column('c3'), column('c4')
- >>> stmt = select([c1, c2, c3.label('c2'), c2, c4])
- >>> print(stmt)
- SELECT c1, c2, c3 AS c2, c2, c4
-
-To support this change, the :class:`_expression.ColumnCollection` used by
-:class:`_expression.SelectBase` as well as for derived FROM clauses such as subqueries
-also support duplicate columns; this includes the new
-:attr:`_expression.SelectBase.selected_columns` attribute, the deprecated ``SelectBase.c``
-attribute, as well as the :attr:`_expression.FromClause.c` attribute seen on constructs
-such as :class:`.Subquery` and :class:`_expression.Alias`::
-
- >>> list(stmt.selected_columns)
- [
- <sqlalchemy.sql.elements.ColumnClause at 0x7fa540bcca20; c1>,
- <sqlalchemy.sql.elements.ColumnClause at 0x7fa540bcc9e8; c2>,
- <sqlalchemy.sql.elements.Label object at 0x7fa540b3e2e8>,
- <sqlalchemy.sql.elements.ColumnClause at 0x7fa540bcc9e8; c2>,
- <sqlalchemy.sql.elements.ColumnClause at 0x7fa540897048; c4>
- ]
-
- >>> print(stmt.subquery().select())
- SELECT anon_1.c1, anon_1.c2, anon_1.c2, anon_1.c2, anon_1.c4
- FROM (SELECT c1, c2, c3 AS c2, c2, c4) AS anon_1
-
-:class:`_expression.ColumnCollection` also allows access by integer index to support
-when the string "key" is ambiguous::
-
- >>> stmt.selected_columns[2]
- <sqlalchemy.sql.elements.Label object at 0x7fa540b3e2e8>
-
-To suit the use of :class:`_expression.ColumnCollection` in objects such as
-:class:`_schema.Table` and :class:`.PrimaryKeyConstraint`, the old "deduplicating"
-behavior which is more critical for these objects is preserved in a new class
-:class:`.DedupeColumnCollection`.
-
-The change includes that the familiar warning ``"Column %r on table %r being
-replaced by %r, which has the same key. Consider use_labels for select()
-statements."`` is **removed**; the :meth:`_expression.Select.apply_labels` is still
-available and is still used by the ORM for all SELECT operations, however it
-does not imply deduplication of column objects, although it does imply
-deduplication of implicitly generated labels::
-
- >>> from sqlalchemy import table
- >>> user = table('user', column('id'), column('name'))
- >>> stmt = select([user.c.id, user.c.name, user.c.id]).apply_labels()
- >>> print(stmt)
- SELECT "user".id AS user_id, "user".name AS user_name, "user".id AS id_1
- FROM "user"
-
-Finally, the change makes it easier to create UNION and other
-:class:`_selectable.CompoundSelect` objects, by ensuring that the number and position
-of columns in a SELECT statement mirrors what was given, in a use case such
-as::
-
- >>> s1 = select([user, user.c.id])
- >>> s2 = select([c1, c2, c3])
- >>> from sqlalchemy import union
- >>> u = union(s1, s2)
- >>> print(u)
- SELECT "user".id, "user".name, "user".id
- FROM "user" UNION SELECT c1, c2, c3
-
-
-
-:ticket:`4753`
-
-
-
-.. _change_4449:
-
-Improved column labeling for simple column expressions using CAST or similar
-----------------------------------------------------------------------------
-
-A user pointed out that the PostgreSQL database has a convenient behavior when
-using functions like CAST against a named column, in that the result column name
-is named the same as the inner expression::
-
- test=> SELECT CAST(data AS VARCHAR) FROM foo;
-
- data
- ------
- 5
- (1 row)
-
-This allows one to apply CAST to table columns while not losing the column
-name (above using the name ``"data"``) in the result row. Compare to
-databases such as MySQL/MariaDB, as well as most others, where the column
-name is taken from the full SQL expression and is not very portable::
-
- MariaDB [test]> SELECT CAST(data AS CHAR) FROM foo;
- +--------------------+
- | CAST(data AS CHAR) |
- +--------------------+
- | 5 |
- +--------------------+
- 1 row in set (0.003 sec)
-
-
-In SQLAlchemy Core expressions, we never deal with a raw generated name like
-the above, as SQLAlchemy applies auto-labeling to expressions like these, which
-are up until now always a so-called "anonymous" expression::
-
- >>> print(select([cast(foo.c.data, String)]))
- SELECT CAST(foo.data AS VARCHAR) AS anon_1 # old behavior
- FROM foo
-
-These anonymous expressions were necessary as SQLAlchemy's
-:class:`_engine.ResultProxy` made heavy use of result column names in order to match
-up datatypes, such as the :class:`.String` datatype which used to have
-result-row-processing behavior, to the correct column, so most importantly the
-names had to be both easy to determine in a database-agnostic manner as well as
-unique in all cases. In SQLAlchemy 1.0 as part of :ticket:`918`, this
-reliance on named columns in result rows (specifically the
-``cursor.description`` element of the PEP-249 cursor) was scaled back to not be
-necessary for most Core SELECT constructs; in release 1.4, the system overall
-is becoming more comfortable with SELECT statements that have duplicate column
-or label names such as in :ref:`change_4753`. So we now emulate PostgreSQL's
-reasonable behavior for simple modifications to a single column, most
-prominently with CAST::
-
- >>> print(select([cast(foo.c.data, String)]))
- SELECT CAST(foo.data AS VARCHAR) AS data
- FROM foo
-
-For CAST against expressions that don't have a name, the previous logic is used
-to generate the usual "anonymous" labels::
-
- >>> print(select([cast('hi there,' + foo.c.data, String)]))
- SELECT CAST(:data_1 + foo.data AS VARCHAR) AS anon_1
- FROM foo
-
-A :func:`.cast` against a :class:`.Label`, despite having to omit the label
-expression as these don't render inside of a CAST, will nonetheless make use of
-the given name::
-
- >>> print(select([cast(('hi there,' + foo.c.data).label('hello_data'), String)]))
- SELECT CAST(:data_1 + foo.data AS VARCHAR) AS hello_data
- FROM foo
-
-And of course as was always the case, :class:`.Label` can be applied to the
-expression on the outside to apply an "AS <name>" label directly::
-
- >>> print(select([cast(('hi there,' + foo.c.data), String).label('hello_data')]))
- SELECT CAST(:data_1 + foo.data AS VARCHAR) AS hello_data
- FROM foo
-
-
-:ticket:`4449`
-
-.. _change_4808:
-
-New "post compile" bound parameters used for LIMIT/OFFSET in Oracle, SQL Server
--------------------------------------------------------------------------------
-
-A major goal of the 1.4 series is to establish that all Core SQL constructs
-are completely cacheable, meaning that a particular :class:`.Compiled`
-structure will produce an identical SQL string regardless of any SQL parameters
-used with it, which notably includes those used to specify the LIMIT and
-OFFSET values, typically used for pagination and "top N" style results.
-
-While SQLAlchemy has used bound parameters for LIMIT/OFFSET schemes for many
-years, a few outliers remained where such parameters were not allowed, including
-a SQL Server "TOP N" statement, such as::
-
- SELECT TOP 5 mytable.id, mytable.data FROM mytable
-
-as well as with Oracle, where the FIRST_ROWS() hint (which SQLAlchemy will
-use if the ``optimize_limits=True`` parameter is passed to
-:func:`_sa.create_engine` with an Oracle URL) does not allow them,
-but also that using bound parameters with ROWNUM comparisons has been reported
-as producing slower query plans::
-
- SELECT anon_1.id, anon_1.data FROM (
- SELECT /*+ FIRST_ROWS(5) */
- anon_2.id AS id,
- anon_2.data AS data,
- ROWNUM AS ora_rn FROM (
- SELECT mytable.id, mytable.data FROM mytable
- ) anon_2
- WHERE ROWNUM <= :param_1
- ) anon_1 WHERE ora_rn > :param_2
-
-In order to allow for all statements to be unconditionally cacheable at the
-compilation level, a new form of bound parameter called a "post compile"
-parameter has been added, which makes use of the same mechanism as that
-of "expanding IN parameters". This is a :func:`.bindparam` that behaves
-identically to any other bound parameter except that parameter value will
-be rendered literally into the SQL string before sending it to the DBAPI
-``cursor.execute()`` method. The new parameter is used internally by the
-SQL Server and Oracle dialects, so that the drivers receive the literal
-rendered value but the rest of SQLAlchemy can still consider this as a
-bound parameter. The above two statements when stringified using
-``str(statement.compile(dialect=<dialect>))`` now look like::
-
- SELECT TOP [POSTCOMPILE_param_1] mytable.id, mytable.data FROM mytable
-
-and::
-
- SELECT anon_1.id, anon_1.data FROM (
- SELECT /*+ FIRST_ROWS([POSTCOMPILE__ora_frow_1]) */
- anon_2.id AS id,
- anon_2.data AS data,
- ROWNUM AS ora_rn FROM (
- SELECT mytable.id, mytable.data FROM mytable
- ) anon_2
- WHERE ROWNUM <= [POSTCOMPILE_param_1]
- ) anon_1 WHERE ora_rn > [POSTCOMPILE_param_2]
-
-The ``[POSTCOMPILE_<param>]`` format is also what is seen when an
-"expanding IN" is used.
-
-When viewing the SQL logging output, the final form of the statement will
-be seen::
-
- SELECT anon_1.id, anon_1.data FROM (
- SELECT /*+ FIRST_ROWS(5) */
- anon_2.id AS id,
- anon_2.data AS data,
- ROWNUM AS ora_rn FROM (
- SELECT mytable.id AS id, mytable.data AS data FROM mytable
- ) anon_2
- WHERE ROWNUM <= 8
- ) anon_1 WHERE ora_rn > 3
-
-
-The "post compile parameter" feature is exposed as public API through the
-:paramref:`.bindparam.literal_execute` parameter, however is currently not
-intended for general use. The literal values are rendered using the
-:meth:`.TypeEngine.literal_processor` of the underlying datatype, which in
-SQLAlchemy has **extremely limited** scope, supporting only integers and simple
-string values.
-
-:ticket:`4808`
-
-.. _change_4712:
-
-Connection-level transactions can now be inactive based on subtransaction
--------------------------------------------------------------------------
-
-A :class:`_engine.Connection` now includes the behavior where a :class:`.Transaction`
-can be made inactive due to a rollback on an inner transaction, however the
-:class:`.Transaction` will not clear until it is itself rolled back.
-
-This is essentially a new error condition which will disallow statement
-executions to proceed on a :class:`_engine.Connection` if an inner "sub" transaction
-has been rolled back. The behavior works very similarly to that of the
-ORM :class:`.Session`, where if an outer transaction has been begun, it needs
-to be rolled back to clear the invalid transaction; this behavior is described
-in :ref:`faq_session_rollback`
-
-While the :class:`_engine.Connection` has had a less strict behavioral pattern than
-the :class:`.Session`, this change was made as it helps to identify when
-a subtransaction has rolled back the DBAPI transaction, however the external
-code isn't aware of this and attempts to continue proceeding, which in fact
-runs operations on a new transaction. The "test harness" pattern described
-at :ref:`session_external_transaction` is the common place for this to occur.
-
-The new behavior is described in the errors page at :ref:`error_8s2a`.
-
-
Dialect Changes
===============