foo = relationship(Dest, foreign_keys=[foo_id, bar_id])
-.. _faq_subqueryload_sort:
-
-Why must I always ``ORDER BY`` a unique column when using ``subqueryload``?
-----------------------------------------------------------------------------
-
-The SQL standard prescribes that RDBMSs are free to return rows in any order it
-deems appropriate, if no ``ORDER BY`` clause is specified. This even extends to
-the case where the ``ORDER BY`` clause is not unique across all rows, i.e. rows
-with the same value in the ``ORDER BY`` column(s) will not necessarily be
-returned in a deterministic order.
-
-SQLAlchemy implements :func:`.orm.subqueryload` by issuing a separate query
-(where the table specified in the relationship is joined to the original query)
-and then attempting to match up the results in Python. This works fine
-normally:
+.. _faq_subqueryload_limit_sort:
+
+Why is ``ORDER BY`` required with ``LIMIT`` (especially with ``subqueryload()``)?
+---------------------------------------------------------------------------------
+
+A relational database can return rows in any
+arbitrary order, when an explicit ordering is not set.
+While this ordering very often corresponds to the natural
+order of rows within a table, this is not the case for all databases and
+all queries. The consequence of this is that any query that limits rows
+using ``LIMIT`` or ``OFFSET`` should **always** specify an ``ORDER BY``.
+Otherwise, it is not deterministic which rows will actually be returned.
+
+When we use a SQLAlchemy method like :meth:`.Query.first`, we are in fact
+applying a ``LIMIT`` of one to the query, so without an explicit ordering
+it is not deterministic what row we actually get back.
+While we may not notice this for simple queries on databases that usually
+returns rows in their natural
+order, it becomes much more of an issue if we also use :func:`.orm.subqueryload`
+to load related collections, and we may not be loading the collections
+as intended.
+
+SQLAlchemy implements :func:`.orm.subqueryload` by issuing a separate query,
+the results of which are matched up to the results from the first query.
+We see two queries emitted like this:
.. sourcecode:: python+sql
>>> session.query(User).options(subqueryload(User.addresses)).all()
- {opensql}# the "main" query
+ {opensql}-- the "main" query
SELECT users.id AS users_id
FROM users
{stop}
- {opensql}# the "load" query issued by subqueryload
- SELECT addresses.id AS addresses_id, addresses.user_id AS addresses_user_id, anon_1.users_id AS anon_1_users_id
- FROM (SELECT users.id AS users_id
- FROM users) AS anon_1 JOIN addresses ON anon_1.users_id = addresses.user_id ORDER BY anon_1.users_id
-
-Notice how the main query is a subquery in the load query. When an
-``OFFSET``/``LIMIT`` is involved, however, things get a bit tricky:
+ {opensql}-- the "load" query issued by subqueryload
+ SELECT addresses.id AS addresses_id,
+ addresses.user_id AS addresses_user_id,
+ anon_1.users_id AS anon_1_users_id
+ FROM (SELECT users.id AS users_id FROM users) AS anon_1
+ JOIN addresses ON anon_1.users_id = addresses.user_id
+ ORDER BY anon_1.users_id
+
+The second query embeds the first query as a source of rows.
+When the inner query uses ``OFFSET`` and/or ``LIMIT`` without ordering,
+the two queries may not see the same results:
.. sourcecode:: python+sql
>>> user = session.query(User).options(subqueryload(User.addresses)).first()
- {opensql}# the "main" query
+ {opensql}-- the "main" query
SELECT users.id AS users_id
FROM users
LIMIT 1
{stop}
- {opensql}# the "load" query issued by subqueryload
- SELECT addresses.id AS addresses_id, addresses.user_id AS addresses_user_id, anon_1.users_id AS anon_1_users_id
- FROM (SELECT users.id AS users_id
- FROM users
- LIMIT 1) AS anon_1 JOIN addresses ON anon_1.users_id = addresses.user_id ORDER BY anon_1.users_id
-
-The main query is still a subquery in the load query, but *it may return a
-different set of results in the second query from the first* because it does
-not have a deterministic sort order! Depending on database internals, there is
-a chance we may get the following resultset for the two queries::
-
+ {opensql}-- the "load" query issued by subqueryload
+ SELECT addresses.id AS addresses_id,
+ addresses.user_id AS addresses_user_id,
+ anon_1.users_id AS anon_1_users_id
+ FROM (SELECT users.id AS users_id FROM users LIMIT 1) AS anon_1
+ JOIN addresses ON anon_1.users_id = addresses.user_id
+ ORDER BY anon_1.users_id
+
+Depending on database specifics, there is
+a chance we may get the a result like the following for the two queries::
+
+ -- query #1
+--------+
|users_id|
+--------+
| 1|
+--------+
+ -- query #2
+------------+-----------------+---------------+
|addresses_id|addresses_user_id|anon_1_users_id|
+------------+-----------------+---------------+
| 4| 2| 2|
+------------+-----------------+---------------+
-From SQLAlchemy's point of view, it didn't get any addresses back for user 1,
-so ``user.addresses`` is empty. Oops.
+Above, we receive two ``addresses`` rows for ``user.id`` of 2, and none for
+1. We've wasted two rows and failed to actually load the collection. This
+is an insidious error because without looking at the SQL and the results, the
+ORM will not show that there's any issue; if we access the ``addresses``
+for the ``User`` we have, it will emit a lazy load for the collection and we
+won't see that anything actually went wrong.
The solution to this problem is to always specify a deterministic sort order,
so that the main query always returns the same set of rows. This generally
-means that you should :meth:`.Query.order_by` on a unique column on the table,
-usually the primary key::
+means that you should :meth:`.Query.order_by` on a unique column on the table.
+The primary key is a good choice for this::
session.query(User).options(subqueryload(User.addresses)).order_by(User.id).first()
-You can get away with not doing a sort if the ``OFFSET``/``LIMIT`` does not
-throw away any rows at all, but it's much simpler to remember to always ``ORDER
-BY`` the primary key::
-
- session.query(User).options(subqueryload(User.addresses)).filter(User.id == 1).first()
-
Note that :func:`.joinedload` does not suffer from the same problem because
only one query is ever issued, so the load query cannot be different from the
main query.
+.. seealso::
+
+ :ref:`subqueryload_ordering`
+
Performance
===========
# set children to load eagerly with a second statement
session.query(Parent).options(subqueryload('children')).all()
-.. _subquery_loading_tips:
+.. _subqueryload_ordering:
-Subquery Loading Tips
-^^^^^^^^^^^^^^^^^^^^^
+The Importance of Ordering
+--------------------------
+
+A query which makes use of :func:`.subqueryload` in conjunction with a
+limiting modifier such as :meth:`.Query.first`, :meth:`.Query.limit`,
+or :meth:`.Query.offset` should **always** include :meth:`.Query.order_by`
+against unique column(s) such as the primary key, so that the additional queries
+emitted by :func:`.subqueryload` include
+the same ordering as used by the parent query. Without it, there is a chance
+that the inner query could return the wrong rows::
-If you have ``LIMIT`` or ``OFFSET`` in your query, you **must** ``ORDER BY`` a
-unique column, generally the primary key of your table, in order to ensure
-correct results (see :ref:`faq_subqueryload_sort`)::
+ # incorrect, no ORDER BY
+ session.query(User).options(subqueryload(User.addresses)).first()
- # incorrect
+ # incorrect if User.name is not unique
session.query(User).options(subqueryload(User.addresses)).order_by(User.name).first()
# correct
session.query(User).options(subqueryload(User.addresses)).order_by(User.name, User.id).first()
+.. seealso::
+
+ :ref:`faq_subqueryload_limit_sort` - detailed example
+
Loading Along Paths
-------------------