- edits to the subqueryload ordering merge

author Mike Bayer <mike_mp@zzzcomputing.com>

Wed, 5 Nov 2014 20:11:13 +0000 (15:11 -0500)

committer Mike Bayer <mike_mp@zzzcomputing.com>

Wed, 5 Nov 2014 20:11:59 +0000 (15:11 -0500)
author Mike Bayer <mike_mp@zzzcomputing.com>
Wed, 5 Nov 2014 20:11:13 +0000 (15:11 -0500)
committer Mike Bayer <mike_mp@zzzcomputing.com>
Wed, 5 Nov 2014 20:11:59 +0000 (15:11 -0500)
diff --git a/doc/build/faq.rst b/doc/build/faq.rst

index fa10ba44b176f762357f7d8bf8b31120a28f16ee..12d8e0acc6605408c952c6ce29491a63fb03b548 100644 (file)
--- a/doc/build/faq.rst
+++ b/doc/build/faq.rst
@@ -603,62 +603,78 @@ The same idea applies to all the other arguments, such as ``foreign_keys``::
  
          foo = relationship(Dest, foreign_keys=[foo_id, bar_id])
  
-.. _faq_subqueryload_sort:
-
-Why must I always ``ORDER BY`` a unique column when using ``subqueryload``?
-----------------------------------------------------------------------------
-
-The SQL standard prescribes that RDBMSs are free to return rows in any order it
-deems appropriate, if no ``ORDER BY`` clause is specified. This even extends to
-the case where the ``ORDER BY`` clause is not unique across all rows, i.e. rows
-with the same value in the ``ORDER BY`` column(s) will not necessarily be
-returned in a deterministic order.
-
-SQLAlchemy implements :func:`.orm.subqueryload` by issuing a separate query
-(where the table specified in the relationship is joined to the original query)
-and then attempting to match up the results in Python. This works fine
-normally:
+.. _faq_subqueryload_limit_sort:
+
+Why is ``ORDER BY`` required with ``LIMIT`` (especially with ``subqueryload()``)?
+---------------------------------------------------------------------------------
+
+A relational database can return rows in any
+arbitrary order, when an explicit ordering is not set.
+While this ordering very often corresponds to the natural
+order of rows within a table, this is not the case for all databases and
+all queries.   The consequence of this is that any query that limits rows
+using ``LIMIT`` or ``OFFSET`` should **always** specify an ``ORDER BY``.
+Otherwise, it is not deterministic which rows will actually be returned.
+
+When we use a SQLAlchemy method like :meth:`.Query.first`, we are in fact
+applying a ``LIMIT`` of one to the query, so without an explicit ordering
+it is not deterministic what row we actually get back.
+While we may not notice this for simple queries on databases that usually
+returns rows in their natural
+order, it becomes much more of an issue if we also use :func:`.orm.subqueryload`
+to load related collections, and we may not be loading the collections
+as intended.
+
+SQLAlchemy implements :func:`.orm.subqueryload` by issuing a separate query,
+the results of which are matched up to the results from the first query.
+We see two queries emitted like this:
  
  .. sourcecode:: python+sql
  
      >>> session.query(User).options(subqueryload(User.addresses)).all()
-    {opensql}# the "main" query
+    {opensql}-- the "main" query
      SELECT users.id AS users_id
      FROM users
      {stop}
-    {opensql}# the "load" query issued by subqueryload
-    SELECT addresses.id AS addresses_id, addresses.user_id AS addresses_user_id, anon_1.users_id AS anon_1_users_id
-    FROM (SELECT users.id AS users_id
-    FROM users) AS anon_1 JOIN addresses ON anon_1.users_id = addresses.user_id ORDER BY anon_1.users_id
-
-Notice how the main query is a subquery in the load query. When an
-``OFFSET``/``LIMIT`` is involved, however, things get a bit tricky:
+    {opensql}-- the "load" query issued by subqueryload
+    SELECT addresses.id AS addresses_id,
+           addresses.user_id AS addresses_user_id,
+           anon_1.users_id AS anon_1_users_id
+    FROM (SELECT users.id AS users_id FROM users) AS anon_1
+    JOIN addresses ON anon_1.users_id = addresses.user_id
+    ORDER BY anon_1.users_id
+
+The second query embeds the first query as a source of rows.
+When the inner query uses ``OFFSET`` and/or ``LIMIT`` without ordering,
+the two queries may not see the same results:
  
  .. sourcecode:: python+sql
  
      >>> user = session.query(User).options(subqueryload(User.addresses)).first()
-    {opensql}# the "main" query
+    {opensql}-- the "main" query
      SELECT users.id AS users_id
      FROM users
       LIMIT 1
      {stop}
-    {opensql}# the "load" query issued by subqueryload
-    SELECT addresses.id AS addresses_id, addresses.user_id AS addresses_user_id, anon_1.users_id AS anon_1_users_id
-    FROM (SELECT users.id AS users_id
-    FROM users
-     LIMIT 1) AS anon_1 JOIN addresses ON anon_1.users_id = addresses.user_id ORDER BY anon_1.users_id
-
-The main query is still a subquery in the load query, but *it may return a
-different set of results in the second query from the first* because it does
-not have a deterministic sort order! Depending on database internals, there is
-a chance we may get the following resultset for the two queries::
-
+    {opensql}-- the "load" query issued by subqueryload
+    SELECT addresses.id AS addresses_id,
+           addresses.user_id AS addresses_user_id,
+           anon_1.users_id AS anon_1_users_id
+    FROM (SELECT users.id AS users_id FROM users LIMIT 1) AS anon_1
+    JOIN addresses ON anon_1.users_id = addresses.user_id
+    ORDER BY anon_1.users_id
+
+Depending on database specifics, there is
+a chance we may get the a result like the following for the two queries::
+
+    -- query #1
      +--------+
      |users_id|
      +--------+
      |       1|
      +--------+
  
+    -- query #2
      +------------+-----------------+---------------+
      |addresses_id|addresses_user_id|anon_1_users_id|
      +------------+-----------------+---------------+
@@ -667,26 +683,28 @@ a chance we may get the following resultset for the two queries::
      |           4|                2|              2|
      +------------+-----------------+---------------+
  
-From SQLAlchemy's point of view, it didn't get any addresses back for user 1,
-so ``user.addresses`` is empty. Oops.
+Above, we receive two ``addresses`` rows for ``user.id`` of 2, and none for
+1.  We've wasted two rows and failed to actually load the collection.  This
+is an insidious error because without looking at the SQL and the results, the
+ORM will not show that there's any issue; if we access the ``addresses``
+for the ``User`` we have, it will emit a lazy load for the collection and we
+won't see that anything actually went wrong.
  
  The solution to this problem is to always specify a deterministic sort order,
  so that the main query always returns the same set of rows. This generally
-means that you should :meth:`.Query.order_by` on a unique column on the table,
-usually the primary key::
+means that you should :meth:`.Query.order_by` on a unique column on the table.
+The primary key is a good choice for this::
  
      session.query(User).options(subqueryload(User.addresses)).order_by(User.id).first()
  
-You can get away with not doing a sort if the ``OFFSET``/``LIMIT`` does not
-throw away any rows at all, but it's much simpler to remember to always ``ORDER
-BY`` the primary key::
-
-    session.query(User).options(subqueryload(User.addresses)).filter(User.id == 1).first()
-
  Note that :func:`.joinedload` does not suffer from the same problem because
  only one query is ever issued, so the load query cannot be different from the
  main query.
  
+.. seealso::
+
+    :ref:`subqueryload_ordering`
+
  Performance
  ===========
  
diff --git a/doc/build/orm/loading.rst b/doc/build/orm/loading.rst

index 27846b9b2885f390c32b3d0715fe3f33a2dbbb72..b2d8124e2c230d049c202992ee5edd23be70d76b 100644 (file)
--- a/doc/build/orm/loading.rst
+++ b/doc/build/orm/loading.rst
@@ -120,21 +120,32 @@ query options:
      # set children to load eagerly with a second statement
      session.query(Parent).options(subqueryload('children')).all()
  
-.. _subquery_loading_tips:
+.. _subqueryload_ordering:
  
-Subquery Loading Tips
-^^^^^^^^^^^^^^^^^^^^^
+The Importance of Ordering
+--------------------------
+
+A query which makes use of :func:`.subqueryload` in conjunction with a
+limiting modifier such as :meth:`.Query.first`, :meth:`.Query.limit`,
+or :meth:`.Query.offset` should **always** include :meth:`.Query.order_by`
+against unique column(s) such as the primary key, so that the additional queries
+emitted by :func:`.subqueryload` include
+the same ordering as used by the parent query.  Without it, there is a chance
+that the inner query could return the wrong rows::
  
-If you have ``LIMIT`` or ``OFFSET`` in your query, you **must** ``ORDER BY`` a
-unique column, generally the primary key of your table, in order to ensure
-correct results (see :ref:`faq_subqueryload_sort`)::
+    # incorrect, no ORDER BY
+    session.query(User).options(subqueryload(User.addresses)).first()
  
-    # incorrect
+    # incorrect if User.name is not unique
      session.query(User).options(subqueryload(User.addresses)).order_by(User.name).first()
  
      # correct
      session.query(User).options(subqueryload(User.addresses)).order_by(User.name, User.id).first()
  
+.. seealso::
+
+    :ref:`faq_subqueryload_limit_sort` - detailed example
+
  Loading Along Paths
  -------------------
  
diff --git a/doc/build/orm/tutorial.rst b/doc/build/orm/tutorial.rst

index d7020b37d39ac9ab1d1c3572427e74954db75422..46c252daf851d282f96028ff8cda6d53be99a032 100644 (file)
--- a/doc/build/orm/tutorial.rst
+++ b/doc/build/orm/tutorial.rst
@@ -1626,11 +1626,12 @@ very easy to use:
      >>> jack.addresses
      [<Address(email_address='jack@google.com')>, <Address(email_address='j25@yahoo.com')>]
  
-.. warning::
+.. note::
  
-   If you use :func:`.subqueryload`, you should generally
-   :meth:`.Query.order_by` on a unique column in order to ensure correct
-   results. See :ref:`subquery_loading_tips`.
+   :func:`.subqueryload` when used in conjunction with limiting such as
+   :meth:`.Query.first`, :meth:`.Query.limit` or :meth:`.Query.offset`
+   should also include :meth:`.Query.order_by` on a unique column in order to
+   ensure correct results.  See :ref:`subqueryload_ordering`.
  
  Joined Load
  -------------
author	Mike Bayer <mike_mp@zzzcomputing.com>
	Wed, 5 Nov 2014 20:11:13 +0000 (15:11 -0500)
committer	Mike Bayer <mike_mp@zzzcomputing.com>
	Wed, 5 Nov 2014 20:11:59 +0000 (15:11 -0500)
doc/build/faq.rst		patch \| blob \| blame \| history
doc/build/orm/loading.rst		patch \| blob \| blame \| history
doc/build/orm/tutorial.rst		patch \| blob \| blame \| history