Added documentation about interaction between `subqueryload` and LIMIT/OFFSET.

author Jack Zhou <univerio@gmail.com>

Tue, 29 Jul 2014 18:49:52 +0000 (11:49 -0700)

committer Jack Zhou <univerio@gmail.com>

Thu, 7 Aug 2014 19:08:43 +0000 (12:08 -0700)
author Jack Zhou <univerio@gmail.com>
Tue, 29 Jul 2014 18:49:52 +0000 (11:49 -0700)
committer Jack Zhou <univerio@gmail.com>
Thu, 7 Aug 2014 19:08:43 +0000 (12:08 -0700)
diff --git a/doc/build/faq.rst b/doc/build/faq.rst

index 0c8314cb5de6d2edbde8176b6398cf39cd32bdc1..d642d1de887647138a1852f22d2e6a2f849eeb4b 100644 (file)
--- a/doc/build/faq.rst
+++ b/doc/build/faq.rst
@@ -622,6 +622,90 @@ The same idea applies to all the other arguments, such as ``foreign_keys``::
  
          foo = relationship(Dest, foreign_keys=[foo_id, bar_id])
  
+.. _faq_subqueryload_sort:
+
+Why must I always ``ORDER BY`` a unique column when using ``subqueryload``?
+----------------------------------------------------------------------------
+
+The SQL standard prescribes that RDBMSs are free to return rows in any order it
+deems appropriate, if no ``ORDER BY`` clause is specified. This even extends to
+the case where the ``ORDER BY`` clause is not unique across all rows, i.e. rows
+with the same value in the ``ORDER BY`` column(s) will not necessarily be
+returned in a deterministic order.
+
+SQLAlchemy implements :func:`.orm.subqueryload` by issuing a separate query
+(where the table specified in the relationship is joined to the original query)
+and then attempting to match up the results in Python. This works fine
+normally:
+
+.. sourcecode:: python+sql
+
+    >>> session.query(User).options(subqueryload(User.addresses)).all()
+    {opensql}# the "main" query
+    SELECT users.id AS users_id
+    FROM users
+    {stop}
+    {opensql}# the "load" query issued by subqueryload
+    SELECT addresses.id AS addresses_id, addresses.user_id AS addresses_user_id, anon_1.users_id AS anon_1_users_id
+    FROM (SELECT users.id AS users_id
+    FROM users) AS anon_1 JOIN addresses ON anon_1.users_id = addresses.user_id ORDER BY anon_1.users_id
+
+Notice how the main query is a subquery in the load query. When an
+``OFFSET``/``LIMIT`` is involved, however, things get a bit tricky:
+
+.. sourcecode:: python+sql
+
+    >>> user = session.query(User).options(subqueryload(User.addresses)).first()
+    {opensql}# the "main" query
+    SELECT users.id AS users_id
+    FROM users
+     LIMIT 1
+    {stop}
+    {opensql}# the "load" query issued by subqueryload
+    SELECT addresses.id AS addresses_id, addresses.user_id AS addresses_user_id, anon_1.users_id AS anon_1_users_id
+    FROM (SELECT users.id AS users_id
+    FROM users
+     LIMIT 1) AS anon_1 JOIN addresses ON anon_1.users_id = addresses.user_id ORDER BY anon_1.users_id
+
+The main query is still a subquery in the load query, but *it may return a
+different set of results in the second query from the first* because it does
+not have a deterministic sort order! Depending on database internals, there is
+a chance we may get the following resultset for the two queries::
+
+    +--------+
+    |users_id|
+    +--------+
+    |       1|
+    +--------+
+
+    +------------+-----------------+---------------+
+    |addresses_id|addresses_user_id|anon_1_users_id|
+    +------------+-----------------+---------------+
+    |           3|                2|              2|
+    +------------+-----------------+---------------+
+    |           4|                2|              2|
+    +------------+-----------------+---------------+
+
+From SQLAlchemy's point of view, it didn't get any addresses back for user 1,
+so ``user.addresses`` is empty. Oops.
+
+The solution to this problem is to always specify a deterministic sort order,
+so that the main query always returns the same set of rows. This generally
+means that you should :meth:`.Query.order_by` on a unique column on the table,
+usually the primary key::
+
+    session.query(User).options(subqueryload(User.addresses)).order_by(User.id).first()
+
+You can get away with not doing a sort if the ``OFFSET``/``LIMIT`` does not
+throw away any rows at all, but it's much simpler to remember to always ``ORDER
+BY`` the primary key::
+
+    session.query(User).options(subqueryload(User.addresses)).filter(User.id == 1).first()
+
+Note that :func:`.joinedload` does not suffer from the same problem because
+only one query is ever issued, so the load query cannot be different from the
+main query.
+
  Performance
  ===========
  
diff --git a/doc/build/orm/loading.rst b/doc/build/orm/loading.rst

index 6c2fac004c4c61573dcebb1bee03065e416a24d5..27846b9b2885f390c32b3d0715fe3f33a2dbbb72 100644 (file)
--- a/doc/build/orm/loading.rst
+++ b/doc/build/orm/loading.rst
@@ -120,6 +120,21 @@ query options:
      # set children to load eagerly with a second statement
      session.query(Parent).options(subqueryload('children')).all()
  
+.. _subquery_loading_tips:
+
+Subquery Loading Tips
+^^^^^^^^^^^^^^^^^^^^^
+
+If you have ``LIMIT`` or ``OFFSET`` in your query, you **must** ``ORDER BY`` a
+unique column, generally the primary key of your table, in order to ensure
+correct results (see :ref:`faq_subqueryload_sort`)::
+
+    # incorrect
+    session.query(User).options(subqueryload(User.addresses)).order_by(User.name).first()
+
+    # correct
+    session.query(User).options(subqueryload(User.addresses)).order_by(User.name, User.id).first()
+
  Loading Along Paths
  -------------------
  
diff --git a/doc/build/orm/tutorial.rst b/doc/build/orm/tutorial.rst

index f90dc48d2e30d15a4428bd919ea6991552f078dc..e75eda1ee4f7e42d9b96805e2780cdae05ca5ea6 100644 (file)
--- a/doc/build/orm/tutorial.rst
+++ b/doc/build/orm/tutorial.rst
@@ -1703,6 +1703,12 @@ very easy to use:
      >>> jack.addresses
      [<Address(email_address='jack@google.com')>, <Address(email_address='j25@yahoo.com')>]
  
+.. warning::
+
+   If you use :func:`.subqueryload`, you should generally
+   :meth:`.Query.order_by` on a unique column in order to ensure correct
+   results. See :ref:`subquery_loading_tips`.
+
  Joined Load
  -------------
author	Jack Zhou <univerio@gmail.com>
	Tue, 29 Jul 2014 18:49:52 +0000 (11:49 -0700)
committer	Jack Zhou <univerio@gmail.com>
	Thu, 7 Aug 2014 19:08:43 +0000 (12:08 -0700)
doc/build/faq.rst		patch \| blob \| blame \| history
doc/build/orm/loading.rst		patch \| blob \| blame \| history
doc/build/orm/tutorial.rst		patch \| blob \| blame \| history