From: Mike Bayer <mike_mp@zzzcomputing.com>
Date: Tue, 2 Aug 2022 18:51:49 +0000 (-0400)
Subject: reword yield_per a bit more
X-Git-Tag: rel_2_0_0b1~139
X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=3ef9fa6d4ff8ade8915000b41c262caf4a88e064;p=thirdparty%2Fsqlalchemy%2Fsqlalchemy.git

reword yield_per a bit more

I'm still not satisified with this section as it is still
too wordy and dense, but at least let's put a better description
of what yield_per actually is and why one might use it at the top.

Change-Id: I10f4d862d9c499044f5718fca0d27ac106289717
---

diff --git a/doc/build/orm/queryguide.rst b/doc/build/orm/queryguide.rst
index 8de7ed2a3c..c38b662c95 100644
--- a/doc/build/orm/queryguide.rst
+++ b/doc/build/orm/queryguide.rst
@@ -1013,12 +1013,43 @@ The ``autoflush`` execution option is equvialent to the
 
 .. _orm_queryguide_yield_per:
 
-Yield Per
-^^^^^^^^^
+Fetching Large Result Sets with Yield Per
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 The ``yield_per`` execution option is an integer value which will cause the
-:class:`_engine.Result` to yield only a fixed count of rows at a time.
-When used as an execution option, ``yield_per`` is equivalent to making use
+:class:`_engine.Result` to buffer only limited number of rows and/or ORM
+objects at a time, before making data available to the client.
+
+Normally, the ORM will construct ORM objects for **all** rows up front,
+assembling them into a single buffer, before passing this buffer to
+the :class:`_engine.Result` object as a source of rows to be returned.
+The rationale for this behavior is to allow correct behavior
+for features such as joined eager loading, uniquifying of results, and the
+general case of result handling logic that relies upon the identity map
+maintaining a consistent state for every object in a result set as it is
+fetched.
+
+The purpose of the ``yield_per`` option is to change this behavior so that the
+ORM result set is optimized for iteration through very large result sets (> 10K
+rows), where the user has determined that the above patterns don't apply. When
+``yield_per`` is used, the ORM will instead batch ORM results into
+sub-collections and yield rows from each sub-collection individually as the
+:class:`_engine.Result` object is iterated, so that the Python interpreter
+doesn't need to declare very large areas of memory which is both time consuming
+and leads to excessive memory use. The option affects both the way the database
+cursor is used as well as how the ORM constructs rows and objects to be
+passed to the :class:`_engine.Result`.
+
+.. tip::
+
+    From the above, it follows that the :class:`_engine.Result` must be
+    consumed in an iterable fashion, that is, using iteration such as
+    ``for row in result`` or using partial row methods such as
+    :meth:`_engine.Result.fetchmany` or :meth:`_engine.Result.partitions`.
+    Calling :meth:`_engine.Result.all` will defeat the purpose of using
+    ``yield_per``.
+
+Using ``yield_per`` is equivalent to making use
 of both the :paramref:`_engine.Connection.execution_options.stream_results`
 execution option, which selects for server side cursors to be used
 by the backend if supported, and the :meth:`_engine.Result.yield_per` method
@@ -1081,12 +1112,6 @@ partitions. The size of each partition defaults to the integer value passed to
     (User(id=1, name='spongebob', fullname='Spongebob Squarepants'),)
     ...
 
-The purpose of "yield per" is when fetching very large result sets
-(> 10K rows), to batch results in sub-collections and yield them
-out partially, so that the Python interpreter doesn't need to declare
-very large areas of memory which is both time consuming and leads
-to excessive memory use.
-
 When ``yield_per`` is used, the
 :paramref:`_engine.Connection.execution_options.stream_results` option is also
 set for the Core execution, so that a streaming / server side cursor will be