Perf/conditional unique selectinload
Optimized :func:`_orm.selectinload` to skip the ``.unique()`` call on inner
result sets when no nested :func:`_orm.joinedload` on a collection is
present. The uniquing pass is only required when a joined eager load
inflates rows due to a one-to-many or many-to-many JOIN; in the common case
of a leaf selectin load, rows are already unique by construction and the
per-row hashing overhead can be avoided. As a side effect, ``yield_per``
set in a ``do_orm_execute`` event for a :func:`_orm.selectinload`
relationship load no longer raises ``InvalidRequestError`` when no nested
collection joinedload is in effect, since ``.unique()`` is no longer called
in that path. Pull request courtesy Oliver Parker.
`_SelectInLoader._load_via_parent` and `_load_via_child` currently call
`.unique()` unconditionally on the inner `Result`. The uniqueness pass is only
required when a nested `joinedload` on a collection is in effect — in that case
the `JOIN` inflates rows (one row per `(child, grandchild)` instead of one per
child) and the outer `groupby` would produce duplicated entries without dedup.
`loading.instances()` already signals exactly this condition: it sets
`Result._unique_filter_state` to a `require_unique` guard when the inner
compile state has `multi_row_eager_loaders=True`. When that flag is unset (no
nested collection `joinedload`), `_unique_filter_state` stays `None`, meaning
the inner query produces unique rows by construction:
- `omit_join` 1:N — one row per child
- `omit_join` M2O — one row per parent
- `omit_join` M2M — one row per (parent, entity)
- `load_with_join` (non-omit) — one row per (parent, child)
This PR makes the `.unique()` call conditional, via a new `_has_unique_filter`
property on `Result` that exposes this state without reaching into the private
`_unique_filter_state` attribute directly:
```python
if result._has_unique_filter:
result = result.unique()
```
Per-row hashing in `_iterator_getter` is avoided in the common leaf-load case.
On an in-memory SQLite bench (2000 parents × 5 children + M:N tags, Python
3.14, n=1000 iterations):
| Workload | before avg | after avg | delta | before sd | after sd |
|---|---|---|---|---|---|
| selectin children (1:N) | 56.0 ms | 41.4 ms | **−26%** | 5.2 ms | 1.9 ms |
| selectin tags M2M | 25.6 ms | 20.0 ms | **−22%** | 4.2 ms | 3.1 ms |
| joined children (1:N) | 42.9 ms | 37.6 ms | ~0% | — | — |
| plain parent load | 4.2 ms | 3.6 ms | ~0% | — | — |
**Behaviour change:** `yield_per` set in a `do_orm_execute` event for a
relationship load no longer raises
`InvalidRequestError("Can't use yield_per in conjunction with unique")` for
`selectinload` without a nested collection `joinedload` — because `.unique()`
is no longer called in that path. `immediateload` is unaffected (it still calls
`.unique()` unconditionally).
Fixes: #13339
Closes: #13341
Pull-request: https://github.com/sqlalchemy/sqlalchemy/pull/13341
Pull-request-sha:
980af0383b174271bb602a890f595c26f78ddcee
Change-Id: I12e993f104d354ae894f81f9b09fbcf9a95b0027