From: Mike Bayer Date: Sat, 25 May 2019 22:04:58 +0000 (-0400) Subject: Hold implicitly created collections in a pending area X-Git-Tag: rel_1_4_0b1~859^2 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=e573752a986dec84216d948a1497b7d789d039ea;p=thirdparty%2Fsqlalchemy%2Fsqlalchemy.git Hold implicitly created collections in a pending area Accessing a collection-oriented attribute on a newly created object no longer mutates ``__dict__``, but still returns an empty collection as has always been the case. This allows collection-oriented attributes to work consistently in comparison to scalar attributes which return ``None``, but also don't mutate ``__dict__``. In order to accommodate for the collection being mutated, the same empty collection is returned each time once initially created, and when it is mutated (e.g. an item appended, added, etc.) it is then moved into ``__dict__``. This removes the last of mutating side-effects on read-only attribute access within the ORM. Fixes: #4519 Change-Id: I06a058d24e6eb24b5c6b6092d3f8b31cf9c244ae --- diff --git a/doc/build/changelog/migration_14.rst b/doc/build/changelog/migration_14.rst index 3f8bd3d0c4..66fe1c0948 100644 --- a/doc/build/changelog/migration_14.rst +++ b/doc/build/changelog/migration_14.rst @@ -11,6 +11,139 @@ What's New in SQLAlchemy 1.4? Behavioral Changes - ORM ======================== +.. _change_4519: + +Accessing an uninitialized collection attribute on a transient object no longer mutates __dict__ +------------------------------------------------------------------------------------------------- + +It has always been SQLAlchemy's behavior that accessing mapped attributes on a +newly created object returns an implicitly generated value, rather than raising +``AttributeError``, such as ``None`` for scalar attributes or ``[]`` for a +list-holding relationship:: + + >>> u1 = User() + >>> u1.name + None + >>> u1.addresses + [] + +The rationale for the above behavior was originally to make ORM objects easier +to work with. Since an ORM object represents an empty row when first created +without any state, it is intuitive that its un-accessed attributes would +resolve to ``None`` (or SQL NULL) for scalars and to empty collections for +relationships. In particular, it makes possible an extremely common pattern +of being able to mutate the new collection without manually creating and +assigning an empty collection first:: + + >>> u1 = User() + >>> u1.addresses.append(Address()) # no need to assign u1.addresses = [] + +Up until version 1.0 of SQLAlchemy, the behavior of this initialization system +for both scalar attributes as well as collections would be that the ``None`` or +empty collection would be *populated* into the object's state, e.g. +``__dict__``. This meant that the following two operations were equivalent:: + + >>> u1 = User() + >>> u1.name = None # explicit assignment + + >>> u2 = User() + >>> u2.name # implicit assignment just by accessing it + None + +Where above, both ``u1`` and ``u2`` would have the value ``None`` populated +in the value of the ``name`` attribute. Since this is a SQL NULL, the ORM +would skip including these values within an INSERT so that SQL-level defaults +take place, if any, else the value defaults to NULL on the database side. + +In version 1.0 as part of :ref:`migration_3061`, this behavior was refined so +that the ``None`` value was no longer populated into ``__dict__``, only +returned. Besides removing the mutating side effect of a getter operation, +this change also made it possible to set columns that did have server defaults +to the value NULL by actually assigning ``None``, which was now distinguished +from just reading it. + +The change however did not accommodate for collections, where returning an +empty collection that is not assigned meant that this mutable collection would +be different each time and also would not be able to correctly accommodate for +mutating operations (e.g. append, add, etc.) called upon it. While the +behavior continued to generally not get in anyone's way, an edge case was +eventually identified in :ticket:`4519` where this empty collection could be +harmful, which is when the object is merged into a session:: + + >>> u1 = User(id=1) # create an empty User to merge with id=1 in the database + >>> merged1 = session.merge(u1) # value of merged1.addresses is unchanged from that of the DB + + >>> u2 = User(id=2) # create an empty User to merge with id=2 in the database + >>> u2.addresses + [] + >>> merged2 = session.merge(u2) # value of merged2.addresses has been emptied in the DB + +Above, the ``.addresses`` collection on ``merged1`` will contain all the +``Address()`` objects that were already in the database. ``merged2`` will +not; because it has an empty list implicitly assigned, the ``.addresses`` +collection will be erased. This is an example of where this mutating side +effect can actually mutate the database itself. + +While it was considered that perhaps the attribute system should begin using +strict "plain Python" behavior, raising ``AttributeError`` in all cases for +non-existent attributes on non-persistent objects and requiring that all +collections be explicitly assigned, such a change would likely be too extreme +for the vast number of applications that have relied upon this behavior for +many years, leading to a complex rollout / backwards compatibility problem as +well as the likelihood that workarounds to restore the old behavior would +become prevalent, thus rendering the whole change ineffective in any case. + +The change then is to keep the default producing behavior, but to finally make +the non-mutating behavior of scalars a reality for collections as well, via the +addition of additional mechanics in the collection system. When accessing the +empty attribute, the new collection is created and associated with the state, +however is not added to ``__dict__`` until it is actually mutated:: + + >>> u1 = User() + >>> l1 = u1.addresses # new list is created, associated with the state + >>> assert u1.addresses is l1 # you get the same list each time you access it + >>> assert "addresses" not in u1.__dict__ # but it won't go into __dict__ until it's mutated + >>> from sqlalchemy import inspect + >>> inspect(u1).attrs.addresses.history + History(added=None, unchanged=None, deleted=None) + +When the list is changed, then it becomes part of the tracked changes to +be persisted to the database:: + + >>> l1.append(Address()) + >>> assert "addresses" in u1.__dict__ + >>> inspect(u1).attrs.addresses.history + History(added=[<__main__.Address object at 0x7f49b725eda0>], unchanged=[], deleted=[]) + +This change is expected to have *nearly* no impact on existing applications +in any way, except that it has been observed that some applications may be +relying upon the implicit assignment of this collection, such as to assert that +the object contains certain values based on its ``__dict__``:: + + >>> u1 = User() + >>> u1.addresses + [] + # this will now fail, would pass before + >>> assert {k: v for k, v in u1.__dict__.items() if not k.startswith("_")} == {"addresses": []} + +or to ensure that the collection won't require a lazy load to proceed, the +(admittedly awkward) code below will now also fail:: + + >>> u1 = User() + >>> u1.addresses + [] + >>> s.add(u1) + >>> s.flush() + >>> s.close() + >>> u1.addresses # <-- will fail, .addresses is not loaded and object is detached + +Applications that rely upon the implicit mutating behavior of collections will +need to be changed so that they assign the desired collection explicitly:: + + >>> u1.addresses = [] + +:ticket:`4519` + .. _change_4662: The "New instance conflicts with existing identity" error is now a warning diff --git a/doc/build/changelog/unreleased_14/4519.rst b/doc/build/changelog/unreleased_14/4519.rst new file mode 100644 index 0000000000..c1fdb8a7f7 --- /dev/null +++ b/doc/build/changelog/unreleased_14/4519.rst @@ -0,0 +1,17 @@ +.. change:: + :tags: bug, orm + :tickets: 4519 + + Accessing a collection-oriented attribute on a newly created object no + longer mutates ``__dict__``, but still returns an empty collection as has + always been the case. This allows collection-oriented attributes to work + consistently in comparison to scalar attributes which return ``None``, but + also don't mutate ``__dict__``. In order to accommodate for the collection + being mutated, the same empty collection is returned each time once + initially created, and when it is mutated (e.g. an item appended, added, + etc.) it is then moved into ``__dict__``. This removes the last of + mutating side-effects on read-only attribute access within the ORM. + + .. seealso:: + + :ref:`change_4519` \ No newline at end of file diff --git a/lib/sqlalchemy/orm/attributes.py b/lib/sqlalchemy/orm/attributes.py index 321ab7d6fb..31c351bb05 100644 --- a/lib/sqlalchemy/orm/attributes.py +++ b/lib/sqlalchemy/orm/attributes.py @@ -653,8 +653,8 @@ class AttributeImpl(object): """ raise NotImplementedError() - def initialize(self, state, dict_): - """Initialize the given state's attribute with an empty value.""" + def _default_value(self, state, dict_): + """Produce an empty value for an uninitialized scalar attribute.""" value = None for fn in self.dispatch.init_scalar: @@ -710,8 +710,7 @@ class AttributeImpl(object): if not passive & INIT_OK: return NO_VALUE else: - # Return a new, empty value - return self.initialize(state, dict_) + return self._default_value(state, dict_) def append(self, state, dict_, value, initiator, passive=PASSIVE_OFF): self.set(state, dict_, value, initiator, passive=passive) @@ -1184,11 +1183,14 @@ class CollectionAttributeImpl(AttributeImpl): # del is a no-op if collection not present. del dict_[self.key] - def initialize(self, state, dict_): - """Initialize this attribute with an empty collection.""" + def _default_value(self, state, dict_): + """Produce an empty collection for an un-initialized attribute""" - _, user_data = self._initialize_collection(state) - dict_[self.key] = user_data + if self.key in state._empty_collections: + return state._empty_collections[self.key] + + adapter, user_data = self._initialize_collection(state) + adapter._set_empty(user_data) return user_data def _initialize_collection(self, state): @@ -1287,7 +1289,7 @@ class CollectionAttributeImpl(AttributeImpl): old = self.get(state, dict_, passive=PASSIVE_ONLY_PERSISTENT) if old is PASSIVE_NO_RESULT: - old = self.initialize(state, dict_) + old = self._default_value(state, dict_) elif old is orig_iterable: # ignore re-assignment of the current collection, as happens # implicitly with in-place operators (foo.collection |= other) @@ -1699,7 +1701,6 @@ class History(History): @classmethod def from_collection(cls, attribute, state, current): original = state.committed_state.get(attribute.key, _NO_HISTORY) - if current is NO_VALUE: return cls((), (), ()) @@ -1892,8 +1893,10 @@ def init_state_collection(state, dict_, key): """Initialize a collection attribute and return the collection adapter.""" attr = state.manager[key].impl - user_data = attr.initialize(state, dict_) - return attr.get_collection(state, dict_, user_data) + user_data = attr._default_value(state, dict_) + adapter = attr.get_collection(state, dict_, user_data) + adapter._reset_empty() + return adapter def set_committed_value(instance, key, value): diff --git a/lib/sqlalchemy/orm/collections.py b/lib/sqlalchemy/orm/collections.py index 6bd009fb89..1f50c7b091 100644 --- a/lib/sqlalchemy/orm/collections.py +++ b/lib/sqlalchemy/orm/collections.py @@ -614,6 +614,7 @@ class CollectionAdapter(object): "owner_state", "_converter", "invalidated", + "empty", ) def __init__(self, attr, owner_state, data): @@ -624,6 +625,7 @@ class CollectionAdapter(object): data._sa_adapter = self self._converter = data._sa_converter self.invalidated = False + self.empty = False def _warn_invalidated(self): util.warn("This collection has been invalidated.") @@ -651,12 +653,39 @@ class CollectionAdapter(object): self._data()._sa_appender(item, _sa_initiator=initiator) + def _set_empty(self, user_data): + assert ( + not self.empty + ), "This collection adapter is alreay in the 'empty' state" + self.empty = True + self.owner_state._empty_collections[self._key] = user_data + + def _reset_empty(self): + assert ( + self.empty + ), "This collection adapter is not in the 'empty' state" + self.empty = False + self.owner_state.dict[ + self._key + ] = self.owner_state._empty_collections.pop(self._key) + + def _refuse_empty(self): + raise sa_exc.InvalidRequestError( + "This is a special 'empty' collection which cannot accommodate " + "internal mutation operations" + ) + def append_without_event(self, item): """Add or restore an entity to the collection, firing no events.""" + + if self.empty: + self._refuse_empty() self._data()._sa_appender(item, _sa_initiator=False) def append_multiple_without_event(self, items): """Add or restore an entity to the collection, firing no events.""" + if self.empty: + self._refuse_empty() appender = self._data()._sa_appender for item in items: appender(item, _sa_initiator=False) @@ -670,11 +699,15 @@ class CollectionAdapter(object): def remove_without_event(self, item): """Remove an entity from the collection, firing no events.""" + if self.empty: + self._refuse_empty() self._data()._sa_remover(item, _sa_initiator=False) def clear_with_event(self, initiator=None): """Empty the collection, firing a mutation event for each entity.""" + if self.empty: + self._refuse_empty() remover = self._data()._sa_remover for item in list(self): remover(item, _sa_initiator=initiator) @@ -682,6 +715,8 @@ class CollectionAdapter(object): def clear_without_event(self): """Empty the collection, firing no events.""" + if self.empty: + self._refuse_empty() remover = self._data()._sa_remover for item in list(self): remover(item, _sa_initiator=False) @@ -712,6 +747,10 @@ class CollectionAdapter(object): if initiator is not False: if self.invalidated: self._warn_invalidated() + + if self.empty: + self._reset_empty() + return self.attr.fire_append_event( self.owner_state, self.owner_state.dict, item, initiator ) @@ -729,6 +768,10 @@ class CollectionAdapter(object): if initiator is not False: if self.invalidated: self._warn_invalidated() + + if self.empty: + self._reset_empty() + self.attr.fire_remove_event( self.owner_state, self.owner_state.dict, item, initiator ) @@ -753,6 +796,7 @@ class CollectionAdapter(object): "owner_cls": self.owner_state.class_, "data": self.data, "invalidated": self.invalidated, + "empty": self.empty, } def __setstate__(self, d): @@ -763,6 +807,7 @@ class CollectionAdapter(object): d["data"]._sa_adapter = self self.invalidated = d["invalidated"] self.attr = getattr(d["owner_cls"], self._key).impl + self.empty = d.get("empty", False) def bulk_replace(values, existing_adapter, new_adapter, initiator=None): diff --git a/lib/sqlalchemy/orm/events.py b/lib/sqlalchemy/orm/events.py index d73a20e93a..5bf6ff4182 100644 --- a/lib/sqlalchemy/orm/events.py +++ b/lib/sqlalchemy/orm/events.py @@ -2267,6 +2267,9 @@ class AttributeEvents(event.Events): .. seealso:: + :meth:`.AttributeEvents.init_collection` - collection version + of this event + :class:`.AttributeEvents` - background on listener options such as propagation to subclasses. @@ -2312,6 +2315,9 @@ class AttributeEvents(event.Events): :class:`.AttributeEvents` - background on listener options such as propagation to subclasses. + :meth:`.AttributeEvents.init_scalar` - "scalar" version of this + event. + """ def dispose_collection(self, target, collection, collection_adapter): diff --git a/lib/sqlalchemy/orm/relationships.py b/lib/sqlalchemy/orm/relationships.py index 8b03955e22..0d7ce8bbf6 100644 --- a/lib/sqlalchemy/orm/relationships.py +++ b/lib/sqlalchemy/orm/relationships.py @@ -1653,12 +1653,13 @@ class RelationshipProperty(StrategizedProperty): return if self.uselist: - instances = source_state.get_impl(self.key).get( - source_state, source_dict - ) - if hasattr(instances, "_sa_adapter"): - # convert collections to adapters to get a true iterator - instances = instances._sa_adapter + impl = source_state.get_impl(self.key) + instances_iterable = impl.get_collection(source_state, source_dict) + + # if this is a CollectionAttributeImpl, then empty should + # be False, otherwise "self.key in source_dict" should not be + # True + assert not instances_iterable.empty if impl.collection else True if load: # for a full merge, pre-load the destination collection, @@ -1669,7 +1670,7 @@ class RelationshipProperty(StrategizedProperty): dest_state.get_impl(self.key).get(dest_state, dest_dict) dest_list = [] - for current in instances: + for current in instances_iterable: current_state = attributes.instance_state(current) current_dict = attributes.instance_dict(current) _recursive[(current_state, self)] = True diff --git a/lib/sqlalchemy/orm/state.py b/lib/sqlalchemy/orm/state.py index c6252b6b8c..f6c06acc82 100644 --- a/lib/sqlalchemy/orm/state.py +++ b/lib/sqlalchemy/orm/state.py @@ -310,6 +310,10 @@ class InstanceState(interfaces.InspectionAttrInfo): def _pending_mutations(self): return {} + @util.memoized_property + def _empty_collections(self): + return {} + @util.memoized_property def mapper(self): """Return the :class:`.Mapper` used for this mapped object.""" diff --git a/lib/sqlalchemy/orm/strategies.py b/lib/sqlalchemy/orm/strategies.py index d05adecdbb..b0dffe5ddf 100644 --- a/lib/sqlalchemy/orm/strategies.py +++ b/lib/sqlalchemy/orm/strategies.py @@ -488,7 +488,7 @@ class NoLoader(AbstractRelationshipLoader): ): def invoke_no_load(state, dict_, row): if self.uselist: - state.manager.get_impl(self.key).initialize(state, dict_) + attributes.init_state_collection(state, dict_, self.key) else: dict_[self.key] = None diff --git a/test/orm/test_attributes.py b/test/orm/test_attributes.py index 8d244c4fa4..717ae1de0d 100644 --- a/test/orm/test_attributes.py +++ b/test/orm/test_attributes.py @@ -3448,16 +3448,32 @@ class ListenerTest(fixtures.ORMTest): class Foo(object): pass + class Bar(object): + pass + instrumentation.register_class(Foo) + instrumentation.register_class(Bar) attributes.register_attribute(Foo, "bar", useobject=True, uselist=True) event.listen(Foo.bar, "set", canary) f1 = Foo() eq_(f1.bar, []) + + assert "bar" not in f1.__dict__ + + adapter = Foo.bar.impl.get_collection( + attributes.instance_state(f1), attributes.instance_dict(f1) + ) + assert adapter.empty + # reversal of approach in #3061 eq_(canary.mock_calls, []) + f1.bar.append(Bar()) + assert "bar" in f1.__dict__ + assert not adapter.empty + class EventPropagateTest(fixtures.TestBase): # tests that were expanded as of #4695 diff --git a/test/orm/test_collection.py b/test/orm/test_collection.py index eb0df3107f..11703e9832 100644 --- a/test/orm/test_collection.py +++ b/test/orm/test_collection.py @@ -18,6 +18,8 @@ from sqlalchemy.testing import assert_raises from sqlalchemy.testing import assert_raises_message from sqlalchemy.testing import eq_ from sqlalchemy.testing import fixtures +from sqlalchemy.testing import is_false +from sqlalchemy.testing import is_true from sqlalchemy.testing import ne_ from sqlalchemy.testing.schema import Column from sqlalchemy.testing.schema import Table @@ -137,6 +139,44 @@ class CollectionsTest(fixtures.ORMTest): adapter.remove_with_event(e1) assert_eq() + self._test_empty_init(typecallable, creator=creator) + + def _test_empty_init(self, typecallable, creator=None): + if creator is None: + creator = self.entity_maker + + class Foo(object): + pass + + instrumentation.register_class(Foo) + d = attributes.register_attribute( + Foo, + "attr", + uselist=True, + typecallable=typecallable, + useobject=True, + ) + + obj = Foo() + e1 = creator() + e2 = creator() + implicit_collection = obj.attr + is_true("attr" not in obj.__dict__) + adapter = collections.collection_adapter(implicit_collection) + is_true(adapter.empty) + assert_raises_message( + sa_exc.InvalidRequestError, + "This is a special 'empty'", + adapter.append_without_event, + e1, + ) + + adapter.append_with_event(e1) + is_false(adapter.empty) + is_true("attr" in obj.__dict__) + adapter.append_without_event(e2) + eq_(set(adapter), {e1, e2}) + def _test_list(self, typecallable, creator=None): if creator is None: creator = self.entity_maker diff --git a/test/orm/test_inspect.py b/test/orm/test_inspect.py index 368199b96f..48d69b6edd 100644 --- a/test/orm/test_inspect.py +++ b/test/orm/test_inspect.py @@ -352,8 +352,8 @@ class TestORMInspection(_fixtures.FixtureTest): eq_(insp.attrs.addresses.loaded_value, NO_VALUE) # regular accessor sets it eq_(insp.attrs.addresses.value, []) - # now the None is there - eq_(insp.attrs.addresses.loaded_value, []) + # stays as NO_VALUE, this is #4519 + eq_(insp.attrs.addresses.loaded_value, NO_VALUE) def test_instance_state_collection_attr_hist(self): User = self.classes.User @@ -363,7 +363,8 @@ class TestORMInspection(_fixtures.FixtureTest): eq_(hist.unchanged, None) u1.addresses hist = insp.attrs.addresses.history - eq_(hist.unchanged, []) + # stays, this is #4519 + eq_(hist.unchanged, None) def test_instance_state_scalar_attr_hist(self): User = self.classes.User @@ -385,7 +386,8 @@ class TestORMInspection(_fixtures.FixtureTest): eq_(hist.unchanged, ()) u1.addresses hist = insp.attrs.addresses.load_history() - eq_(hist.unchanged, []) + # stays, this is #4519 + eq_(hist.unchanged, ()) def test_instance_state_scalar_attr_hist_load(self): User = self.classes.User diff --git a/test/orm/test_merge.py b/test/orm/test_merge.py index 48a519dd7c..27d6c4bfd6 100644 --- a/test/orm/test_merge.py +++ b/test/orm/test_merge.py @@ -173,6 +173,39 @@ class MergeTest(_fixtures.FixtureTest): ), ) + def test_transient_non_mutated_collection(self): + User, Address, addresses, users = ( + self.classes.User, + self.classes.Address, + self.tables.addresses, + self.tables.users, + ) + + mapper( + User, + users, + properties={"addresses": relationship(Address, backref="user")}, + ) + mapper(Address, addresses) + + s = Session() + u = User( + id=7, + name="fred", + addresses=[Address(id=1, email_address="jack@bean.com")], + ) + s.add(u) + s.commit() + s.close() + + u = User(id=7, name="fred") + # access address collection to get implicit blank collection + eq_(u.addresses, []) + u2 = s.merge(u) + + # collection wasn't emptied + eq_(u2.addresses, [Address()]) + def test_transient_to_pending_collection_pk_none(self): User, Address, addresses, users = ( self.classes.User,