From: Mike Bayer Date: Tue, 2 Sep 2014 00:31:00 +0000 (-0400) Subject: - reorganize X-Git-Tag: rel_1_0_0b1~205^2~4 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=b621d232519bd84321853087b5ab21b3d8ef1dd9;p=thirdparty%2Fsqlalchemy%2Fsqlalchemy.git - reorganize --- diff --git a/doc/build/changelog/migration_10.rst b/doc/build/changelog/migration_10.rst index 8f01e99e6e..8acaa04458 100644 --- a/doc/build/changelog/migration_10.rst +++ b/doc/build/changelog/migration_10.rst @@ -8,7 +8,7 @@ What's New in SQLAlchemy 1.0? undergoing maintenance releases as of May, 2014, and SQLAlchemy version 1.0, as of yet unreleased. - Document last updated: August 26, 2014 + Document last updated: September 1, 2014 Introduction ============ @@ -22,236 +22,372 @@ Please carefully review potentially backwards-incompatible changes. -.. _behavioral_changes_orm_10: +New Features +============ -Behavioral Changes - ORM -======================== +.. _feature_3034: -.. _migration_3061: +Select/Query LIMIT / OFFSET may be specified as an arbitrary SQL expression +---------------------------------------------------------------------------- -Changes to attribute events and other operations regarding attributes that have no pre-existing value ------------------------------------------------------------------------------------------------------- +The :meth:`.Select.limit` and :meth:`.Select.offset` methods now accept +any SQL expression, in addition to integer values, as arguments. The ORM +:class:`.Query` object also passes through any expression to the underlying +:class:`.Select` object. Typically +this is used to allow a bound parameter to be passed, which can be substituted +with a value later:: -In this change, the default return value of ``None`` when accessing an object -is now returned dynamically on each access, rather than implicitly setting the -attribute's state with a special "set" operation when it is first accessed. -The visible result of this change is that ``obj.__dict__`` is not implicitly -modified on get, and there are also some minor behavioral changes -for :func:`.attributes.get_history` and related functions. + sel = select([table]).limit(bindparam('mylimit')).offset(bindparam('myoffset')) -Given an object with no state:: +Dialects which don't support non-integer LIMIT or OFFSET expressions may continue +to not support this behavior; third party dialects may also need modification +in order to take advantage of the new behavior. A dialect which currently +uses the ``._limit`` or ``._offset`` attributes will continue to function +for those cases where the limit/offset was specified as a simple integer value. +However, when a SQL expression is specified, these two attributes will +instead raise a :class:`.CompileError` on access. A third-party dialect which +wishes to support the new feature should now call upon the ``._limit_clause`` +and ``._offset_clause`` attributes to receive the full SQL expression, rather +than the integer value. - >>> obj = Foo() -It has always been SQLAlchemy's behavior such that if we access a scalar -or many-to-one attribute that was never set, it is returned as ``None``:: +Behavioral Improvements +======================= - >>> obj.someattr - None +.. _feature_updatemany: -This value of ``None`` is in fact now part of the state of ``obj``, and is -not unlike as though we had set the attribute explicitly, e.g. -``obj.someattr = None``. However, the "set on get" here would behave -differently as far as history and events. It would not emit any attribute -event, and additionally if we view history, we see this:: +UPDATE statements are now batched with executemany() in a flush +---------------------------------------------------------------- - >>> inspect(obj).attrs.someattr.history - History(added=(), unchanged=[None], deleted=()) # 0.9 and below +UPDATE statements can now be batched within an ORM flush +into more performant executemany() call, similarly to how INSERT +statements can be batched; this will be invoked within flush +based on the following criteria: -That is, it's as though the attribute were always ``None`` and were -never changed. This is explicitly different from if we had set the -attribute first instead:: +* two or more UPDATE statements in sequence involve the identical set of + columns to be modified. - >>> obj = Foo() - >>> obj.someattr = None - >>> inspect(obj).attrs.someattr.history - History(added=[None], unchanged=(), deleted=()) # all versions +* The statement has no embedded SQL expressions in the SET clause. -The above means that the behavior of our "set" operation can be corrupted -by the fact that the value was accessed via "get" earlier. In 1.0, this -inconsistency has been resolved, by no longer actually setting anything -when the default "getter" is used. +* The mapping does not use a :paramref:`~.orm.mapper.version_id_col`, or + the backend dialect supports a "sane" rowcount for an executemany() + operation; most DBAPIs support this correctly now. - >>> obj = Foo() - >>> obj.someattr - None - >>> inspect(obj).attrs.someattr.history - History(added=(), unchanged=(), deleted=()) # 1.0 - >>> obj.someattr = None - >>> inspect(obj).attrs.someattr.history - History(added=[None], unchanged=(), deleted=()) +ORM full object fetches 25% faster +---------------------------------- -The reason the above behavior hasn't had much impact is because the -INSERT statement in relational databases considers a missing value to be -the same as NULL in most cases. Whether SQLAlchemy received a history -event for a particular attribute set to None or not would usually not matter; -as the difference between sending None/NULL or not wouldn't have an impact. -However, as :ticket:`3060` illustrates, there are some seldom edge cases -where we do in fact want to positively have ``None`` set. Also, allowing -the attribute event here means it's now possible to create "default value" -functions for ORM mapped attributes. +The mechanics of the ``loading.py`` module as well as the identity map +have undergone several passes of inlining, refactoring, and pruning, so +that a raw load of rows now populates ORM-based objects around 25% faster. +Assuming a 1M row table, a script like the following illustrates the type +of load that's improved the most:: -As part of this change, the generation of the implicit "None" is now disabled -for other situations where this used to occur; this includes when an -attribute set operation on a many-to-one is received; previously, the "old" value -would be "None" if it had been not set otherwise; it now will send the -value :data:`.orm.attributes.NEVER_SET`, which is a value that may be sent -to an attribute listener now. This symbol may also be received when -calling on mapper utility functions such as :meth:`.Mapper.primary_key_from_instance`; -if the primary key attributes have no setting at all, whereas the value -would be ``None`` before, it will now be the :data:`.orm.attributes.NEVER_SET` -symbol, and no change to the object's state occurs. + import time + from sqlalchemy import Integer, Column, create_engine, Table + from sqlalchemy.orm import Session + from sqlalchemy.ext.declarative import declarative_base -:ticket:`3061` + Base = declarative_base() -.. _migration_2992: + class Foo(Base): + __table__ = Table( + 'foo', Base.metadata, + Column('id', Integer, primary_key=True), + Column('a', Integer(), nullable=False), + Column('b', Integer(), nullable=False), + Column('c', Integer(), nullable=False), + ) -Warnings emitted when coercing full SQL fragments into text() -------------------------------------------------------------- + engine = create_engine( + 'mysql+mysqldb://scott:tiger@localhost/test', echo=True) -Since SQLAlchemy's inception, there has always been an emphasis on not getting -in the way of the usage of plain text. The Core and ORM expression systems -were intended to allow any number of points at which the user can just -use plain text SQL expressions, not just in the sense that you can send a -full SQL string to :meth:`.Connection.execute`, but that you can send strings -with SQL expressions into many functions, such as :meth:`.Select.where`, -:meth:`.Query.filter`, and :meth:`.Select.order_by`. + sess = Session(engine) -Note that by "SQL expressions" we mean a **full fragment of a SQL string**, -such as:: + now = time.time() - # the argument sent to where() is a full SQL expression - stmt = select([sometable]).where("somecolumn = 'value'") + # avoid using all() so that we don't have the overhead of building + # a large list of full objects in memory + for obj in sess.query(Foo).yield_per(100).limit(1000000): + pass -and we are **not talking about string arguments**, that is, the normal -behavior of passing string values that become parameterized:: + print("Total time: %d" % (time.time() - now)) - # This is a normal Core expression with a string argument - - # we aren't talking about this!! - stmt = select([sometable]).where(sometable.c.somecolumn == 'value') +Local MacBookPro results bench from 19 seconds for 0.9 down to 14 seconds for +1.0. The :meth:`.Query.yield_per` call is always a good idea when batching +huge numbers of rows, as it prevents the Python interpreter from having +to allocate a huge amount of memory for all objects and their instrumentation +at once. Without the :meth:`.Query.yield_per`, the above script on the +MacBookPro is 31 seconds on 0.9 and 26 seconds on 1.0, the extra time spent +setting up very large memory buffers. -The Core tutorial has long featured an example of the use of this technique, -using a :func:`.select` construct where virtually all components of it -are specified as straight strings. However, despite this long-standing -behavior and example, users are apparently surprised that this behavior -exists, and when asking around the community, I was unable to find any user -that was in fact *not* surprised that you can send a full string into a method -like :meth:`.Query.filter`. -So the change here is to encourage the user to qualify textual strings when -composing SQL that is partially or fully composed from textual fragments. -When composing a select as below:: - stmt = select(["a", "b"]).where("a = b").select_from("sometable") +.. _feature_3176: -The statement is built up normally, with all the same coercions as before. -However, one will see the following warnings emitted:: +New KeyedTuple implementation dramatically faster +------------------------------------------------- - SAWarning: Textual column expression 'a' should be explicitly declared - with text('a'), or use column('a') for more specificity - (this warning may be suppressed after 10 occurrences) +We took a look into the :class:`.KeyedTuple` implementation in the hopes +of improving queries like this:: - SAWarning: Textual column expression 'b' should be explicitly declared - with text('b'), or use column('b') for more specificity - (this warning may be suppressed after 10 occurrences) + rows = sess.query(Foo.a, Foo.b, Foo.c).all() - SAWarning: Textual SQL expression 'a = b' should be explicitly declared - as text('a = b') (this warning may be suppressed after 10 occurrences) +The :class:`.KeyedTuple` class is used rather than Python's +``collections.namedtuple()``, because the latter has a very complex +type-creation routine that benchmarks much slower than :class:`.KeyedTuple`. +However, when fetching hundreds of thousands of rows, +``collections.namedtuple()`` quickly overtakes :class:`.KeyedTuple` which +becomes dramatically slower as instance invocation goes up. What to do? +A new type that hedges between the approaches of both. Benching +all three types for "size" (number of rows returned) and "num" +(number of distinct queries), the new "lightweight keyed tuple" either +outperforms both, or lags very slightly behind the faster object, based on +which scenario. In the "sweet spot", where we are both creating a good number +of new types as well as fetching a good number of rows, the lightweight +object totally smokes both namedtuple and KeyedTuple:: - SAWarning: Textual SQL FROM expression 'sometable' should be explicitly - declared as text('sometable'), or use table('sometable') for more - specificity (this warning may be suppressed after 10 occurrences) + ----------------- + size=10 num=10000 # few rows, lots of queries + namedtuple: 3.60302400589 # namedtuple falls over + keyedtuple: 0.255059957504 # KeyedTuple very fast + lw keyed tuple: 0.582715034485 # lw keyed trails right on KeyedTuple + ----------------- + size=100 num=1000 # <--- sweet spot + namedtuple: 0.365247011185 + keyedtuple: 0.24896979332 + lw keyed tuple: 0.0889317989349 # lw keyed blows both away! + ----------------- + size=10000 num=100 + namedtuple: 0.572599887848 + keyedtuple: 2.54251694679 + lw keyed tuple: 0.613876104355 + ----------------- + size=1000000 num=10 # few queries, lots of rows + namedtuple: 5.79669594765 # namedtuple very fast + keyedtuple: 28.856498003 # KeyedTuple falls over + lw keyed tuple: 6.74346804619 # lw keyed trails right on namedtuple -These warnings attempt to show exactly where the issue is by displaying -the parameters as well as where the string was received. -The warnings make use of the :ref:`feature_3178` so that parameterized warnings -can be emitted safely without running out of memory, and as always, if -one wishes the warnings to be exceptions, the -`Python Warnings Filter `_ -should be used:: +:ticket:`3176` + +.. _feature_3178: + +New systems to safely emit parameterized warnings +------------------------------------------------- + +For a long time, there has been a restriction that warning messages could not +refer to data elements, such that a particular function might emit an +infinite number of unique warnings. The key place this occurs is in the +``Unicode type received non-unicode bind param value`` warning. Placing +the data value in this message would mean that the Python ``__warningregistry__`` +for that module, or in some cases the Python-global ``warnings.onceregistry``, +would grow unbounded, as in most warning scenarios, one of these two collections +is populated with every distinct warning message. + +The change here is that by using a special ``string`` type that purposely +changes how the string is hashed, we can control that a large number of +parameterized messages are hashed only on a small set of possible hash +values, such that a warning such as ``Unicode type received non-unicode +bind param value`` can be tailored to be emitted only a specific number +of times; beyond that, the Python warnings registry will begin recording +them as duplicates. + +To illustrate, the following test script will show only ten warnings being +emitted for ten of the parameter sets, out of a total of 1000:: + + from sqlalchemy import create_engine, Unicode, select, cast + import random import warnings - warnings.simplefilter("error") # all warnings raise an exception -Given the above warnings, our statement works just fine, but -to get rid of the warnings we would rewrite our statement as follows:: + e = create_engine("sqlite://") - from sqlalchemy import select, text - stmt = select([ - text("a"), - text("b") - ]).where(text("a = b")).select_from(text("sometable")) + # Use the "once" filter (which is also the default for Python + # warnings). Exactly ten of these warnings will + # be emitted; beyond that, the Python warnings registry will accumulate + # new values as dupes of one of the ten existing. + warnings.filterwarnings("once") -and as the warnings suggest, we can give our statement more specificity -about the text if we use :func:`.column` and :func:`.table`:: + for i in range(1000): + e.execute(select([cast( + ('foo_%d' % random.randint(0, 1000000)).encode('ascii'), Unicode)])) - from sqlalchemy import select, text, column, table +The format of the warning here is:: - stmt = select([column("a"), column("b")]).\\ - where(text("a = b")).select_from(table("sometable")) + /path/lib/sqlalchemy/sql/sqltypes.py:186: SAWarning: Unicode type received + non-unicode bind param value 'foo_4852'. (this warning may be + suppressed after 10 occurrences) -Where note also that :func:`.table` and :func:`.column` can now -be imported from "sqlalchemy" without the "sql" part. -The behavior here applies to :func:`.select` as well as to key methods -on :class:`.Query`, including :meth:`.Query.filter`, -:meth:`.Query.from_statement` and :meth:`.Query.having`. +:ticket:`3178` -ORDER BY and GROUP BY are special cases -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +.. _feature_2963: -There is one case where usage of a string has special meaning, and as part -of this change we have enhanced its functionality. When we have a -:func:`.select` or :class:`.Query` that refers to some column name or named -label, we might want to GROUP BY and/or ORDER BY known columns or labels:: +.info dictionary improvements +----------------------------- - stmt = select([ - user.c.name, - func.count(user.c.id).label("id_count") - ]).group_by("name").order_by("id_count") +The :attr:`.InspectionAttr.info` collection is now available on every kind +of object that one would retrieve from the :attr:`.Mapper.all_orm_descriptors` +collection. This includes :class:`.hybrid_property` and :func:`.association_proxy`. +However, as these objects are class-bound descriptors, they must be accessed +**separately** from the class to which they are attached in order to get +at the attribute. Below this is illustared using the +:attr:`.Mapper.all_orm_descriptors` namespace:: -In the above statement we expect to see "ORDER BY id_count", as opposed to a -re-statement of the function. The string argument given is actively -matched to an entry in the columns clause during compilation, so the above -statement would produce as we expect, without warnings:: + class SomeObject(Base): + # ... + + @hybrid_property + def some_prop(self): + return self.value + 5 + + + inspect(SomeObject).all_orm_descriptors.some_prop.info['foo'] = 'bar' + +It is also available as a constructor argument for all :class:`.SchemaItem` +objects (e.g. :class:`.ForeignKey`, :class:`.UniqueConstraint` etc.) as well +as remaining ORM constructs such as :func:`.orm.synonym`. + +:ticket:`2971` + +:ticket:`2963` + +.. _migration_3177: + +Change to single-table-inheritance criteria when using from_self(), count() +--------------------------------------------------------------------------- + +Given a single-table inheritance mapping, such as:: + + class Widget(Base): + __table__ = 'widget_table' + + class FooWidget(Widget): + pass + +Using :meth:`.Query.from_self` or :meth:`.Query.count` against a subclass +would produce a subquery, but then add the "WHERE" criteria for subtypes +to the outside:: + + sess.query(FooWidget).from_self().all() + +rendering:: + + SELECT + anon_1.widgets_id AS anon_1_widgets_id, + anon_1.widgets_type AS anon_1_widgets_type + FROM (SELECT widgets.id AS widgets_id, widgets.type AS widgets_type, + FROM widgets) AS anon_1 + WHERE anon_1.widgets_type IN (?) + +The issue with this is that if the inner query does not specify all +columns, then we can't add the WHERE clause on the outside (it actually tries, +and produces a bad query). This decision +apparently goes way back to 0.6.5 with the note "may need to make more +adjustments to this". Well, those adjustments have arrived! So now the +above query will render:: + + SELECT + anon_1.widgets_id AS anon_1_widgets_id, + anon_1.widgets_type AS anon_1_widgets_type + FROM (SELECT widgets.id AS widgets_id, widgets.type AS widgets_type, + FROM widgets + WHERE widgets.type IN (?)) AS anon_1 + +So that queries that don't include "type" will still work!:: + + sess.query(FooWidget.id).count() + +Renders:: + + SELECT count(*) AS count_1 + FROM (SELECT widgets.id AS widgets_id + FROM widgets + WHERE widgets.type IN (?)) AS anon_1 + + +:ticket:`3177` + +.. _behavioral_changes_orm_10: + +Behavioral Changes - ORM +======================== + +.. _migration_3061: + +Changes to attribute events and other operations regarding attributes that have no pre-existing value +------------------------------------------------------------------------------------------------------ + +In this change, the default return value of ``None`` when accessing an object +is now returned dynamically on each access, rather than implicitly setting the +attribute's state with a special "set" operation when it is first accessed. +The visible result of this change is that ``obj.__dict__`` is not implicitly +modified on get, and there are also some minor behavioral changes +for :func:`.attributes.get_history` and related functions. + +Given an object with no state:: + + >>> obj = Foo() - SELECT users.name, count(users.id) AS id_count - FROM users GROUP BY users.name ORDER BY id_count +It has always been SQLAlchemy's behavior such that if we access a scalar +or many-to-one attribute that was never set, it is returned as ``None``:: -However, if we refer to a name that cannot be located, then we get -the warning again, as below:: + >>> obj.someattr + None - stmt = select([ - user.c.name, - func.count(user.c.id).label("id_count") - ]).order_by("some_label") +This value of ``None`` is in fact now part of the state of ``obj``, and is +not unlike as though we had set the attribute explicitly, e.g. +``obj.someattr = None``. However, the "set on get" here would behave +differently as far as history and events. It would not emit any attribute +event, and additionally if we view history, we see this:: -The output does what we say, but again it warns us:: + >>> inspect(obj).attrs.someattr.history + History(added=(), unchanged=[None], deleted=()) # 0.9 and below - SAWarning: Can't resolve label reference 'some_label'; converting to - text() (this warning may be suppressed after 10 occurrences) +That is, it's as though the attribute were always ``None`` and were +never changed. This is explicitly different from if we had set the +attribute first instead:: - SELECT users.name, count(users.id) AS id_count - FROM users ORDER BY some_label + >>> obj = Foo() + >>> obj.someattr = None + >>> inspect(obj).attrs.someattr.history + History(added=[None], unchanged=(), deleted=()) # all versions -The above behavior applies to all those places where we might want to refer -to a so-called "label reference"; ORDER BY and GROUP BY, but also within an -OVER clause as well as a DISTINCT ON clause that refers to columns (e.g. the -Postgresql syntax). +The above means that the behavior of our "set" operation can be corrupted +by the fact that the value was accessed via "get" earlier. In 1.0, this +inconsistency has been resolved, by no longer actually setting anything +when the default "getter" is used. -We can still specify any arbitrary expression for ORDER BY or others using -:func:`.text`:: + >>> obj = Foo() + >>> obj.someattr + None + >>> inspect(obj).attrs.someattr.history + History(added=(), unchanged=(), deleted=()) # 1.0 + >>> obj.someattr = None + >>> inspect(obj).attrs.someattr.history + History(added=[None], unchanged=(), deleted=()) - stmt = select([users]).order_by(text("some special expression")) +The reason the above behavior hasn't had much impact is because the +INSERT statement in relational databases considers a missing value to be +the same as NULL in most cases. Whether SQLAlchemy received a history +event for a particular attribute set to None or not would usually not matter; +as the difference between sending None/NULL or not wouldn't have an impact. +However, as :ticket:`3060` illustrates, there are some seldom edge cases +where we do in fact want to positively have ``None`` set. Also, allowing +the attribute event here means it's now possible to create "default value" +functions for ORM mapped attributes. -The upshot of the whole change is that SQLAlchemy now would like us -to tell it when a string is sent that this string is explicitly -a :func:`.text` construct, or a column, table, etc., and if we use it as a -label name in an order by, group by, or other expression, SQLAlchemy expects -that the string resolves to something known, else it should again -be qualified with :func:`.text` or similar. +As part of this change, the generation of the implicit "None" is now disabled +for other situations where this used to occur; this includes when an +attribute set operation on a many-to-one is received; previously, the "old" value +would be "None" if it had been not set otherwise; it now will send the +value :data:`.orm.attributes.NEVER_SET`, which is a value that may be sent +to an attribute listener now. This symbol may also be received when +calling on mapper utility functions such as :meth:`.Mapper.primary_key_from_instance`; +if the primary key attributes have no setting at all, whereas the value +would be ``None`` before, it will now be the :data:`.orm.attributes.NEVER_SET` +symbol, and no change to the object's state occurs. -:ticket:`2992` +:ticket:`3061` .. _migration_yield_per_eager_loading: @@ -406,344 +542,210 @@ from the unit of work. Behavioral Changes - Core ========================= -.. _change_3163: - -Event listeners can not be added or removed from within that event's runner ---------------------------------------------------------------------------- - -Removal of an event listener from inside that same event itself would -modify the elements of a list during iteration, which would cause -still-attached event listeners to silently fail to fire. To prevent -this while still maintaining performance, the lists have been replaced -with ``collections.deque()``, which does not allow any additions or -removals during iteration, and instead raises ``RuntimeError``. - -:ticket:`3163` - -.. _change_3169: - -The INSERT...FROM SELECT construct now implies ``inline=True`` --------------------------------------------------------------- - -Using :meth:`.Insert.from_select` now implies ``inline=True`` -on :func:`.insert`. This helps to fix a bug where an -INSERT...FROM SELECT construct would inadvertently be compiled -as "implicit returning" on supporting backends, which would -cause breakage in the case of an INSERT that inserts zero rows -(as implicit returning expects a row), as well as arbitrary -return data in the case of an INSERT that inserts multiple -rows (e.g. only the first row of many). -A similar change is also applied to an INSERT..VALUES -with multiple parameter sets; implicit RETURNING will no longer emit -for this statement either. As both of these constructs deal -with varible numbers of rows, the -:attr:`.ResultProxy.inserted_primary_key` accessor does not -apply. Previously, there was a documentation note that one -may prefer ``inline=True`` with INSERT..FROM SELECT as some databases -don't support returning and therefore can't do "implicit" returning, -but there's no reason an INSERT...FROM SELECT needs implicit returning -in any case. Regular explicit :meth:`.Insert.returning` should -be used to return variable numbers of result rows if inserted -data is needed. - -:ticket:`3169` - -.. _change_3027: - -``autoload_with`` now implies ``autoload=True`` ------------------------------------------------ - -A :class:`.Table` can be set up for reflection by passing -:paramref:`.Table.autoload_with` alone:: - - my_table = Table('my_table', metadata, autoload_with=some_engine) - -:ticket:`3027` - - -New Features -============ - -.. _feature_3034: - -Select/Query LIMIT / OFFSET may be specified as an arbitrary SQL expression ----------------------------------------------------------------------------- - -The :meth:`.Select.limit` and :meth:`.Select.offset` methods now accept -any SQL expression, in addition to integer values, as arguments. The ORM -:class:`.Query` object also passes through any expression to the underlying -:class:`.Select` object. Typically -this is used to allow a bound parameter to be passed, which can be substituted -with a value later:: - - sel = select([table]).limit(bindparam('mylimit')).offset(bindparam('myoffset')) - -Dialects which don't support non-integer LIMIT or OFFSET expressions may continue -to not support this behavior; third party dialects may also need modification -in order to take advantage of the new behavior. A dialect which currently -uses the ``._limit`` or ``._offset`` attributes will continue to function -for those cases where the limit/offset was specified as a simple integer value. -However, when a SQL expression is specified, these two attributes will -instead raise a :class:`.CompileError` on access. A third-party dialect which -wishes to support the new feature should now call upon the ``._limit_clause`` -and ``._offset_clause`` attributes to receive the full SQL expression, rather -than the integer value. - -Behavioral Improvements -======================= - -.. _feature_updatemany: - -UPDATE statements are now batched with executemany() in a flush ----------------------------------------------------------------- - -UPDATE statements can now be batched within an ORM flush -into more performant executemany() call, similarly to how INSERT -statements can be batched; this will be invoked within flush -based on the following criteria: - -* two or more UPDATE statements in sequence involve the identical set of - columns to be modified. - -* The statement has no embedded SQL expressions in the SET clause. - -* The mapping does not use a :paramref:`~.orm.mapper.version_id_col`, or - the backend dialect supports a "sane" rowcount for an executemany() - operation; most DBAPIs support this correctly now. - -ORM full object fetches 25% faster ----------------------------------- - -The mechanics of the ``loading.py`` module as well as the identity map -have undergone several passes of inlining, refactoring, and pruning, so -that a raw load of rows now populates ORM-based objects around 25% faster. -Assuming a 1M row table, a script like the following illustrates the type -of load that's improved the most:: - - import time - from sqlalchemy import Integer, Column, create_engine, Table - from sqlalchemy.orm import Session - from sqlalchemy.ext.declarative import declarative_base - - Base = declarative_base() - - class Foo(Base): - __table__ = Table( - 'foo', Base.metadata, - Column('id', Integer, primary_key=True), - Column('a', Integer(), nullable=False), - Column('b', Integer(), nullable=False), - Column('c', Integer(), nullable=False), - ) - - engine = create_engine( - 'mysql+mysqldb://scott:tiger@localhost/test', echo=True) - - sess = Session(engine) - - now = time.time() - - # avoid using all() so that we don't have the overhead of building - # a large list of full objects in memory - for obj in sess.query(Foo).yield_per(100).limit(1000000): - pass +.. _migration_2992: - print("Total time: %d" % (time.time() - now)) +Warnings emitted when coercing full SQL fragments into text() +------------------------------------------------------------- -Local MacBookPro results bench from 19 seconds for 0.9 down to 14 seconds for -1.0. The :meth:`.Query.yield_per` call is always a good idea when batching -huge numbers of rows, as it prevents the Python interpreter from having -to allocate a huge amount of memory for all objects and their instrumentation -at once. Without the :meth:`.Query.yield_per`, the above script on the -MacBookPro is 31 seconds on 0.9 and 26 seconds on 1.0, the extra time spent -setting up very large memory buffers. +Since SQLAlchemy's inception, there has always been an emphasis on not getting +in the way of the usage of plain text. The Core and ORM expression systems +were intended to allow any number of points at which the user can just +use plain text SQL expressions, not just in the sense that you can send a +full SQL string to :meth:`.Connection.execute`, but that you can send strings +with SQL expressions into many functions, such as :meth:`.Select.where`, +:meth:`.Query.filter`, and :meth:`.Select.order_by`. +Note that by "SQL expressions" we mean a **full fragment of a SQL string**, +such as:: + # the argument sent to where() is a full SQL expression + stmt = select([sometable]).where("somecolumn = 'value'") -.. _feature_3176: +and we are **not talking about string arguments**, that is, the normal +behavior of passing string values that become parameterized:: -New KeyedTuple implementation dramatically faster -------------------------------------------------- + # This is a normal Core expression with a string argument - + # we aren't talking about this!! + stmt = select([sometable]).where(sometable.c.somecolumn == 'value') -We took a look into the :class:`.KeyedTuple` implementation in the hopes -of improving queries like this:: +The Core tutorial has long featured an example of the use of this technique, +using a :func:`.select` construct where virtually all components of it +are specified as straight strings. However, despite this long-standing +behavior and example, users are apparently surprised that this behavior +exists, and when asking around the community, I was unable to find any user +that was in fact *not* surprised that you can send a full string into a method +like :meth:`.Query.filter`. - rows = sess.query(Foo.a, Foo.b, Foo.c).all() +So the change here is to encourage the user to qualify textual strings when +composing SQL that is partially or fully composed from textual fragments. +When composing a select as below:: -The :class:`.KeyedTuple` class is used rather than Python's -``collections.namedtuple()``, because the latter has a very complex -type-creation routine that benchmarks much slower than :class:`.KeyedTuple`. -However, when fetching hundreds of thousands of rows, -``collections.namedtuple()`` quickly overtakes :class:`.KeyedTuple` which -becomes dramatically slower as instance invocation goes up. What to do? -A new type that hedges between the approaches of both. Benching -all three types for "size" (number of rows returned) and "num" -(number of distinct queries), the new "lightweight keyed tuple" either -outperforms both, or lags very slightly behind the faster object, based on -which scenario. In the "sweet spot", where we are both creating a good number -of new types as well as fetching a good number of rows, the lightweight -object totally smokes both namedtuple and KeyedTuple:: + stmt = select(["a", "b"]).where("a = b").select_from("sometable") - ----------------- - size=10 num=10000 # few rows, lots of queries - namedtuple: 3.60302400589 # namedtuple falls over - keyedtuple: 0.255059957504 # KeyedTuple very fast - lw keyed tuple: 0.582715034485 # lw keyed trails right on KeyedTuple - ----------------- - size=100 num=1000 # <--- sweet spot - namedtuple: 0.365247011185 - keyedtuple: 0.24896979332 - lw keyed tuple: 0.0889317989349 # lw keyed blows both away! - ----------------- - size=10000 num=100 - namedtuple: 0.572599887848 - keyedtuple: 2.54251694679 - lw keyed tuple: 0.613876104355 - ----------------- - size=1000000 num=10 # few queries, lots of rows - namedtuple: 5.79669594765 # namedtuple very fast - keyedtuple: 28.856498003 # KeyedTuple falls over - lw keyed tuple: 6.74346804619 # lw keyed trails right on namedtuple +The statement is built up normally, with all the same coercions as before. +However, one will see the following warnings emitted:: + SAWarning: Textual column expression 'a' should be explicitly declared + with text('a'), or use column('a') for more specificity + (this warning may be suppressed after 10 occurrences) -:ticket:`3176` + SAWarning: Textual column expression 'b' should be explicitly declared + with text('b'), or use column('b') for more specificity + (this warning may be suppressed after 10 occurrences) -.. _feature_3178: + SAWarning: Textual SQL expression 'a = b' should be explicitly declared + as text('a = b') (this warning may be suppressed after 10 occurrences) -New systems to safely emit parameterized warnings -------------------------------------------------- + SAWarning: Textual SQL FROM expression 'sometable' should be explicitly + declared as text('sometable'), or use table('sometable') for more + specificity (this warning may be suppressed after 10 occurrences) -For a long time, there has been a restriction that warning messages could not -refer to data elements, such that a particular function might emit an -infinite number of unique warnings. The key place this occurs is in the -``Unicode type received non-unicode bind param value`` warning. Placing -the data value in this message would mean that the Python ``__warningregistry__`` -for that module, or in some cases the Python-global ``warnings.onceregistry``, -would grow unbounded, as in most warning scenarios, one of these two collections -is populated with every distinct warning message. +These warnings attempt to show exactly where the issue is by displaying +the parameters as well as where the string was received. +The warnings make use of the :ref:`feature_3178` so that parameterized warnings +can be emitted safely without running out of memory, and as always, if +one wishes the warnings to be exceptions, the +`Python Warnings Filter `_ +should be used:: -The change here is that by using a special ``string`` type that purposely -changes how the string is hashed, we can control that a large number of -parameterized messages are hashed only on a small set of possible hash -values, such that a warning such as ``Unicode type received non-unicode -bind param value`` can be tailored to be emitted only a specific number -of times; beyond that, the Python warnings registry will begin recording -them as duplicates. + import warnings + warnings.simplefilter("error") # all warnings raise an exception -To illustrate, the following test script will show only ten warnings being -emitted for ten of the parameter sets, out of a total of 1000:: +Given the above warnings, our statement works just fine, but +to get rid of the warnings we would rewrite our statement as follows:: - from sqlalchemy import create_engine, Unicode, select, cast - import random - import warnings + from sqlalchemy import select, text + stmt = select([ + text("a"), + text("b") + ]).where(text("a = b")).select_from(text("sometable")) - e = create_engine("sqlite://") +and as the warnings suggest, we can give our statement more specificity +about the text if we use :func:`.column` and :func:`.table`:: - # Use the "once" filter (which is also the default for Python - # warnings). Exactly ten of these warnings will - # be emitted; beyond that, the Python warnings registry will accumulate - # new values as dupes of one of the ten existing. - warnings.filterwarnings("once") + from sqlalchemy import select, text, column, table - for i in range(1000): - e.execute(select([cast( - ('foo_%d' % random.randint(0, 1000000)).encode('ascii'), Unicode)])) + stmt = select([column("a"), column("b")]).\ + where(text("a = b")).select_from(table("sometable")) -The format of the warning here is:: +Where note also that :func:`.table` and :func:`.column` can now +be imported from "sqlalchemy" without the "sql" part. - /path/lib/sqlalchemy/sql/sqltypes.py:186: SAWarning: Unicode type received - non-unicode bind param value 'foo_4852'. (this warning may be - suppressed after 10 occurrences) +The behavior here applies to :func:`.select` as well as to key methods +on :class:`.Query`, including :meth:`.Query.filter`, +:meth:`.Query.from_statement` and :meth:`.Query.having`. +ORDER BY and GROUP BY are special cases +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -:ticket:`3178` +There is one case where usage of a string has special meaning, and as part +of this change we have enhanced its functionality. When we have a +:func:`.select` or :class:`.Query` that refers to some column name or named +label, we might want to GROUP BY and/or ORDER BY known columns or labels:: -.. _feature_2963: + stmt = select([ + user.c.name, + func.count(user.c.id).label("id_count") + ]).group_by("name").order_by("id_count") -.info dictionary improvements ------------------------------ +In the above statement we expect to see "ORDER BY id_count", as opposed to a +re-statement of the function. The string argument given is actively +matched to an entry in the columns clause during compilation, so the above +statement would produce as we expect, without warnings (though note that +the ``"name"`` expression has been resolved to ``users.name``!):: -The :attr:`.InspectionAttr.info` collection is now available on every kind -of object that one would retrieve from the :attr:`.Mapper.all_orm_descriptors` -collection. This includes :class:`.hybrid_property` and :func:`.association_proxy`. -However, as these objects are class-bound descriptors, they must be accessed -**separately** from the class to which they are attached in order to get -at the attribute. Below this is illustared using the -:attr:`.Mapper.all_orm_descriptors` namespace:: + SELECT users.name, count(users.id) AS id_count + FROM users GROUP BY users.name ORDER BY id_count - class SomeObject(Base): - # ... +However, if we refer to a name that cannot be located, then we get +the warning again, as below:: - @hybrid_property - def some_prop(self): - return self.value + 5 + stmt = select([ + user.c.name, + func.count(user.c.id).label("id_count") + ]).order_by("some_label") +The output does what we say, but again it warns us:: - inspect(SomeObject).all_orm_descriptors.some_prop.info['foo'] = 'bar' + SAWarning: Can't resolve label reference 'some_label'; converting to + text() (this warning may be suppressed after 10 occurrences) -It is also available as a constructor argument for all :class:`.SchemaItem` -objects (e.g. :class:`.ForeignKey`, :class:`.UniqueConstraint` etc.) as well -as remaining ORM constructs such as :func:`.orm.synonym`. + SELECT users.name, count(users.id) AS id_count + FROM users ORDER BY some_label -:ticket:`2971` +The above behavior applies to all those places where we might want to refer +to a so-called "label reference"; ORDER BY and GROUP BY, but also within an +OVER clause as well as a DISTINCT ON clause that refers to columns (e.g. the +Postgresql syntax). -:ticket:`2963` +We can still specify any arbitrary expression for ORDER BY or others using +:func:`.text`:: -.. _migration_3177: + stmt = select([users]).order_by(text("some special expression")) -Change to single-table-inheritance criteria when using from_self(), count() ---------------------------------------------------------------------------- +The upshot of the whole change is that SQLAlchemy now would like us +to tell it when a string is sent that this string is explicitly +a :func:`.text` construct, or a column, table, etc., and if we use it as a +label name in an order by, group by, or other expression, SQLAlchemy expects +that the string resolves to something known, else it should again +be qualified with :func:`.text` or similar. -Given a single-table inheritance mapping, such as:: +:ticket:`2992` - class Widget(Base): - __table__ = 'widget_table' +.. _change_3163: - class FooWidget(Widget): - pass +Event listeners can not be added or removed from within that event's runner +--------------------------------------------------------------------------- -Using :meth:`.Query.from_self` or :meth:`.Query.count` against a subclass -would produce a subquery, but then add the "WHERE" criteria for subtypes -to the outside:: +Removal of an event listener from inside that same event itself would +modify the elements of a list during iteration, which would cause +still-attached event listeners to silently fail to fire. To prevent +this while still maintaining performance, the lists have been replaced +with ``collections.deque()``, which does not allow any additions or +removals during iteration, and instead raises ``RuntimeError``. - sess.query(FooWidget).from_self().all() +:ticket:`3163` -rendering:: +.. _change_3169: - SELECT - anon_1.widgets_id AS anon_1_widgets_id, - anon_1.widgets_type AS anon_1_widgets_type - FROM (SELECT widgets.id AS widgets_id, widgets.type AS widgets_type, - FROM widgets) AS anon_1 - WHERE anon_1.widgets_type IN (?) +The INSERT...FROM SELECT construct now implies ``inline=True`` +-------------------------------------------------------------- -The issue with this is that if the inner query does not specify all -columns, then we can't add the WHERE clause on the outside (it actually tries, -and produces a bad query). This decision -apparently goes way back to 0.6.5 with the note "may need to make more -adjustments to this". Well, those adjustments have arrived! So now the -above query will render:: +Using :meth:`.Insert.from_select` now implies ``inline=True`` +on :func:`.insert`. This helps to fix a bug where an +INSERT...FROM SELECT construct would inadvertently be compiled +as "implicit returning" on supporting backends, which would +cause breakage in the case of an INSERT that inserts zero rows +(as implicit returning expects a row), as well as arbitrary +return data in the case of an INSERT that inserts multiple +rows (e.g. only the first row of many). +A similar change is also applied to an INSERT..VALUES +with multiple parameter sets; implicit RETURNING will no longer emit +for this statement either. As both of these constructs deal +with varible numbers of rows, the +:attr:`.ResultProxy.inserted_primary_key` accessor does not +apply. Previously, there was a documentation note that one +may prefer ``inline=True`` with INSERT..FROM SELECT as some databases +don't support returning and therefore can't do "implicit" returning, +but there's no reason an INSERT...FROM SELECT needs implicit returning +in any case. Regular explicit :meth:`.Insert.returning` should +be used to return variable numbers of result rows if inserted +data is needed. - SELECT - anon_1.widgets_id AS anon_1_widgets_id, - anon_1.widgets_type AS anon_1_widgets_type - FROM (SELECT widgets.id AS widgets_id, widgets.type AS widgets_type, - FROM widgets - WHERE widgets.type IN (?)) AS anon_1 +:ticket:`3169` -So that queries that don't include "type" will still work!:: +.. _change_3027: - sess.query(FooWidget.id).count() +``autoload_with`` now implies ``autoload=True`` +----------------------------------------------- -Renders:: +A :class:`.Table` can be set up for reflection by passing +:paramref:`.Table.autoload_with` alone:: - SELECT count(*) AS count_1 - FROM (SELECT widgets.id AS widgets_id - FROM widgets - WHERE widgets.type IN (?)) AS anon_1 + my_table = Table('my_table', metadata, autoload_with=some_engine) +:ticket:`3027` -:ticket:`3177` Dialect Changes