From: Mike Bayer Date: Thu, 12 Mar 2015 15:23:44 +0000 (-0400) Subject: - add a rationale section X-Git-Tag: rel_1_0_0b1~9 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=6b76352f46644b4fc978f752bfd9a5d91f316c75;p=thirdparty%2Fsqlalchemy%2Fsqlalchemy.git - add a rationale section --- diff --git a/doc/build/orm/extensions/baked.rst b/doc/build/orm/extensions/baked.rst index 2fd930c3d2..ad6fbf090f 100644 --- a/doc/build/orm/extensions/baked.rst +++ b/doc/build/orm/extensions/baked.rst @@ -102,6 +102,7 @@ Following are some observations about the above code: SQL string, we use :func:`.bindparam` to construct named parameters, where we apply their actual values later using :meth:`.Result.params`. + Performance ----------- @@ -159,6 +160,146 @@ as being impacted by this particular form of overhead. measurement techniques are used when attempting to improve the performance of an application. +Rationale +--------- + +The "lambda" approach above is a superset of what would be a more +traditional "parameterized" approach. Suppose we wished to build +a simple system where we build a :class:`~.query.Query` just once, then +store it in a dictionary for re-use. This is possible right now by +just building up the query, and removing its :class:`.Session` by calling +``my_cached_query = query.with_session(None)``:: + + my_simple_cache = {} + + def lookup(session, id_argument): + if "my_key" not in my_simple_cache: + query = session.query(Model).filter(Model.id == bindparam('id')) + my_simple_cache["my_key"] = query.with_session(None) + else: + query = my_simple_cache["my_key"].with_session(session) + + return query.params(id=id_argument).all() + +The above approach gets us a very minimal performance benefit. +By re-using a :class:`~.query.Query`, we save on the Python work within +the ``session.query(Model)`` constructor as well as calling upon +``filter(Model.id == bindparam('id'))``, which will skip for us the building +up of the Core expression as well as sending it to :meth:`.Query.filter`. +However, the approach still regenerates the full :class:`.Select` +object every time when :meth:`.Query.all` is called and additionally this +brand new :class:`.Select` is sent off to the string compilation step every +time, which for a simple case like the above is probably about 70% of the +overhead. + +We can use the "bakery" approach to re-frame the above in a way that +looks less unusual than the "building up lambdas" approach, and more like +a simple improvement upon the simple "reuse a query" approach:: + + bakery = baked.bakery() + + def lookup(session, id_argument): + def create_model_query(session): + return session.query(Model).filter(Model.id == bindparam('id')) + + parameterized_query = bakery.bake(create_model_query) + return parameterized_query(session).params(id=id_argument).all() + +Above, we use the "baked" system in a manner that is +very similar to the simplistic "cache a query" system. However, it +uses two fewer lines of code, does not need to manufacture a cache key of +"my_key", and caches **100%** of the Python invocation work from the +constructor of the query, to the filter call, to the production +of the :class:`.Select` object, to the string compilation step. + +From the above, if we ask ourselves, "what if lookup needs to make conditional decisions +as to the structure of the query?", this is where hopefully it becomes apparent +why "baked" is the way it is. Instead of a parameterized query building +off from exactly one function (which is how we thought baked might work +originally), we can build it from *any number* of functions. Consider +our naive example, if we needed to have an additional clause in our +query on a conditional basis:: + + my_simple_cache = {} + + def lookup(session, id_argument, include_frobnizzle=False): + if include_frobnizzle: + cache_key = "my_key_with_frobnizzle" + else: + cache_key = "my_key_without_frobnizzle" + + if cache_key not in my_simple_cache: + query = session.query(Model).filter(Model.id == bindparam('id')) + if include_frobnizzle: + query = query.filter(Model.frobnizzle == True) + + my_simple_cache[cache_key] = query.with_session(None) + else: + query = my_simple_cache[cache_key].with_session(session) + + return query.params(id=id_argument).all() + +Our "simple" parameterized system must now be tasked with generating +cache keys which take into account whether or not the "include_frobnizzle" +flag was passed, as the presence of this flag means that the generated +SQL would be entirely different. It should be apparent that as the +complexity of query building goes up, the task of caching these queries +becomes burdensome very quickly. We can convert the above example +into a direct use of "bakery" as follows:: + + + bakery = baked.bakery() + + def lookup(session, id_argument, include_frobnizzle=False): + def create_model_query(session): + return session.query(Model).filter(Model.id == bindparam('id')) + + parameterized_query = bakery.bake(create_model_query) + + if include_frobnizzle: + def include_frobnizzle_in_query(query): + return query.filter(Model.frobnizzle == True) + + parameterized_query = parameterized_query.with_criteria( + include_frobnizzle_in_query) + + return parameterized_query(session).params(id=id_argument).all() + +Above, we again cache not just the query object but all the work it needs +to do in order to generate SQL. We also no longer need to deal with +making sure we generate a cache key that accurately takes into account +all of the structural modifications we've made; this is now handled +automatically and without the chance of mistakes. + +This code sample is a few lines shorter than the naive example, removes +the need to deal with cache keys, and is vastly more performant. But +still a little verbose! Hence we take methods like :meth:`.BakedQuery.add_criteria` +and :meth:`.BakedQuery.with_criteria` and shorten them into operators, and +encourage (though certainly not require!) using simple lambdas, only as a +means to reduce verbosity:: + + bakery = baked.bakery() + + def lookup(session, id_argument, include_frobnizzle=False): + parameterized_query = bakery.bake( + lambda s: s.query(Model).filter(Model.id == bindparam('id')) + ) + + if include_frobnizzle: + parameterized_query += lambda q: q.filter(Model.frobnizzle == True) + + return parameterized_query(session).params(id=id_argument).all() + +Where above, we have an approach to our naive caching example +that is vastly more performant, simpler to implement, and much more similar +in code flow to what a non-cached querying function would look like, +hence making code easier to port. + +The above description is essentially a summary of the design process used +to arrive at the current "baked" approach. Starting from the +"normal" approaches, the additional issues of cache key construction and +management, removal of all redundant Python execution, and queries built up +with conditionals needed to be addressed, leading to the final approach. Lazy Loading Integration ------------------------