beefed up documentation for count(), [ticket:1465]

author Mike Bayer <mike_mp@zzzcomputing.com>

Sat, 25 Jul 2009 18:54:20 +0000 (18:54 +0000)

committer Mike Bayer <mike_mp@zzzcomputing.com>

Sat, 25 Jul 2009 18:54:20 +0000 (18:54 +0000)
author Mike Bayer <mike_mp@zzzcomputing.com>
Sat, 25 Jul 2009 18:54:20 +0000 (18:54 +0000)
committer Mike Bayer <mike_mp@zzzcomputing.com>
Sat, 25 Jul 2009 18:54:20 +0000 (18:54 +0000)
diff --git a/doc/build/ormtutorial.rst b/doc/build/ormtutorial.rst

index acdbba149eb6bbd24da10f4158156142710dcc36..c10d457f14949ae176ac4a9d1516ff6afbb6f374 100644 (file)
--- a/doc/build/ormtutorial.rst
+++ b/doc/build/ormtutorial.rst
@@ -567,6 +567,54 @@ To use an entirely string-based statement, using ``from_statement()``; just ensu
      ['ed']
      {stop}[<User('ed','Ed Jones', 'f8s7ccs')>]
  
+Counting
+--------
+
+``Query`` includes a convenience method for counting called ``count()``:
+
+.. sourcecode:: python+sql
+
+    {sql}>>> session.query(User).filter(User.name.like('%ed')).count()
+    SELECT count(1) AS count_1 
+    FROM users 
+    WHERE users.name LIKE ?
+    ['%ed']
+    {stop}2
+    
+The ``count()`` method is used to determine how many rows the SQL statement would return, and is mainly intended to return a simple count of a single type of entity, in this case ``User``.   For more complicated sets of columns or entities where the "thing to be counted" needs to be indicated more specifically, ``count()`` is probably not what you want.  Below, a query for individual columns does return the expected result:
+
+.. sourcecode:: python+sql
+
+    {sql}>>> session.query(User.id, User.name).filter(User.name.like('%ed')).count()
+    SELECT count(1) AS count_1 
+    FROM (SELECT users.id AS users_id, users.name AS users_name 
+    FROM users 
+    WHERE users.name LIKE ?) AS anon_1
+    ['%ed']
+    {stop}2
+
+...but if you look at the generated SQL, SQLAlchemy saw that we were placing individual column expressions and decided to wrap whatever it was we were doing in a subquery, so as to be assured that it returns the "number of rows".   This defensive behavior is not really needed here and in other cases is not what we want at all, such as if we wanted a grouping of counts per name:
+
+.. sourcecode:: python+sql
+
+    {sql}>>> session.query(User.name).group_by(User.name).count()
+    SELECT count(1) AS count_1 
+    FROM (SELECT users.name AS users_name 
+    FROM users GROUP BY users.name) AS anon_1
+    []
+    {stop}4
+
+We don't want the number ``4``, we wanted some rows back.   So for detailed queries where you need to count something specific, use the ``func.count()`` function as a column expression:
+
+.. sourcecode:: python+sql
+    
+    >>> from sqlalchemy import func
+    {sql}>>> session.query(func.count(User.name), User.name).group_by(User.name).all()
+    SELECT count(users.name) AS count_1, users.name AS users_name 
+    FROM users GROUP BY users.name
+    {stop}[]
+    [(1, u'ed'), (1, u'fred'), (1, u'mary'), (1, u'wendy')]
+
  Building a Relation
  ====================
  
@@ -824,7 +872,7 @@ The ``Query`` is suitable for generating statements which can be used as subquer
          (SELECT user_id, count(*) AS address_count FROM addresses GROUP BY user_id) AS adr_count
          ON users.id=adr_count.user_id
  
-Using the ``Query``, we build a statement like this from the inside out.  The ``statement`` accessor returns a SQL expression representing the statement generated by a particular ``Query`` - this is an instance of a ``select()`` construct, which are described in :ref:`sql`::
+Using the ``Query``, we build a statement like this from the inside out.  The ``statement`` accessor returns a SQL expression representing the statement generated by a particular ``Query`` - this is an instance of a ``select()`` construct, which are described in :ref:`sqlexpression_toplevel`::
  
      >>> from sqlalchemy.sql import func
      >>> stmt = session.query(Address.user_id, func.count('*').label('address_count')).group_by(Address.user_id).subquery()
diff --git a/lib/sqlalchemy/orm/query.py b/lib/sqlalchemy/orm/query.py

index c0eb3b02adf898c54f86a7d2d56671e597a2d27a..043ee15683851ffd81441f71f1764d45f46f91e7 100644 (file)
--- a/lib/sqlalchemy/orm/query.py
+++ b/lib/sqlalchemy/orm/query.py
@@ -1450,18 +1450,24 @@ class Query(object):
                  kwargs.get('distinct', False))
  
      def count(self):
-        """Apply this query's criterion to a SELECT COUNT statement.
-
-        If column expressions or LIMIT/OFFSET/DISTINCT are present,
-        the query "SELECT count(1) FROM (SELECT ...)" is issued,
-        so that the result matches the total number of rows
-        this query would return.  For mapped entities,
-        the primary key columns of each is written to the
-        columns clause of the nested SELECT statement.
-
-        For a Query which is only against mapped entities,
-        a simpler "SELECT count(1) FROM table1, table2, ...
-        WHERE criterion" is issued.
+        """Return a count of rows this Query would return.
+        
+        For simple entity queries, count() issues
+        a SELECT COUNT, and will specifically count the primary
+        key column of the first entity only.  If the query uses 
+        LIMIT, OFFSET, or DISTINCT, count() will wrap the statement 
+        generated by this Query in a subquery, from which a SELECT COUNT
+        is issued, so that the contract of "how many rows
+        would be returned?" is honored.
+        
+        For queries that request specific columns or expressions, 
+        count() again makes no assumptions about those expressions
+        and will wrap everything in a subquery.  Therefore,
+        ``Query.count()`` is usually not what you want in this case.   
+        To count specific columns, often in conjunction with 
+        GROUP BY, use ``func.count()`` as an individual column expression
+        instead of ``Query.count()``.  See the ORM tutorial
+        for an example.
  
          """
          should_nest = [self._should_nest_selectable]
author	Mike Bayer <mike_mp@zzzcomputing.com>
	Sat, 25 Jul 2009 18:54:20 +0000 (18:54 +0000)
committer	Mike Bayer <mike_mp@zzzcomputing.com>
	Sat, 25 Jul 2009 18:54:20 +0000 (18:54 +0000)
doc/build/ormtutorial.rst		patch \| blob \| blame \| history
lib/sqlalchemy/orm/query.py		patch \| blob \| blame \| history