- added new flag to String and create_engine(), assert_unicode=(True|False|None).

author Mike Bayer <mike_mp@zzzcomputing.com>

Sun, 25 Nov 2007 23:14:03 +0000 (23:14 +0000)

committer Mike Bayer <mike_mp@zzzcomputing.com>

Sun, 25 Nov 2007 23:14:03 +0000 (23:14 +0000)
author Mike Bayer <mike_mp@zzzcomputing.com>
Sun, 25 Nov 2007 23:14:03 +0000 (23:14 +0000)
committer Mike Bayer <mike_mp@zzzcomputing.com>
Sun, 25 Nov 2007 23:14:03 +0000 (23:14 +0000)
diff --git a/CHANGES b/CHANGES

index 007691d3ad37c626884e4c11d579f687b1d3d68a..adb61906c2a52ce6d4b3055bc28cd8da55e41c85 100644 (file)
--- a/CHANGES
+++ b/CHANGES
@@ -3,7 +3,14 @@ CHANGES
  =======
  0.4.2
  -----
-
+- sql
+    - added new flag to String and create_engine(), assert_unicode=(True|False|None).
+    When convert_unicode=True, this flag also defaults to `True`, and results in all 
+    unicode conversion operations raising an exception when a non-unicode bytestring
+    is passed as a bind parameter.  It is strongly advised that all unicode-aware
+    applications make proper use of Python unicode objects (i.e. u'hello' and 
+    not 'hello').
+    
  - orm
  
     - fixed endless loop issue when using lazy="dynamic" on both 
diff --git a/doc/build/content/dbengine.txt b/doc/build/content/dbengine.txt

index a8b6e5bce8cda74081f51f202b90529492d30d65..9fc0ae5910c96ffd1a4e5b4ded54a11f4be76cd9 100644 (file)
--- a/doc/build/content/dbengine.txt
+++ b/doc/build/content/dbengine.txt
@@ -126,6 +126,7 @@ Keyword options can also be specified to `create_engine()`, following the string
  
  A list of all standard options, as well as several that are used by particular database dialects, is as follows:
  
+* **assert_unicode=None** - defaults to `True` when `convert_unicode==True`.  This will assert that all incoming string bind parameters are instances of `unicode`.  Only takes effect when `convert_unicode==True`.  Set to `False` to disable unicode assertions when `convert_unicode==True`.  This flag is also available on the `String` type and its decsendants. New in 0.4.2.  
  * **connect_args** - a dictionary of options which will be passed directly to the DBAPI's `connect()` method as additional keyword arguments.
  * **convert_unicode=False** - if set to True, all String/character based types will convert Unicode values to raw byte values going into the database, and all raw byte values to Python Unicode coming out in result sets.  This is an engine-wide method to provide unicode conversion across the board.  For unicode conversion on a column-by-column level, use the `Unicode` column type instead, described in [types](rel:types).
  * **creator** - a callable which returns a DBAPI connection.  This creation function will be passed to the underlying connection pool and will be used to create all new database connections.  Usage of this function causes connection parameters specified in the URL argument to be bypassed.
diff --git a/doc/build/content/types.txt b/doc/build/content/types.txt

index 4abe508df4a073d7c960be61ff6ca3a029a67ddf..8f9cb1f7c53c57ed3085487939fdb43cc4a5063d 100644 (file)
--- a/doc/build/content/types.txt
+++ b/doc/build/content/types.txt
@@ -5,10 +5,53 @@ The package `sqlalchemy.types` defines the datatype identifiers which may be use
  
  ### Built-in Types {@name=standard}
  
-SQLAlchemy comes with a set of standard generic datatypes, which are defined as classes.   
+SQLAlchemy comes with a set of standard generic datatypes, which are defined as classes.  Types are usually used when defining tables, and can be left as a class or instantiated, for example:
  
-The standard set of generic types are:
+    {python}
+    mytable = Table('mytable', metadata,
+        Column('myid', Integer, primary_key=True),
+        Column('data', String(30)),
+        Column('info', Unicode(100)),
+        Column('value', Number(7,4)) 
+        )
+
+Following is a rundown of the standard types.
+
+#### String
+
+This type is the base type for all string and character types, such as `Unicode`, `Text`, `CLOB`, etc.  By default it generates a VARCHAR in DDL.  It includes an argument `length`, which indicates the length in characters of the type, as well as `convert_unicode` and `assert_unicode`, which are booleans.  `length` will be used as the length argument when generating DDL.  If `length` is omitted, the `String` type resolves into the `Text` type.
+
+`convert_unicode=True` indicates that incoming strings, if they are Python `unicode` strings, will be encoded into a raw bytestring using the `encoding` attribute of the dialect (defaults to `utf-8`).  Similarly, raw bytestrings coming back from the database will be decoded into `unicode` objects on the way back.
+
+`assert_unicode=True` is set to true by default when `convert_unicode=True`, and indicates that incoming bind parameters will be checked that they are in fact  `unicode` objects, else an error is raised.  (this flag is new as of version 0.4.2)
+
+Both `convert_unicode` and `assert_unicode` may be set at the engine level as flags to `create_engine()`.
+
+#### Unicode
+
+The `Unicode` type is shorthand for `String` with `convert_unicode=True` and `assert_unicode=True`.  When writing a unicode-aware appication, it is strongly recommended that this type is used, and that only unicode strings are used in the application.  By "unicode string" we mean a string with a u, i.e. `u'hello'`.  Otherwise, particularly when using the ORM, data will be converted to unicode when it returns from the database, but local data which was generated locally will not be in unicode format, which can create confusion.
+
+#### Numeric
+
+TODO
+
+#### Float
  
+TODO
+
+#### Datetime/Date/Time
+
+TODO
+
+#### Binary
+
+TODO
+
+#### Boolean
+
+TODO
+
+#### Summary of Types
      {python title="package sqlalchemy.types"}
      class String(TypeEngine):
          def __init__(self, length=None)
@@ -66,17 +109,6 @@ More specific subclasses of these types are available, which various database en
  When using a specific database engine, these types are adapted even further via a set of database-specific subclasses defined by the database engine.
  There may eventually be more type objects that are defined for specific databases.  An example of this would be Postgres' Array type.
  
-Type objects are specified to table meta data using either the class itself, or an instance of the class.  Creating an instance of the class allows you to specify parameters for the type, such as string length, numerical precision, etc.:
-
-    {python}
-    mytable = Table('mytable', engine, 
-        # define type using a class
-        Column('my_id', Integer, primary_key=True), 
-        
-        # define type using an object instance
-        Column('value', Number(7,4)) 
-    )
-
  ### Dialect Specific Types {@name=dialect}
  
  Each dialect has its own set of types, many of which are available only within that dialect.  For example, MySQL has a `BigInteger` type and Postgres has an `Inet` type.  To use these, import them from the module explicitly:
diff --git a/lib/sqlalchemy/engine/default.py b/lib/sqlalchemy/engine/default.py

index 19ab22c9e9d1857d9c91dd494010b045868a79be..fab5d05b051de887d8a0f22ff9e928b48b54af08 100644 (file)
--- a/lib/sqlalchemy/engine/default.py
+++ b/lib/sqlalchemy/engine/default.py
@@ -35,8 +35,9 @@ class DefaultDialect(base.Dialect):
      supports_pk_autoincrement = True
      dbapi_type_map = {}
      
-    def __init__(self, convert_unicode=False, encoding='utf-8', default_paramstyle='named', paramstyle=None, dbapi=None, **kwargs):
+    def __init__(self, convert_unicode=False, assert_unicode=None, encoding='utf-8', default_paramstyle='named', paramstyle=None, dbapi=None, **kwargs):
          self.convert_unicode = convert_unicode
+        self.assert_unicode = assert_unicode
          self.encoding = encoding
          self.positional = False
          self._ischema = None
diff --git a/lib/sqlalchemy/types.py b/lib/sqlalchemy/types.py

index 0f93b8e349b983ebfc427847a8bd1f844c649e9b..ac9a36195bf37df34122d1c404543e30e822dc8a 100644 (file)
--- a/lib/sqlalchemy/types.py
+++ b/lib/sqlalchemy/types.py
@@ -300,18 +300,27 @@ class Concatenable(object):
              return op
  
  class String(Concatenable, TypeEngine):
-    def __init__(self, length=None, convert_unicode=False):
+    def __init__(self, length=None, convert_unicode=False, assert_unicode=None):
          self.length = length
          self.convert_unicode = convert_unicode
+        self.assert_unicode = assert_unicode
  
      def adapt(self, impltype):
          return impltype(length=self.length, convert_unicode=self.convert_unicode)
  
      def bind_processor(self, dialect):
          if self.convert_unicode or dialect.convert_unicode:
+            if self.assert_unicode is not None:
+                assert_unicode = self.assert_unicode
+            elif dialect.assert_unicode is not None:
+                assert_unicode = dialect.assert_unicode
+            else:
+                assert_unicode = True
              def process(value):
                  if isinstance(value, unicode):
                      return value.encode(dialect.encoding)
+                elif assert_unicode:
+                    raise exceptions.InvalidRequestError("Received non-unicode bind param value %r" % value)
                  else:
                      return value
              return process
@@ -344,7 +353,7 @@ class String(Concatenable, TypeEngine):
  
  class Unicode(String):
      def __init__(self, length=None, **kwargs):
-        kwargs['convert_unicode'] = True
+        kwargs['convert_unicode'] = kwargs['assert_unicode'] = True
          super(Unicode, self).__init__(length=length, **kwargs)
  
  class Integer(TypeEngine):
diff --git a/test/sql/testtypes.py b/test/sql/testtypes.py

index 630ecb9d53a711952208db21a849b4f9bb9bec99..7a9add947135c87847f5324aa6d08ab740c182da 100644 (file)
--- a/test/sql/testtypes.py
+++ b/test/sql/testtypes.py
@@ -2,7 +2,7 @@ import testbase
  import pickleable
  import datetime, os
  from sqlalchemy import *
-from sqlalchemy import types
+from sqlalchemy import types, exceptions
  from sqlalchemy.sql import operators
  import sqlalchemy.engine.url as url
  from sqlalchemy.databases import mssql, oracle, mysql, postgres, firebird
@@ -178,9 +178,9 @@ class UserDefinedTest(PersistTest):
      def testprocessing(self):
  
          global users
-        users.insert().execute(user_id = 2, goofy = 'jack', goofy2='jack', goofy3='jack', goofy4='jack', goofy5='jack', goofy6='jack')
-        users.insert().execute(user_id = 3, goofy = 'lala', goofy2='lala', goofy3='lala', goofy4='lala', goofy5='lala', goofy6='lala')
-        users.insert().execute(user_id = 4, goofy = 'fred', goofy2='fred', goofy3='fred', goofy4='fred', goofy5='fred', goofy6='fred')
+        users.insert().execute(user_id = 2, goofy = 'jack', goofy2='jack', goofy3='jack', goofy4=u'jack', goofy5=u'jack', goofy6='jack')
+        users.insert().execute(user_id = 3, goofy = 'lala', goofy2='lala', goofy3='lala', goofy4=u'lala', goofy5=u'lala', goofy6='lala')
+        users.insert().execute(user_id = 4, goofy = 'fred', goofy2='fred', goofy3='fred', goofy4=u'fred', goofy5=u'fred', goofy6='fred')
  
          l = users.select().execute().fetchall()
          assert l == [
@@ -286,7 +286,13 @@ class UnicodeTest(AssertMixin):
              print "it's %s!" % testbase.db.name
          else:
              self.assert_(not isinstance(x['plain_varchar'], unicode) and x['plain_varchar'] == rawdata)
-
+    
+    def testassert(self):
+        try:
+            unicode_table.insert().execute(unicode_varchar='im not unicode')
+        except exceptions.InvalidRequestError, e:
+            assert str(e) == "Received non-unicode bind param value 'im not unicode'"
+        
      @testing.unsupported('oracle')
      def testblanks(self):
          unicode_table.insert().execute(unicode_varchar=u'')
@@ -295,8 +301,10 @@ class UnicodeTest(AssertMixin):
      def testengineparam(self):
          """tests engine-wide unicode conversion"""
          prev_unicode = testbase.db.engine.dialect.convert_unicode
+        prev_assert = testbase.db.engine.dialect.assert_unicode
          try:
              testbase.db.engine.dialect.convert_unicode = True
+            testbase.db.engine.dialect.assert_unicode = False
              rawdata = 'Alors vous imaginez ma surprise, au lever du jour, quand une dr\xc3\xb4le de petit voix m\xe2\x80\x99a r\xc3\xa9veill\xc3\xa9. Elle disait: \xc2\xab S\xe2\x80\x99il vous pla\xc3\xaet\xe2\x80\xa6 dessine-moi un mouton! \xc2\xbb\n'
              unicodedata = rawdata.decode('utf-8')
              unicode_table.insert().execute(unicode_varchar=unicodedata,
@@ -312,6 +320,7 @@ class UnicodeTest(AssertMixin):
              self.assert_(isinstance(x['plain_varchar'], unicode) and x['plain_varchar'] == unicodedata)
          finally:
              testbase.db.engine.dialect.convert_unicode = prev_unicode
+            testbase.db.engine.dialect.convert_unicode = prev_assert
  
      @testing.unsupported('oracle')
      def testlength(self):
author	Mike Bayer <mike_mp@zzzcomputing.com>
	Sun, 25 Nov 2007 23:14:03 +0000 (23:14 +0000)
committer	Mike Bayer <mike_mp@zzzcomputing.com>
	Sun, 25 Nov 2007 23:14:03 +0000 (23:14 +0000)
CHANGES		patch \| blob \| blame \| history
doc/build/content/dbengine.txt		patch \| blob \| blame \| history
doc/build/content/types.txt		patch \| blob \| blame \| history
lib/sqlalchemy/engine/default.py		patch \| blob \| blame \| history
lib/sqlalchemy/types.py		patch \| blob \| blame \| history
test/sql/testtypes.py		patch \| blob \| blame \| history