From: jonathan vanasco Date: Mon, 27 Sep 2021 16:41:24 +0000 (-0400) Subject: Add new sections regarding schemas and reflection X-Git-Tag: rel_2_0_0b1~648^2 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=0fa0beacb465c61e792c97d530a0e8fdd7139256;p=thirdparty%2Fsqlalchemy%2Fsqlalchemy.git Add new sections regarding schemas and reflection * add a new section to reflection.rst `Schemas and Reflection`. * this contains some text from the ticket * migrate some text from `Specifying the Schema Name` to new section * migrate some text from PostgreSQL dialect to new section * target text is made more generic * cross-reference the postgres and new sections to one another, to avoid duplication of docs * update some docs 'meta' to 'metadata_obj' Fixes: #4387 Co-authored-by: Mike Bayer Change-Id: I2b08672753fb2575d30ada07ead77587468fdade --- diff --git a/doc/build/changelog/migration_12.rst b/doc/build/changelog/migration_12.rst index f0b88c4936..bc1d0739e9 100644 --- a/doc/build/changelog/migration_12.rst +++ b/doc/build/changelog/migration_12.rst @@ -1048,7 +1048,7 @@ localized to the current VALUES clause being processed:: def mydefault(context): return context.get_current_parameters()['counter'] + 12 - mytable = Table('mytable', meta, + mytable = Table('mytable', metadata_obj, Column('counter', Integer), Column('counter_plus_twelve', Integer, default=mydefault, onupdate=mydefault) diff --git a/doc/build/core/metadata.rst b/doc/build/core/metadata.rst index 86a8f6de34..c7316d1b65 100644 --- a/doc/build/core/metadata.rst +++ b/doc/build/core/metadata.rst @@ -284,11 +284,11 @@ remote servers (Oracle DBLINK with synonyms). What all of the above approaches have (mostly) in common is that there's a way of referring to this alternate set of tables using a string name. SQLAlchemy -refers to this name as the **schema name**. Within SQLAlchemy, this is nothing more than -a string name which is associated with a :class:`_schema.Table` object, and -is then rendered into SQL statements in a manner appropriate to the target -database such that the table is referred towards in its remote "schema", whatever -mechanism that is on the target database. +refers to this name as the **schema name**. Within SQLAlchemy, this is nothing +more than a string name which is associated with a :class:`_schema.Table` +object, and is then rendered into SQL statements in a manner appropriate to the +target database such that the table is referred towards in its remote "schema", +whatever mechanism that is on the target database. The "schema" name may be associated directly with a :class:`_schema.Table` using the :paramref:`_schema.Table.schema` argument; when using the ORM @@ -298,11 +298,27 @@ the parameter is passed using the ``__table_args__`` parameter dictionary. The "schema" name may also be associated with the :class:`_schema.MetaData` object where it will take effect automatically for all :class:`_schema.Table` objects associated with that :class:`_schema.MetaData` that don't otherwise -specify their own name. Finally, SQLAlchemy also supports a "dynamic" schema name +specify their own name. Finally, SQLAlchemy also supports a "dynamic" schema name system that is often used for multi-tenant applications such that a single set of :class:`_schema.Table` metadata may refer to a dynamically configured set of schema names on a per-connection or per-statement basis. +.. topic:: What's "schema" ? + + SQLAlchemy's support for database "schema" was designed with first party + support for PostgreSQL-style schemas. In this style, there is first a + "database" that typically has a single "owner". Within this database there + can be any number of "schemas" which then contain the actual table objects. + + A table within a specific schema is referred towards explicitly using the + syntax ".". Constrast this to an architecture such + as that of MySQL, where there are only "databases", however SQL statements + can refer to multiple databases at once, using the same syntax except it + is ".". On Oracle, this syntax refers to yet another + concept, the "owner" of a table. Regardless of which kind of database is + in use, SQLAlchemy uses the phrase "schema" to refer to the qualifying + identifier within the general syntax of ".". + .. seealso:: :ref:`orm_declarative_table_schema_name` - schema name specification when using the ORM @@ -368,6 +384,8 @@ at once, such as:: :ref:`multipart_schema_names` - describes use of dotted schema names with the SQL Server dialect. + :ref:`schema_table_reflection` + .. _schema_metadata_schema_name: @@ -438,11 +456,11 @@ to specify that it should not be schema qualified may use the special symbol schema=BLANK_SCHEMA # will not use "remote_banks" ) - .. seealso:: :paramref:`_schema.MetaData.schema` + .. _schema_dynamic_naming_convention: Applying Dynamic Schema Naming Conventions @@ -454,11 +472,11 @@ basis, so that for example in multi-tenant situations, each transaction or statement may be targeted at a specific set of schema names that change. The section :ref:`schema_translating` describes how this feature is used. - .. seealso:: :ref:`schema_translating` + .. _schema_set_default_connections: Setting a Default Schema for New Connections @@ -506,6 +524,17 @@ for specific information regarding how default schemas are configured. :ref:`postgresql_alternate_search_path` - in the :ref:`postgresql_toplevel` dialect documentation. + + + +Schemas and Reflection +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The schema feature of SQLAlchemy interacts with the table reflection +feature introduced at ref:`metadata_reflection_toplevel`. See the section +:ref:`metadata_reflection_schemas` for additional details on how this works. + + Backend-Specific Options ------------------------ diff --git a/doc/build/core/reflection.rst b/doc/build/core/reflection.rst index 796bd414f9..ec9073138a 100644 --- a/doc/build/core/reflection.rst +++ b/doc/build/core/reflection.rst @@ -13,7 +13,7 @@ existing within the database. This process is called *reflection*. In the most simple case you need only specify the table name, a :class:`~sqlalchemy.schema.MetaData` object, and the ``autoload_with`` argument:: - >>> messages = Table('messages', meta, autoload_with=engine) + >>> messages = Table('messages', metadata_obj, autoload_with=engine) >>> [c.name for c in messages.columns] ['message_id', 'message_name', 'date'] @@ -30,8 +30,8 @@ Below, assume the table ``shopping_cart_items`` references a table named ``shopping_carts``. Reflecting the ``shopping_cart_items`` table has the effect such that the ``shopping_carts`` table will also be loaded:: - >>> shopping_cart_items = Table('shopping_cart_items', meta, autoload_with=engine) - >>> 'shopping_carts' in meta.tables: + >>> shopping_cart_items = Table('shopping_cart_items', metadata_obj, autoload_with=engine) + >>> 'shopping_carts' in metadata_obj.tables: True The :class:`~sqlalchemy.schema.MetaData` has an interesting "singleton-like" @@ -43,7 +43,7 @@ you the already-existing :class:`~sqlalchemy.schema.Table` object if one already exists with the given name. Such as below, we can access the already generated ``shopping_carts`` table just by naming it:: - shopping_carts = Table('shopping_carts', meta) + shopping_carts = Table('shopping_carts', metadata_obj) Of course, it's a good idea to use ``autoload_with=engine`` with the above table regardless. This is so that the table's attributes will be loaded if they have @@ -61,7 +61,7 @@ Individual columns can be overridden with explicit values when reflecting tables; this is handy for specifying custom datatypes, constraints such as primary keys that may not be configured within the database, etc.:: - >>> mytable = Table('mytable', meta, + >>> mytable = Table('mytable', metadata_obj, ... Column('id', Integer, primary_key=True), # override reflected 'id' to have primary key ... Column('mydata', Unicode(50)), # override reflected 'mydata' to be Unicode ... # additional Column objects which require no change are reflected normally @@ -119,6 +119,219 @@ object's dictionary of tables:: for table in reversed(metadata_obj.sorted_tables): someengine.execute(table.delete()) +.. _metadata_reflection_schemas: + +Reflecting Tables from Other Schemas +------------------------------------ + +The section :ref:`schema_table_schema_name` introduces the concept of table +schemas, which are namespaces within a database that contain tables and other +objects, and which can be specified explicitly. The "schema" for a +:class:`_schema.Table` object, as well as for other objects like views, indexes and +sequences, can be set up using the :paramref:`_schema.Table.schema` parameter, +and also as the default schema for a :class:`_schema.MetaData` object using the +:paramref:`_schema.MetaData.schema` parameter. + +The use of this schema parameter directly affects where the table reflection +feature will look when it is asked to reflect objects. For example, given +a :class:`_schema.MetaData` object configured with a default schema name +"project" via its :paramref:`_schema.MetaData.schema` parameter:: + + >>> metadata_obj = MetaData(schema="project") + +The :method:`.MetaData.reflect` will then utilize that configured ``.schema`` +for reflection:: + + >>> # uses `schema` configured in metadata_obj + >>> metadata_obj.reflect(someengine) + +The end result is that :class:`_schema.Table` objects from the "project" +schema will be reflected, and they will be populated as schema-qualified +with that name:: + + >>> metadata_obj.tables['project.messages'] + Table('messages', MetaData(), Column('message_id', INTEGER(), table=), schema='project') + +Similarly, an individual :class:`_schema.Table` object that includes the +:paramref:`_schema.Table.schema` parameter will also be reflected from that +database schema, overriding any default schema that may have been configured on the +owning :class:`_schema.MetaData` collection:: + + >>> messages = Table('messages', metadata_obj, schema="project", autoload_with=someengine) + >>> messages + Table('messages', MetaData(), Column('message_id', INTEGER(), table=), schema='project') + +Finally, the :meth:`_schema.MetaData.reflect` method itself also allows a +:paramref:`_schema.MetaData.reflect.schema` parameter to be passed, so we +could also load tables from the "project" schema for a default configured +:class:`_schema.MetaData` object:: + + >>> metadata_obj = MetaData() + >>> metadata_obj.reflect(someengine, schema="project") + +We can call :meth:`_schema.MetaData.reflect` any number of times with different +:paramref:`_schema.MetaData.schema` arguments (or none at all) to continue +populating the :class:`_schema.MetaData` object with more objects:: + + >>> # add tables from the "customer" schema + >>> metadata_obj.reflect(someengine, schema="customer") + >>> # add tables from the default schema + >>> metadata_obj.reflect(someengine) + +.. _reflection_schema_qualified_interaction: + +Interaction of Schema-qualified Reflection with the Default Schema +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. admonition:: Section Best Practices Summarized + + In this section, we discuss SQLAlchemy's reflection behavior regarding + tables that are visible in the "default schema" of a database session, + and how these interact with SQLAlchemy directives that include the schema + explicitly. As a best practice, ensure the "default" schema for a database + is just a single name, and not a list of names; for tables that are + part of this "default" schema and can be named without schema qualification + in DDL and SQL, leave corresponding :paramref:`_schema.Table.schema` and + similar schema parameters set to their default of ``None``. + +As described at :ref:`schema_metadata_schema_name`, databases that have +the concept of schemas usually also include the concept of a "default" schema. +The reason for this is naturally that when one refers to table objects without +a schema as is common, a schema-capable database will still consider that +table to be in a "schema" somewhere. Some databases such as PostgreSQL +take this concept further into the notion of a +`schema search path +`_ +where *multiple* schema names can be considered in a particular database +session to be "implicit"; referring to a table name that it's any of those +schemas will not require that the schema name be present (while at the same time +it's also perfectly fine if the schema name *is* present). + +Since most relational databases therefore have the concept of a particular +table object which can be referred towards both in a schema-qualified way, as +well as an "implicit" way where no schema is present, this presents a +complexity for SQLAlchemy's reflection +feature. Reflecting a table in +a schema-qualified manner will always populate its :attr:`_schema.Table.schema` +attribute and additionally affect how this :class:`_schema.Table` is organized +into the :attr:`_schema.MetaData.tables` collection, that is, in a schema +qualified manner. Conversely, reflecting the **same** table in a non-schema +qualified manner will organize it into the :attr:`_schema.MetaData.tables` +collection **without** being schema qualified. The end result is that there +would be two separate :class:`_schema.Table` objects in the single +:class:`_schema.MetaData` collection representing the same table in the +actual database. + +To illustrate the ramifications of this issue, consider tables from the +"project" schema in the previous example, and suppose also that the "project" +schema is the default schema of our database connection, or if using a database +such as PostgreSQL suppose the "project" schema is set up in the PostgreSQL +``search_path``. This would mean that the database accepts the following +two SQL statements as equivalent:: + + -- schema qualified + SELECT message_id FROM project.messages + + -- non-schema qualified + SELECT message_id FROM messages + +This is not a problem as the table can be found in both ways. However +in SQLAlchemy, it's the **identity** of the :class:`_schema.Table` object +that determines its semantic role within a SQL statement. Based on the current +decisions within SQLAlchemy, this means that if we reflect the same "messages" table in +both a schema-qualified as well as a non-schema qualified manner, we get +**two** :class:`_schema.Table` objects that will **not** be treated as +semantically equivalent:: + + >>> # reflect in non-schema qualified fashion + >>> messages_table_1 = Table("messages", metadata_obj, autoload_with=someengine) + >>> # reflect in schema qualified fashion + >>> messages_table_2 = Table("messages", metadata_obj, schema="project", autoload_with=someengine) + >>> # two different objects + >>> messages_table_1 is messages_table_2 + False + >>> # stored in two different ways + >>> metadata.tables["messages"] is messages_table_1 + True + >>> metadata.tables["project.messages"] is messages_table_2 + True + +The above issue becomes more complicated when the tables being reflected contain +foreign key references to other tables. Suppose "messages" has a "project_id" +column which refers to rows in another schema-local table "projects", meaning +there is a :class:`_schema.ForeignKeyConstraint` object that is part of the +definition of the "messages" table. + +We can find ourselves in a situation where one :class:`_schema.MetaData` +collection may contain as many as four :class:`_schema.Table` objects +representing these two database tables, where one or two of the additional +tables were generated by the reflection process; this is because when +the reflection process encounters a foreign key constraint on a table +being reflected, it branches out to reflect that referenced table as well. +The decision making it uses to assign the schema to this referenced +table is that SQLAlchemy will **omit a default schema** from the reflected +:class:`_schema.ForeignKeyConstraint` object if the owning +:class:`_schema.Table` also omits its schema name and also that these two objects +are in the same schema, but will **include** it if +it were not omitted. + +The common scenario is when the reflection of a table in a schema qualified +fashion then loads a related table that will also be performed in a schema +qualified fashion:: + + >>> # reflect "messages" in a schema qualified fashion + >>> messages_table_1 = Table("messages", metadata_obj, schema="project", autoload_with=someengine) + +The above ``messages_table_1`` will refer to ``projects`` also in a schema +qualified fashion. This "projects" table will be reflected automatically by +the fact that "messages" refers to it:: + + >>> messages_table_1.c.project_id + Column('project_id', INTEGER(), ForeignKey('project.projects.project_id'), table=) + +if some other part of the code reflects "projects" in a non-schema qualified +fashion, there are now two projects tables that are not the same: + + >>> # reflect "projects" in a non-schema qualified fashion + >>> projects_table_1 = Table("projects", metadata_obj, autoload_with=someengine) + + >>> # messages does not refer to projects_table_1 above + >>> messages_table_1.c.project_id.references(projects_table_1.c.project_id) + False + + >>> it refers to this one + >>> projects_table_2 = metadata_obj.tables["project.projects"] + >>> messages_table_1.c.project_id.references(projects_table_2.c.project_id) + True + + >>> they're different, as one non-schema qualified and the other one is + >>> projects_table_1 is projects_table_2 + False + +The above confusion can cause problems within applications that use table +reflection to load up application-level :class:`_schema.Table` objects, as +well as within migration scenarios, in particular such as when using Alembic +Migrations to detect new tables and foreign key constraints. + +The above behavior can be remedied by sticking to one simple practice: + +* Don't include the :paramref:`_schema.Table.schema` parameter for any + :class:`_schema.Table` that expects to be located in the **default** schema + of the database. + +For PostgreSQL and other databases that support a "search" path for schemas, +add the following additional practice: + +* Keep the "search path" narrowed down to **one schema only, which is the + default schema**. + + +.. seealso:: + + :ref:`postgresql_schema_reflection` - additional details of this behavior + as regards the PostgreSQL database. + + .. _metadata_reflection_inspector: Fine Grained Reflection with Inspector diff --git a/doc/build/core/type_basics.rst b/doc/build/core/type_basics.rst index b938cc5eee..3ec50cc003 100644 --- a/doc/build/core/type_basics.rst +++ b/doc/build/core/type_basics.rst @@ -232,7 +232,7 @@ such as `collation` and `charset`:: from sqlalchemy.dialects.mysql import VARCHAR, TEXT - table = Table('foo', meta, + table = Table('foo', metadata_obj, Column('col1', VARCHAR(200, collation='binary')), Column('col2', TEXT(charset='latin1')) ) diff --git a/lib/sqlalchemy/dialects/postgresql/base.py b/lib/sqlalchemy/dialects/postgresql/base.py index a00c26e87d..8b0b87d559 100644 --- a/lib/sqlalchemy/dialects/postgresql/base.py +++ b/lib/sqlalchemy/dialects/postgresql/base.py @@ -273,20 +273,22 @@ be reverted when the DBAPI connection has a rollback. Remote-Schema Table Introspection and PostgreSQL search_path ------------------------------------------------------------ -**TL;DR;**: keep the ``search_path`` variable set to its default of ``public``, -name schemas **other** than ``public`` explicitly within ``Table`` definitions. - -The PostgreSQL dialect can reflect tables from any schema. The -:paramref:`_schema.Table.schema` argument, or alternatively the -:paramref:`.MetaData.reflect.schema` argument determines which schema will -be searched for the table or tables. The reflected :class:`_schema.Table` -objects -will in all cases retain this ``.schema`` attribute as was specified. -However, with regards to tables which these :class:`_schema.Table` -objects refer to -via foreign key constraint, a decision must be made as to how the ``.schema`` -is represented in those remote tables, in the case where that remote -schema name is also a member of the current +.. admonition:: Section Best Practices Summarized + + keep the ``search_path`` variable set to its default of ``public``, without + any other schema names. For other schema names, name these explicitly + within :class:`_schema.Table` definitions. Alternatively, the + ``postgresql_ignore_search_path`` option will cause all reflected + :class:`_schema.Table` objects to have a :attr:`_schema.Table.schema` + attribute set up. + +The PostgreSQL dialect can reflect tables from any schema, as outlined in +:ref:`schema_table_reflection`. + +With regards to tables which these :class:`_schema.Table` +objects refer to via foreign key constraint, a decision must be made as to how +the ``.schema`` is represented in those remote tables, in the case where that +remote schema name is also a member of the current `PostgreSQL search path `_. @@ -349,8 +351,8 @@ reflection process as follows:: >>> engine = create_engine("postgresql+psycopg2://scott:tiger@localhost/test") >>> with engine.connect() as conn: ... conn.execute(text("SET search_path TO test_schema, public")) - ... meta = MetaData() - ... referring = Table('referring', meta, + ... metadata_obj = MetaData() + ... referring = Table('referring', metadata_obj, ... autoload_with=conn) ... @@ -359,7 +361,7 @@ The above process would deliver to the :attr:`_schema.MetaData.tables` collection ``referred`` table named **without** the schema:: - >>> meta.tables['referred'].schema is None + >>> metadata_obj.tables['referred'].schema is None True To alter the behavior of reflection such that the referred schema is @@ -370,8 +372,8 @@ dialect-specific argument to both :class:`_schema.Table` as well as >>> with engine.connect() as conn: ... conn.execute(text("SET search_path TO test_schema, public")) - ... meta = MetaData() - ... referring = Table('referring', meta, + ... metadata_obj = MetaData() + ... referring = Table('referring', metadata_obj, ... autoload_with=conn, ... postgresql_ignore_search_path=True) ... @@ -379,7 +381,7 @@ dialect-specific argument to both :class:`_schema.Table` as well as We will now have ``test_schema.referred`` stored as schema-qualified:: - >>> meta.tables['test_schema.referred'].schema + >>> metadata_obj.tables['test_schema.referred'].schema 'test_schema' .. sidebar:: Best Practices for PostgreSQL Schema reflection @@ -401,13 +403,11 @@ installation, this is the name ``public``. So a table that refers to another which is in the ``public`` (i.e. default) schema will always have the ``.schema`` attribute set to ``None``. -.. versionadded:: 0.9.2 Added the ``postgresql_ignore_search_path`` - dialect-level option accepted by :class:`_schema.Table` and - :meth:`_schema.MetaData.reflect`. - - .. seealso:: + :ref:`reflection_schema_qualified_interaction` - discussion of the issue + from a backend-agnostic perspective + `The Schema Search Path `_ - on the PostgreSQL website. diff --git a/lib/sqlalchemy/sql/schema.py b/lib/sqlalchemy/sql/schema.py index 641e62be3e..c8f26f9065 100644 --- a/lib/sqlalchemy/sql/schema.py +++ b/lib/sqlalchemy/sql/schema.py @@ -4877,7 +4877,7 @@ class Computed(FetchedValue, SchemaItem): from sqlalchemy import Computed - Table('square', meta, + Table('square', metadata_obj, Column('side', Float, nullable=False), Column('area', Float, Computed('side * side')) ) @@ -4974,7 +4974,7 @@ class Identity(IdentityOptions, FetchedValue, SchemaItem): from sqlalchemy import Identity - Table('foo', meta, + Table('foo', metadata_obj, Column('id', Integer, Identity()) Column('description', Text), ) diff --git a/lib/sqlalchemy/sql/type_api.py b/lib/sqlalchemy/sql/type_api.py index f588512687..34f23fb0cf 100644 --- a/lib/sqlalchemy/sql/type_api.py +++ b/lib/sqlalchemy/sql/type_api.py @@ -838,7 +838,7 @@ class UserDefinedType(util.with_metaclass(VisitableCheckKWArg, TypeEngine)): Once the type is made, it's immediately usable:: - table = Table('foo', meta, + table = Table('foo', metadata_obj, Column('id', Integer, primary_key=True), Column('data', MyType(16)) )