From: jonathan vanasco <jonathan@2xlp.com>
Date: Mon, 27 Sep 2021 16:41:24 +0000 (-0400)
Subject: Add new sections regarding schemas and reflection
X-Git-Tag: rel_2_0_0b1~648^2
X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=0fa0beacb465c61e792c97d530a0e8fdd7139256;p=thirdparty%2Fsqlalchemy%2Fsqlalchemy.git

Add new sections regarding schemas and reflection

* add a new section to reflection.rst `Schemas and Reflection`.
  * this contains some text from the ticket
* migrate some text from `Specifying the Schema Name` to new section
* migrate some text from PostgreSQL dialect to new section
  * target text is made more generic
  * cross-reference the postgres and new sections to one another, to avoid duplication of docs
* update some docs 'meta' to 'metadata_obj'

Fixes: #4387
Co-authored-by: Mike Bayer <mike_mp@zzzcomputing.com>

Change-Id: I2b08672753fb2575d30ada07ead77587468fdade
---

diff --git a/doc/build/changelog/migration_12.rst b/doc/build/changelog/migration_12.rst
index f0b88c4936..bc1d0739e9 100644
--- a/doc/build/changelog/migration_12.rst
+++ b/doc/build/changelog/migration_12.rst
@@ -1048,7 +1048,7 @@ localized to the current VALUES clause being processed::
     def mydefault(context):
         return context.get_current_parameters()['counter'] + 12
 
-    mytable = Table('mytable', meta,
+    mytable = Table('mytable', metadata_obj,
         Column('counter', Integer),
         Column('counter_plus_twelve',
                Integer, default=mydefault, onupdate=mydefault)
diff --git a/doc/build/core/metadata.rst b/doc/build/core/metadata.rst
index 86a8f6de34..c7316d1b65 100644
--- a/doc/build/core/metadata.rst
+++ b/doc/build/core/metadata.rst
@@ -284,11 +284,11 @@ remote servers (Oracle DBLINK with synonyms).
 
 What all of the above approaches have (mostly) in common is that there's a way
 of referring to this alternate set of tables using a string name.  SQLAlchemy
-refers to this name as the **schema name**.  Within SQLAlchemy, this is nothing more than
-a string name which is associated with a :class:`_schema.Table` object, and
-is then rendered into SQL statements in a manner appropriate to the target
-database such that the table is referred towards in its remote "schema", whatever
-mechanism that is on the target database.
+refers to this name as the **schema name**.  Within SQLAlchemy, this is nothing
+more than a string name which is associated with a :class:`_schema.Table`
+object, and is then rendered into SQL statements in a manner appropriate to the
+target database such that the table is referred towards in its remote "schema",
+whatever mechanism that is on the target database.
 
 The "schema" name may be associated directly with a :class:`_schema.Table`
 using the :paramref:`_schema.Table.schema` argument; when using the ORM
@@ -298,11 +298,27 @@ the parameter is passed using the ``__table_args__`` parameter dictionary.
 The "schema" name may also be associated with the :class:`_schema.MetaData`
 object where it will take effect automatically for all :class:`_schema.Table`
 objects associated with that :class:`_schema.MetaData` that don't otherwise
-specify their own name.   Finally, SQLAlchemy also supports a "dynamic" schema name
+specify their own name.  Finally, SQLAlchemy also supports a "dynamic" schema name
 system that is often used for multi-tenant applications such that a single set
 of :class:`_schema.Table` metadata may refer to a dynamically configured set of
 schema names on a per-connection or per-statement basis.
 
+.. topic::  What's "schema" ?
+
+    SQLAlchemy's support for database "schema" was designed with first party
+    support for PostgreSQL-style schemas.  In this style, there is first a
+    "database" that typically has a single "owner".  Within this database there
+    can be any number of "schemas" which then contain the actual table objects.
+
+    A table within a specific schema is referred towards explicitly using the
+    syntax "<schemaname>.<tablename>".  Constrast this to an architecture such
+    as that of MySQL, where there are only "databases", however SQL statements
+    can refer to multiple databases at once, using the same syntax except it
+    is "<database>.<tablename>".  On Oracle, this syntax refers to yet another
+    concept, the "owner" of a table.  Regardless of which kind of database is
+    in use, SQLAlchemy uses the phrase "schema" to refer to the qualifying
+    identifier within the general syntax of "<qualifier>.<tablename>".
+
 .. seealso::
 
     :ref:`orm_declarative_table_schema_name` - schema name specification when using the ORM
@@ -368,6 +384,8 @@ at once, such as::
     :ref:`multipart_schema_names` - describes use of dotted schema names
     with the SQL Server dialect.
 
+    :ref:`schema_table_reflection`
+
 
 .. _schema_metadata_schema_name:
 
@@ -438,11 +456,11 @@ to specify that it should not be schema qualified may use the special symbol
         schema=BLANK_SCHEMA  # will not use "remote_banks"
     )
 
-
 .. seealso::
 
     :paramref:`_schema.MetaData.schema`
 
+
 .. _schema_dynamic_naming_convention:
 
 Applying Dynamic Schema Naming Conventions
@@ -454,11 +472,11 @@ basis, so that for example in multi-tenant situations, each transaction
 or statement may be targeted at a specific set of schema names that change.
 The section :ref:`schema_translating` describes how this feature is used.
 
-
 .. seealso::
 
     :ref:`schema_translating`
 
+
 .. _schema_set_default_connections:
 
 Setting a Default Schema for New Connections
@@ -506,6 +524,17 @@ for specific information regarding how default schemas are configured.
 
     :ref:`postgresql_alternate_search_path` - in the :ref:`postgresql_toplevel` dialect documentation.
 
+
+
+
+Schemas and Reflection
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The schema feature of SQLAlchemy interacts with the table reflection
+feature introduced at ref:`metadata_reflection_toplevel`.  See the section
+:ref:`metadata_reflection_schemas` for additional details on how this works.
+
+
 Backend-Specific Options
 ------------------------
 
diff --git a/doc/build/core/reflection.rst b/doc/build/core/reflection.rst
index 796bd414f9..ec9073138a 100644
--- a/doc/build/core/reflection.rst
+++ b/doc/build/core/reflection.rst
@@ -13,7 +13,7 @@ existing within the database. This process is called *reflection*. In the
 most simple case you need only specify the table name, a :class:`~sqlalchemy.schema.MetaData`
 object, and the ``autoload_with`` argument::
 
-    >>> messages = Table('messages', meta, autoload_with=engine)
+    >>> messages = Table('messages', metadata_obj, autoload_with=engine)
     >>> [c.name for c in messages.columns]
     ['message_id', 'message_name', 'date']
 
@@ -30,8 +30,8 @@ Below, assume the table ``shopping_cart_items`` references a table named
 ``shopping_carts``. Reflecting the ``shopping_cart_items`` table has the
 effect such that the ``shopping_carts`` table will also be loaded::
 
-    >>> shopping_cart_items = Table('shopping_cart_items', meta, autoload_with=engine)
-    >>> 'shopping_carts' in meta.tables:
+    >>> shopping_cart_items = Table('shopping_cart_items', metadata_obj, autoload_with=engine)
+    >>> 'shopping_carts' in metadata_obj.tables:
     True
 
 The :class:`~sqlalchemy.schema.MetaData` has an interesting "singleton-like"
@@ -43,7 +43,7 @@ you the already-existing :class:`~sqlalchemy.schema.Table` object if one
 already exists with the given name. Such as below, we can access the already
 generated ``shopping_carts`` table just by naming it::
 
-    shopping_carts = Table('shopping_carts', meta)
+    shopping_carts = Table('shopping_carts', metadata_obj)
 
 Of course, it's a good idea to use ``autoload_with=engine`` with the above table
 regardless. This is so that the table's attributes will be loaded if they have
@@ -61,7 +61,7 @@ Individual columns can be overridden with explicit values when reflecting
 tables; this is handy for specifying custom datatypes, constraints such as
 primary keys that may not be configured within the database, etc.::
 
-    >>> mytable = Table('mytable', meta,
+    >>> mytable = Table('mytable', metadata_obj,
     ... Column('id', Integer, primary_key=True),   # override reflected 'id' to have primary key
     ... Column('mydata', Unicode(50)),    # override reflected 'mydata' to be Unicode
     ... # additional Column objects which require no change are reflected normally
@@ -119,6 +119,219 @@ object's dictionary of tables::
     for table in reversed(metadata_obj.sorted_tables):
         someengine.execute(table.delete())
 
+.. _metadata_reflection_schemas:
+
+Reflecting Tables from Other Schemas
+------------------------------------
+
+The section :ref:`schema_table_schema_name` introduces the concept of table
+schemas, which are namespaces within a database that contain tables and other
+objects, and which can be specified explicitly. The "schema" for a
+:class:`_schema.Table` object, as well as for other objects like views, indexes and
+sequences, can be set up using the :paramref:`_schema.Table.schema` parameter,
+and also as the default schema for a :class:`_schema.MetaData` object using the
+:paramref:`_schema.MetaData.schema` parameter.
+
+The use of this schema parameter directly affects where the table reflection
+feature will look when it is asked to reflect objects.  For example, given
+a :class:`_schema.MetaData` object configured with a default schema name
+"project" via its :paramref:`_schema.MetaData.schema` parameter::
+
+    >>> metadata_obj = MetaData(schema="project")
+
+The :method:`.MetaData.reflect` will then utilize that configured ``.schema``
+for reflection::
+
+    >>> # uses `schema` configured in metadata_obj
+    >>> metadata_obj.reflect(someengine)
+
+The end result is that :class:`_schema.Table` objects from the "project"
+schema will be reflected, and they will be populated as schema-qualified
+with that name::
+
+    >>> metadata_obj.tables['project.messages']
+    Table('messages', MetaData(), Column('message_id', INTEGER(), table=<messages>), schema='project')
+
+Similarly, an individual :class:`_schema.Table` object that includes the
+:paramref:`_schema.Table.schema` parameter will also be reflected from that
+database schema, overriding any default schema that may have been configured on the
+owning :class:`_schema.MetaData` collection::
+
+    >>> messages = Table('messages', metadata_obj, schema="project", autoload_with=someengine)
+    >>> messages
+    Table('messages', MetaData(), Column('message_id', INTEGER(), table=<messages>), schema='project')
+
+Finally, the :meth:`_schema.MetaData.reflect` method itself also allows a
+:paramref:`_schema.MetaData.reflect.schema` parameter to be passed, so we
+could also load tables from the "project" schema for a default configured
+:class:`_schema.MetaData` object::
+
+    >>> metadata_obj = MetaData()
+    >>> metadata_obj.reflect(someengine, schema="project")
+
+We can call :meth:`_schema.MetaData.reflect` any number of times with different
+:paramref:`_schema.MetaData.schema` arguments (or none at all) to continue
+populating the :class:`_schema.MetaData` object with more objects::
+
+    >>> # add tables from the "customer" schema
+    >>> metadata_obj.reflect(someengine, schema="customer")
+    >>> # add tables from the default schema
+    >>> metadata_obj.reflect(someengine)
+
+.. _reflection_schema_qualified_interaction:
+
+Interaction of Schema-qualified Reflection with the Default Schema
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. admonition:: Section Best Practices Summarized
+
+   In this section, we discuss SQLAlchemy's reflection behavior regarding
+   tables that are visible in the "default schema" of a database session,
+   and how these interact with SQLAlchemy directives that include the schema
+   explicitly.  As a best practice, ensure the "default" schema for a database
+   is just a single name, and not a list of names; for tables that are
+   part of this "default" schema and can be named without schema qualification
+   in DDL and SQL, leave corresponding :paramref:`_schema.Table.schema` and
+   similar schema parameters set to their default of ``None``.
+
+As described at :ref:`schema_metadata_schema_name`, databases that have
+the concept of schemas usually also include the concept of a "default" schema.
+The reason for this is naturally that when one refers to table objects without
+a schema as is common, a schema-capable database will still consider that
+table to be in a "schema" somewhere.   Some databases such as PostgreSQL
+take this concept further into the notion of a
+`schema search path
+<https://www.postgresql.org/docs/current/static/ddl-schemas.html#DDL-SCHEMAS-PATH>`_
+where *multiple* schema names can be considered in a particular database
+session to be "implicit"; referring to a table name that it's any of those
+schemas will not require that the schema name be present (while at the same time
+it's also perfectly fine if the schema name *is* present).
+
+Since most relational databases therefore have the concept of a particular
+table object which can be referred towards both in a schema-qualified way, as
+well as an "implicit" way where no schema is present, this presents a
+complexity for SQLAlchemy's reflection
+feature.  Reflecting a table in
+a schema-qualified manner will always populate its :attr:`_schema.Table.schema`
+attribute and additionally affect how this :class:`_schema.Table` is organized
+into the :attr:`_schema.MetaData.tables` collection, that is, in a schema
+qualified manner.  Conversely, reflecting the **same** table in a non-schema
+qualified manner will organize it into the :attr:`_schema.MetaData.tables`
+collection **without** being schema qualified.  The end result is that there
+would be two separate :class:`_schema.Table` objects in the single
+:class:`_schema.MetaData` collection representing the same table in the
+actual database.
+
+To illustrate the ramifications of this issue, consider tables from the
+"project" schema in the previous example, and suppose also that the "project"
+schema is the default schema of our database connection, or if using a database
+such as PostgreSQL suppose the "project" schema is set up in the PostgreSQL
+``search_path``.  This would mean that the database accepts the following
+two SQL statements as equivalent::
+
+    -- schema qualified
+    SELECT message_id FROM project.messages
+
+    -- non-schema qualified
+    SELECT message_id FROM messages
+
+This is not a problem as the table can be found in both ways.  However
+in SQLAlchemy, it's the **identity** of the :class:`_schema.Table` object
+that determines its semantic role within a SQL statement.  Based on the current
+decisions within SQLAlchemy, this means that if we reflect the same "messages" table in
+both a schema-qualified as well as a non-schema qualified manner, we get
+**two** :class:`_schema.Table` objects that will **not** be treated as
+semantically equivalent::
+
+    >>> # reflect in non-schema qualified fashion
+    >>> messages_table_1 = Table("messages", metadata_obj, autoload_with=someengine)
+    >>> # reflect in schema qualified fashion
+    >>> messages_table_2 = Table("messages", metadata_obj, schema="project", autoload_with=someengine)
+    >>> # two different objects
+    >>> messages_table_1 is messages_table_2
+    False
+    >>> # stored in two different ways
+    >>> metadata.tables["messages"] is messages_table_1
+    True
+    >>> metadata.tables["project.messages"] is messages_table_2
+    True
+
+The above issue becomes more complicated when the tables being reflected contain
+foreign key references to other tables.  Suppose "messages" has a "project_id"
+column which refers to rows in another schema-local table "projects", meaning
+there is a :class:`_schema.ForeignKeyConstraint` object that is part of the
+definition of the "messages" table.
+
+We can find ourselves in a situation where one :class:`_schema.MetaData`
+collection may contain as many as four :class:`_schema.Table` objects
+representing these two database tables, where one or two of the additional
+tables were generated by the reflection process; this is because when
+the reflection process encounters a foreign key constraint on a table
+being reflected, it branches out to reflect that referenced table as well.
+The decision making it uses to assign the schema to this referenced
+table is that SQLAlchemy will **omit a default schema** from the reflected
+:class:`_schema.ForeignKeyConstraint` object if the owning
+:class:`_schema.Table` also omits its schema name and also that these two objects
+are in the same schema, but will **include** it if
+it were not omitted.
+
+The common scenario is when the reflection of a table in a schema qualified
+fashion then loads a related table that will also be performed in a schema
+qualified fashion::
+
+    >>> # reflect "messages" in a schema qualified fashion
+    >>> messages_table_1 = Table("messages", metadata_obj, schema="project", autoload_with=someengine)
+
+The above ``messages_table_1`` will refer to ``projects`` also in a schema
+qualified fashion.  This "projects" table will be reflected automatically by
+the fact that "messages" refers to it::
+
+    >>> messages_table_1.c.project_id
+    Column('project_id', INTEGER(), ForeignKey('project.projects.project_id'), table=<messages>)
+
+if some other part of the code reflects "projects" in a non-schema qualified
+fashion, there are now two projects tables that are not the same:
+
+    >>> # reflect "projects" in a non-schema qualified fashion
+    >>> projects_table_1 = Table("projects", metadata_obj, autoload_with=someengine)
+
+    >>> # messages does not refer to projects_table_1 above
+    >>> messages_table_1.c.project_id.references(projects_table_1.c.project_id)
+    False
+
+    >>> it refers to this one
+    >>> projects_table_2 = metadata_obj.tables["project.projects"]
+    >>> messages_table_1.c.project_id.references(projects_table_2.c.project_id)
+    True
+
+    >>> they're different, as one non-schema qualified and the other one is
+    >>> projects_table_1 is projects_table_2
+    False
+
+The above confusion can cause problems within applications that use table
+reflection to load up application-level :class:`_schema.Table` objects, as
+well as within migration scenarios, in particular such as when using Alembic
+Migrations to detect new tables and foreign key constraints.
+
+The above behavior can be remedied by sticking to one simple practice:
+
+* Don't include the :paramref:`_schema.Table.schema` parameter for any
+  :class:`_schema.Table` that expects to be located in the **default** schema
+  of the database.
+
+For PostgreSQL and other databases that support a "search" path for schemas,
+add the following additional practice:
+
+* Keep the "search path" narrowed down to **one schema only, which is the
+  default schema**.
+
+
+.. seealso::
+
+    :ref:`postgresql_schema_reflection` - additional details of this behavior
+    as regards the PostgreSQL database.
+
+
 .. _metadata_reflection_inspector:
 
 Fine Grained Reflection with Inspector
diff --git a/doc/build/core/type_basics.rst b/doc/build/core/type_basics.rst
index b938cc5eee..3ec50cc003 100644
--- a/doc/build/core/type_basics.rst
+++ b/doc/build/core/type_basics.rst
@@ -232,7 +232,7 @@ such as `collation` and `charset`::
 
     from sqlalchemy.dialects.mysql import VARCHAR, TEXT
 
-    table = Table('foo', meta,
+    table = Table('foo', metadata_obj,
         Column('col1', VARCHAR(200, collation='binary')),
         Column('col2', TEXT(charset='latin1'))
     )
diff --git a/lib/sqlalchemy/dialects/postgresql/base.py b/lib/sqlalchemy/dialects/postgresql/base.py
index a00c26e87d..8b0b87d559 100644
--- a/lib/sqlalchemy/dialects/postgresql/base.py
+++ b/lib/sqlalchemy/dialects/postgresql/base.py
@@ -273,20 +273,22 @@ be reverted when the DBAPI connection has a rollback.
 Remote-Schema Table Introspection and PostgreSQL search_path
 ------------------------------------------------------------
 
-**TL;DR;**: keep the ``search_path`` variable set to its default of ``public``,
-name schemas **other** than ``public`` explicitly within ``Table`` definitions.
-
-The PostgreSQL dialect can reflect tables from any schema.  The
-:paramref:`_schema.Table.schema` argument, or alternatively the
-:paramref:`.MetaData.reflect.schema` argument determines which schema will
-be searched for the table or tables.   The reflected :class:`_schema.Table`
-objects
-will in all cases retain this ``.schema`` attribute as was specified.
-However, with regards to tables which these :class:`_schema.Table`
-objects refer to
-via foreign key constraint, a decision must be made as to how the ``.schema``
-is represented in those remote tables, in the case where that remote
-schema name is also a member of the current
+.. admonition:: Section Best Practices Summarized
+
+    keep the ``search_path`` variable set to its default of ``public``, without
+    any other schema names. For other schema names, name these explicitly
+    within :class:`_schema.Table` definitions. Alternatively, the
+    ``postgresql_ignore_search_path`` option will cause all reflected
+    :class:`_schema.Table` objects to have a :attr:`_schema.Table.schema`
+    attribute set up.
+
+The PostgreSQL dialect can reflect tables from any schema, as outlined in
+:ref:`schema_table_reflection`.
+
+With regards to tables which these :class:`_schema.Table`
+objects refer to via foreign key constraint, a decision must be made as to how
+the ``.schema`` is represented in those remote tables, in the case where that
+remote schema name is also a member of the current
 `PostgreSQL search path
 <https://www.postgresql.org/docs/current/static/ddl-schemas.html#DDL-SCHEMAS-PATH>`_.
 
@@ -349,8 +351,8 @@ reflection process as follows::
     >>> engine = create_engine("postgresql+psycopg2://scott:tiger@localhost/test")
     >>> with engine.connect() as conn:
     ...     conn.execute(text("SET search_path TO test_schema, public"))
-    ...     meta = MetaData()
-    ...     referring = Table('referring', meta,
+    ...     metadata_obj = MetaData()
+    ...     referring = Table('referring', metadata_obj,
     ...                       autoload_with=conn)
     ...
     <sqlalchemy.engine.result.CursorResult object at 0x101612ed0>
@@ -359,7 +361,7 @@ The above process would deliver to the :attr:`_schema.MetaData.tables`
 collection
 ``referred`` table named **without** the schema::
 
-    >>> meta.tables['referred'].schema is None
+    >>> metadata_obj.tables['referred'].schema is None
     True
 
 To alter the behavior of reflection such that the referred schema is
@@ -370,8 +372,8 @@ dialect-specific argument to both :class:`_schema.Table` as well as
 
     >>> with engine.connect() as conn:
     ...     conn.execute(text("SET search_path TO test_schema, public"))
-    ...     meta = MetaData()
-    ...     referring = Table('referring', meta,
+    ...     metadata_obj = MetaData()
+    ...     referring = Table('referring', metadata_obj,
     ...                       autoload_with=conn,
     ...                       postgresql_ignore_search_path=True)
     ...
@@ -379,7 +381,7 @@ dialect-specific argument to both :class:`_schema.Table` as well as
 
 We will now have ``test_schema.referred`` stored as schema-qualified::
 
-    >>> meta.tables['test_schema.referred'].schema
+    >>> metadata_obj.tables['test_schema.referred'].schema
     'test_schema'
 
 .. sidebar:: Best Practices for PostgreSQL Schema reflection
@@ -401,13 +403,11 @@ installation, this is the name ``public``.  So a table that refers to another
 which is in the ``public`` (i.e. default) schema will always have the
 ``.schema`` attribute set to ``None``.
 
-.. versionadded:: 0.9.2 Added the ``postgresql_ignore_search_path``
-   dialect-level option accepted by :class:`_schema.Table` and
-   :meth:`_schema.MetaData.reflect`.
-
-
 .. seealso::
 
+    :ref:`reflection_schema_qualified_interaction` - discussion of the issue
+    from a backend-agnostic perspective
+
     `The Schema Search Path
     <https://www.postgresql.org/docs/9.0/static/ddl-schemas.html#DDL-SCHEMAS-PATH>`_
     - on the PostgreSQL website.
diff --git a/lib/sqlalchemy/sql/schema.py b/lib/sqlalchemy/sql/schema.py
index 641e62be3e..c8f26f9065 100644
--- a/lib/sqlalchemy/sql/schema.py
+++ b/lib/sqlalchemy/sql/schema.py
@@ -4877,7 +4877,7 @@ class Computed(FetchedValue, SchemaItem):
 
         from sqlalchemy import Computed
 
-        Table('square', meta,
+        Table('square', metadata_obj,
             Column('side', Float, nullable=False),
             Column('area', Float, Computed('side * side'))
         )
@@ -4974,7 +4974,7 @@ class Identity(IdentityOptions, FetchedValue, SchemaItem):
 
         from sqlalchemy import Identity
 
-        Table('foo', meta,
+        Table('foo', metadata_obj,
             Column('id', Integer, Identity())
             Column('description', Text),
         )
diff --git a/lib/sqlalchemy/sql/type_api.py b/lib/sqlalchemy/sql/type_api.py
index f588512687..34f23fb0cf 100644
--- a/lib/sqlalchemy/sql/type_api.py
+++ b/lib/sqlalchemy/sql/type_api.py
@@ -838,7 +838,7 @@ class UserDefinedType(util.with_metaclass(VisitableCheckKWArg, TypeEngine)):
 
     Once the type is made, it's immediately usable::
 
-      table = Table('foo', meta,
+      table = Table('foo', metadata_obj,
           Column('id', Integer, primary_key=True),
           Column('data', MyType(16))
           )