Elliott Shugerman created AIRFLOW-3973:
------------------------------------------

             Summary: `airflow initdb` logs errors when `Variable` is used in 
DAGs and Postgres is used for the internal database
                 Key: AIRFLOW-3973
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3973
             Project: Apache Airflow
          Issue Type: Bug
            Reporter: Elliott Shugerman
            Assignee: Elliott Shugerman


h2. Example

{{ERROR [airflow.models.DagBag] Failed to import: 
/home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): File 
"/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
 line 1236, in _execute_context cursor, statement, parameters, context File 
"/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py",
 line 536, in do_execute cursor.execute(statement, parameters) 
psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM 
variable}}
h2. Explanation

The first thing {{airflow initdb}} does is run the Alembic migrations. All 
migrations are run in one transaction. Most tables, including the {{variable}} 
table, are defined in the initial migration. A [later 
migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py]
 imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} 
calls its {{collect_dags}} method, which scans the DAGs directory and attempts 
to load all DAGs it finds. When it loads a DAG that uses a {{Variable}}, it 
will query the database to see if that {{Variable}} is defined in the 
{{variable}} table. It's not clear to me how exactly the connection for that 
query is created, but I think it is a fair assumption that it does _not_ use 
the same transaction that is used to run the migrations. Since the migrations 
are not yet complete, and all migrations are run in one transaction, the 
migration that creates the {{variable}} table has not yet been committed, and 
therefore the table does not exist to any other connection/transaction. This 
raises {{ProgrammingError}}, which is caught and logged by {{collect_dags}}.

NOTE: This does not occur with the default SQLite database.
h2. Proposed Solution

Run each Alembic migration in its own transaction. I will be opening a pull 
request which accomplishes this shortly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to