[
https://issues.apache.org/jira/browse/AIRFLOW-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Elliott Shugerman updated AIRFLOW-3973:
---------------------------------------
Description:
h2. Notes:
* This does not occur if the database is already initialized. If it is, run
`resetdb` instead to observe the bug.
* This does not occur with the default SQLite database.
h2. Example
{{ERROR [airflow.models.DagBag] Failed to import:
/home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): File
"/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
line 1236, in _execute_context cursor, statement, parameters, context File
"/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py",
line 536, in do_execute cursor.execute(statement, parameters)
psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM
variable}}
h2. Explanation
The first thing {{airflow initdb}} does is run the Alembic migrations. All
migrations are run in one transaction. Most tables, including the {{variable}}
table, are defined in the initial migration. A [later
migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py]
imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}}
calls its {{collect_dags}} method, which scans the DAGs directory and attempts
to load all DAGs it finds. When it loads a DAG that uses a {{Variable}}, it
will query the database to see if that {{Variable}} is defined in the
{{variable}} table. It's not clear to me how exactly the connection for that
query is created, but I think it is apparent that it does _not_ use the same
transaction that is used to run the migrations. Since the migrations are not
yet complete, and all migrations are run in one transaction, the migration that
creates the {{variable}} table has not yet been committed, and therefore the
table does not exist to any other connection/transaction. This raises
{{ProgrammingError}}, which is caught and logged by {{collect_dags}}.
h2. Proposed Solution
Run each Alembic migration in its own transaction. I will open a pull request
which accomplishes this shortly.
was:
h2. Notes:
* This does not occur if the database is already initialized. If it is, run
`resetdb` instead to observe the bug.
* This does not occur with the default SQLite database.
h2. Example
{{ERROR [airflow.models.DagBag] Failed to import:
/home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): File
"/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
line 1236, in _execute_context cursor, statement, parameters, context File
"/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py",
line 536, in do_execute cursor.execute(statement, parameters)
psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM
variable}}
h2. Explanation
The first thing {{airflow initdb}} does is run the Alembic migrations. All
migrations are run in one transaction. Most tables, including the {{variable}}
table, are defined in the initial migration. A [later
migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py]
imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}}
calls its {{collect_dags}} method, which scans the DAGs directory and attempts
to load all DAGs it finds. When it loads a DAG that uses a {{Variable}}, it
will query the database to see if that {{Variable}} is defined in the
{{variable}} table. It's not clear to me how exactly the connection for that
query is created, but I think it is a fair assumption that it does _not_ use
the same transaction that is used to run the migrations. Since the migrations
are not yet complete, and all migrations are run in one transaction, the
migration that creates the {{variable}} table has not yet been committed, and
therefore the table does not exist to any other connection/transaction. This
raises {{ProgrammingError}}, which is caught and logged by {{collect_dags}}.
h2. Proposed Solution
Run each Alembic migration in its own transaction. I will open a pull request
which accomplishes this shortly.
> `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is
> used for the internal database
> -----------------------------------------------------------------------------------------------------------
>
> Key: AIRFLOW-3973
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3973
> Project: Apache Airflow
> Issue Type: Bug
> Reporter: Elliott Shugerman
> Assignee: Elliott Shugerman
> Priority: Minor
>
> h2. Notes:
> * This does not occur if the database is already initialized. If it is, run
> `resetdb` instead to observe the bug.
> * This does not occur with the default SQLite database.
> h2. Example
> {{ERROR [airflow.models.DagBag] Failed to import:
> /home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last):
> File
> "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
> line 1236, in _execute_context cursor, statement, parameters, context File
> "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py",
> line 536, in do_execute cursor.execute(statement, parameters)
> psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM
> variable}}
> h2. Explanation
> The first thing {{airflow initdb}} does is run the Alembic migrations. All
> migrations are run in one transaction. Most tables, including the
> {{variable}} table, are defined in the initial migration. A [later
> migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py]
> imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}}
> calls its {{collect_dags}} method, which scans the DAGs directory and
> attempts to load all DAGs it finds. When it loads a DAG that uses a
> {{Variable}}, it will query the database to see if that {{Variable}} is
> defined in the {{variable}} table. It's not clear to me how exactly the
> connection for that query is created, but I think it is apparent that it does
> _not_ use the same transaction that is used to run the migrations. Since the
> migrations are not yet complete, and all migrations are run in one
> transaction, the migration that creates the {{variable}} table has not yet
> been committed, and therefore the table does not exist to any other
> connection/transaction. This raises {{ProgrammingError}}, which is caught and
> logged by {{collect_dags}}.
>
> h2. Proposed Solution
> Run each Alembic migration in its own transaction. I will open a pull request
> which accomplishes this shortly.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)