AetherUnbound opened a new issue, #36602: URL: https://github.com/apache/airflow/issues/36602
### Apache Airflow Provider(s) http ### Versions of Apache Airflow Providers `apache-airflow-providers-http>=4.7.0` ### Apache Airflow version Any version that supports the above provider version ### Operating System Linux/Docker ### Deployment Docker-Compose ### Deployment details _No response_ ### What happened This PR appears to have altered the way that the hook within the `HttpOperator` is initialized: https://github.com/apache/airflow/pull/34669 Previously, an `HttpHook` was initialized explicitly. Since both the connection ID _and_ the hook type were already defined, connection variables defined via environment variables which used `https` as the prefix were parsed correctly. After the above change, `BaseHook.get_connection` is used. This refers to the `ProviderManager`'s list of acceptable connection types, which only includes `http` (assuming that's because it's based on the provider name or something). Note the absence of `https` below: ```python In [1]: from airflow.providers_manager import ProvidersManager In [2]: ProvidersManager().hooks Out[2]: <airflow.providers_manager.LazyDictWithCache at 0x7f40120a4190> In [3]: dict(ProvidersManager().hooks) Out[3]: {'generic': HookInfo(hook_class_name=None, connection_id_attribute_name=None, package_name=None, hook_name='Generic', connection_type=None, connection_testable=False), 'email': HookInfo(hook_class_name=None, connection_id_attribute_name=None, package_name=None, hook_name='Email', connection_type=None, connection_testable=False), 'fs': HookInfo(hook_class_name='airflow.hooks.filesystem.FSHook', connection_id_attribute_name='fs_conn_id', package_name='airflow.hooks.filesystem', hook_name='File (path)', connection_type='fs', connection_testable=True), 'package_index': HookInfo(hook_class_name='airflow.hooks.package_index.PackageIndexHook', connection_id_attribute_name='pi_conn_id', package_name='airflow.hooks.package_index', hook_name='Package Index (Python)', connection_type='package_index', connection_testable=True), 'aws': HookInfo(hook_class_name='airflow.providers.amazon.aws.hooks.base_aws.AwsGenericHook', connection_id_attribute_name='aws_conn_id', package_name='apache-airflow-providers-amazon', hook_name='Amazon Web Services', connection_type='aws', connection_testable=True), 'chime': HookInfo(hook_class_name='airflow.providers.amazon.aws.hooks.chime.ChimeWebhookHook', connection_id_attribute_name='chime_conn_id', package_name='apache-airflow-providers-amazon', hook_name='Amazon Chime Webhook', connection_type='chime', connection_testable=True), 'emr': HookInfo(hook_class_name='airflow.providers.amazon.aws.hooks.emr.EmrHook', connection_id_attribute_name='emr_conn_id', package_name='apache-airflow-providers-amazon', hook_name='Amazon Elastic MapReduce', connection_type='emr', connection_testable=True), 'redshift': HookInfo(hook_class_name='airflow.providers.amazon.aws.hooks.redshift_sql.RedshiftSQLHook', connection_id_attribute_name='redshift_conn_id', package_name='apache-airflow-providers-amazon', hook_name='Amazon Redshift', connection_type='redshift', connection_testable=True), 'elasticsearch': HookInfo(hook_class_name='airflow.providers.elasticsearch.hooks.elasticsearch.ElasticsearchHook', connection_id_attribute_name='elasticsearch_conn_id', package_name='apache-airflow-providers-elasticsearch', hook_name='Elasticsearch', connection_type='elasticsearch', connection_testable=True), 'ftp': HookInfo(hook_class_name='airflow.providers.ftp.hooks.ftp.FTPHook', connection_id_attribute_name='ftp_conn_id', package_name='apache-airflow-providers-ftp', hook_name='FTP', connection_type='ftp', connection_testable=True), 'http': HookInfo(hook_class_name='airflow.providers.http.hooks.http.HttpHook', connection_id_attribute_name='http_conn_id', package_name='apache-airflow-providers-http', hook_name='HTTP', connection_type='http', connection_testable=True), 'imap': HookInfo(hook_class_name='airflow.providers.imap.hooks.imap.ImapHook', connection_id_attribute_name='imap_conn_id', package_name='apache-airflow-providers-imap', hook_name='IMAP', connection_type='imap', connection_testable=False), 'postgres': HookInfo(hook_class_name='airflow.providers.postgres.hooks.postgres.PostgresHook', connection_id_attribute_name='postgres_conn_id', package_name='apache-airflow-providers-postgres', hook_name='Postgres', connection_type='postgres', connection_testable=True), 'sqlite': HookInfo(hook_class_name='airflow.providers.sqlite.hooks.sqlite.SqliteHook', connection_id_attribute_name='sqlite_conn_id', package_name='apache-airflow-providers-sqlite', hook_name='Sqlite', connection_type='sqlite', connection_testable=True)} ``` This causes connections of the form `AIRFLOW_CONN_...=https://...` to no longer be allowed for use by the `HttpOperator`. ### What you think should happen instead The [documentation for v4.8.0](https://airflow.apache.org/docs/apache-airflow-providers-http/4.8.0/connections/http.html) specifies: > **Schema (optional**: > Specify the service type etc: http/https. Thus it seems reasonable to expect connection URIs which start with `https` to be allowed, especially given this is a breaking change with previous behavior that was not immediately obvious when upgrading from provider version <=4.6.0. ### How to reproduce The following command simulates an HTTPS hook being instantiated with the `HttpHook` class directly, which was the previous internal behavior of the `HttpOperator`. No errors are raised. ```bash $ docker run -e AIRFLOW_CONN_SAMPLE_HOOK='https://google.com' --rm docker.io/apache/airflow:slim-2.8.0-python3.10 python -c 'from airflow.providers.http.hooks.http import HttpHook; hook = HttpHook(http_conn_id="sample_hook")' ``` This command shows the newer behavior, which uses `BaseHook` to determine the connection type and fails to do so. ```bash $ docker run -e AIRFLOW_CONN_SAMPLE_HOOK='https://google.com' --rm docker.io/apache/airflow:slim-2.8.0-python3.10 python -c 'from airflow.hooks.base import BaseHook; conn = BaseHook.get_connection("sample_hook"); hook = conn.get_hook()' ``` ### Anything else We were able to apply a patch in our own code as a workaround in https://github.com/WordPress/openverse/pull/3624 ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
