AetherUnbound opened a new issue, #36602:
URL: https://github.com/apache/airflow/issues/36602

   ### Apache Airflow Provider(s)
   
   http
   
   ### Versions of Apache Airflow Providers
   
   `apache-airflow-providers-http>=4.7.0`
   
   ### Apache Airflow version
   
   Any version that supports the above provider version
   
   ### Operating System
   
   Linux/Docker
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   This PR appears to have altered the way that the hook within the 
`HttpOperator` is initialized: https://github.com/apache/airflow/pull/34669
   
   Previously, an `HttpHook` was initialized explicitly. Since both the 
connection ID _and_ the hook type were already defined, connection variables 
defined via environment variables which used `https` as the prefix were parsed 
correctly. After the above change, `BaseHook.get_connection` is used. This 
refers to the `ProviderManager`'s list of acceptable connection types, which 
only includes `http` (assuming that's because it's based on the provider name 
or something). Note the absence of `https` below:
   
   ```python
   In [1]: from airflow.providers_manager import ProvidersManager
   
   In [2]: ProvidersManager().hooks
   Out[2]: <airflow.providers_manager.LazyDictWithCache at 0x7f40120a4190>
   
   In [3]: dict(ProvidersManager().hooks)
   Out[3]: 
   {'generic': HookInfo(hook_class_name=None, 
connection_id_attribute_name=None, package_name=None, hook_name='Generic', 
connection_type=None, connection_testable=False),
    'email': HookInfo(hook_class_name=None, connection_id_attribute_name=None, 
package_name=None, hook_name='Email', connection_type=None, 
connection_testable=False),
    'fs': HookInfo(hook_class_name='airflow.hooks.filesystem.FSHook', 
connection_id_attribute_name='fs_conn_id', 
package_name='airflow.hooks.filesystem', hook_name='File (path)', 
connection_type='fs', connection_testable=True),
    'package_index': 
HookInfo(hook_class_name='airflow.hooks.package_index.PackageIndexHook', 
connection_id_attribute_name='pi_conn_id', 
package_name='airflow.hooks.package_index', hook_name='Package Index (Python)', 
connection_type='package_index', connection_testable=True),
    'aws': 
HookInfo(hook_class_name='airflow.providers.amazon.aws.hooks.base_aws.AwsGenericHook',
 connection_id_attribute_name='aws_conn_id', 
package_name='apache-airflow-providers-amazon', hook_name='Amazon Web 
Services', connection_type='aws', connection_testable=True),
    'chime': 
HookInfo(hook_class_name='airflow.providers.amazon.aws.hooks.chime.ChimeWebhookHook',
 connection_id_attribute_name='chime_conn_id', 
package_name='apache-airflow-providers-amazon', hook_name='Amazon Chime 
Webhook', connection_type='chime', connection_testable=True),
    'emr': 
HookInfo(hook_class_name='airflow.providers.amazon.aws.hooks.emr.EmrHook', 
connection_id_attribute_name='emr_conn_id', 
package_name='apache-airflow-providers-amazon', hook_name='Amazon Elastic 
MapReduce', connection_type='emr', connection_testable=True),
    'redshift': 
HookInfo(hook_class_name='airflow.providers.amazon.aws.hooks.redshift_sql.RedshiftSQLHook',
 connection_id_attribute_name='redshift_conn_id', 
package_name='apache-airflow-providers-amazon', hook_name='Amazon Redshift', 
connection_type='redshift', connection_testable=True),
    'elasticsearch': 
HookInfo(hook_class_name='airflow.providers.elasticsearch.hooks.elasticsearch.ElasticsearchHook',
 connection_id_attribute_name='elasticsearch_conn_id', 
package_name='apache-airflow-providers-elasticsearch', 
hook_name='Elasticsearch', connection_type='elasticsearch', 
connection_testable=True),
    'ftp': HookInfo(hook_class_name='airflow.providers.ftp.hooks.ftp.FTPHook', 
connection_id_attribute_name='ftp_conn_id', 
package_name='apache-airflow-providers-ftp', hook_name='FTP', 
connection_type='ftp', connection_testable=True),
    'http': 
HookInfo(hook_class_name='airflow.providers.http.hooks.http.HttpHook', 
connection_id_attribute_name='http_conn_id', 
package_name='apache-airflow-providers-http', hook_name='HTTP', 
connection_type='http', connection_testable=True),
    'imap': 
HookInfo(hook_class_name='airflow.providers.imap.hooks.imap.ImapHook', 
connection_id_attribute_name='imap_conn_id', 
package_name='apache-airflow-providers-imap', hook_name='IMAP', 
connection_type='imap', connection_testable=False),
    'postgres': 
HookInfo(hook_class_name='airflow.providers.postgres.hooks.postgres.PostgresHook',
 connection_id_attribute_name='postgres_conn_id', 
package_name='apache-airflow-providers-postgres', hook_name='Postgres', 
connection_type='postgres', connection_testable=True),
    'sqlite': 
HookInfo(hook_class_name='airflow.providers.sqlite.hooks.sqlite.SqliteHook', 
connection_id_attribute_name='sqlite_conn_id', 
package_name='apache-airflow-providers-sqlite', hook_name='Sqlite', 
connection_type='sqlite', connection_testable=True)}
   ```
   
   This causes connections of the form `AIRFLOW_CONN_...=https://...` to no 
longer be allowed for use by the `HttpOperator`.
   
   ### What you think should happen instead
   
   The [documentation for 
v4.8.0](https://airflow.apache.org/docs/apache-airflow-providers-http/4.8.0/connections/http.html)
 specifies:
   
   > **Schema (optional**:
   >     Specify the service type etc: http/https.
   
   Thus it seems reasonable to expect connection URIs which start with `https` 
to be allowed, especially given this is a breaking change with previous 
behavior that was not immediately obvious when upgrading from provider version 
<=4.6.0.
   
   ### How to reproduce
   
   The following command simulates an HTTPS hook being instantiated with the 
`HttpHook` class directly, which was the previous internal behavior of the 
`HttpOperator`. No errors are raised.
   
   ```bash
   $ docker run -e AIRFLOW_CONN_SAMPLE_HOOK='https://google.com' --rm 
docker.io/apache/airflow:slim-2.8.0-python3.10 python -c 'from 
airflow.providers.http.hooks.http import HttpHook; hook = 
HttpHook(http_conn_id="sample_hook")'
   ```
   
   This command shows the newer behavior, which uses `BaseHook` to determine 
the connection type and fails to do so.
   
   ```bash
   $ docker run -e AIRFLOW_CONN_SAMPLE_HOOK='https://google.com' --rm 
docker.io/apache/airflow:slim-2.8.0-python3.10 python -c 'from 
airflow.hooks.base import BaseHook; conn = 
BaseHook.get_connection("sample_hook"); hook = conn.get_hook()'
   ```
   
   ### Anything else
   
   We were able to apply a patch in our own code as a workaround in 
https://github.com/WordPress/openverse/pull/3624
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to