Hi Pradeep

Can you give a try to https://github.com/OSGeo/gdal/pull/10857 ?

Even

Le 22/09/2024 à 07:37, Pradeep kumar via gdal-dev a écrit :

Dear GDAL Developers,


I hope this message finds you well.


I am experiencing an issue with GDAL’s VSICURL virtual file system where the Authorization header is being passed to redirected pre-signed S3 URLs, leading to errors when accessing AWS S3.


*Background:*


In my use case, I have a proxy URL that, when called with an Authorization header, validates the token, verifies authorization, and then generates a pre-signed URL to an S3 object. The proxy then redirects the client to this pre-signed URL. When using GDAL with this reverse proxy server and setting the configuration option --config CPL_VSIL_CURL_USE_S3_REDIRECT NO, everything works correctly. However, the issue is that GDAL makes too many round trips, and each time I end up generating a new pre-signed URL. Ideally, I would like GDAL to send the initial request and then reuse the pre-signed URLs for subsequent requests.


This desired behavior is achieved with --config CPL_VSIL_CURL_USE_S3_REDIRECT YES. However, the problem now is that GDAL, on subsequent requests to the pre-signed URL, is also passing the Authorization header. I am using the configuration --config GDAL_HTTP_HEADERS "Authorization: Bearer xxxx" to set the initial header, so the Authorization header is being sent with every request.


When the Authorization header is sent to AWS S3 along with the pre-signed URL, AWS returns the following error:


```Only one auth mechanism allowed; only the X-Amz-Algorithm query parameter, Signature query string parameter or the Authorization header should be specified.```


*Observed Behavior:*


•*First Request (302 OK):* GDAL sends a request to the proxy URL with the Authorization header. The proxy validates the token and redirects to the pre-signed S3 URL. GDAL follows the redirect, and since the pre-signed URL is accessed without the Authorization header, AWS S3 responds with a *200 OK*. This behavior is as expected.

•*Subsequent Requests (400 Bad Request):* GDAL reuses the pre-signed URL but includes the Authorization header in the request. AWS S3, seeing both the pre-signed URL’s query parameters and the Authorization header, returns a *400 Bad Request* error, stating that only one authentication mechanism is allowed.


*Expected Behavior:*


I expect that GDAL’s VSICURL should send the initial request with the Authorization header to the proxy URL. Upon receiving the redirect to the pre-signed URL, it should not include the Authorization header in subsequent requests to AWS S3. This would allow AWS S3 to accept the pre-signed URL without conflicts.


*Questions:*


•Is there a way to configure GDAL so that it does not pass the Authorization header to the redirected pre-signed URLs while retaining it for the initial request?

•If this feature is not currently available, would it be feasible to implement such functionality?

•Are there any plans to address the handling of HTTP redirect response codes 301 and 307 in future GDAL releases to better support this use case?



*GDAL CLI Command:*

*
*

*``` *gdalinfo --debug on --config CPL_CURL_VERBOSE YES --config GDAL_DISABLE_READDIR_ON_OPEN EMPTY_DIR --config CPL_VSIL_CURL_USE_S3_REDIRECT YES --config GDAL_HTTP_HEADERS="Authorization: Bearer xxxx" /vsicurl/https://example.com/stac/collections/items/assets?path=s3://your-bucket/red.tif ```


*Example GDAL Logs:*


Below are the GDAL logs illustrating the issue (sensitive information has been redacted):


*First Request (200 OK):*

*```*

HTTP: libcurl/8.7.1 (SecureTransport) LibreSSL/3.3.6 zlib/1.2.12 nghttp2/1.61.0
HTTP: GDAL was built against curl 8.4.0, but is running against 8.7.1.
CURL_INFO_TEXT: [HTTP/2] [1] OPENED stream for https://example.com/stac/collections/items/assets?path=s3://your-bucket/red.tif
CURL_INFO_TEXT: [HTTP/2] [1] [:method: HEAD]
CURL_INFO_TEXT: [HTTP/2] [1] [:scheme: https]
CURL_INFO_TEXT: [HTTP/2] [1] [:authority: example.com <http://example.com>] CURL_INFO_TEXT: [HTTP/2] [1] [:path: /stac/collections/items/assets/foo?path=s3://your-bucket/red.tif]
CURL_INFO_TEXT: [HTTP/2] [1] [user-agent: GDAL/3.9.2]
CURL_INFO_TEXT: [HTTP/2] [1] [accept: */*]
CURL_INFO_TEXT: [HTTP/2] [1] [authorization: Bearer [REDACTED]]
CURL_INFO_HEADER_OUT: HEAD /stac/collections/items/assets/foo?path=s3://your-bucket/red.tif HTTP/2
Host: example.com <http://example.com>
User-Agent: GDAL/3.9.2
Accept: */*
Authorization: Bearer [REDACTED]

CURL_INFO_TEXT: Request completely sent off
CURL_INFO_HEADER_IN: HTTP/2 301
CURL_INFO_HEADER_IN: date: Sun, 22 Sep 2024 04:44:28 GMT
CURL_INFO_HEADER_IN: content-type: text/plain; charset=utf-8
CURL_INFO_HEADER_IN: content-length: 43
CURL_INFO_HEADER_IN: location: https://s3-bucket-placeholder/red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868 <https://s3-bucket-placeholder/red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868>
CURL_INFO_HEADER_IN: x-content-length: 176995703
CURL_INFO_HEADER_IN: apigw-requestid: [REDACTED]
CURL_INFO_HEADER_IN:
CURL_INFO_TEXT: Ignoring the response-body
CURL_INFO_TEXT: Connection #0 to host example.com <http://example.com> left intact CURL_INFO_TEXT: Issue another request to this URL: 'https://s3-bucket-placeholder/red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868 <https://s3-bucket-placeholder/red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868>' CURL_INFO_TEXT: Couldn't find host s3-bucket-placeholder in the .netrc file; using defaults
CURL_INFO_TEXT: Host s3-bucket-placeholder:443 was resolved.
CURL_INFO_TEXT:   Trying [REDACTED]...
CURL_INFO_TEXT: Connected to s3-bucket-placeholder ([REDACTED]) port 443
CURL_INFO_TEXT: SSL connection using TLSv1.3 / [REDACTED]
CURL_INFO_TEXT: Server certificate:
CURL_INFO_TEXT:  subject: CN=*.s3.amazonaws.com <http://s3.amazonaws.com>
CURL_INFO_TEXT:  start date: Apr 22 00:00:00 2024 GMT
CURL_INFO_TEXT:  expire date: Apr  7 23:59:59 2025 GMT
CURL_INFO_TEXT:  issuer: C=US; O=Amazon; CN=Amazon RSA 2048 M01
CURL_INFO_TEXT:  SSL certificate verify ok.
CURL_INFO_HEADER_OUT: HEAD /red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868 HTTP/1.1
Host: s3-bucket-placeholder
User-Agent: GDAL/3.9.2
Accept: */*

CURL_INFO_TEXT: Request completely sent off
CURL_INFO_HEADER_IN: HTTP/1.1 200 OK
CURL_INFO_HEADER_IN: x-amz-id-2: [REDACTED]
CURL_INFO_HEADER_IN: x-amz-request-id: [REDACTED]
CURL_INFO_HEADER_IN: Date: Sun, 22 Sep 2024 04:44:29 GMT
CURL_INFO_HEADER_IN: Last-Modified: Tue, 17 Sep 2024 17:44:40 GMT
CURL_INFO_HEADER_IN: ETag: "[REDACTED]"
CURL_INFO_HEADER_IN: x-amz-server-side-encryption: AES256
CURL_INFO_HEADER_IN: Accept-Ranges: bytes
CURL_INFO_HEADER_IN: Content-Type: image/tiff
CURL_INFO_HEADER_IN: Server: AmazonS3
CURL_INFO_HEADER_IN: Content-Length: 176995703
CURL_INFO_HEADER_IN:
CURL_INFO_TEXT: Connection #1 to host s3-bucket-placeholder left intact
VSICURL: Effective URL: https://s3-bucket-placeholder/red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868 <https://s3-bucket-placeholder/red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868>
VSICURL: Will use redirect URL for the next 3599 seconds
VSICURL: GetFileSize(https://example.com/stac/collections/items/assets/foo?path=s3://your-bucket/red.tif)=176995703  response_code=200*
*

```


*Subsequent Request (400 Bad Request):*

*```*

VSICURL: Using redirect URL as it looks to be still valid (3599 seconds left) VSICURL: Downloading 0-16383 (https://s3-bucket-placeholder/red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868). <https://s3-bucket-placeholder/red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868).>.. CURL_INFO_TEXT: Couldn't find host s3-bucket-placeholder in the .netrc file; using defaults
CURL_INFO_TEXT: Found bundle for host: 0x600001aa8270 [serially]
CURL_INFO_TEXT: Can not multiplex, even if we wanted to
CURL_INFO_TEXT: Re-using existing connection with host s3-bucket-placeholder CURL_INFO_HEADER_OUT: GET /red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868 HTTP/1.1
Host: s3-bucket-placeholder
User-Agent: GDAL/3.9.2
Accept: */*
Authorization: Bearer [REDACTED]
Range: bytes=0-16383

CURL_INFO_TEXT: Request completely sent off
CURL_INFO_HEADER_IN: HTTP/1.1 400 Bad Request
CURL_INFO_HEADER_IN: x-amz-request-id: [REDACTED]
CURL_INFO_HEADER_IN: x-amz-id-2: [REDACTED]
CURL_INFO_HEADER_IN: Content-Type: application/xml
CURL_INFO_HEADER_IN: Transfer-Encoding: chunked
CURL_INFO_HEADER_IN: Date: Sun, 22 Sep 2024 04:44:28 GMT
CURL_INFO_HEADER_IN: Server: AmazonS3
CURL_INFO_HEADER_IN: Connection: close
CURL_INFO_HEADER_IN:
CURL_INFO_TEXT: Closing connection
VSICURL: Got response_code=400
ERROR 4: `/vsicurl/https://example.com/stac/collections/items/assets/foo?path=s3://your-bucket/red.tif' not recognized as being in a supported file format. gdalinfo failed - unable to open '/vsicurl/https://example.com/stac/collections/items/assets/foo?path=s3://your-bucket/red.tif'.*
*

```

*Additional Information:*


If you require any further details or clarification, please let me know. I would be happy to provide more information. Additionally, if necessary, I can create an issue on the GDAL GitHub repository to track this problem.


Thank you very much for your time and assistance. I appreciate any guidance or suggestions you can provide to help resolve this issue.


Best Regards,
Pradeep Gulla

_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

--
http://www.spatialys.com
My software is free, but my time generally not.
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to