Hi Pradeep
Can you give a try to https://github.com/OSGeo/gdal/pull/10857 ?
Even
Le 22/09/2024 à 07:37, Pradeep kumar via gdal-dev a écrit :
Dear GDAL Developers,
I hope this message finds you well.
I am experiencing an issue with GDAL’s VSICURL virtual file system
where the Authorization header is being passed to redirected
pre-signed S3 URLs, leading to errors when accessing AWS S3.
*Background:*
In my use case, I have a proxy URL that, when called with an
Authorization header, validates the token, verifies authorization, and
then generates a pre-signed URL to an S3 object. The proxy then
redirects the client to this pre-signed URL. When using GDAL with this
reverse proxy server and setting the configuration option --config
CPL_VSIL_CURL_USE_S3_REDIRECT NO, everything works correctly. However,
the issue is that GDAL makes too many round trips, and each time I end
up generating a new pre-signed URL. Ideally, I would like GDAL to send
the initial request and then reuse the pre-signed URLs for subsequent
requests.
This desired behavior is achieved with --config
CPL_VSIL_CURL_USE_S3_REDIRECT YES. However, the problem now is that
GDAL, on subsequent requests to the pre-signed URL, is also passing
the Authorization header. I am using the configuration --config
GDAL_HTTP_HEADERS "Authorization: Bearer xxxx" to set the initial
header, so the Authorization header is being sent with every request.
When the Authorization header is sent to AWS S3 along with the
pre-signed URL, AWS returns the following error:
```Only one auth mechanism allowed; only the X-Amz-Algorithm query
parameter, Signature query string parameter or the Authorization
header should be specified.```
*Observed Behavior:*
•*First Request (302 OK):* GDAL sends a request to the proxy URL with
the Authorization header. The proxy validates the token and redirects
to the pre-signed S3 URL. GDAL follows the redirect, and since the
pre-signed URL is accessed without the Authorization header, AWS S3
responds with a *200 OK*. This behavior is as expected.
•*Subsequent Requests (400 Bad Request):* GDAL reuses the pre-signed
URL but includes the Authorization header in the request. AWS S3,
seeing both the pre-signed URL’s query parameters and the
Authorization header, returns a *400 Bad Request* error, stating that
only one authentication mechanism is allowed.
*Expected Behavior:*
I expect that GDAL’s VSICURL should send the initial request with the
Authorization header to the proxy URL. Upon receiving the redirect to
the pre-signed URL, it should not include the Authorization header in
subsequent requests to AWS S3. This would allow AWS S3 to accept the
pre-signed URL without conflicts.
*Questions:*
•Is there a way to configure GDAL so that it does not pass the
Authorization header to the redirected pre-signed URLs while retaining
it for the initial request?
•If this feature is not currently available, would it be feasible to
implement such functionality?
•Are there any plans to address the handling of HTTP redirect response
codes 301 and 307 in future GDAL releases to better support this use case?
*GDAL CLI Command:*
*
*
*``` *gdalinfo --debug on --config CPL_CURL_VERBOSE YES --config
GDAL_DISABLE_READDIR_ON_OPEN EMPTY_DIR --config
CPL_VSIL_CURL_USE_S3_REDIRECT YES --config
GDAL_HTTP_HEADERS="Authorization: Bearer xxxx"
/vsicurl/https://example.com/stac/collections/items/assets?path=s3://your-bucket/red.tif ```
*Example GDAL Logs:*
Below are the GDAL logs illustrating the issue (sensitive information
has been redacted):
*First Request (200 OK):*
*```*
HTTP: libcurl/8.7.1 (SecureTransport) LibreSSL/3.3.6 zlib/1.2.12
nghttp2/1.61.0
HTTP: GDAL was built against curl 8.4.0, but is running against 8.7.1.
CURL_INFO_TEXT: [HTTP/2] [1] OPENED stream for
https://example.com/stac/collections/items/assets?path=s3://your-bucket/red.tif
CURL_INFO_TEXT: [HTTP/2] [1] [:method: HEAD]
CURL_INFO_TEXT: [HTTP/2] [1] [:scheme: https]
CURL_INFO_TEXT: [HTTP/2] [1] [:authority: example.com
<http://example.com>]
CURL_INFO_TEXT: [HTTP/2] [1] [:path:
/stac/collections/items/assets/foo?path=s3://your-bucket/red.tif]
CURL_INFO_TEXT: [HTTP/2] [1] [user-agent: GDAL/3.9.2]
CURL_INFO_TEXT: [HTTP/2] [1] [accept: */*]
CURL_INFO_TEXT: [HTTP/2] [1] [authorization: Bearer [REDACTED]]
CURL_INFO_HEADER_OUT: HEAD
/stac/collections/items/assets/foo?path=s3://your-bucket/red.tif HTTP/2
Host: example.com <http://example.com>
User-Agent: GDAL/3.9.2
Accept: */*
Authorization: Bearer [REDACTED]
CURL_INFO_TEXT: Request completely sent off
CURL_INFO_HEADER_IN: HTTP/2 301
CURL_INFO_HEADER_IN: date: Sun, 22 Sep 2024 04:44:28 GMT
CURL_INFO_HEADER_IN: content-type: text/plain; charset=utf-8
CURL_INFO_HEADER_IN: content-length: 43
CURL_INFO_HEADER_IN: location:
https://s3-bucket-placeholder/red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868
<https://s3-bucket-placeholder/red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868>
CURL_INFO_HEADER_IN: x-content-length: 176995703
CURL_INFO_HEADER_IN: apigw-requestid: [REDACTED]
CURL_INFO_HEADER_IN:
CURL_INFO_TEXT: Ignoring the response-body
CURL_INFO_TEXT: Connection #0 to host example.com <http://example.com>
left intact
CURL_INFO_TEXT: Issue another request to this URL:
'https://s3-bucket-placeholder/red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868
<https://s3-bucket-placeholder/red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868>'
CURL_INFO_TEXT: Couldn't find host s3-bucket-placeholder in the .netrc
file; using defaults
CURL_INFO_TEXT: Host s3-bucket-placeholder:443 was resolved.
CURL_INFO_TEXT: Trying [REDACTED]...
CURL_INFO_TEXT: Connected to s3-bucket-placeholder ([REDACTED]) port 443
CURL_INFO_TEXT: SSL connection using TLSv1.3 / [REDACTED]
CURL_INFO_TEXT: Server certificate:
CURL_INFO_TEXT: subject: CN=*.s3.amazonaws.com <http://s3.amazonaws.com>
CURL_INFO_TEXT: start date: Apr 22 00:00:00 2024 GMT
CURL_INFO_TEXT: expire date: Apr 7 23:59:59 2025 GMT
CURL_INFO_TEXT: issuer: C=US; O=Amazon; CN=Amazon RSA 2048 M01
CURL_INFO_TEXT: SSL certificate verify ok.
CURL_INFO_HEADER_OUT: HEAD
/red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868
HTTP/1.1
Host: s3-bucket-placeholder
User-Agent: GDAL/3.9.2
Accept: */*
CURL_INFO_TEXT: Request completely sent off
CURL_INFO_HEADER_IN: HTTP/1.1 200 OK
CURL_INFO_HEADER_IN: x-amz-id-2: [REDACTED]
CURL_INFO_HEADER_IN: x-amz-request-id: [REDACTED]
CURL_INFO_HEADER_IN: Date: Sun, 22 Sep 2024 04:44:29 GMT
CURL_INFO_HEADER_IN: Last-Modified: Tue, 17 Sep 2024 17:44:40 GMT
CURL_INFO_HEADER_IN: ETag: "[REDACTED]"
CURL_INFO_HEADER_IN: x-amz-server-side-encryption: AES256
CURL_INFO_HEADER_IN: Accept-Ranges: bytes
CURL_INFO_HEADER_IN: Content-Type: image/tiff
CURL_INFO_HEADER_IN: Server: AmazonS3
CURL_INFO_HEADER_IN: Content-Length: 176995703
CURL_INFO_HEADER_IN:
CURL_INFO_TEXT: Connection #1 to host s3-bucket-placeholder left intact
VSICURL: Effective URL:
https://s3-bucket-placeholder/red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868
<https://s3-bucket-placeholder/red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868>
VSICURL: Will use redirect URL for the next 3599 seconds
VSICURL:
GetFileSize(https://example.com/stac/collections/items/assets/foo?path=s3://your-bucket/red.tif)=176995703
response_code=200*
*
```
*Subsequent Request (400 Bad Request):*
*```*
VSICURL: Using redirect URL as it looks to be still valid (3599
seconds left)
VSICURL: Downloading 0-16383
(https://s3-bucket-placeholder/red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868).
<https://s3-bucket-placeholder/red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868).>..
CURL_INFO_TEXT: Couldn't find host s3-bucket-placeholder in the .netrc
file; using defaults
CURL_INFO_TEXT: Found bundle for host: 0x600001aa8270 [serially]
CURL_INFO_TEXT: Can not multiplex, even if we wanted to
CURL_INFO_TEXT: Re-using existing connection with host
s3-bucket-placeholder
CURL_INFO_HEADER_OUT: GET
/red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868
HTTP/1.1
Host: s3-bucket-placeholder
User-Agent: GDAL/3.9.2
Accept: */*
Authorization: Bearer [REDACTED]
Range: bytes=0-16383
CURL_INFO_TEXT: Request completely sent off
CURL_INFO_HEADER_IN: HTTP/1.1 400 Bad Request
CURL_INFO_HEADER_IN: x-amz-request-id: [REDACTED]
CURL_INFO_HEADER_IN: x-amz-id-2: [REDACTED]
CURL_INFO_HEADER_IN: Content-Type: application/xml
CURL_INFO_HEADER_IN: Transfer-Encoding: chunked
CURL_INFO_HEADER_IN: Date: Sun, 22 Sep 2024 04:44:28 GMT
CURL_INFO_HEADER_IN: Server: AmazonS3
CURL_INFO_HEADER_IN: Connection: close
CURL_INFO_HEADER_IN:
CURL_INFO_TEXT: Closing connection
VSICURL: Got response_code=400
ERROR 4:
`/vsicurl/https://example.com/stac/collections/items/assets/foo?path=s3://your-bucket/red.tif'
not recognized as being in a supported file format.
gdalinfo failed - unable to open
'/vsicurl/https://example.com/stac/collections/items/assets/foo?path=s3://your-bucket/red.tif'.*
*
```
*Additional Information:*
If you require any further details or clarification, please let me
know. I would be happy to provide more information. Additionally, if
necessary, I can create an issue on the GDAL GitHub repository to
track this problem.
Thank you very much for your time and assistance. I appreciate any
guidance or suggestions you can provide to help resolve this issue.
Best Regards,
Pradeep Gulla
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev
--
http://www.spatialys.com
My software is free, but my time generally not.
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev