Le 10/09/2024 à 16:10, Rahkonen Jukka via gdal-dev a écrit :

Hi,

Have you tried with configuration option “CPL_VSIL_CURL_USE_HEAD=[YES/NO]: Defaults to YES. Controls whether to use a HEAD request when opening a remote URL.”

I was just going to suggest that too. It "works", but not really. It just postpones the core issue: the server doesn't support GET Range requests, so can't be used with /vsicurl/

As it has a COG organization with overview data first in the file, If you want to read the smallest overview(s), you can use /vsicurl_streaming/ instead, but that won't be efficient to read the bottom-right most tile of the full resoultion late, which will require reading the whole file...

Nothing GDAL can do about that.

Actually... digging further... it somehow supports Range requests, but in what I believe a non-compliant way. It does return the expected content, but returns HTTP 200 and not HTTP 206 (Partial content). And it never returns the Content-Length header.

Well, I've implemented a workaround in https://github.com/OSGeo/gdal/pull/10760 that might be useful in other similar cases too.

With that, the following works:

|gdal_translate "/vsicurl?file_size=unlimited&url=https://data.source.coop/earthgenome/sentinel2-temporal-mosaics/20NMH_2024-04-01_2024-08-01/B08.tif"; --config GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR out.tif -srcwin 5000 5000 50 50|

file_size=unlimited works here since the GTiff driver doesn't really need to have the right file size, it will just check we don't try to read beyond at some points, so unlimited is OK. In other situations/drivers, the exact value could be needed.

But they should really fix their servers

Even

-Jukka Rahkonen-

*Lähettäjä:* gdal-dev <gdal-dev-boun...@lists.osgeo.org> *Puolesta *Daniel Evans via gdal-dev
*Lähetetty:* tiistai 10. syyskuuta 2024 16.57
*Vastaanottaja:* 'gdal-dev@lists.osgeo.org' (gdal-dev@lists.osgeo.org) <gdal-dev@lists.osgeo.org>
*Aihe:* [gdal-dev] Ignore content-length in vsicurl?

Hi all,

I am attempting to read a dataset via /vsicurl/ where I believe the server is incorrectly returning `content-length: 0` in response to HEAD requests. This causes GDAL to believe it's a zero-length file, and it therefore can't be read.

If I download the file via HTTP GET, it's valid, and GDAL can read it locally. I've also confirmed I can use /vsicurl/ on some test datasets in the GDAL repo.

Is it possible to force GDAL to work around the faulty content-length header, or is it too fundamental a problem to ignore?

I've separately got in touch with the data provider to see if they are able to fix the issue at their end.

Cheers,

Daniel

URL of the troublesome dataset:

https://data.source.coop/earthgenome/sentinel2-temporal-mosaics/20NMH_2024-04-01_2024-08-01/B08.tif

Example HTTP header responses I'm seeing:

GET

HTTP/2 200
date: Tue, 10 Sep 2024 13:47:54 GMT
content-type: binary/octet-stream
content-length: 278198294
vary: Origin, Access-Control-Request-Method, Access-Control-Request-Headers
etag: "a79f3f685281d6681e4d362536c5b3eb-34"
last-modified: Thu, 25 Jul 2024 13:16:08 GMT
x-version: 0.0.16
access-control-allow-credentials: true

HEAD

HTTP/2 200
date: Tue, 10 Sep 2024 13:48:08 GMT
content-type: binary/octet-stream
content-length: 0
x-version: 0.0.16
access-control-allow-credentials: true
etag: "a79f3f685281d6681e4d362536c5b3eb-34"
last-modified: Thu, 25 Jul 2024 13:16:08 GMT
vary: Origin, Access-Control-Request-Method, Access-Control-Request-Headers


_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

--
http://www.spatialys.com
My software is free, but my time generally not.
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to