Le 10/09/2024 à 16:10, Rahkonen Jukka via gdal-dev a écrit :
Hi,
Have you tried with configuration option
“CPL_VSIL_CURL_USE_HEAD=[YES/NO]: Defaults to YES. Controls whether to
use a HEAD request when opening a remote URL.”
I was just going to suggest that too. It "works", but not really. It
just postpones the core issue: the server doesn't support GET Range
requests, so can't be used with /vsicurl/
As it has a COG organization with overview data first in the file, If
you want to read the smallest overview(s), you can use
/vsicurl_streaming/ instead, but that won't be efficient to read the
bottom-right most tile of the full resoultion late, which will require
reading the whole file...
Nothing GDAL can do about that.
Actually... digging further... it somehow supports Range requests, but
in what I believe a non-compliant way. It does return the expected
content, but returns HTTP 200 and not HTTP 206 (Partial content). And it
never returns the Content-Length header.
Well, I've implemented a workaround in
https://github.com/OSGeo/gdal/pull/10760 that might be useful in other
similar cases too.
With that, the following works:
|gdal_translate
"/vsicurl?file_size=unlimited&url=https://data.source.coop/earthgenome/sentinel2-temporal-mosaics/20NMH_2024-04-01_2024-08-01/B08.tif"
--config GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR out.tif -srcwin 5000
5000 50 50|
file_size=unlimited works here since the GTiff driver doesn't really
need to have the right file size, it will just check we don't try to
read beyond at some points, so unlimited is OK. In other
situations/drivers, the exact value could be needed.
But they should really fix their servers
Even
-Jukka Rahkonen-
*Lähettäjä:* gdal-dev <gdal-dev-boun...@lists.osgeo.org> *Puolesta
*Daniel Evans via gdal-dev
*Lähetetty:* tiistai 10. syyskuuta 2024 16.57
*Vastaanottaja:* 'gdal-dev@lists.osgeo.org' (gdal-dev@lists.osgeo.org)
<gdal-dev@lists.osgeo.org>
*Aihe:* [gdal-dev] Ignore content-length in vsicurl?
Hi all,
I am attempting to read a dataset via /vsicurl/ where I believe the
server is incorrectly returning `content-length: 0` in response to
HEAD requests. This causes GDAL to believe it's a zero-length file,
and it therefore can't be read.
If I download the file via HTTP GET, it's valid, and GDAL can read it
locally. I've also confirmed I can use /vsicurl/ on some test datasets
in the GDAL repo.
Is it possible to force GDAL to work around the faulty content-length
header, or is it too fundamental a problem to ignore?
I've separately got in touch with the data provider to see if they are
able to fix the issue at their end.
Cheers,
Daniel
URL of the troublesome dataset:
https://data.source.coop/earthgenome/sentinel2-temporal-mosaics/20NMH_2024-04-01_2024-08-01/B08.tif
Example HTTP header responses I'm seeing:
GET
HTTP/2 200
date: Tue, 10 Sep 2024 13:47:54 GMT
content-type: binary/octet-stream
content-length: 278198294
vary: Origin, Access-Control-Request-Method,
Access-Control-Request-Headers
etag: "a79f3f685281d6681e4d362536c5b3eb-34"
last-modified: Thu, 25 Jul 2024 13:16:08 GMT
x-version: 0.0.16
access-control-allow-credentials: true
HEAD
HTTP/2 200
date: Tue, 10 Sep 2024 13:48:08 GMT
content-type: binary/octet-stream
content-length: 0
x-version: 0.0.16
access-control-allow-credentials: true
etag: "a79f3f685281d6681e4d362536c5b3eb-34"
last-modified: Thu, 25 Jul 2024 13:16:08 GMT
vary: Origin, Access-Control-Request-Method,
Access-Control-Request-Headers
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev
--
http://www.spatialys.com
My software is free, but my time generally not.
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev