On jeudi 24 octobre 2019 17:42:23 CEST Rahkonen Jukka (MML) wrote: > Hi, > > I was experimenting with accessing some vector files through http (same data > as FlatGeoBuffers, GeoPackage, and shapefile). The file size in each format > was about 850 MB and the amount of data was about 240000 linestrings. I > made ogrinfo request with spatial filter that selects one feature and > cheched the number of http requests and amount of requested data. > > FlatGeoBuffers > 19 http requests > 33046509 bytes read
Looking at the debug log, FlatGeoBuf currently loads the whole index-of- features array( "Reading feature offsets index" ), which accounts for 32.7 MB of the above 33 MB. This could probably be avoided by only loading the offsets of the selected features. The shapefile driver a few years ago had the same issue and this was fixed by initializing the offset array to zeroes, and load on demand the offsets when needed. > If somebody > really finds a use case for reading vector data from the web it seems > obvious that having a possibility to cache and re-use the spatial index > would be very beneficial. I can imagine that with shapefile it would mean > downloading the .qix file, with GeoPackage reading the contents of the > rtree index table, and with FlatGeoBuffers probably extracting the Static > packed Hilbert R-tree index. A general caching logic in /vsicurl/ would be preferable (although the download of the 'data' part of files might potentially evict the indexes, but having a dedicated logic in each driver to tell which files / region of the files should be cached would be a bit annoying). Basically doing a HEAD request on the file to get its last update date, and have a local cache of downloaded pieces would be a more general solution. Even -- Spatialys - Geospatial professional services http://www.spatialys.com _______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev