Putting something like mapserver/mapcache in front of your requests might work, if I understand correctly. We serve COGs out of s3 via WMS like this and the performance is pretty nice.
See https://github.com/pedros007/mapserver-docker for some discussion on such a setup. Best, Patrick On Thu, Jan 23, 2020, 6:44 AM Daniel Evans <daniel.ev...@jbarisk.com> wrote: > Hi, > > > > I have a large (global, 30m resolution, 50GB+) GeoTIFF dataset, from which > I need to read many (millions) of pixel values at given input coordinates. > I’ve got reasonable performance out of the code, about a million queries > over five minutes, but: > > > > 1. There are actually twelve separate datasets of this size to query, > not just one, so it takes approximately an hour. > 2. This is by far the slowest portion of the program, and the users > demand speed! > 3. The users would also like to move towards higher resolution > datasets, which we see run about 5x slower. > 4. When querying the data on a particular piece of network storage > mounted as part of the local filesystem, we see a slowdown approaching two > orders of magnitude – bulk file copies off the network storage are > reasonable, but each IO request shows a significant overhead (up to a > second), and GDAL is sending one for each coordinate queried. > > > > The implementation is in Python, directly calling down to GDAL. The short, > long-running snippet of code which performs the actual queries the dataset, > having converted real-world coordinates to pixels, is: > > > > value_arrays = ( > > raster_ds.ReadAsArray( > > xoff=coord[0] - buffer_size, > > yoff=coord[1] - buffer_size, > > xsize=npix, > > ysize=npix > > ) for coord in offsets > > ) > > > > There are a few things that are probably worth noting: > > 1. It is not necessarily a single pixel that is being read – for each > coordinate, the program may be asked to get all pixel values within a given > radius (typically a couple of pixels), and use some function to summarise > these into a single value (mean, median, …). GDAL currently returns a numpy > array for each query, which is passed to the user-specified function after > the snippet above. > 2. The dataset is made up of 2048x2048 LZW-Compressed tiles containing > floating point data (essentially conforming to COG, but with no overviews), > grouped together in a VRT (performance is identical with plain GeoTIFFs, > though). > > > > Multiprocessing has not been found to help - we actually lose throughput > as the disk read head is moving back and forth constantly. Better hardware > (especially SSDs) is known to help, but no one wants to pay for that. > > > > We see no particular performance difference from setting > GDAL_DISABLE_READDIR_ON_OPEN=TRUE, and GDAL_CACHEMAX is left at the default > 5% (64GB+ RAM available). > > > > > > > > Does the Python interface to GDAL provide a way to supply a large number > of offsets and get blocks of pixels back, avoiding the need to come back up > to Python after each query? (I suspect not) > > Is there some way to optimise GDAL so that queries of files on the mounted > network storage are more efficient? > > > > > > > > *Dr. Daniel Evans* > > Software Developer > > > > *Skype* > > > > *T* +44 (0) 1756 799919 > www.jbarisk.com > > [image: Visit our website] <http://www.jbarisk.com> [image: Follow us on > Twitter] <https://twitter.com/jbarisk> > > Our postal address and registered office is JBA Risk Management Limited, > 1 Broughton Park, Old Lane North, Broughton, Skipton, North Yorkshire BD23 > 3FD. > > *Find out more about us here: www.jbarisk.com <http://www.jbarisk.com/> > and **follow us on Twitter @JBARisk <http://twitter.com/JBARisk> and > LinkedIn > <https://www.linkedin.com/company/2370847?trk=tyah&trkInfo=clickedVertical%3Acompany%2CclickedEntityId%3A2370847%2Cidx%3A2-1-2%2CtarId%3A1447414259786%2Ctas%3AJBA%20RISK%20MANAGEMENT> > * > > The JBA Group supports the JBA Trust. > > All JBA Risk Management's email messages contain confidential information > and are intended only for the individual(s) named. If you are not the named > addressee you should not disseminate, distribute or copy this e-mail. > Please notify the sender immediately by email if you have received this > email by mistake and delete this email from your system. > > > JBA Risk Management Limited is registered in England, company number > 07732946, 1 Broughton Park, Old Lane North, Broughton, Skipton, North > Yorkshire, BD23 3FD, Telephone: +441756799919 > > > _______________________________________________ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/gdal-dev
_______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev