Hi Simultaneous writers would be a better long term solution as we often improve the raster following initial creation. This improvement may well be a filter run on a sub region ( eg a despeckle) or updating a piece of the dtm with better information or even some manual edits as a last resort. I can imagine a Hadoop style map/reduce would fit nicely into your sub window idea.
Regards pk On 12/01/2013, at 11:16 PM, "Even Rouault" <even.roua...@mines-paris.org> wrote: > Le samedi 12 janvier 2013 02:38:55, Kennedy, Paul a écrit : > > Hi, > > Yes, we are pretty sure we will see a significant benefit. The processing > > algorithms are CPU bound not io bound. Our digital terrain model > > interpolations often run for many hours ( we do them overnight) but the > > underlying file is only a few gigabytes. > > OK, my understanding is that you don't really need writers to write > simultaneously. You need to compute tiles or subwindows of the whole raster in > parallel, but the writing itself of the result of that computation could be > well done in a serialized way. > > That's a bit what is done with gdalwarp -wo NUM_THREADS=xxxx . Having > parallelized I/O could perhaps give some extra performance when you have so > many threads that the time spent in I/O becomes of the same order of magnitude > than the time spent in computing, but at the expense of probably a significant > complexity in GDAL core and drivers. > > > If we split them into multiple > > files of tiles and run each on a dedicated process the whole thing is > > quicker, but this is messy and results in a stitching error. > > > > Another example is gdalwarp. It takes quite some time with a large data set > > and would be. A good candidate for parallelisation, as would gdaladdo. > > > > I believe slower cores but more of them in pcs are the future. My pc has 8 > > but they rarely get used to their potential. > > > > I am certain there are some challenges here, that's why it is interesting;) > > > > Regards > > pk > > > > On 11/01/2013, at 6:54 PM, "Even Rouault" <even.roua...@mines-paris.org> > wrote: > > > Hi, > > > > > > This is an intersting topic, with many "intersecting" issues to deal with > > > at different levels. > > > > > > First, are you confident that in the use cases you imagine that I/O > > > access won't be the limiting factor, in which case serialization of I/O > > > could be acceptable and this would just require an API with a dataset > > > level mutex. > > > > > > There are several places where parallel write should be addressed : > > > - The GDAL core mechanisms that deal with the block cache > > > - Each GDAL driver where parallel write would be supported. I guess that > > > GDAL drivers should advertize a specific capability > > > - The low-level library used by the driver. In the case of GDAL, libtiff > > > > > > And finally, as Frank underlined, there are intrinsic limitations due to > > > the format itself. For a compressed TIFF, at some point, you have to > > > serialize the writing of the tile, because you cannot kown in advance > > > the size of the compressed data, or at least have some coordination of > > > the writers so that a "next offset available" is properly synchronized > > > between them. The compression itself could be serialized. > > > > > > I'm not sure however if what Jan mentionned, different process, writing > > > the same dataset is doable.
_______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev