Hi, For those interested in parallelizing algorithms that generate a big dataset (let's say a GeoTIFF), I've just committed in trunk an improvement to the existing not-so-known-I-guess GDAL "api proxy" mechanism (http://www.gdal.org/gdal_api_proxy.html) that was initially designed to deal with datasets in isolation from the main process.
The improvement consists in the addition of the "-nofork" option (for Unix builds only for now, but could relatively easily be extended to Windows if needed) to the gdalserver utility, which cause all actions of different client connections to be run (sequentially) in the same thread, thus allowing sharing the same dataset object if clients open it with the same name. Consequently, safe parallel (in fact serialized) update of a dataset is possible. Demo: 1) Create a target dataset: gdalwarp in.tif /tmp/out.tif -overwrite -co TILED=YES (Ctrl+C almost immediately to just create the file, could be done more cleanly but that's enough for the demo) 2) Launch the server: gdalserver -unixserver /tmp/mysocket -nofork -v (you could use "-tcpserver 8080" also, in which case you would set localhost:8080 as the value of the below GDAL_API_PROXY_SERVER) 3) Launch in parallel in 2 terminals : a) GDAL_API_PROXY_SERVER=/tmp/mysocket gdalwarp upper.vrt API_PROXY:/tmp/out.tif b) GDAL_API_PROXY_SERVER=/tmp/mysocket gdalwarp lower.vrt API_PROXY:/tmp/out.tif where upper.vrt and lower.vrt are 2 VRT that are the upper and lower part of in.tif A cool aspect is that you can interrupt violently any client at any time and the integrity of the output dataset will be still preserved (but you can only safely kill the server once all clients connecting to the same output dataset have terminated, which the server will tell you with the verbose -v flag). So you can resume part of the processing later (assuming clients deal with separated parts of the output raster). You can also display the result with QGIS while it is processed (this will slow down things of course, and it should be launched AFTER a first client so it doesn't open the dataset in read-only mode) : $ ln -s API_PROXY:/tmp/out.tif proxied_out.tif $ GDAL_API_PROXY_SERVER=/tmp/mysocket qgis proxied_out.tif The server, and thus the output file, could also be on a completely different machine, when using TCP mode of course. The clients could be on different machines also. They could also be 2 threads of the same process (assuming they use each a dedicated dataset handle obtained with a GDALOpen("API_PROXY:/tmp/out.tif", GA_Update) call) This demo is probably not very exciting (you could use the -multi -wo NUM_THREADS=ALL_CPUS options of gdalwarp with more performance), but it should give an idea of what this is about. Of course as the communication of all clients with the server in -nofork mode is serialized, this is only interesting if writing the output dataset itself is not the bottleneck of the processing. This also works for read/update scenarios (what gdalwarp does in fact since it asks for the content of blocks it will update). Enjoy, Even -- Spatialys - Geospatial professional services http://www.spatialys.com _______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev