I've tried multiprocessing a bit, here's my log on that.

My test case was computing the min and max of a 35989 x 61978 integer raster (Finland in 20 m x 20 m cells). The data is a LZW compressed GTiff with 128 x 128 blocks. The file size is ~200 MB.

I used block based access, Perl and PDL (Perl Data Language). Each block is read into a PDL object and the min and max of the block is then computed by PDL.

I used MCE first. MCE is "Multi-core engine for Perl" (a module available at CPAN). It can use threads but since my Perl is not compiled to use them (the usual case) it spawns child processes as workers.

The first experiment went fine, the computing time went from 214 secs with one worker to 125 secs with 5 workers (I have 4 CPUs). However, each worker processed one block at a time (opening the file each time anew), which I thought was not optimal because of overhead of spawning and opening. Then I changed the setup so that I arranged blocks into as many batches that I had workers, so each worker would work only once. I could not get that setup to work - I got low level errors from PDL.

The second experiment was to take the second setup from the first experiment (each worker works only once with a batch of blocks assigned to it) and use vanilla fork() from Perl core. Input to the spawned children is easy but for output I used files. This time there were no errors from PDL or elsewhere and everything worked fine. The computing time went from 62 secs with one worker to 36 secs with 4 workers.

It seems that using plain fork is quite easy and useful. I'd expect that similar results can be obtained with Python and its equivalent to fork() in Perl. I'm using Linux. Windows is bit different story since at least for Perl the fork() in Windows is somehow emulated version of the unix fork and that may cause issues.

The MCE module seems to be highly praised but it did not work for me well.

Ari


_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to