Le samedi 12 janvier 2013 15:08:55, Jan Hartmann a écrit : > You probably know this, but there is an option to let gdalwarp use more > cores: -wo NUM_THREADS=ALL_CPUS. It gives some improvement, but not > really staggering.
Do you use Proj 4.8.0 ? If not, that might explain why you don't see a significant improvement. The performance gain is also much more significant with complex resampling kernels. With nearest resampling, most of the time is spent in I/O. Increasing the warping memory buffer might also help to benefit from parallelization. For example (debug non-optimized build) : - 1 thread, nearest : $ time gdalwarp world_4326.tif out.tif -t_srs EPSG:3857 -overwrite -wo NUM_THREADS=1 -wm 512 Creating output file that is 8183P x 8201L. Processing input file world_4326.tif. 0...10...20...30...40...50...60...70...80...90...100 - done. real 0m6.390s user 0m5.940s sys 0m0.440s - 4 threads, nearest : $ time gdalwarp world_4326.tif out.tif -t_srs EPSG:3857 -overwrite -wo NUM_THREADS=4 -wm 512 Creating output file that is 8183P x 8201L. Processing input file world_4326.tif. 0...10...20...30...40...50...60...70...80...90...100 - done. real 0m3.482s user 0m6.330s sys 0m0.700s - 1 thread, bilinear : $ time gdalwarp world_4326.tif out.tif -t_srs EPSG:3857 -overwrite -wo NUM_THREADS=1 -wm 512 -rb Creating output file that is 8183P x 8201L. Processing input file world_4326.tif. 0...10...20...30...40...50...60...70...80...90...100 - done. real 0m18.387s user 0m17.840s sys 0m0.510s - 4 threads, bilinear : $ time gdalwarp world_4326.tif out.tif -t_srs EPSG:3857 -overwrite -wo NUM_THREADS=4 -wm 512 -rb Creating output file that is 8183P x 8201L. Processing input file world_4326.tif. 0...10...20...30...40...50...60...70...80...90...100 - done. real 0m8.052s user 0m20.000s sys 0m0.550s - 1 thread, cubic : $ time gdalwarp world_4326.tif out.tif -t_srs EPSG:3857 -overwrite -wo NUM_THREADS=1 -wm 512 -rc Creating output file that is 8183P x 8201L. Processing input file world_4326.tif. 0...10...20...30...40...50...60...70...80...90...100 - done. real 0m35.724s user 0m35.010s sys 0m0.620s - 4 threads, cubic : $ time gdalwarp world_4326.tif out.tif -t_srs EPSG:3857 -overwrite -wo NUM_THREADS=4 -wm 512 -rc Creating output file that is 8183P x 8201L. Processing input file world_4326.tif. 0...10...20...30...40...50...60...70...80...90...100 - done. real 0m13.274s user 0m39.530s sys 0m0.560s - 1 thread, lanczos : $ time gdalwarp world_4326.tif out.tif -t_srs EPSG:3857 -overwrite -wo NUM_THREADS=1 -wm 512 -r lanczos Creating output file that is 8183P x 8201L. Processing input file world_4326.tif. 0...10...20...30...40...50...60...70...80...90...100 - done. real 2m21.269s user 2m20.460s sys 0m0.400s - 4 threads, lanczos : $ time gdalwarp world_4326.tif out.tif -t_srs EPSG:3857 -overwrite -wo NUM_THREADS=4 -wm 512 -r lanczos Creating output file that is 8183P x 8201L. Processing input file world_4326.tif. 0...10...20...30...40...50...60...70...80...90...100 - done. real 0m51.852s user 2m36.520s sys 0m0.750s > Splitting up operations over individual tiles would > really fasten up things. Even if I use only one VM, I can define 32 > cores, and it would certainly be interesting to experiment with programs > like MPI to integrate multiple VMs into one computing cluster. > > Jan > > On 01/12/2013 02:38 AM, Kennedy, Paul wrote: > > Hi, > > Yes, we are pretty sure we will see a significant benefit. The > > processing algorithms are CPU bound not io bound. Our digital terrain > > model interpolations often run for many hours ( we do them overnight) > > but the underlying file is only a few gigabytes. If we split them into > > multiple files of tiles and run each on a dedicated process the whole > > thing is quicker, but this is messy and results in a stitching error. > > > > Another example is gdalwarp. It takes quite some time with a large > > data set and would be. A good candidate for parallelisation, as would > > gdaladdo. > > > > I believe slower cores but more of them in pcs are the future. My pc > > has 8 but they rarely get used to their potential. > > > > I am certain there are some challenges here, that's why it is > > interesting;) > > > > Regards > > pk > > > > On 11/01/2013, at 6:54 PM, "Even Rouault" > > <even.roua...@mines-paris.org <mailto:even.roua...@mines-paris.org>> > > > > wrote: > >> Re: [gdal-dev] does gdal support multiple simultaneous writers to raster > >> > >> Hi, > >> > >> This is an intersting topic, with many "intersecting" issues to deal > >> with at > >> different levels. > >> > >> First, are you confident that in the use cases you imagine that I/O > >> access won't > >> be the limiting factor, in which case serialization of I/O could be > >> acceptable > >> and this would just require an API with a dataset level mutex. > >> > >> There are several places where parallel write should be addressed : > >> - The GDAL core mechanisms that deal with the block cache > >> - Each GDAL driver where parallel write would be supported. I guess > >> that GDAL > >> drivers should advertize a specific capability > >> - The low-level library used by the driver. In the case of GDAL, libtiff > >> > >> And finally, as Frank underlined, there are intrinsic limitations due > >> to the > >> format itself. For a compressed TIFF, at some point, you have to > >> serialize the > >> writing of the tile, because you cannot kown in advance the size of the > >> compressed data, or at least have some coordination of the writers so > >> that a > >> "next offset available" is properly synchronized between them. The > >> compression > >> itself could be serialized. > >> > >> I'm not sure however if what Jan mentionned, different process, > >> writing the same > >> dataset is doable. > > > > _______________________________________________ > > gdal-dev mailing list > > gdal-dev@lists.osgeo.org > > http://lists.osgeo.org/mailman/listinfo/gdal-dev _______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev