Hi all,

I've been experiencing some behavior using the GDAL python bindings where I
am occasionally seeing what appears to be random blocks of the tiff being
unwritten in geotiffs I've pushed to S3.  a small block(s) in one of the
bands will be all zeros while everywhere else is good.

My setup is that I have a thread pool crunching through some gdal.Warp
calls.  The main thread is polling for completed jobs and then uploading
the file to s3.  My theory is that Python's garbage collector hasn't
destroyed the dataset I've set to None before I start uploading.  Is this
plausible?   The call to FlushCache didn't solve the problem for me and I'm
not aware of another way via the Python bindings for ensure the dataset is
closed.  I'm using Ubuntu 19.10 (comes with GDAL 2.4.2), any thoughts and
ideas to try are greatly appreciated, as one can imagine, this is hard to
reproduce.

The code looks something like this:

def warp_tile(f_in, f_out, warp_opts):
    gdal_warp_opts = gdal.WarpOptions(**warp_opts,
creationOptions=["TILED=YES", "COMPRESS=DEFLATE"])
    try:
        warp_ds = gdal.Warp(f_out, f_in, options=gdal_warp_opts)
        warp_ds.FlushCache()
    finally:
        warp_ds = None


with ThreadPoolExecutor(max_workers=max_workers) as executor:

    job_d = {}
    for job in jobs:
        job_d[executor.submit(warp_tile, job.in_f, job.out_f,
job.warp_opts)] = out_f

    for future in as_completed(job_d):
        out_f = job_d[future]
        try:
            future.result()
        except Exception as e:
            ...
        else:

boto3.resource('s3').Bucket(bucket_name).upload_file(Filename=out_f,
Key=key)

Thanks,
Patrick
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to