Hi Jeremy,

I’ve definitely identified that it’s the mask generation that takes more time 
and not the jpeg compression. If I force the mask in a .lzw COG the time goes 
from ~2.5 minutes to a couple hours, and if I just generate a 3-band jpeg with 
no mask, it similarly only takes about 3 minutes and exits cleanly and quickly. 
So any format with an alpha layer or three bands works great, but any format 
with a mask seems to choke, at least at the size that I’m working.

Thanks for sharing your pipeline. I like it! You only use the default quality 
though? I’ve found that I can generally perceive artifacts at around 85% and 
more like 90% if I look hard or it’s the right kind of imagery. We try to save 
as much detail as is reasonable since we’re generating imagery that fits into 
classification and mapping processes and working on machine learning workflows.

My LZW COGs are around 1-2GB, and the JPEG COGs are about 200-300MB. But I am 
producing data for areas easily 10x this size, so I worry what that means if we 
stay with the JPEG pipeline. I generated some WebP images a few years ago but 
hadn’t tried with COGs yet because (1) it’s incompatible with 
ArcMap/GlobalMapper (used by our org.) and (2) we get resistance with any file 
format that’s not old enough to vote. But the alpha layer support of WebP and 
the internal-mask-taking-just-shy-of-forever issue with JPEG might be enough to 
convince them. I’ll raise the issue but I’m guessing it won’t be an option in 
the near term. There’s a lot of momentum building for cloud-based service 
though, so I could be wrong.

I just modified my command to make webp at the same quality setting and it 
looks great in QGIS, and shrinks my test COG from 287MB to 195MB, but ArcMap 
hates it and so does GlobalMapper. Unfortunately as far as I can tell the only 
one that all three of them like is the LZW COGs but those are huge. I’m working 
with GlobalMapper on COGs right now, and I’ll see if I can get the ear of our 
people who talk with ESRI.



From: Jeremy Palmer <palmer...@gmail.com>
Sent: Wednesday, April 22, 2020 12:22 AM
To: Ritchie, Andrew C <aritc...@usgs.gov>
Cc: Even Rouault <even.roua...@spatialys.com>; gdal-dev@lists.osgeo.org
Subject: [EXTERNAL] Re: [gdal-dev] gdal_translate (3.1.0dev) "never" finishes 
on large jpeg cogs... REALLLLLY long time to unload.

Hi Andy,

On Wed, Apr 22, 2020 at 8:33 AM Ritchie, Andrew C 
<aritc...@usgs.gov<mailto:aritc...@usgs.gov>> wrote:

Sorry I should’ve run more tests to clarify the situation re BIGTIFFs. It looks 
like gdal_translate honors -co BIGTIFF=NO for the raster but not the mask.

What's the output size of your COG when it successful completes?


Incidentally, when I kill the process with ctrl-C (on a windoze machine) GDAL 
fails to exit gracefully (2 of 2 times this run) with the following as the 
final debug message

GDAL: Flushing dirty blocks: 0GTIFF: Waiting for worker job to finish handling 
block 0

In my experience, the progress reporting in GDAL is not very good and can spend 
a lot of time in the flushing dirty blocks process. It might be that you can't 
interrupt GDAL at this point. I would wait a little longer. Even will be able 
to comment further on this.

My cmd:
gdal_translate <infile.tif> <outfile.tif> -b 1 -b 2 -b 3 -mask 4 -of cog -co 
COMPRESS=LZW -co PREDICTOR=2 -co NUM_THREADS=ALL_CPUs -co RESAMPLING=AVERAGE 
-co BIGTIFF=NO –config GDAL_TIF_OVR_BLOCKSIZE 128 –debug ON

Seems ok to me. For our processing of aerial RGB photos COGs, when we are 
interested in web mapping use and a good balance between storage size and 
quality, we go for something like:

gdalbuildvrt \
  -addalpha -hidenodata \
  $PWD/$TIF_FOLDER.vrt \
  $PWD/$TIF_FOLDER/*.tif

gdal_translate \
  -of COG \
  -co COMPRESS=WebP \
  -co NUM_THREADS=ALL_CPUS \
  -co BIGTIFF=YES \
  -co TILING_SCHEME=GoogleMapsCompatible \
  --config BIGTIFF_OVERVIEW YES \
  -co ALIGNED_LEVELS=3 \
  -co ADD_ALPHA=YES \
  -co BLOCKSIZE=512 \
  -co RESAMPLING=CUBIC \
  $PWD/$TIF_FOLDER.vrt $PWD/$TIF_FOLDER.webp.google.aligned.cog.tif


Jeremy – to clarify, I have confirmed that if I wait long enough, the COG will 
finish, so generating in the background is feasible if slow. I was just 
surprised that including a transparency mask increases the processing time so 
much. It’s necessary to reduce the file size using jpeg or webp compression and 
still provide transparency I guess, but it’s a huge performance penalty to pay. 
I don’t have enough programming experience (or time) to do profiling and figure 
out what the bottleneck is, and don’t get me wrong – I ❤ gdal x 10^10, but I 
thought this was worth mentioning because of the increase in time (which is so 
long I initially thought it was actually a hang).

First, I would consider using WebP if you think your users can handle that. 
It's way better than JPEG+Mask. Note I'm surprised that adding the mask to the 
tiff is adding heaps of additional time. Can you generate your dataset with and 
without the mask to see the time difference? As mentioned before, most of the 
processing time is taken up in the overview generation (especially when 
compared to the data compression stage, which can use all of your CPU cores). 
Hopefully, some upcoming GDAL improvements can improve this situation.


As far as the steps to generate a COG – I output tiled tiffs, then create a 
VRT, then create a RGBA LZW cog, preview, and generate a JPEG COG. I only added 
the RGBA LZW cog because of the issues I was having generating the JPG cog – 
it’s actually a good point to delete the tiles in my workflow because I can go 
back to the LZW cog again and again if I need to since it’s lossless.

What was the issue you were having with JPEG compression? Just time to process? 
I would try the above command to see if that gives a good result (remove 
warping to GoogleMap projection if you don't need that as that adds a lot to 
processing times)

Cheers,
Jeremy
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to