Re: PDFBox 2.0.23 MergeUtility

Andreas Lehmkuehler Mon, 29 Mar 2021 10:14:49 -0700

Hi,

Am 29.03.21 um 03:17 schrieb viraf:

I am using PDFBox 2.0.23 to merge a large number of single page searchable PDF 
files.  As these files are stored in the cloud, and to make a copy of 
PDFMergerUtility::optimizedMergeDocuments passing in sourceObject other than a 
File or InputStream.
In support of cloud environments, I am requesting an enhancement to PDFBox 
allowing one to pass in an object that implements an interface such as the 
Resource in the SpringFramework.

I'm afraid that won't happen as it would add one or more SPring jar asdependency just to support a Resource.

In merging a large number of files, and frequently get OOM.  An examination in 
VisualVM indicates a large number of ScratchFile objects.
Looking for suggestions on how best to merge a large number (say 100K) of 
searchable PDF files generated during OCR (i.e. image + text).

It is possible to reduce the usage of ScratchFile object by using the mainmemory instead. Have a look at org.apache.pdfbox.io.MemoryUsageSetting forfurther details.Maybe it is a good idea to close the files from time to time and not to waituntil all a merged together.

If you are able to experiment a little you might wanna use the upcoming newmajor release 3.0.0. A first release candidate will be available in a few days.It provides an on demand parser which doesn't use ScratchFiles for readinganymore, those are limited to writing.


Andreas

Thanks
- viraf



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: PDFBox 2.0.23 MergeUtility

Reply via email to