I was not suggesting adding Spring to PDFBox.  I was suggesting that we add an 
interface like Resource and allow developers to provide the implementation for 
getInputStream.  This way we can support File, InputStream and various cloud 
environments with little to no impact to PDFBox.  Furthermore the PDFBox code 
would be cleaner as it would not need checks on the type of source.
Could you expand on "Maybe it is a good idea to close the files from time to 
time and not to wait
until all a merged together.".  Each source (input input stream) being merged 
is closed once merged.  The consolidated PDF is not closed until all files are 
merged.  Are you suggesting to close the consolidated PDF ?  If so, is there a 
way to reopen it for merging ?  
Happy to try the new release.
Thanks.
- viraf


    On Monday, March 29, 2021, 01:14:49 PM EDT, Andreas Lehmkuehler 
<[email protected]> wrote:  
 
 Hi,

Am 29.03.21 um 03:17 schrieb viraf:
> I am using PDFBox 2.0.23 to merge a large number of single page searchable 
> PDF files.  As these files are stored in the cloud, and to make a copy of 
> PDFMergerUtility::optimizedMergeDocuments passing in sourceObject other than 
> a File or InputStream.
> In support of cloud environments, I am requesting an enhancement to PDFBox 
> allowing one to pass in an object that implements an interface such as the 
> Resource in the SpringFramework.
I'm afraid that won't happen as it would add one or more SPring jar as 
dependency just to support a Resource.

> In merging a large number of files, and frequently get OOM.  An examination 
> in VisualVM indicates a large number of ScratchFile objects.
> Looking for suggestions on how best to merge a large number (say 100K) of 
> searchable PDF files generated during OCR (i.e. image + text).
It is possible to reduce the usage of ScratchFile object by using the main 
memory instead. Have a look at org.apache.pdfbox.io.MemoryUsageSetting for 
further details.
Maybe it is a good idea to close the files from time to time and not to wait 
until all a merged together.

If you are able to experiment a little you might wanna use the upcoming new 
major release 3.0.0. A first release candidate will be available in a few days.
It provides an on demand parser which doesn't use ScratchFiles for reading 
anymore, those are limited to writing.

Andreas

> Thanks
> - viraf
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

  

Reply via email to