(This was started on the users list, but I am switching over to the dev list.)

I found the issue.  I have a bunch of small pages.  The COSDocument keeps a 
list of the streams that have been created.  The problem is that the 
currentPage in the ScratchFileBuffer is always in memory.  If there are 40,000 
pages, then this will add up to 40,000 * the page size (4096) which is over 
160,000,000.

So, now I am not sure how to deal with this.  Each page has a 
PDFPageContentStream, which creates a ScratchFileBuffer.
This ScratchFileBuffer is kept in the list of streams.  I could recompile with 
a smaller page size, but that will only cut the problem by a percentage.  Does 
anyone think it may be possible to change this to not maintain the list of 
streams?  Or maybe clear the currentPage byte array for the items in the list?

I am willing to do some work on this, but a little guidance (or realism) would 
be helpful before I get too deep into this.

Thanks,

Mark Claassen
Senior Software Engineer

Donnell Systems, Inc.
130 South Main Street
Leighton Plaza Suite 375
South Bend, IN  46601
E-mail: mailto:[email protected]
Voice: (574)232-3784
Fax: (574)232-4014

Disclaimer:
The opinions provided herein do not necessarily state or reflect 
those of Donnell Systems, Inc.(DSI). DSI makes no warranty for and 
assumes no legal liability or responsibility for the posting. 
-----Original Message-----
From: Mark A. Claassen <[email protected]> 
Sent: Wednesday, June 9, 2021 4:53 PM
To: [email protected]
Subject: [Possible Spam] RE: PDF Memory issue
Importance: Low

In looking at this further, it seems that the ScratchFileBuffer.close method is 
only called when the document is closed.  ScratchFileBuffer.clear is never 
called.  

These are the only places where the pageHandler.markPagesAsFree is called.  I 
believe this is the issue, since markPagesAsFree is never called, this content 
just keeps building up until the document is closed.

Any guidance would be greatly appreciated.  I can't seem to find a 
configuration work around for this issue. 

Mark Claassen
Senior Software Engineer

Donnell Systems, Inc.
130 South Main Street
Leighton Plaza Suite 375
South Bend, IN  46601
E-mail: mailto:[email protected]
Voice: (574)232-3784
Fax: (574)232-4014

Disclaimer:
The opinions provided herein do not necessarily state or reflect those of 
Donnell Systems, Inc.(DSI). DSI makes no warranty for and assumes no legal 
liability or responsibility for the posting. 


-----Original Message-----
From: Mark A. Claassen <[email protected]>
Sent: Wednesday, June 9, 2021 1:39 PM
To: [email protected]
Subject: [Possible Spam] PDF Memory issue
Importance: Low

Hi.  Thanks for your time.

I am using PDF box and am having trouble creating large PDFS (50,000+ pages).  
The heap size of the process is capped, but with the temp file active (which I 
can see being created) I didn't think this would matter.

Here is what I am doing in a very condensed form:
        MEMORY_SETTING = MemoryUsageSetting.setupTempFileOnly();
        PDDocument pdf = new PDDocument(MEMORY_SETTING);
        
        for (...) {
                String text = [generate page text]
                PDPage page = new PDPage(PDRectangle.LETTER);
                try (PDPageContentStream contentStream = new 
PDPageContentStream(doc, page, PDPageContentStream.AppendMode.OVERWRITE, 
false)) {
                        
                        contentStream.endText();
                        doc.addPage(page);
        }

When I do a heap dump, I see over 100 MG of memory taken by 42,000 instances of 
ScratchFileBuffer.currentPage

Is there something I am going wrong here?  Or is this a bug?  It seems like I 
must be doing something wrong / forgetting to do something, since this is a 
problem in 2 and 3-RC1.

Thanks again,

Mark Claassen
Senior Software Engineer

Donnell Systems, Inc.
130 South Main Street
Leighton Plaza Suite 375
South Bend, IN  46601
E-mail: mailto:[email protected]
Voice: (574)232-3784
Fax: (574)232-4014

Disclaimer:
The opinions provided herein do not necessarily state or reflect those of 
Donnell Systems, Inc.(DSI). DSI makes no warranty for and assumes no legal 
liability or responsibility for the posting. 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to