(This was started on the users list, but I am switching over to the dev list.)
I found the issue. I have a bunch of small pages. The COSDocument keeps a list of the streams that have been created. The problem is that the currentPage in the ScratchFileBuffer is always in memory. If there are 40,000 pages, then this will add up to 40,000 * the page size (4096) which is over 160,000,000. So, now I am not sure how to deal with this. Each page has a PDFPageContentStream, which creates a ScratchFileBuffer. This ScratchFileBuffer is kept in the list of streams. I could recompile with a smaller page size, but that will only cut the problem by a percentage. Does anyone think it may be possible to change this to not maintain the list of streams? Or maybe clear the currentPage byte array for the items in the list? I am willing to do some work on this, but a little guidance (or realism) would be helpful before I get too deep into this. Thanks, Mark Claassen Senior Software Engineer Donnell Systems, Inc. 130 South Main Street Leighton Plaza Suite 375 South Bend, IN 46601 E-mail: mailto:[email protected] Voice: (574)232-3784 Fax: (574)232-4014 Disclaimer: The opinions provided herein do not necessarily state or reflect those of Donnell Systems, Inc.(DSI). DSI makes no warranty for and assumes no legal liability or responsibility for the posting. -----Original Message----- From: Mark A. Claassen <[email protected]> Sent: Wednesday, June 9, 2021 4:53 PM To: [email protected] Subject: [Possible Spam] RE: PDF Memory issue Importance: Low In looking at this further, it seems that the ScratchFileBuffer.close method is only called when the document is closed. ScratchFileBuffer.clear is never called. These are the only places where the pageHandler.markPagesAsFree is called. I believe this is the issue, since markPagesAsFree is never called, this content just keeps building up until the document is closed. Any guidance would be greatly appreciated. I can't seem to find a configuration work around for this issue. Mark Claassen Senior Software Engineer Donnell Systems, Inc. 130 South Main Street Leighton Plaza Suite 375 South Bend, IN 46601 E-mail: mailto:[email protected] Voice: (574)232-3784 Fax: (574)232-4014 Disclaimer: The opinions provided herein do not necessarily state or reflect those of Donnell Systems, Inc.(DSI). DSI makes no warranty for and assumes no legal liability or responsibility for the posting. -----Original Message----- From: Mark A. Claassen <[email protected]> Sent: Wednesday, June 9, 2021 1:39 PM To: [email protected] Subject: [Possible Spam] PDF Memory issue Importance: Low Hi. Thanks for your time. I am using PDF box and am having trouble creating large PDFS (50,000+ pages). The heap size of the process is capped, but with the temp file active (which I can see being created) I didn't think this would matter. Here is what I am doing in a very condensed form: MEMORY_SETTING = MemoryUsageSetting.setupTempFileOnly(); PDDocument pdf = new PDDocument(MEMORY_SETTING); for (...) { String text = [generate page text] PDPage page = new PDPage(PDRectangle.LETTER); try (PDPageContentStream contentStream = new PDPageContentStream(doc, page, PDPageContentStream.AppendMode.OVERWRITE, false)) { contentStream.endText(); doc.addPage(page); } When I do a heap dump, I see over 100 MG of memory taken by 42,000 instances of ScratchFileBuffer.currentPage Is there something I am going wrong here? Or is this a bug? It seems like I must be doing something wrong / forgetting to do something, since this is a problem in 2 and 3-RC1. Thanks again, Mark Claassen Senior Software Engineer Donnell Systems, Inc. 130 South Main Street Leighton Plaza Suite 375 South Bend, IN 46601 E-mail: mailto:[email protected] Voice: (574)232-3784 Fax: (574)232-4014 Disclaimer: The opinions provided herein do not necessarily state or reflect those of Donnell Systems, Inc.(DSI). DSI makes no warranty for and assumes no legal liability or responsibility for the posting. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

