Am 15.11.2015 um 17:08 schrieb Sridhar So:
Dear PDFBox team

Thanks for response to my earlier query on performance issue. The work around 
suggested was to use the ctor PDFPageable(pdDocument, Orientation.AUTO, true, 
300) while setting PrinterJob pjob instead of PDFPageable(pdDocument), which is 
supposed to force the java.awt.print.PrinterJob pjob.print() to use 
rasterization to render image which is supoosed to be faster.

  //pjob.setPageable(new PDFPageable(pdDocument));    // In my test this caused 
performance issue ( 40 to 80 secs )
pjob.setPageable( new PDFPageable(pdDocument, Orientation.AUTO, true, 0) );  
//In my test this is about 3 times faster but still 18 to 30 sec.

but you had dpi = 0 above. Except for the page border, this is the same as the 1-parameter constructor.


Below are my observations in unit test
Compared to 1 ( 40 to 80 sec) above, 2 was faster, but still it took 18 to 28 
secs in multiple rounds of test and 18 to 30 sec is also slow.
If I used 300 dpi ( last argument to PDFPageable ctor, then the output file ( 
redirected to MS XPS file ) quality was not good.

But was it faster at 300 dpi?

With 600 dpi,  awt.print.Printerjob pjob.print() throws out of memory exception 
as below

Exception in thread "Thread-3" java.lang.OutOfMemoryError: Java heap space
        at java.awt.image.DataBufferInt.<init>(Unknown Source)
        at java.awt.image.Raster.createPackedRaster(Unknown Source)
        at 
java.awt.image.DirectColorModel.createCompatibleWritableRaster(Unknown Source)
        at java.awt.image.BufferedImage.<init>(Unknown Source)
        at org.apache.pdfbox.printing.PDFPrintable.print(PDFPrintable.java:169)
        at sun.print.RasterPrinterJob.printPage(Unknown Source)
        at sun.print.RasterPrinterJob.print(Unknown Source)
        at GPDFBox2AppletClass.run(GPDFBox2AppletClass.java:160)
        at java.lang.Thread.run(Unknown Source)
Nov 15, 2015 8:15:58 PM org.apache.pdfbox.cos.COSDocument finalize
WARNING: Warning: You did not close a PDF Document

Then you'll need more memory. And don't forget to close your documents.


Input PDF file size is 118KB, whereas the output PDF generated by PDFBox 2.0.0 
latest SNAPSHOT build ( redirecting to MS XPS printer ) is about 1 MB
There was no alignment and font issue with 2.0 but slower
OutPut PDF generated by PDFBox was about 500 KB much less in size with PDFBox 
1.8.10 and performance was better ( 12 secs in 1.8.10)
PDFBox 1.8.10 is faster but fonts are lighter ( it used default fonts and has 
thrown exception )

Please clarift the following:
Is 300  or non zero dpi mandatory, in  pjob.setPageable( new 
PDFPageable(pdDocument, Orientation.AUTO, true, 300) ) to force awt to use 
rasterizing.

Yes, non zero is mandatory to force rasterizing (this is mentioned in the javadoc).


I saw in 2.0.0-RC1 code the other ctors of PDFPageable(pdDocument) using 
default values for other 3 arguments and what made pjob.setPageable( new 
PDFPageable(pdDocument, Orientation.AUTO, true, 0) ) faster, compared to 
pjob.setPageable(new PDFPageable(pdDocument)) which in turn internally uses 
default 0 dpi as per RC1 source code. Am i missing something. Whether ctor 
implementations are different in SNAPSHOT code post 2.0.0-RC1?

The best is you just download the source code.. the API wasn't changed recently, but something related to rotation was recently changed.

Why pjob.setPageable( new PDFPageable(pdDocument, Orientation.AUTO, true, 600) 
) causes PrinterJob.print() to throw out of memroy exception?  Is there any way 
to increase the dpi without increasing the memory footprint?

No. A 600dpi raster image is 4x as large as a 300dpi raster image. So the memory has to come from somewhere.

Why even with pjob.setPageable( new PDFPageable(pdDocument, Orientation.AUTO, 
true, 0 ) ), the print is slow 18 to 28 sec and with pjob.setPageable( new 
PDFPageable(pdDocument, Orientation.AUTO, true, 0 ) ), there is only marginal 
improvement in performance of about 15 to 18 sec.
Why the printable image or PDFBox output PDF file size increases to 1MB, while 
the original input PDF file is only 115 KB?

Because PDF is a more efficient format for vector graphics than a raster image. You could as well say that a .TXT file of Hamlet's "To be or not to be" monologue has a size of less than 2 KB. But a screenshot of its display will be much larger.

PDFBox 1.8.10 was able to generate smaller PDF file and performance is better ( 
is it at the cost of fonts as fonts are lighter? ).  Will smaller initial size 
of PDF to PrinterJob.print() will make it to print faster?

No, it depends of the contents.

Will seting java property pdfbox.fontcache to a folder with write permission, 
will remove admin privelage requirement to run PDFBox 2.0.0 to build font cache 
first time and store in disk subsequent time

Yes

Is there a way to improve or roadmap to improve performace in 2.0.0, as 2.0.0 
has lot of fixes and improvements over 1.8.10, but 2.0 performance is main 
blocker for many of users.
Thanks a lot in advance for PDFBox team

No there is no "roadmap". We try to do our best. The slow printing with some files is a known problem, see the issue I mentioned earlier.

We may find improvements by looking at the file. But there is no guuarantee.

Tilman



Regards
Sridhar Sowmiyanarayanan
Tata Consultancy Services
Website: http://www.tcs.com
____________________________________________
Experience certainty.   IT Services
Business Solutions
Consulting
____________________________________________


-----Sridhar So/BLR/TCS wrote: -----
To: [email protected]
From: Sridhar So/BLR/TCS
Date: 11/14/2015 07:01PM
Subject: Re: Performance Issue with 2.0.0 SNAPSHOT latest builds

Dear PDFBox team

Thanks for your response to below query.

I set the Java property value pdfbox.fontcache to JDK lib directory where fonts 
are ther  ( D:\\Softwares\\Java\\jre7\\lib\\fonts ).
I set the java property within the code before PDFBox API calls using
System.setProperty("pdfbox.fontcache", "D:\\Softwares\\Java\\jre7\\lib\\fonts") 
;
First time execution it took 140sec and created the cache file .pdfbox in the 
given directory
Subsequent time it took same 24 to 60 secs and no improvement in performance.
The difference noticed was, earlier  .pdfbox file was created in user.home 
directory, now given directory but on performance side no noticable difference.

My OS is MS Windows 7 home basic, since the user id has admin privilage, did 
not have admin issue.
In another machine/work machine with admin user id/prvilage, PDFBox could 
create font cache, but the performance was slow ( 30 sec + )

We use only Ariel and Time New Roman. ÿ
If the default font is chosen as Ariel in 1.8.10 code, that should be fine as 
1.8.10 has performance, but fonts are lighter.
Alternatively if 2.0.0 ÿis optimized for performance, that will be great.ÿ

FYI
PDF file ( redireting the print to MS XPS Printer type file ), the size of PDF 
file generated in 1.8.10 is about 151 KB, whereas the size of PDF file created 
using 2.0 is 1152 KB ( about 10 times larger ).ÿ
It job.print() which takes lot of time. ÿTime profile output below

System.getproperty of pdfbox.fontcache = D:\Softwares\Java\jre7\lib\fonts
System.getproperty of user.home = C:\Users\Rangarajan
Nov 14, 2015 6:58:22 PM org.apache.pdfbox.cos.COSDocument finalize
WARNING: Warning: You did not close a PDF Document
ÿPDDocument load time = 30ms PrinterJob creation time = 10ms job.setPageable ( 
new PDFPageable(pdDocument) time = 0ms job.print(); Printing Time = ÿ58129ms 
Total time = 58.169 secondsÿ



Regards
Sridhar Sowmiyanarayanan
Tata Consultancy Services
Website: http://www.tcs.com
____________________________________________
Experience certainty.   IT Services
Business Solutions
Consulting
____________________________________________


-----Sridhar So/BLR/TCS wrote: -----
To: [email protected]
From: Sridhar So/BLR/TCS
Date: 11/14/2015 01:36AM
Cc: [email protected]
Subject: Performance Issue with 2.0.0 SNAPSHOT latest builds

Subject line changed.


Sridhar Sowmiyanarayanan
Tata Consultancy Services
Website: http://www.tcs.com
____________________________________________
Experience certainty.   IT Services
Business Solutions
Consulting
____________________________________________


-----Sridhar So/BLR/TCS wrote: -----
To: [email protected]
From: Sridhar So/BLR/TCS
Date: 11/14/2015 01:32AM
Cc: [email protected]
Subject: Re: Returned post for [email protected]

Dear PDFBox Developers/Contributors


Thanks for reply, I tested with latest SNAPSHOT buildsÿ
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/preflight-app/2.0.0-SNAPSHOT/
 ------> build 1823
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.0-SNAPSHOT/
 ---------> build 1800 and 1801
ÿFont cache rebuild is not happening every time and it is able to use from local 
store, however build 1800 & 1801 requires Admin privilage.ÿ
ÿCompared to PDFBox version 1.8.10, 2.0.0 SNAPSHOT builds performance is slow ÿ 
( 1.8.10 about 13 seconds, but 2.0.0 SNAPSHOT 1800, 1801 takes 30 seconds. )

ÿMeasured the time difference between 1.8.10 and 2.0.0 SNAPSHOT builds with 
following code
ÿGetting 13 to 19 seconds in 1.8.10, whereas in 2.0.0SNAPSHOT builds 22 to 36 
secondsÿ
ÿÿ
ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ long t1 = ÿSystem.currentTimeMillis() ;
                        pdDocument = PDDocument.load(is);       
                        long t2 = ÿSystem.currentTimeMillis() ÿ;
                        PrinterJob job = PrinterJob.getPrinterJob();
                 ÿ ÿjob.setPageable(new PDFPageable(pdDocument)); ÿ// ÿVersion 
2.0.0 SNAPSHOT builds
                        //job.setPageable(new PDPageable(pdDocument)); // 
ÿVersion 1.8.10ÿ
                 ÿ ÿjob.print();        
                        long t3 = ÿSystem.currentTimeMillis() ÿ;
                        //printWithPaper(pdDocument, "A4") ;
                        System.out.println ( " PDDocument load time = " ÿ+ÿ
                                String.valueOf(t2 - t1) + " Printing Time = ÿ" 
+ String.valueOf(t3-t2 ) +
                                " Total time = " + String.valueOf( (t3 - t1)/1000.0 ) + 
" seconds " ÿ) ;
ÿ


Do we haveÿ
Performance fix available in 2.0.0 SNAPSHOTS, if so please give full path and 
which build?
Is there a fix available where ÿAdmin privelage is not required?
Thanks a lot for your reply in advance.

FYI
----ÿ
Alignment issue is still there in 2.0.0 and my attempt to adjust alignment 
using below code takes lot of time.ÿ

PageFormat pageFormat = new PageFormat();
ÿ ÿ ÿ ÿ pageFormat.setOrientation(PageFormat.PORTRAIT);
ÿ ÿ ÿ ÿ Paper paper= pageFormat.getPaper();

        if ("SLEEK".equalsIgnoreCase(receiptType)) {
ÿ ÿ ÿ ÿ ÿ ÿpaperWidth = 3.14;
ÿ ÿ ÿ ÿ ÿ ÿ paperHeight = 50;
ÿ ÿ ÿ ÿ } else if ("LETTER".equalsIgnoreCase(receiptType)) {
ÿ ÿ ÿ ÿ ÿ ÿ paperWidth = 8.5;
ÿ ÿ ÿ ÿ ÿ ÿ paperHeight = 11;
ÿ ÿ ÿ ÿ } else if ("LEGAL".equalsIgnoreCase(receiptType)) {
ÿ ÿ ÿ ÿ ÿ ÿ paperWidth = 8.5;
ÿ ÿ ÿ ÿ ÿ ÿ paperHeight = 14;
ÿ ÿ ÿ ÿ }
ÿ ÿ ÿ ÿ ÿ else
ÿ ÿ ÿ ÿ {
ÿ ÿ ÿ ÿ ÿ ÿ paperWidth = 8.3;
ÿ ÿ ÿ ÿ ÿ ÿ paperHeight = 11.7;
ÿ ÿ ÿ ÿ }

ÿ ÿ ÿ ÿ paper.setSize(paperWidth * 72.0, paperHeight * 72.0);
ÿ ÿ ÿ ÿ paper.setImageableArea(-2000, 0, paper.getWidth(), paper.getHeight());
ÿ ÿ ÿ ÿ pageFormat.setPaper(paper);


ÿ ÿ ÿ ÿ PrinterJob job = PrinterJob.getPrinterJob();ÿ
        job.setPageable(new PDFPageable(document));ÿ

ÿ ÿ ÿ ÿ Book book = new Book();ÿ
        book.append(new PDFPrintable(document), getPageFormat(receiptType) , 
document.getNumberOfPages());ÿ
        
        job.setPageable(book);ÿ
        job.print();


Regards
Sridhar Sowmiyanarayanan
Tata Consultancy Services
Website: http://www.tcs.com
____________________________________________
Experience certainty.   IT Services
Business Solutions
Consulting
____________________________________________





-

To:     [email protected]
cc:     [email protected]
Subject:        Re: Speedup Font Cache: Performance Issue in PDFBox 2.0.0-RC1

Dear PDFBox Community

Adding Commits, John and Tilman in the mail request

Details are in below mail.

In brief again

In our system, PDF files generated in server and is sent to client and in 
client applet code uses PDFBox to print.
With PDFBox 1.8.10, we have alignment issue as some characters were missing on 
left side.
With PDFBox 2.0.0-RC1, we have faced performance issue ( slow )

Do we have fix or patch availableÿ
either in 2.0.0-RC1 for performance ÿOR
alignment issue in 1.8.10
Our PDF documents uses True Type Fonts, mostly Ariel Unicode

Thanks a lot for your help and support.

Regards
Sridhar Sowmiyanarayanan
Tata Consultancy Services
Website: http://www.tcs.com
____________________________________________
Experience certainty. IT Services
Business Solutions
Consulting
____________________________________________


-----Sridhar So/BLR/TCS wrote: -----
To: [email protected]
From: Sridhar So/BLR/TCS
Date: 11/12/2015 06:23PM
Subject: Speedup Font Cache: Performance Issue in PDFBox 2.0.0-RC1

Dear PDFBox Developers/Contributors

I am unable to subscribe to users mailing list as the link tries to open 
Outlook not the page to subscribe, hence a seperate mail on similar/same issue 
discussed.

Issue:
-------ÿ
PDFBox2.0.0-RC1 is very slow in printinng ( taking 35 to 50 seconds ) ÿas it 
tries to load fonts each time with the following message

WARNING: New fonts found, font cache will be re-built
Nov 12, 2015 3:17:26 PM org.apache.pdfbox.pdmodel.font.FileSystemFontProvider 
<init>
WARNING: Building font cache, this may take a while
Nov 12, 2015 3:17:32 PM org.apache.pdfbox.pdmodel.font.FileSystemFontProvider 
saveCache
WARNING: Finished building font cache, found 522 fonts


Is the fix or patch available to avoid slow performance due to above ( 
rebuilding font cache ÿevery tme ) ?ÿ
If the fix not available in 2.0.0-RC1,ÿIs there any way to fix alignment issue 
in PDFBox 1.8.10? as 1.8.10 left margin is too low and first few characters are 
found cut in printout.

With PDFBox1.8.10, there is no performance issue, but alignment in prontout is 
not proper. ÿÿ
With PDFBox 2.0.0-RC1, we are facing performance issue.

PDFDocument used has Ariel ÿUnicode or True Type Fonts.ÿ

Similar discussion thread is pasted below, as I was unable to reply to same 
discussion thread, nor able to subscribe to users mailing list, hence a 
seperate mail.

Regards
Sridhar

Subject:        Re: Speedup Font Cache  
From:   John Hewson ([email protected])
Date:   Oct 21, 2015 5:26:41 pm
List:   org.apache.pdfbox.users

On 21 Oct 2015, at 09:43, Maruan Sahyoun <[email protected]> wrote:

Hi,

Am 21.10.2015 um 18:40 schrieb Tilman Hausherr <[email protected]>:

Am 21.10.2015 um 14:10 schrieb Roberto Nibali:
Hi John

On Wed, Oct 21, 2015 at 12:35 AM, John Hewson <[email protected]> wrote:

Yes, I&#8217;m able to replicate that issue on Windows. It&#8217;s apparently 
related
to administrator ownership of that registry key&#8217;s parent node. Looks like
it&#8217;ll be necessary to log in as admin and create that key with user 
access.
I guess that&#8217;s far from ideal?

The whole issue also happens on MacOSX. When you introduce this on-disk
cache a couple of months back, it worked fine, however one of the recent
changes to SVN must have wrecked the initially intended functionality. Not
only is the font caching setup 5-10 times as long as it used to be, it also
does not seem to persist it anymore. Version used:

$ svn info | grep -i changed
Last Changed Author: tilman
Last Changed Rev: 1709647
Last Changed Date: 2015-10-20 19:04:02 +0200 (Tue, 20 Oct 2015)

Running my test tool indicates:

Oct 21, 2015 2:08:29 PM
org.apache.pdfbox.pdmodel.font.FileSystemFontProvider loadCache
WARNING: New fonts found, font cache will be re-built
Oct 21, 2015 2:08:29 PM
org.apache.pdfbox.pdmodel.font.FileSystemFontProvider <init>
WARNING: Building font cache, this may take a while
Oct 21, 2015 2:08:39 PM
org.apache.pdfbox.pdmodel.font.FileSystemFontProvider saveCache
WARNING: Finished building font cache, found 654 fonts
[INFO, ctx=./ccalt.pdf]: Opening Source ./ccalt.pdf
[INFO, ctx=./ccalt.pdf]: Opening Template ./cctemp.pdf
[INFO, ctx=./ccalt.pdf]: Writing Output ./ccmig.pdf
[INFO, ctx=./ccalt.pdf]: Completed in 15037.02ms

This used to be anything between 1200ms and 2300ms and once it was
persisted onto disk, it was rather fast in subsequent calls. Unfortunately,
SVN does not provide the handy tool of "git bisect" to quickly find out
which change actually caused this regression.

There were only 4 changes since then, so it might be worth a try to just revert
that file.

(I can't help; for me, it has always been slow.)

Could it be that 1) you installed new stuff on your computer, 2) that MacOS has
many of its fonts in .ttc files? In Windows there are only 10.

on my OS X I have 92 ttc files (out of 384) :-)

Yep, OS X uses ttc much more heavily than Windows and some of those are big
Asian fonts which PDFBox parses relatively slowly.

&#8212; John

BR
Maruan

Tilman

Let me know if you need any further input.

Cheers
Roberto



Regards
Sridhar Sowmiyanarayanan
Tata Consultancy Services
Website:ÿhttp://www.tcs.com
____________________________________________
Experience certainty. IT Services
Business Solutions
Consulting
____________________________________________
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to