Am 16.05.2019 um 22:29 schrieb Christopher Schultz:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

All,

A simple tweak to the getFullUnicodeFont method to cache the loaded
font made a huge difference. The resulting file is now only 20% of the
original size when not embedding the same font over and over again.

Just so I have things sorted in my own mind: each font used will still
show on each page where it's used, right?


Yes!


In the "smaller" file, I can
still see the font mentioned on more than one page, but it's got the
same "CID" and the same font name ("AAAROV+ArialUnicodeMS" -- no more
"AAA???+ArialUnicodeMS" coming up multiple times with slightly
different names).


Yes, that and the object number (e.g. "[29 0 R]").



Of course, I'm also seeing the Type1 fonts show up repeated on
multiple pages as well -- that's normal, right?

Yes. These are still the same object.




Thanks,
- -chris

On 5/16/19 16:06, Christopher Schultz wrote:
Tilman,

On 5/16/19 12:17, Tilman Hausherr wrote:
PDFDebugger.
Look at the resources. If the same font occurs several times,
then you did something wrong. It should occur only once in a
document.
Okay, it looks like it is indeed showing multiple times. Here's
what I can see in the document:

Page 1 Contents MediaBox Parent Resources (1) [8 0 R] Font (12)
[15 0 R]
F1 (6) [19 0 R] /T:Font /S:Type0  (AAAGXI+ArialUnicodeMS) F10 (4)
[28 0 R] /T:Font /S:Type1 (Times-Italic) F11 (6) [29 0 R] /T:Font
/S:Type0 (AAABJI+ArialUnicodeMS) (9 more listed: 3 total type 1
fonts, 9 total type 0 fonts including those above) The font
AAA???I+ArialUnicodeMS shows up for all of the "type 0" entries .

Page 2 [...] Resources Font (3)
F1 (4) [20 0 R] /T:Font /S:Type1 (Times-Roman) F2 (6) [31 0 R]
/T:Font /S:Type0 (AAAYGI+ArialUnicodeMS) F3 (4) [28 0 R] /T:Font
/S:Type1 (Times-Italic)

Page 3 [...] Resources Font (2)
F1 (4) [20 0 R] /T:Font /S:Type1 (Times-Roman) F2 (4) [28 0 R]
/T:Font /S:Type1 (Times-Italic)

Page 4 [...] Resources Font (2)
F1 (4) [20 0 R] /T:Font /S:Type1 (Times-Roman) F2 (4) [28 0 R]
/T:Font /S:Type1 (Times-Italic)

So perhaps I am even using the built-in fonts incorrectly if they
are being mentioned on every page. Or is each page which uses a
font expected to have its own Font entry in the resources?

Does this mean I am "adding" the font too many times somehow?

My code looks like this:

private void writeWrappedText(PDFont font, int fontSize, String
text, Color color) throws IOException { int paragraphWidth = 500;
boolean indented = false;

String strippedText = sanitizeString(text); int start = 0; int end
= 0; int wrappedLineCnt = 1;

if(!isAnsiEncoding(strippedText)) { if(logger.isDebugEnabled())
logger.debug("Text contains non-ansi characters: " + text);

font = getFullUnicodeFont(); }

for ( int i : getPossibleWrapPoints(strippedText) ) { float width
= font.getStringWidth(strippedText.substring(start,i)) / 1000 *
fontSize; if ( start < end && width > paragraphWidth ) { if
(wrappedLineCnt == 1) setOffsetX(getOffsetXforMargin());
printSanitizedLine(font, fontSize,
strippedText.substring(start,end), indented ? _pageIndent : 0,
color); wrappedLineCnt++; start = end; } end = i; } if
(wrappedLineCnt == 1) setOffsetX(getOffsetXforMargin()); // Last
piece of text printSanitizedLine(font, fontSize,
strippedText.substring(start), indented ? _pageIndent : 0, color);
}

The getFullUnicodeFont method is:

private PDFont getFullUnicodeFont() { if(null == _doc) throw new
IllegalStateException("Document has not yet been created; cannot
load a new font");

InputStream in = null; try { String fullUnicodeFontFile =
"/resources/fonts/ARIALUNI.TTF" ; in =
getClass().getResourceAsStream(fullUnicodeFontFile); if(null ==
in) throw new MissingResourceException("Cannot load font file " +
fullUnicodeFontFile, this.getClass().getName(),
fullUnicodeFontFile);

PDFont font = PDType0Font.load(_doc, in);

return font; } catch (IOException ioe) { throw new
RuntimeException("Cannot load font", ioe); }

}

Re-reading that code, it's obvious that I should be storing the
font once loaded and re-using it. I'm guessing that
PDType0Font.load(PDDocument,InputStream) doesn't recognize that
the font has already been loaded and just adds it a second (or
third, etc.) time. Can anyone confirm that?


Yes!


Tilman



I know that my code isn't the best in terms of only choosing to
render certain glyphs in this "full" font. I am working to improve
that, and I know there is example code for choosing the "best" font
for each character in a string, which I'll be reviewing
separately.

Thanks, -chris

Am 16.05.2019 um 18:09 schrieb Christopher Schultz: All,
We have a process that generates PDF documents usually using the
  default Type-1 built-in fonts, so the documents do not embed
the font information.
We recently added the ability for the documents to include font
information if certain glyphs were not available in the default
font(s) and, as expected, the file sizes end up being bigger
when that happens.
What is the best tool to look at a particular document to see
why it ended up being so large? I'm not sure I can visually tell
by looking at the document which character triggered the
inclusion of the font, and then why that font was used for what I
can only assume was a lot of text. By inspecting the file, I'm
sure I can improve my code so that we have fewer uses of this
additional font and therefore keep the file sizes to a minimum.
Thanks, -chris
--------------------------------------------------------------------
- -

To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------


To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlzdyDoACgkQHPApP6U8
pFi+/g//Vpbt1+fgt2MGCoVgXXJKKfFGRa6rd1M+/V7klJGlgWPFBiF5GVxrlYTi
uXvUQx6/3eqSc59/EWoECprP7HcAiVKnr4ji6x5weylb053TYGydQu5vSzzFeDRs
/RWu/2hiIv1vPhdIidFDNwzwnz0f1ZjCCMIgLikJw4ezsr6DLrWpt/tfLy6J889s
x05ep3yxljFhTsyELwDACVDLUzqEovSYOfjczDq4kZc99OLxp6hz37w1bo0xo3DH
PzNIKJiUvByT36hs2sEUgpKuPOBzy4n8JeOXVY9YzDBNlCv/DpKv9ecVk9VfOCFb
9Du7wBUBvGbCmbEDlKbHqBeYWmtl++ors1cT8helGx8djtWFBiV59Jauh5OA/qzZ
mRDCQK08uuLZDQ6F7pelwlnleIIrJdz5ccSK5JuTUTcKXZt+Hpk/lKB58lBiySgF
vl7WVFHncuQT1VxbLbjqKlO8ehoyt7DiMzKCl/hpwEiLlSlD3pX0pwstkGV8MlyQ
VvtUh5Crw6lVPjjI/g8ReldzVstzV1C7U+VexRbPYy/eCrK0RavQJWTrKe7SMt4j
wognlbSi+r8AEXXupiudzF4uyqbJo6frFFacKktqqz6Vi81qFPIIIrIJcXC7vTbf
7T65KAOIgDWGECqSPzW57Ql5y3a/UefMUagQDCHUQk8hY7q7bCs=
=m3yA
-----END PGP SIGNATURE-----



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to