Thanks for the feedback. It turns out that there's another error (checksum was empty because MessageDigest doesn't support CRC32), which has been fixed now, please test again (delete the file first). The second-to-last field should now not be empty.

It also teaches an important lesson: a "// never happens" segment should have an output.

Tilman

On 05.12.2023 11:34, Kjetil Ødegaard wrote:
Nice! Tested it now and I can confirm that it fixes the issue. I see good
performance even from the first operation.

Checked the cache file and there is a line for this font there now:

➜  ~ grep -i NotoSansKannada .pdfbox.cache
*skipexception*|TTF||0|0|0|0|0||/System/Library/Fonts/NotoSansKannada.ttc||1700331239000

Thanks for the quick response, great work!

BR Kjetil

tir. 5. des. 2023 kl. 09:55 skrev Tilman Hausherr <[email protected]>:

Thanks, new snapshot build here:

https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.2-SNAPSHOT/


Ticket:
https://issues.apache.org/jira/browse/PDFBOX-5727

Tilman

On 05.12.2023 08:41, Kjetil Ødegaard wrote:
To clarify, this stack trace is not printed anywhere. I got it from
stepping into the code and invoking printStackTrace() on the exception to
get the whole stack. See complete stack trace below.

I agree with your theory, it matches what I'm seeing. These fonts are
never
added to the cache file, so the cache file is always rebuilt.

I double checked the cache file again and there is no trace of these two
fonts, but lots of entries for other fonts (of different weights). I see
from the timestamp on the file that it is rebuilt on every run.

BR Kjetil

java.io.EOFException
at

org.apache.fontbox.ttf.TTFDataStream.readUnsignedShort(TTFDataStream.java:154)
at

org.apache.fontbox.ttf.TTFDataStream.readUnsignedShortArray(TTFDataStream.java:188)
at

org.apache.fontbox.ttf.GlyphSubstitutionTable.readMultipleSubstitutionSubtable(GlyphSubstitutionTable.java:412)
at

org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupSubtable(GlyphSubstitutionTable.java:263)
at

org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupTable(GlyphSubstitutionTable.java:313)
at

org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupList(GlyphSubstitutionTable.java:247)
at

org.apache.fontbox.ttf.GlyphSubstitutionTable.read(GlyphSubstitutionTable.java:102)
at org.apache.fontbox.ttf.TrueTypeFont.readTable(TrueTypeFont.java:365)
at org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:165)
at org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:144)
at

org.apache.fontbox.ttf.TrueTypeCollection.getFontAtIndex(TrueTypeCollection.java:127)
at

org.apache.fontbox.ttf.TrueTypeCollection.processAllFonts(TrueTypeCollection.java:109)
at

org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.addTrueTypeCollection(FileSystemFontProvider.java:665)
at

org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.scanFonts(FileSystemFontProvider.java:396)
at

org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.<init>(FileSystemFontProvider.java:367)
at

org.apache.pdfbox.pdmodel.font.FontMapperImpl$DefaultFontProvider.<clinit>(FontMapperImpl.java:139)
at

org.apache.pdfbox.pdmodel.font.FontMapperImpl.getProvider(FontMapperImpl.java:158)
at

org.apache.pdfbox.pdmodel.font.FontMapperImpl.findFont(FontMapperImpl.java:416)
at

org.apache.pdfbox.pdmodel.font.FontMapperImpl.findFontBoxFont(FontMapperImpl.java:379)
at

org.apache.pdfbox.pdmodel.font.FontMapperImpl.getFontBoxFont(FontMapperImpl.java:353)
at
org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:127)
tir. 5. des. 2023 kl. 05:03 skrev Tilman Hausherr <[email protected]
:

Please do also post the full (for pdfbox / fontbox) stack trace. I have
a theory why it happens, which is that addTrueTypeCollection() does not
add the font as "*skipexception*" to the cache file because it's not
done in the exception handler.

Tilman

On 04.12.2023 21:17, Tilman Hausherr wrote:
Does the stack trace appear at every start? If yes then it's a bug.
The intent of the current code is that bad fonts aren't retried. The
font cache file should contain a line with "*skipexception*" for that
font. Can you look at it for the two font files?

I could change SHA512 to CRC32. It has the advantage that it won't
trigger people who heard about MD5 😂

I made a test and CRC32 is 20% faster.

Tilman

On 04.12.2023 18:48, Gili Tzabari wrote:
I think the commit contains a typo:


872
<
https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l872
      private static String computeHash(byte[] ba)
873
<
https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l873
      {
874
<
https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l874
      MessageDigest md;
875
<
https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l875
      try
876
<
https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l876
      {
877
<
https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l877
      md = MessageDigest.getInstance("SHA512");
878
<
https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l878
      byte[] md5 = md.digest(ba);
879
<
https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l879
      return Hex.getString(md5);
880
<
https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l880
      }
881
<
https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l881
      catch (NoSuchAlgorithmException ex)
882
<
https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l882
      {
883
<
https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l883
      // never happens
884
<
https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l884
      return "";
885
<
https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l885
      }
886
<
https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l886
      }

You shouldn't need to use SHA512 to detect changes by a non-malicious
actor. MD5 should be plenty, and even CRC32 would be enough. I
suggest downgrading the hash complexity.

Gili

On 2023-12-04 10:21, Kjetil Ødegaard wrote:
Hi,

I tried to upgrade an app to PDFBox 3.0.1 and I see a performance
issue.

It only affects the first PDF operation (after that it's quite
fast), but
it's a bit annoying since it takes about 20 seconds (on my M1
Macboox).
Profiling reveals that this Kotlin code triggers the delay:

       val font = PDType1Font(Standard14Fonts.FontName.COURIER)

The thread dump shows that almost all time is spent in this method:

org.apache.pdfbox.pdmodel.font.FileSystemFontProvider#computeHash

I assume that this is related to PDFBOX-5684.

Is this possible to work around? Or is it possible to fix?

BR Kjetil

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to