Maybe the PDF is somehow broken and PDFBox repairs it. If the problem goes away by just opening and saving the PDF, then why modify it?

I've never heard of fonts being a problem... rather patterns, big images or very complex vector graphics.

Tilman

Am 06.05.2021 um 14:16 schrieb jorgeeflorez:
Hi Tilman,
thank you for your reply.

It's more complicated because form XObjects, patterns, annotations,
softmasks (and maybe more) can also have fonts. I also doubt that you
can detect CK fonts this way.

I see... I have a nasty pdf file that is causing OutOfMemoryError when used
by another library and I reached the conclusion that it is (somehow)
because the text and the fonts it uses...

I saw the RemoveAllText example and maybe is what I need. I modified it and
instead of removing text I did nothing, and the new pdf file seems to have
the "corruption" removed...

One last question, how could I modify the RemoveAllText example to remove
from the pdf file all images?

Thanks.

Jorge



El jue, 6 may 2021 a las 1:07, Tilman Hausherr (<[email protected]>)
escribió:

Am 05.05.2021 um 18:39 schrieb jorgeeflorez:
Hi,
I would like to know what would be the best way to detect whether ia pdf
file has CID fonts. As far as I understand, these fonts are used in asian
texts (japanese, chinese, korean, etc). I have the following code:

          PDDocument doc = PDDocument.load(myFile);
          for (int i = 0; i < doc.getNumberOfPages(); ++i)
          {
              PDPage page = doc.getPage(i);
              PDResources res = page.getResources();
              for (COSName fontName : res.getFontNames())
              {
                  PDFont font = res.getFont(fontName);
                  COSName subType =
font.getCOSObject().getCOSName(COSName.SUBTYPE);
                  System.out.println("CID? " +
COSName.TYPE0.equals(subType));
                  System.out.println("font instanceof PDType0Font? " +
(font
instanceof PDType0Font));
              }
          }
Would this be the right way to do it?

It's more complicated because form XObjects, patterns, annotations,
softmasks (and maybe more) can also have fonts. I also doubt that you
can detect CK fonts this way.

Re removing the text, see the RemoveAllTexts example in the source code
download. IIRC this one only does the page content stream.

Tilman


I need to detect this and try to create a pdf file from the original, but
without the text.

Any indication is appreciated.

Regards,

Jorge


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to