Re: Detecting CID fonts

Tilman Hausherr Thu, 06 May 2021 12:33:19 -0700

Am 06.05.2021 um 20:52 schrieb jorgeeflorez:

If the problem goes away by just opening and saving the PDF, then why
modify it?

I cannot share the PDF file to the support team of that library. So I was
wondering if I could share the pdf without images, so technically would not
be sharing the file, only the part that is causing them problems. But it
probably won't work...

Obviously not, if the problem goes away by saving.


I've never heard of fonts being a problem... rather patterns, big images

or very complex vector graphics.

An internal call to Arrays.copyOf (doing God knows what) takes all
available memory. Strange indeed.

You would have to see the rest of the stack trace. Also try to run thewhole thing with a bigger -Xmx value.


Tilman


El jue, 6 may 2021 a las 12:34, Tilman Hausherr (<[email protected]>)
escribió:

Maybe the PDF is somehow broken and PDFBox repairs it. If the problem
goes away by just opening and saving the PDF, then why modify it?

I've never heard of fonts being a problem... rather patterns, big images
or very complex vector graphics.

Tilman

Am 06.05.2021 um 14:16 schrieb jorgeeflorez:

Hi Tilman,
thank you for your reply.

It's more complicated because form XObjects, patterns, annotations,

softmasks (and maybe more) can also have fonts. I also doubt that you
can detect CK fonts this way.

I see... I have a nasty pdf file that is causing OutOfMemoryError when

used

by another library and I reached the conclusion that it is (somehow)
because the text and the fonts it uses...

I saw the RemoveAllText example and maybe is what I need. I modified it

and

instead of removing text I did nothing, and the new pdf file seems to

have

the "corruption" removed...

One last question, how could I modify the RemoveAllText example to remove
from the pdf file all images?

Thanks.

Jorge



El jue, 6 may 2021 a las 1:07, Tilman Hausherr (<[email protected]>)
escribió:

Am 05.05.2021 um 18:39 schrieb jorgeeflorez:

Hi,
I would like to know what would be the best way to detect whether ia

pdf

file has CID fonts. As far as I understand, these fonts are used in

asian

texts (japanese, chinese, korean, etc). I have the following code:

           PDDocument doc = PDDocument.load(myFile);
           for (int i = 0; i < doc.getNumberOfPages(); ++i)
           {
               PDPage page = doc.getPage(i);
               PDResources res = page.getResources();
               for (COSName fontName : res.getFontNames())
               {
                   PDFont font = res.getFont(fontName);
                   COSName subType =
font.getCOSObject().getCOSName(COSName.SUBTYPE);
                   System.out.println("CID? " +

COSName.TYPE0.equals(subType));

                   System.out.println("font instanceof PDType0Font? " +

(font

instanceof PDType0Font));
               }
           }
Would this be the right way to do it?

It's more complicated because form XObjects, patterns, annotations,
softmasks (and maybe more) can also have fonts. I also doubt that you
can detect CK fonts this way.

Re removing the text, see the RemoveAllTexts example in the source code
download. IIRC this one only does the page content stream.

Tilman

I need to detect this and try to create a pdf file from the original,

but

without the text.

Any indication is appreciated.

Regards,

Jorge

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Detecting CID fonts

Reply via email to