Am 06.05.2021 um 20:52 schrieb jorgeeflorez:
If the problem goes away by just opening and saving the PDF, then why
modify it?
I cannot share the PDF file to the support team of that library. So I was
wondering if I could share the pdf without images, so technically would not
be sharing the file, only the part that is causing them problems. But it
probably won't work...
Obviously not, if the problem goes away by saving.
I've never heard of fonts being a problem... rather patterns, big images
or very complex vector graphics.
An internal call to Arrays.copyOf (doing God knows what) takes all
available memory. Strange indeed.
You would have to see the rest of the stack trace. Also try to run the
whole thing with a bigger -Xmx value.
Tilman
El jue, 6 may 2021 a las 12:34, Tilman Hausherr (<[email protected]>)
escribió:
Maybe the PDF is somehow broken and PDFBox repairs it. If the problem
goes away by just opening and saving the PDF, then why modify it?
I've never heard of fonts being a problem... rather patterns, big images
or very complex vector graphics.
Tilman
Am 06.05.2021 um 14:16 schrieb jorgeeflorez:
Hi Tilman,
thank you for your reply.
It's more complicated because form XObjects, patterns, annotations,
softmasks (and maybe more) can also have fonts. I also doubt that you
can detect CK fonts this way.
I see... I have a nasty pdf file that is causing OutOfMemoryError when
used
by another library and I reached the conclusion that it is (somehow)
because the text and the fonts it uses...
I saw the RemoveAllText example and maybe is what I need. I modified it
and
instead of removing text I did nothing, and the new pdf file seems to
have
the "corruption" removed...
One last question, how could I modify the RemoveAllText example to remove
from the pdf file all images?
Thanks.
Jorge
El jue, 6 may 2021 a las 1:07, Tilman Hausherr (<[email protected]>)
escribió:
Am 05.05.2021 um 18:39 schrieb jorgeeflorez:
Hi,
I would like to know what would be the best way to detect whether ia
pdf
file has CID fonts. As far as I understand, these fonts are used in
asian
texts (japanese, chinese, korean, etc). I have the following code:
PDDocument doc = PDDocument.load(myFile);
for (int i = 0; i < doc.getNumberOfPages(); ++i)
{
PDPage page = doc.getPage(i);
PDResources res = page.getResources();
for (COSName fontName : res.getFontNames())
{
PDFont font = res.getFont(fontName);
COSName subType =
font.getCOSObject().getCOSName(COSName.SUBTYPE);
System.out.println("CID? " +
COSName.TYPE0.equals(subType));
System.out.println("font instanceof PDType0Font? " +
(font
instanceof PDType0Font));
}
}
Would this be the right way to do it?
It's more complicated because form XObjects, patterns, annotations,
softmasks (and maybe more) can also have fonts. I also doubt that you
can detect CK fonts this way.
Re removing the text, see the RemoveAllTexts example in the source code
download. IIRC this one only does the page content stream.
Tilman
I need to detect this and try to create a pdf file from the original,
but
without the text.
Any indication is appreciated.
Regards,
Jorge
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]