Hi,
If you only need to know whether it is > 0, then you don't need it at
all. Because > 0 means you got text.
If you want a count, then extend the stripper and extend showGlyph().
Here is how our "big sister project" Apache Tika does it:
@Override
protected void showGlyph(Matrix textRenderingMatrix, PDFont font,
int code, String unicode, Vector displacement) throws IOException
{
super.showGlyph(textRenderingMatrix, font, code, unicode,
displacement);
if (unicode == null || unicode.isEmpty()) {
unmappedUnicodeCharsPerPage++;
}
totalCharsPerPage++;
}
Tilman
Am 10.09.2020 um 14:12 schrieb Karen Madore:
Hello,
We are migrating from PDFBox from 1.8.16 to 2.0.21 and am looking for the
equivalent method to the getValidCharCnt(). Previously this method was
accessible via PDFTextStripper class but now it does not seem to recognize this
method and I am unable to find a suitable replacement method. Below is the line
of code I am trying to migrate.
_hasText = (_hasText || stripper.getValidCharCnt() > 0) ? true : false;
Imports used are:
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.multipdf.Splitter;
I am new to PDFBox so this might be a newbie question.
Cheers,
Karen Madore
This email and any files transmitted with it are solely intended for the use of
the addressee(s) and may contain information that is confidential and
privileged. If you receive this email in error, please advise us by return
email immediately. Please also disregard the contents of the email, delete it
and destroy any copies immediately. Mediagrif Interactive Technologies Inc. and
its subsidiaries do not accept liability for the views expressed in the email
or for the consequences of any computer viruses that may be transmitted with
this email. This email is also subject to copyright. No part of it should be
reproduced, adapted or transmitted without the written consent of the copyright
owner.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]