Hello,
Think I solved this myself. If anyone else is interested; I had to create my
own TextStripperByArea and make the getregionCharacterList public. Then I could
access the information about the characters and determine the
direction/rotation etc.
PDFDev
On Tuesday, March 3, 2020, 7:31:39 PM GMT, PDF Developer
<[email protected]> wrote:
Hello,
I am trying to understand these two methods PDFTextStripper and
PDFTextStripperByArea. I am using them obtain the properties of the text in a
PDF. For what it is worth, I have some PDFs that are marked up with "regions"
which I can, reliably, detect. Since I know the area in question, I thought it
would be enough to use the PDFStripperByArea and get the text within the
bounded area. That works quite well. However, now there is a requirement to
get the rotation of the text, as there are use cases where the text has been
rotated as part of an upstream process.
So I tried to get the TextPositions properties via an override of the
writeString and I thought was all working but a colleague pointed out that the
rotation was always "0".
Going back to basics, for test purposes, I used PDFTextStripper (again with an
override it the writeString method) basically to dump the properties of the
TextPositions. That appears to give me the results I am looking for. However,
if I use a similar override for PDFTextStripperByArea I never see a rotation
other than 0.
Since there can be a lot of text on a page and the pages are very large, so I
would prefer to use PDFTextStripperByArea (mainly because I know exactly where
the text will be and the overhead will be less).
Have I misunderstood something along the way? Made a naive assumption? Any
suggestions on how to get the PDFTextStripperByArea to return the string
contained within an area/region and the rotation (or other properties) of the
text?
PDFDev