Am 04.12.2015 um 21:39 schrieb britt fitch:
Thanks very much for the quick replies!
I think setting startPage & endPage with make it so you correctly only
extract the pages you want, but on every extraction it will iterate
over all pages first.
For example, if you have a 100 page document and want to extract page
2 & page 90, you will iterate over all 100 pages and process page 2,
then iterate over all 100 pages and process page 90.
The 1.8 version allowed you to pass a single page to be processed. I’m
curious if that functionality was removed because of an issue or if it
was just a bug.
Really? I looked at processPage(), and it does use currentPageNo and I
don't see a way to set that one from outside.
On a second look, I think I understand what you mean: processPages()
uses a list of pages, so you would set your own list. But this would
mean trouble if you had set other variables.
I assume this was changed in 2.0 as part of the page tree refactoring.
Btw this looping does indeed look weird, but I doubt you'll use any
time. The text extraction by itself does much more, it needs to loop
through every glyph in the page you're extracting.
Tilman
It looks like I can get around this a bit by overriding
startPage(PDPage) and endPage(PDPage) though.
Thanks again, I really appreciate all your feedback.
Cheers,
Britt
Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110
http://wiredinformatics.com
[email protected]
On Dec 4, 2015, at 3:07 PM, Tilman Hausherr <[email protected]
<mailto:[email protected]>> wrote:
Am 04.12.2015 um 20:56 schrieb britt fitch:
Awesome, thanks. That takes care of #1 & 2.
For #3, is the check on currentPageNo necessary?
Right now processPage must be called from processPages or nothing
happens.
This has a negative effect for cases like mine where I want to
override processTextPosition and handle different pages or even if
you only want to extract data from particular pages.
You can set the start and endpage through the setters setStartPage()
and setEndPage(). That's the official way to do it.
Tilman