Hi Ismael, first of all try to load the pdf with PDDocument doc = PDDocument.load(file). You don't have to parse the doc on your own. See org.apache.pdfbox.examples.util.ExtractTextByArea as an example for extracting textareas. Why do you try to extract the same region twice? Wouldn't it be easier to just copy the resultstring?
BR Andreas Lehmkühler ----- original Nachricht -------- Betreff: PDFTextStripperByArea extracts text only from 1 region, despite several regions being defined Gesendet: Di, 21. Jul 2009 Von: Ismael Hasan<[email protected]> > Hello. I have a problem with the class > "org.apache.pdfbox.util.PDFTextStripperByArea": > > If I add several regions to this class to extract the text from, it is > only retrieved from one of them. The example I build was to create two > regions with the same values (with different names), add them to the > text stripper, and use the "extractRegions" function. > > I really appreciate if someone can answer me what I am doing wrong, or > if this is a bug in the tool. > > Please, see at the end of the message the code with which I get this > issue; the final result buffers (localResult1 and localResult2) have > different content (one of them is empty). If you need a PDF document > to reproduce this, please ask me for it. > > Thanks in advance, > Ismael > > > > //Opening the document and getting the page > PDFParser parser = new PDFParser(new > ByteArrayInputStream(documentInBytes)); > parser.parse(); > PDDocument doc = parser.getPDDocument(); > PDPage page = (PDPage) > doc.getDocumentCatalog().getAllPages().get(pageNumber); > > // Creating the stripper > PDFTextStripperByArea areaStripper = new PDFTextStripperByArea(); > > // Creation and addition of the regions to the stripper > Rectangle2D rectangle = new Rectangle2D.Float(); > rectangle.setRect(0, 0, 500, 100); > areaStripper.addRegion("1", rectangle); > > Rectangle2D rectangle2 = new Rectangle2D.Float(); > rectangle2.setRect(0, 0, 500, 100); > areaStripper.addRegion("2", rectangle2); > > // Extracting the regions and getting the results > areaStripper.extractRegions(page); > String localResult1 = areaStripper.getTextForRegion("1"); > String localResult2 = areaStripper.getTextForRegion("2"); > --- original Nachricht Ende ----
