Re: Extracting tables data from PDF file ?

Dexter Mishra Sun, 22 Mar 2009 03:55:33 -0700

Hi Hanna,
I dont think there is an way to say a data is table data. the one thing you
can do is use the article/bead feature in the PDFTextStripper example. We
also have similar requirement. One have a few metadata in terms of PDF
comments. SO i am modifying thePDFBox library for using the PDF comments as
userspace meta data.
One apporach you can try is manipulating the x,y cooridinates of the PDF.


On Sat, Mar 21, 2009 at 9:01 PM, Hanan Harush <[email protected]> wrote:

> Hi
>
>
>
> My name is Hanan and I am developing an in-house application that requires
> reading pdf file and extract tables text  to a  local Database.
>
> Of course the table number of rows might change from time to time .
>
>
>
> After reading a lot about PDF as well as pdfbox I have  succeeded to  :
>
>                Load a PDF document
>
>     Iterate through its pages
>
>
>
> My questions are:
>
> 1. Is there a way to identify a table in PDF file ?
>
> 2. What are the alternatives for extracting tables data only using pdfBox
>  ?
>
>
> 3. How is it possible to step through a table ?
>
>
>
> Best Regards,
>
> Hanan Harush
>
>
>
>

Re: Extracting tables data from PDF file ?

Reply via email to