This is well beyond the scope of Tutor, but let me mention the following: The code to pdftables disappeared from github some time back. What is on sourceforge is old, same with pypi. I wouldn't create a project using pdftables based on that...
As far as what you are trying to do, it looks like they might have the data in excel spreadsheets. That is totally trivial to load in pandas. if you have any choice at all, avoid PDF at all cost to get data. See some detail of the complexity here: http://ieg.ifs.tuwien.ac.at/pub/yildiz_iicai_2005.pdf For your two documents, if you cannot find the data in the excel sheets, I think the tabula (ruby based application) approach is the best bet. Francois On Wed, Nov 25, 2015 at 8:41 AM, Python Beginner < pythonbeginner...@gmail.com> wrote: > Oh, I forgot to mention that I am using Python 3.4. Thanks again for your > help pointing me in the right direction. > > ~Chris > > On Tue, Nov 24, 2015 at 1:36 PM, Python Beginner < > pythonbeginner...@gmail.com> wrote: > > > Hi, > > > > I am looking for the best way to scrape the following PDF's: > > > > (1) > > http://minerals.usgs.gov/minerals/pubs/commodity/gold/mcs-2015-gold.pdf > > (table on page 1) > > > > (2) > > http://minerals.usgs.gov/minerals/pubs/commodity/gold/myb1-2013-gold.pdf > > (table 1) > > > > I have done a lot of research and have read that pdftables 0.0.4 is an > > excellent way to scrape tabular data from PDF'S (see > > > https://blog.scraperwiki.com/2013/07/pdftables-a-python-library-for-getting-tables-out-of-pdf-files/ > > ). > > > > I downloaded pdftables 0.0.4 (see https://pypi.python.org/pypi/pdftables > ). > > > > I am new to Python and having trouble finding good documentation for how > > to use this library. > > > > Has anybody used pdftables before that could help me get started or point > > me to the ideal library for scrapping the PDF links above? I have read > that > > different PDF libraries are used depending on the format of the PDF. What > > library would be best for the PDF formats above? Knowing this will help > me > > get started, then I can write up some code and ask further questions if > > needed. > > > > Thanks in advance for your help! > > > > ~Chris > > > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor > -- raspberry-python.blogspot.com - www.pyptug.org - www.3DFutureTech.info - @f_dion _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor