Re: [Tutor] PDF Scrapping

2015-11-25 Thread shawn wilson
On Nov 25, 2015 12:44 PM, "Francois Dion" wrote: > > if you > have any choice at all, avoid PDF at all cost to get data. > Agreed and IIRC all of that data should be in xml somewhere (look for their rpc pages). Probably start by searching for similar table names (and Google dorking their site fo

Re: [Tutor] PDF Scrapping

2015-11-25 Thread Laura Creighton
In a message of Wed, 25 Nov 2015 12:43:51 -0500, Francois Dion writes: >This is well beyond the scope of Tutor, but let me mention the following: > >The code to pdftables disappeared from github some time back. What is on >sourceforge is old, same with pypi. I wouldn't create a project using >pdfta

Re: [Tutor] PDF Scrapping

2015-11-25 Thread Francois Dion
This is well beyond the scope of Tutor, but let me mention the following: The code to pdftables disappeared from github some time back. What is on sourceforge is old, same with pypi. I wouldn't create a project using pdftables based on that... As far as what you are trying to do, it looks like th

Re: [Tutor] PDF Scrapping

2015-11-25 Thread Python Beginner
Oh, I forgot to mention that I am using Python 3.4. Thanks again for your help pointing me in the right direction. ~Chris On Tue, Nov 24, 2015 at 1:36 PM, Python Beginner < pythonbeginner...@gmail.com> wrote: > Hi, > > I am looking for the best way to scrape the following PDF's: > > (1) > http:/

[Tutor] PDF Scrapping

2015-11-24 Thread Python Beginner
Hi, I am looking for the best way to scrape the following PDF's: (1) http://minerals.usgs.gov/minerals/pubs/commodity/gold/mcs-2015-gold.pdf (table on page 1) (2) http://minerals.usgs.gov/minerals/pubs/commodity/gold/myb1-2013-gold.pdf (table 1) I have done a lot of research and have read that