Ista, Thank you. That more or less did the trick. I got the data though it's in a weird format compared to how it appears on the page and needs a lot of clean up. But I was kind of expecting that. Dan
-----Original Message----- From: Ista Zahn [mailto:istaz...@gmail.com] Sent: Tuesday, January 15, 2013 3:18 PM To: Lopez, Dan Cc: R help (r-help@r-project.org) Subject: Re: [R] readHTMLTable (XML package) Hi Dan, On Tue, Jan 15, 2013 at 5:31 PM, Lopez, Dan <lopez...@llnl.gov> wrote: > Hi Ista, > > It does exist. It’s a page in our company intranet. Ah, good. > > It is https so it looks like I can't use RCurl either. I tried RCurl BTW and > got the below error. > Well that error is not because RCurl doesn't work with https protocol. In my original example I meant to show tabs <- readHTMLTable(getURL("https://en.wikipedia.org/wiki/List_of_countries_by_population")) i.e., getURL() does work with https. (Well, maybe depending on your version of libcurl. See the getURL help page for details.) > Do you have experience with pulling a table of an https site? Yes, I do :) > If so how do I do that? See below > > >> tabs <- >> readHTMLTable(getURL("https://hr-workforce-analytics.llnl.gov/wf_pi_p >> op.html")) > Error in > readHTMLTable(getURL("https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html")) > : > error in evaluating the argument 'doc' in selecting a method for function > 'readHTMLTable': Error in function (type, msg, asError = TRUE) : > SSL certificate problem, verify that the CA cert is OK. Details: > error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate > verify failed This is an RCurl FAQ (see http://www.omegahat.org/RCurl/FAQ.html). The quick and dirty way is getURL("https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html", ssl.verifypeer = FALSE) Best, Ista > > > Thanks. > Dan > > -----Original Message----- > From: Ista Zahn [mailto:istaz...@gmail.com] > Sent: Tuesday, January 15, 2013 12:22 PM > To: Lopez, Dan > Cc: R help (r-help@r-project.org) > Subject: Re: [R] readHTMLTable (XML package) > > Hi Dan, > > A couple of things: first, I think that file really does not exist (at > least I can't open it in my web browser). Second, even if it did, > url() cannot download from https, according to the details section of > ?url, which points you to RCurl. So, once you verify that you url > actually exists you can do something like > > library(XML) > library(RCurl) > tabs <- > readHTMLTable(getURL("http://en.wikipedia.org/wiki/List_of_countries_b > y_population")) > > Best, > Ista > > On Tue, Jan 15, 2013 at 2:59 PM, Lopez, Dan <lopez...@llnl.gov> wrote: >> Hi, >> >> I am using XML::readHTMLTable and getting the below error. Does anyone know >> why? Does this function not work with https? I didn't see anything in help >> about that. >> >>> library(XML) >>> wampage<-readHTMLTable('https://hr-workforce-analytics.llnl.gov/wf_p >>> i >>> _pop.html',1) >> Error in htmlParse(doc) : >> File https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html does >> not exist >> >> Dan >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.