Eureka! I wish I could send a box of digital donuts. Thanks so much!!!! On Tue, Oct 11, 2016 at 9:21 AM, Duncan Murdoch <murdoch.dun...@gmail.com> wrote:
> On 11/10/2016 7:59 AM, Ryan Utz wrote: > >> Bob/Duncan, >> >> Thanks for writing. I think some of the things Bob mentioned might work, >> but I'm still not quite getting there. Below is the example I'm working >> with: >> >> > It worked for me when I replaced the browseURL call with a readLines call, > as I suggested the other day. What went wrong for you? > > Duncan Murdoch > > #1 >> browseURL('http://pick18.discoverlife.org/mp/20m?plot=2& >> kind=Hypoprepia+fucosa&site=33.9+-83.3&date1=2011,2012,2013& >> flags=build_txt: >> <http://pick18.discoverlife.org/mp/20m?plot=2&kind=Hypoprepi >> a+fucosa&site=33.9+-83.3&date1=2011,2012,2013&flags=build_txt:>') >> # This opens the URL and creates a link to machine-readable data on the >> page, which I can then download by simply doing this: >> >> #2 >> read.delim('http://pick18.discoverlife.org/tmp/Hypoprepia_ >> fucosa_33.9_-83.3_2011,2012,2013.txt >> <http://pick18.discoverlife.org/tmp/Hypoprepia_fucosa_33.9_- >> 83.3_2011,2012,2013.txt>') >> #This is what I need to read in terms of data, but this URL only exists >> if the URL ran above is activated first >> >> So, for example, try running line #2 without the first line- it won't >> work. Next run #1 then #2- works fine. >> >> See what I mean? >> >> >> On Thu, Sep 29, 2016 at 5:09 PM, Bob Rudis <b...@rud.is >> <mailto:b...@rud.is>> wrote: >> >> The rvest/httr/curl trio can do the cookie management pretty well. >> Make the initial connection via rvest::html_session() and then >> hopefully be able to use other rvest function calls, but curl and >> httr calls will use the cached in-memory handle info seamlessly. >> You'd need to store and retrieve cookies if you need them preserved >> between R sessions. >> >> Failing the above and assuming this would not need to be lightning >> fast, use the phantomjs or firefox web driver (either with RSelenium >> or some new stuff rOpenSci is cooking up) which will then do what >> browsers do best and maintain all this state for you. You can still >> slurp the page contents up with xml2::read_html() and use the super >> handy processing idioms in the scraping tidyverse (it needs it's own >> name). >> >> A concrete example (assuming the URLs aren't sensitive) would enable >> me or someone else to mock up something for you. >> >> >> On Thu, Sep 29, 2016 at 4:59 PM, Duncan Murdoch >> <murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>> wrote: >> >> On 29/09/2016 3:29 PM, Ryan Utz wrote: >> >> Hi all, >> >> I've got a situation that involves activating a URL so that >> a link to some >> data becomes available for download. I can easily use >> 'browseURL' to do so, >> but I'm hoping to make this batch-process-able, and I would >> prefer to not >> have 100s of browser windows open when I go to download >> multiple data sets. >> >> Here's the example: >> >> #1 >> browseURL(' >> http://pick18.discoverlife.org/mp/20m?plot=2&kind=Hypoprepia >> +fucosa&site=33.9+-83.3&date1=2011,2012,2013&flags=build_txt >> <http://pick18.discoverlife.org/mp/20m?plot=2&kind=Hypoprepi >> a+fucosa&site=33.9+-83.3&date1=2011,2012,2013&flags=build_txt>: >> ') >> # This opens the URL and creates a link to machine-readable >> data on the >> page, which I can then download by simply doing this: >> >> #2 >> read.delim(' >> http://pick18.discoverlife.org/tmp/Hypoprepia_fucosa_33.9_- >> 83.3_2011,2012,2013.txt >> <http://pick18.discoverlife.org/tmp/Hypoprepia_fucosa_33.9_- >> 83.3_2011,2012,2013.txt> >> ') >> >> However, I can only get the second line above to work if the >> thing in line >> #1 has been opened in a browser already. Is there any way to >> allow me to >> either 1) close the browser after it's been opened or 2) >> execute the line >> #2 above without having to open a browser? We have hundreds >> of species that >> you can see after the '&kind=' bit of the URL, so I'm trying >> to keep the >> browsing situation sane. >> >> Thanks! >> R >> >> >> You'll need to figure out what happens when you open the first >> page. Does it set a cookie? Does it record your IP address? >> Does it just build the file but record nothing about you? >> >> If it's one of the simpler versions, you can just read the first >> page, wait a bit, then read the second one. >> >> If you need to manage cookies, you'll need something more >> complicated. I don't know the easiest way to do that. >> >> Duncan Murdoch >> >> >> ______________________________________________ >> R-help@r-project.org <mailto:R-help@r-project.org> mailing list >> -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> <https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> >> >> -- >> >> Ryan Utz, Ph.D. >> Assistant professor of water resources >> *chatham**UNIVERSITY* >> Home/Cell: (724) 272-7769 >> >> > -- Ryan Utz, Ph.D. Assistant professor of water resources *chatham**UNIVERSITY* Home/Cell: (724) 272-7769 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.