Bob/Duncan, Thanks for writing. I think some of the things Bob mentioned might work, but I'm still not quite getting there. Below is the example I'm working with:
#1 browseURL('http://pick18.discoverlife.org/mp/20m?plot= 2&kind=Hypoprepia+fucosa&site=33.9+-83.3&date1=2011,2012, 2013&flags=build_txt:') # This opens the URL and creates a link to machine-readable data on the page, which I can then download by simply doing this: #2 read.delim('http://pick18.discoverlife.org/tmp/Hypoprepia_fucosa_33.9_-83.3_ 2011,2012,2013.txt') #This is what I need to read in terms of data, but this URL only exists if the URL ran above is activated first So, for example, try running line #2 without the first line- it won't work. Next run #1 then #2- works fine. See what I mean? On Thu, Sep 29, 2016 at 5:09 PM, Bob Rudis <b...@rud.is> wrote: > The rvest/httr/curl trio can do the cookie management pretty well. Make > the initial connection via rvest::html_session() and then hopefully be able > to use other rvest function calls, but curl and httr calls will use the > cached in-memory handle info seamlessly. You'd need to store and retrieve > cookies if you need them preserved between R sessions. > > Failing the above and assuming this would not need to be lightning fast, > use the phantomjs or firefox web driver (either with RSelenium or some new > stuff rOpenSci is cooking up) which will then do what browsers do best and > maintain all this state for you. You can still slurp the page contents up > with xml2::read_html() and use the super handy processing idioms in the > scraping tidyverse (it needs it's own name). > > A concrete example (assuming the URLs aren't sensitive) would enable me or > someone else to mock up something for you. > > > On Thu, Sep 29, 2016 at 4:59 PM, Duncan Murdoch <murdoch.dun...@gmail.com> > wrote: > >> On 29/09/2016 3:29 PM, Ryan Utz wrote: >> >>> Hi all, >>> >>> I've got a situation that involves activating a URL so that a link to >>> some >>> data becomes available for download. I can easily use 'browseURL' to do >>> so, >>> but I'm hoping to make this batch-process-able, and I would prefer to not >>> have 100s of browser windows open when I go to download multiple data >>> sets. >>> >>> Here's the example: >>> >>> #1 >>> browseURL(' >>> http://pick18.discoverlife.org/mp/20m?plot=2&kind=Hypoprepia >>> +fucosa&site=33.9+-83.3&date1=2011,2012,2013&flags=build_txt: >>> ') >>> # This opens the URL and creates a link to machine-readable data on the >>> page, which I can then download by simply doing this: >>> >>> #2 >>> read.delim(' >>> http://pick18.discoverlife.org/tmp/Hypoprepia_fucosa_33.9_-8 >>> 3.3_2011,2012,2013.txt >>> ') >>> >>> However, I can only get the second line above to work if the thing in >>> line >>> #1 has been opened in a browser already. Is there any way to allow me to >>> either 1) close the browser after it's been opened or 2) execute the line >>> #2 above without having to open a browser? We have hundreds of species >>> that >>> you can see after the '&kind=' bit of the URL, so I'm trying to keep the >>> browsing situation sane. >>> >>> Thanks! >>> R >>> >>> >> You'll need to figure out what happens when you open the first page. Does >> it set a cookie? Does it record your IP address? Does it just build the >> file but record nothing about you? >> >> If it's one of the simpler versions, you can just read the first page, >> wait a bit, then read the second one. >> >> If you need to manage cookies, you'll need something more complicated. I >> don't know the easiest way to do that. >> >> Duncan Murdoch >> >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > -- Ryan Utz, Ph.D. Assistant professor of water resources *chatham**UNIVERSITY* Home/Cell: (724) 272-7769 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.