Thank you. The output i get from that example is below: > d = debugGatherer() > getURL("http://uk.youtube.com", + debugfunction = d$update, verbose = TRUE ) [1] "" > d$value() text "About to connect() to uk.youtube.com port 80 (#0)\n Trying 208.117.236.72... connected\nConnected to uk.youtube.com (208.117.236.72) port 80 (#0)\nConnection #0 to host uk.youtube.com left intact\n" headerIn "HTTP/1.1 400 Bad Request\r\nVia: 1.1 PFO-FIREWALL\r\nConnection: Keep- Alive\r\nProxy-Connection: Keep-Alive\r\nTransfer-Encoding: chunked\r \nExpires: Tue, 27 Apr 1971 19:44:06 EST\r\nDate: Tue, 27 Jan 2009 15:31:25 GMT\r\nContent-Type: text/plain\r\nServer: Apache\r\nX- Content-Type-Options: nosniff\r\nCache-Control: no-cache\r \nCneonction: close\r\n\r\n" headerOut "GET / HTTP/1.1\r\nHost: uk.youtube.com\r\nAccept: */*\r\n\r\n" dataIn "0\r\n\r\n" dataOut "" >
So the critical information from this is the '400 Bad Request'. A Google search defines this for me as: The request could not be understood by the server due to malformed syntax. The client SHOULD NOT repeat the request without modifications. looking through sort(both listCurlOptions() and http://curl.haxx.se/libcurl/c/curl_easy_setopt.htm) doesn't really help me this time (unless i missed something). Any advice? Thank you for your time, C.C P.S. I can get the d/l to work if i use: > toString(readLines("http://www.uk.youtube.com")) [1] "<html>, \t<head>, \t\t<title>OpenDNS</title>, \t</head>, , \t<body id=\"mainbody\" onLoad=\"testforbanner();\" style=\"margin: 0px;\">, \t\t<script language=\"JavaScript\">, \t\t\tfunction testforbanner() {, \t\t\t\tvar width;, \t\t\t\tvar height;, \t\t\t \tvar x = 0;, \t\t\t\tvar isbanner = false;, \t\t\t\tvar bannersizes = new Array(16), \t\t\t\tbannersizes[0] = [etc] > On 27 Jan, 13:52, Duncan Temple Lang <dun...@wald.ucdavis.edu> wrote: > clair.crossup...@googlemail.com wrote: > > Thank you Duncan. > > > I remember seeing in your documentation that you have used this > > 'verbose=TRUE' argument in functions before when trying to see what is > > going on. This is good. However, I have not been able to get it to > > work for me. Does the output appear in R or do you use some other > > external window (i.e. MS DOS window?)? > > The libcurl code typically defaults to print on the console. > So on the Windows GUI, this will not show up. Using > a shell (MS DOS window or Unix-like shell) should > should cause the output to be displayed. > > A more general way however is to use the debugfunction > option. > > d = debugGatherer() > > getURL("http://uk.youtube.com", > debugfunction = d$update, verbose = TRUE) > > When this completes, use > > d$value() > > and you have the entire contents that would be displayed on the console. > > D. > > > > >> library(RCurl) > >> my.url <- > >> 'http://www.nytimes.com/2009/01/07/technology/business-computing/07pro... > >> getURL(my.url, verbose = TRUE) > > [1] "" > > > I am having a problem with a new webpage (http://uk.youtube.com/) but > > if i can get this verbose to work, then i think i will be able to > > google the right action to take based on the information it gives. > > > Many thanks for your time, > > C.C. > > > On 26 Jan, 16:12, Duncan Temple Lang <dun...@wald.ucdavis.edu> wrote: > >> clair.crossup...@googlemail.com wrote: > >>> Dear R-help, > >>> There seems to be a web page I am unable to download using RCurl. I > >>> don't understand why it won't download: > >>>> library(RCurl) > >>>> my.url <- > >>>> "http://www.nytimes.com/2009/01/07/technology/business-computing/07pro..." > >>>> getURL(my.url) > >>> [1] "" > >> I like the irony that RCurl seems to have difficulties downloading an > >> article about R. Good thing it is just a matter of additional arguments > >> to getURL() or it would be bad news. > > >> The followlocation parameter defaults to FALSE, so > > >> getURL(my.url, followlocation = TRUE) > > >> gets what you want. > > >> The way I found this is > > >> getURL(my.url, verbose = TRUE) > > >> and take a look at the information being sent from R > >> and received by R from the server. > > >> This gives > > >> * About to connect() towww.nytimes.comport80 (#0) > >> * Trying 199.239.136.200... * connected > >> * Connected towww.nytimes.com(199.239.136.200) port 80 (#0) > >> > GET /2009/01/07/technology/business-computing/07program.html?_r=2 > >> HTTP/1.1 > >> Host:www.nytimes.com > >> Accept: */* > > >> < HTTP/1.1 301 Moved Permanently > >> < Server: Sun-ONE-Web-Server/6.1 > >> < Date: Mon, 26 Jan 2009 16:10:51 GMT > >> < Content-length: 0 > >> < Content-type: text/html > >> < > >> Location:http://www.nytimes.com/glogin?URI=http://www.nytimes.com/2009/01/07/t... > >> < > > >> And the 301 is the critical thing here. > > >> D. > > >>> Other web pages are ok to download but this is the first time I have > >>> been unable to download a web page using the very nice RCurl package. > >>> While i can download the webpage using the RDCOMClient, i would like > >>> to understand why it doesn't work as above please? > >>>> library(RDCOMClient) > >>>> my.url <- > >>>> "http://www.nytimes.com/2009/01/07/technology/business-computing/07pro..." > >>>> ie <- COMCreate("InternetExplorer.Application") > >>>> txt <- list() > >>>> ie$Navigate(my.url) > >>> NULL > >>>> while(ie[["Busy"]]) Sys.sleep(1) > >>>> txt[[my.url]] <- ie[["document"]][["body"]][["innerText"]] > >>>> txt > >>> $`http://www.nytimes.com/2009/01/07/technology/business-computing/ > >>> 07program.html?_r=2` > >>> [1] "Skip to article Try Electronic Edition Log ... > >>> Many thanks for your time, > >>> C.C > >>> Windows Vista, running with administrator privileges. > >>>> sessionInfo() > >>> R version 2.8.1 (2008-12-22) > >>> i386-pc-mingw32 > >>> locale: > >>> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. > >>> 1252;LC_MONETARY=English_United Kingdom. > >>> 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > >>> attached base packages: > >>> [1] stats graphics grDevices utils datasets methods > >>> base > >>> other attached packages: > >>> [1] RDCOMClient_0.92-0 RCurl_0.94-0 > >>> loaded via a namespace (and not attached): > >>> [1] tools_2.8.1 > >>> ______________________________________________ > >>> r-h...@r-project.org mailing list > >>>https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting > >>> guidehttp://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >> ______________________________________________ > >> r-h...@r-project.org mailing > >> listhttps://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ > > r-h...@r-project.org mailing list > >https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.