opps, i meant: toString(readLines("http://uk.youtube.com")) > toString(readLines("http://uk.youtube.com")) [1] "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd\">, , , \t<html lang=\"en\">, , <!-- machid: 302 -->, <head>, , \t, \t<title>YouTube - Broadcast Yourself.</title>, [etc] Warning message: In readLines("http://uk.youtube.com") : incomplete final line found on 'http://uk.youtube.com' >
On 27 Jan, 16:02, "clair.crossup...@googlemail.com" <clair.crossup...@googlemail.com> wrote: > Thank you. The output i get from that example is below: > > > d = debugGatherer() > > getURL("http://uk.youtube.com", > > + debugfunction = d$update, verbose = TRUE ) > [1] "" > > > d$value() > > text > "About to connect() to uk.youtube.com port 80 (#0)\n Trying > 208.117.236.72... connected\nConnected to uk.youtube.com > (208.117.236.72) port 80 (#0)\nConnection #0 to host uk.youtube.com > left intact\n" > > headerIn > "HTTP/1.1 400 Bad Request\r\nVia: 1.1 PFO-FIREWALL\r\nConnection: Keep- > Alive\r\nProxy-Connection: Keep-Alive\r\nTransfer-Encoding: chunked\r > \nExpires: Tue, 27 Apr 1971 19:44:06 EST\r\nDate: Tue, 27 Jan 2009 > 15:31:25 GMT\r\nContent-Type: text/plain\r\nServer: Apache\r\nX- > Content-Type-Options: nosniff\r\nCache-Control: no-cache\r > \nCneonction: close\r\n\r\n" > > headerOut > "GET / HTTP/1.1\r\nHost: uk.youtube.com\r\nAccept: */*\r\n\r\n" > > dataIn > "0\r\n\r\n" > > dataOut > "" > > > > So the critical information from this is the '400 Bad Request'. A > Google search defines this for me as: > > The request could not be understood by the server due to malformed > syntax. The client SHOULD NOT repeat the request without > modifications. > > looking through sort(both listCurlOptions() > andhttp://curl.haxx.se/libcurl/c/curl_easy_setopt.htm) doesn't really > help me this time (unless i missed something). Any advice? > > Thank you for your time, > C.C > > P.S. I can get the d/l to work if i use:> > toString(readLines("http://www.uk.youtube.com")) > > [1] "<html>, \t<head>, \t\t<title>OpenDNS</title>, \t</head>, , > \t<body id=\"mainbody\" onLoad=\"testforbanner();\" style=\"margin: > 0px;\">, \t\t<script language=\"JavaScript\">, \t\t\tfunction > testforbanner() {, \t\t\t\tvar width;, \t\t\t\tvar height;, \t\t\t > \tvar x = 0;, \t\t\t\tvar isbanner = false;, \t\t\t\tvar bannersizes = > new Array(16), \t\t\t\tbannersizes[0] = [etc] > > > > On 27 Jan, 13:52, Duncan Temple Lang <dun...@wald.ucdavis.edu> wrote: > > > > > clair.crossup...@googlemail.com wrote: > > > Thank you Duncan. > > > > I remember seeing in your documentation that you have used this > > > 'verbose=TRUE' argument in functions before when trying to see what is > > > going on. This is good. However, I have not been able to get it to > > > work for me. Does the output appear in R or do you use some other > > > external window (i.e. MS DOS window?)? > > > The libcurl code typically defaults to print on the console. > > So on the Windows GUI, this will not show up. Using > > a shell (MS DOS window or Unix-like shell) should > > should cause the output to be displayed. > > > A more general way however is to use the debugfunction > > option. > > > d = debugGatherer() > > > getURL("http://uk.youtube.com", > > debugfunction = d$update, verbose = TRUE) > > > When this completes, use > > > d$value() > > > and you have the entire contents that would be displayed on the console. > > > D. > > > >> library(RCurl) > > >> my.url <- > > >> 'http://www.nytimes.com/2009/01/07/technology/business-computing/07pro... > > >> getURL(my.url, verbose = TRUE) > > > [1] "" > > > > I am having a problem with a new webpage (http://uk.youtube.com/) but > > > if i can get this verbose to work, then i think i will be able to > > > google the right action to take based on the information it gives. > > > > Many thanks for your time, > > > C.C. > > > > On 26 Jan, 16:12, Duncan Temple Lang <dun...@wald.ucdavis.edu> wrote: > > >> clair.crossup...@googlemail.com wrote: > > >>> Dear R-help, > > >>> There seems to be a web page I am unable to download using RCurl. I > > >>> don't understand why it won't download: > > >>>> library(RCurl) > > >>>> my.url <- > > >>>> "http://www.nytimes.com/2009/01/07/technology/business-computing/07pro..." > > >>>> getURL(my.url) > > >>> [1] "" > > >> I like the irony that RCurl seems to have difficulties downloading an > > >> article about R. Good thing it is just a matter of additional arguments > > >> to getURL() or it would be bad news. > > > >> The followlocation parameter defaults to FALSE, so > > > >> getURL(my.url, followlocation = TRUE) > > > >> gets what you want. > > > >> The way I found this is > > > >> getURL(my.url, verbose = TRUE) > > > >> and take a look at the information being sent from R > > >> and received by R from the server. > > > >> This gives > > > >> * About to connect() towww.nytimes.comport80(#0) > > >> * Trying 199.239.136.200... * connected > > >> * Connected towww.nytimes.com(199.239.136.200) port 80 (#0) > > >> > GET /2009/01/07/technology/business-computing/07program.html?_r=2 > > >> HTTP/1.1 > > >> Host:www.nytimes.com > > >> Accept: */* > > > >> < HTTP/1.1 301 Moved Permanently > > >> < Server: Sun-ONE-Web-Server/6.1 > > >> < Date: Mon, 26 Jan 2009 16:10:51 GMT > > >> < Content-length: 0 > > >> < Content-type: text/html > > >> < > > >> Location:http://www.nytimes.com/glogin?URI=http://www.nytimes.com/2009/01/07/t... > > >> < > > > >> And the 301 is the critical thing here. > > > >> D. > > > >>> Other web pages are ok to download but this is the first time I have > > >>> been unable to download a web page using the very nice RCurl package. > > >>> While i can download the webpage using the RDCOMClient, i would like > > >>> to understand why it doesn't work as above please? > > >>>> library(RDCOMClient) > > >>>> my.url <- > > >>>> "http://www.nytimes.com/2009/01/07/technology/business-computing/07pro..." > > >>>> ie <- COMCreate("InternetExplorer.Application") > > >>>> txt <- list() > > >>>> ie$Navigate(my.url) > > >>> NULL > > >>>> while(ie[["Busy"]]) Sys.sleep(1) > > >>>> txt[[my.url]] <- ie[["document"]][["body"]][["innerText"]] > > >>>> txt > > >>> $`http://www.nytimes.com/2009/01/07/technology/business-computing/ > > >>> 07program.html?_r=2` > > >>> [1] "Skip to article Try Electronic Edition Log ... > > >>> Many thanks for your time, > > >>> C.C > > >>> Windows Vista, running with administrator privileges. > > >>>> sessionInfo() > > >>> R version 2.8.1 (2008-12-22) > > >>> i386-pc-mingw32 > > >>> locale: > > >>> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. > > >>> 1252;LC_MONETARY=English_United Kingdom. > > >>> 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > > >>> attached base packages: > > >>> [1] stats graphics grDevices utils datasets methods > > >>> base > > >>> other attached packages: > > >>> [1] RDCOMClient_0.92-0 RCurl_0.94-0 > > >>> loaded via a namespace (and not attached): > > >>> [1] tools_2.8.1 > > >>> ______________________________________________ > > >>> r-h...@r-project.org mailing list > > >>>https://stat.ethz.ch/mailman/listinfo/r-help > > >>> PLEASE do read the posting > > >>> guidehttp://www.R-project.org/posting-guide.html > > >>> and provide commented, minimal, self-contained, reproducible code. > > >> ______________________________________________ > > >> r-h...@r-project.org mailing > > >> listhttps://stat.ethz.ch/mailman/listinfo/r-help > > >> PLEASE do read the posting > > >> guidehttp://www.R-project.org/posting-guide.html > > >> and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > > r-h...@r-project.org mailing list > > >https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting > > > guidehttp://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ > > r-h...@r-project.org mailing > > listhttps://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.