Re: [R] Parsing aspects of a url path in R

2014-03-06 Thread arun
Hi, In addition, you could also do: gsub(".*www\\.([[:alnum:]]+\\.[[:alnum:]]+).*","\\1",url) #[1] "mdd.com"    "mdd.com"    "mdd.com"    "genius.com" "google.com"  gsub(".*www\\.([[:alnum:]]+\\.[[:alnum:]]+).*","\\1",url2) #[1] "mdd.com"    "mdd.com"    "mdd.edu"    "genius.gov" "google.com" gs

Re: [R] Parsing aspects of a url path in R

2014-03-06 Thread arun
Try: gsub(".*\\.com","",url) [1] "/food/pizza/index.html" "/build-your-own/index.html" [3] "/special-deals.html"    "/find-a-location.html" [5] "/hello.html"     gsub(".*www\\.([[:alpha:]]+\\.com).*","\\1",url) #[1] "mdd.com"    "mdd.com"    "mdd.com"    "genius.com" "go

Re: [R] Parsing aspects of a url path in R

2014-03-06 Thread Ben Tupper
Hi, The XML package has a nice function, parseURI(), that nicely slice and dices the url. library(XML) parseURI('http://www.mdd.com/food/pizza/index.html') Might that help? Cheers, Ben On Mar 6, 2014, at 12:23 PM, Abraham Mathew wrote: > Let's say that I have the following character vecto

Re: [R] Parsing aspects of a url path in R

2014-03-06 Thread Abraham Mathew
Oh, that's perfect. I can just use one of the apply functions to run that each url and then extract the methods that I need. Thanks! On Thu, Mar 6, 2014 at 11:52 AM, Ben Tupper wrote: > Hi, > > The XML package has a nice function, parseURI(), that nicely slice and > dices the url. > > libr

Re: [R] Parsing aspects of a url path in R

2014-03-06 Thread Ista Zahn
See the parse_url function in the httr package. It does all this and more. On Mar 6, 2014 2:45 PM, "Sarah Goslee" wrote: > There are many ways to do this. Here's a simple version and a slightly > fancier version: > > > url = c("http://www.mdd.com/food/pizza/index.html";, > "http://www.mdd.com/bui

Re: [R] Parsing aspects of a url path in R

2014-03-06 Thread Sarah Goslee
There are many ways to do this. Here's a simple version and a slightly fancier version: url = c("http://www.mdd.com/food/pizza/index.html";, "http://www.mdd.com/build-your-own/index.html";, "http://www.mdd.com/special-deals.html";, "http://www.genius.com/find-a-location.html";, "http://www.google

[R] Parsing aspects of a url path in R

2014-03-06 Thread Abraham Mathew
Let's say that I have the following character vector with a series of url strings. I'm interested in extracting some information from each string. url = c("http://www.mdd.com/food/pizza/index.html";, " http://www.mdd.com/build-your-own/index.html";, "http://www.mdd.com/special-deals.html";