Really? I don't usually think of Vectorize as a performance enhancement, probably because my use of with a complex function then gets applied to 4.5 million records. I need to go out, get a cup of coffee, and leave it alone for about half an hour. I tried recently to figure out how I can do the matrix look-up and function application without the Vectorize route but gave up after a couple of hours after realizing that I had a method that worked and I had spent way more time on it than just doing it would have.
Glad it helped. David. On Jul 9, 2010, at 11:01 AM, harsh yadav wrote: > Hi, > > Thanks a lot. > The Vectorize method worked and its much faster than looping through > the data frame. > > Regards, > Harsh Yadav > > On Thu, Jul 8, 2010 at 11:06 PM, David Winsemius <dwinsem...@comcast.net > > wrote: > > On Jul 8, 2010, at 10:33 PM, Erik Iverson wrote: > > > I have a data frame: > id > url urlType > 1 1 www.yahoo.com <http:// > www.yahoo.com> 1 > 2 2 www.google.com/?search= <http://www.google.com/? > search=> 2 > 3 3 www.google.com <http:// > www.google.com> 1 > 4 4 www.yahoo.com/?query= <http://www.yahoo.com/? > query=> 2 > 5 5 www.gmail.com <http:// > www.gmail.com> 1 > > This is not output from ?dput, which means more work to read it in. > > > Yeah it was kind of pain, but ... > > dta <- read.table(textConnection(' id > url urlType > > 1 1 "www.yahoo.com <http://www.yahoo.com>" 1 > 2 2 "www.google.com/?search= <http://www.google.com/? > search=>" 2 > 3 3 "www.google.com <http://www.google.com>" 1 > 4 4 "www.yahoo.com/?query= <http://www.yahoo.com/? > query=>" 2 > 5 5 "www.gmail.com <http://www.gmail.com>" 1') ) > > > > > Here is the definition for WHITELIST:- > WHITELIST = "[?]query=, [?]search=" > WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ","))) > > What is the 'trim' function? I do not have that defined. > > Perhaps David's answer will work for you... > > Seems to ... after I fixed my incorrect cmd-V paste of the function > name and guessing that trim was the one in gdata: > > > require(gdata) > > > checkBaseLine <- function(s){ > + for (listItem in WHITELIST){ > + if(regexpr(as.character(listItem), s)[1] > -1){ > + return(TRUE) > + } > + } > + return(FALSE) > + } > > > > #Here is the definition for WHITELIST:- > > > > > WHITELIST = "[?]query=, [?]search=" > > WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ","))) > > vcheck <- Vectorize(checkBaseLine) > > > > vcheck <- Vectorize(checkBaseLine) > > > > dta[ dta$urlType != 1 & vcheck(dta$url) , "url" ] > [1] www.google.com/?search= <http://www.google.com/?search=> > www.yahoo.com/?query= > <http://www.yahoo.com/?query=> > 5 Levels: www.gmail.com <http://www.gmail.com> www.google.com > <http://www.google.com > > ... www.yahoo.com/?query= <http://www.yahoo.com/?query=> > > -- > David. > David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.