Hi, Thanks a lot. The Vectorize method worked and its much faster than looping through the data frame.
Regards, Harsh Yadav On Thu, Jul 8, 2010 at 11:06 PM, David Winsemius <dwinsem...@comcast.net>wrote: > > On Jul 8, 2010, at 10:33 PM, Erik Iverson wrote: > > >> I have a data frame: >>> id url >>> urlType >>> 1 1 www.yahoo.com <http://www.yahoo.com> >>> 1 >>> 2 2 www.google.com/?search= <http://www.google.com/?search=> >>> 2 >>> 3 3 www.google.com <http://www.google.com> >>> 1 >>> 4 4 www.yahoo.com/?query= <http://www.yahoo.com/?query=> >>> 2 >>> 5 5 www.gmail.com <http://www.gmail.com> >>> 1 >>> >> >> This is not output from ?dput, which means more work to read it in. >> >> > Yeah it was kind of pain, but ... > > dta <- read.table(textConnection(' id url > urlType > > 1 1 "www.yahoo.com <http://www.yahoo.com>" 1 > 2 2 "www.google.com/?search= <http://www.google.com/?search=>" 2 > 3 3 "www.google.com <http://www.google.com>" 1 > 4 4 "www.yahoo.com/?query= <http://www.yahoo.com/?query=>" 2 > 5 5 "www.gmail.com <http://www.gmail.com>" 1') ) > > > > >> Here is the definition for WHITELIST:- >>> WHITELIST = "[?]query=, [?]search=" >>> WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ","))) >>> >> >> What is the 'trim' function? I do not have that defined. >> >> Perhaps David's answer will work for you... >> > > Seems to ... after I fixed my incorrect cmd-V paste of the function name > and guessing that trim was the one in gdata: > > > require(gdata) > > > checkBaseLine <- function(s){ > + for (listItem in WHITELIST){ > + if(regexpr(as.character(listItem), s)[1] > -1){ > + return(TRUE) > + } > + } > + return(FALSE) > + } > > > > #Here is the definition for WHITELIST:- > > > > > WHITELIST = "[?]query=, [?]search=" > > WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ","))) > > vcheck <- Vectorize(checkBaseLine) > > > > vcheck <- Vectorize(checkBaseLine) > > > > dta[ dta$urlType != 1 & vcheck(dta$url) , "url" ] > [1] www.google.com/?search= <http://www.google.com/?search=> > www.yahoo.com/?query= <http://www.yahoo.com/?query=> > 5 Levels: www.gmail.com <http://www.gmail.com> www.google.com < > http://www.google.com> ... www.yahoo.com/?query= < > http://www.yahoo.com/?query=> > > -- > David. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.