Really? I don't usually think of Vectorize as a performance  
enhancement, probably because my use of with a complex function then  
gets applied to 4.5 million records. I need to go out, get a cup of  
coffee, and leave it alone for about half an hour. I tried  recently  
to figure out how I can do the matrix look-up and function application  
without the Vectorize route but gave up after a couple of hours after  
realizing that I had a method that worked and I had spent way more  
time on it than just doing it would have.

Glad it helped.
David.

On Jul 9, 2010, at 11:01 AM, harsh yadav wrote:

> Hi,
>
> Thanks a lot.
> The Vectorize method worked and its much faster than looping through  
> the data frame.
>
> Regards,
> Harsh Yadav
>
> On Thu, Jul 8, 2010 at 11:06 PM, David Winsemius <dwinsem...@comcast.net 
> > wrote:
>
> On Jul 8, 2010, at 10:33 PM, Erik Iverson wrote:
>
>
> I have a data frame:
>     id      
> url                                                         urlType
> 1     1      www.yahoo.com <http:// 
> www.yahoo.com>                                    1
> 2     2      www.google.com/?search= <http://www.google.com/? 
> search=>                     2
> 3     3      www.google.com <http:// 
> www.google.com>                                   1
> 4     4      www.yahoo.com/?query= <http://www.yahoo.com/? 
> query=>                       2
> 5     5      www.gmail.com <http:// 
> www.gmail.com>                                     1
>
> This is not output from ?dput, which means more work to read it in.
>
>
> Yeah it was kind of pain, but ...
>
> dta <- read.table(textConnection('     id      
> url                                                         urlType
>
> 1     1      "www.yahoo.com <http://www.yahoo.com>"      1
> 2     2      "www.google.com/?search= <http://www.google.com/? 
> search=>" 2
> 3     3      "www.google.com <http://www.google.com>" 1
> 4     4      "www.yahoo.com/?query= <http://www.yahoo.com/? 
> query=>"   2
> 5     5      "www.gmail.com <http://www.gmail.com>" 1') )
>
>
>
>
> Here is the definition for WHITELIST:-
> WHITELIST = "[?]query=, [?]search="
> WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))
>
> What is the 'trim' function?  I do not have that defined.
>
> Perhaps David's answer will work for you...
>
> Seems to ... after I fixed my incorrect cmd-V paste of the function  
> name and guessing that trim was the one in gdata:
>
> > require(gdata)
>
> > checkBaseLine <- function(s){
> + for (listItem in WHITELIST){
> + if(regexpr(as.character(listItem), s)[1] > -1){
> + return(TRUE)
> + }
> + }
> + return(FALSE)
> + }
> >
> > #Here is the definition for WHITELIST:-
>
> >
> > WHITELIST = "[?]query=, [?]search="
> > WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))
> > vcheck <- Vectorize(checkBaseLine)
> >
> > vcheck <- Vectorize(checkBaseLine)
> >
> > dta[ dta$urlType != 1 & vcheck(dta$url) , "url" ]
> [1] www.google.com/?search= <http://www.google.com/?search=> 
> www.yahoo.com/?query= 
>  <http://www.yahoo.com/?query=>
> 5 Levels: www.gmail.com <http://www.gmail.com> www.google.com 
> <http://www.google.com 
> > ... www.yahoo.com/?query= <http://www.yahoo.com/?query=>
>
> -- 
> David.
>

David Winsemius, MD
West Hartford, CT


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to