Hi,

Thanks a lot.
The Vectorize method worked and its much faster than looping through the
data frame.

Regards,
Harsh Yadav

On Thu, Jul 8, 2010 at 11:06 PM, David Winsemius <dwinsem...@comcast.net>wrote:

>
> On Jul 8, 2010, at 10:33 PM, Erik Iverson wrote:
>
>
>>  I have a data frame:
>>>     id     url
>>> urlType
>>> 1     1      www.yahoo.com <http://www.yahoo.com>
>>>              1
>>> 2     2      www.google.com/?search= <http://www.google.com/?search=>
>>>                   2
>>> 3     3      www.google.com <http://www.google.com>
>>>               1
>>> 4     4      www.yahoo.com/?query= <http://www.yahoo.com/?query=>
>>>                 2
>>> 5     5      www.gmail.com <http://www.gmail.com>
>>>               1
>>>
>>
>> This is not output from ?dput, which means more work to read it in.
>>
>>
> Yeah it was kind of pain, but ...
>
> dta <- read.table(textConnection('     id     url
>                               urlType
>
> 1     1      "www.yahoo.com <http://www.yahoo.com>"      1
> 2     2      "www.google.com/?search= <http://www.google.com/?search=>" 2
> 3     3      "www.google.com <http://www.google.com>" 1
> 4     4      "www.yahoo.com/?query= <http://www.yahoo.com/?query=>"   2
> 5     5      "www.gmail.com <http://www.gmail.com>" 1') )
>
>
>
>
>>  Here is the definition for WHITELIST:-
>>> WHITELIST = "[?]query=, [?]search="
>>> WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))
>>>
>>
>> What is the 'trim' function?  I do not have that defined.
>>
>> Perhaps David's answer will work for you...
>>
>
> Seems to ... after I fixed my incorrect cmd-V paste of the function name
> and guessing that trim was the one in gdata:
>
> > require(gdata)
>
> > checkBaseLine <- function(s){
> + for (listItem in WHITELIST){
> + if(regexpr(as.character(listItem), s)[1] > -1){
> + return(TRUE)
> + }
> + }
> + return(FALSE)
> + }
> >
> > #Here is the definition for WHITELIST:-
>
> >
> > WHITELIST = "[?]query=, [?]search="
> > WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))
> > vcheck <- Vectorize(checkBaseLine)
> >
> > vcheck <- Vectorize(checkBaseLine)
> >
> > dta[ dta$urlType != 1 & vcheck(dta$url) , "url" ]
> [1] www.google.com/?search= <http://www.google.com/?search=>
> www.yahoo.com/?query= <http://www.yahoo.com/?query=>
> 5 Levels: www.gmail.com <http://www.gmail.com> www.google.com <
> http://www.google.com> ... www.yahoo.com/?query= <
> http://www.yahoo.com/?query=>
>
> --
> David.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to