On Jul 8, 2010, at 10:09 PM, harsh yadav wrote:
Hi,
Here is a somewhat detailed explanation of what I want to achieve:
I have a data frame:
id url
urlType
1 1 www.yahoo.com 1
2 2 www.google.com/?search= 2
3 3 www.google.com 1
4 4 www.yahoo.com/?query= 2
5 5 www.gmail.com 1
I want to get all the URLs that are not of type `1` and satisfy the
condition defined by the following function:
checkBaseLine <- function(s){
for (listItem in WHITELIST){
if(regexpr(as.character(listItem), s)[1] > -1){
return(TRUE)
}
}
return(FALSE)
}
Here is the definition for WHITELIST:-
WHITELIST = "[?]query=, [?]search="
WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))
Now, for the given data frame I want to apply the above function for
all row values for a given column:-
That is:
It works fine when I define a condition like:
data <- data[data$urlType != 1,]
Arrrgh. Why do people keep using "data" as an object name? Is there
some water pump from which I can remove the handle?
Anyway ... try:
vcheck <- Vectorize(V)
data[ data$urlType != 1 & vcheck(data$url) , "url" ]
--
David
However, I want to combine two logical conditions together like:
data <- data[data$urlType != 1 & checkBaseLine(data$url),]
This would check whether the column `urlType` contains row values
that !=
1, and the column `url` contains row values that satisfy the function
definition.
Any ideas how this can be done?
Thanks in advance.
Regards,
Harsh Yadav
On Thu, Jul 8, 2010 at 9:43 PM, Erik Iverson <er...@ccbr.umn.edu>
wrote:
It will be a lot easier to help you if you follow the posting guide
and
PLEASE do read the posting guide and provide commented, minimal,
self-contained, reproducible code.
You gave your function definition, which is good. Use ?dput to
give us a
small data.frame that can accurately show what you want.
harsh yadav wrote:
Hi all,
I have a data frame for which I want to limit the output by checking
whether
row values for specific column meets particular conditions.
Here are the more specific details:
I have a function that checks whether an input string exists in a
defined
list:-
checkBaseLine <- function(s){
for (listItem in WHITELIST){
if(regexpr(as.character(listItem), s)[1] > -1){
return(TRUE)
}
}
return(FALSE)
}
Now, I have a data frame for which I want to apply the above
function for
all row values for a given column:-
This works fine when I define a condition like:
data <- data[data$urlType != 1,]
However, I want to combine two logical conditions together like:
data <- data[data$urlType != 1 & checkBaseLine(data$url),]
This would check whether the column `urlType` contains row values
that !=
1,
and the column `url` contains row values that gets evaluated using
the
defined function.
Any ideas how this can be done?
Thanks in advance.
Regards,
Harsh Yadav
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.