On 14-01-07 3:21 PM, Ron Michael wrote:
Hi,
I have to perform some formula driven calculation in a data.frame (as defined
below). Let say I have following DF:
DF <- data.frame(A1 = c('a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c'), A2 =
c('m', 'n', 'p', 'm', 'n', 'p', 'm', 'n', 'p'), A3 = c(1,2,3,4,5,6,7,8,9))
DF
A1 A2 A3
1 a m 1
2 a n 2
3 a p 3
4 b m 4
5 b n 5
6 b p 6
7 c m 7
8 c n 8
9 c p 9
Now let say, user gives one formula which will be applied on the elements of A1
column. Let say the formula looks like:
z = a + 2*b + c (infact the formula will be arbitrary like z = f(a, b, c))
Once such formula is given, the result will be like (for the columns A1, A2, A3
respectively)
z m 16
z n 20
z p 24
the last column comes from the fact that 1 + 2*4 + 7 = 16, 2 + 2*5 + 8 = 20, 3
+ 2*6 + 9 = 24
Given that the formula wil be user defined, and to be applied on some
data.frame like DF, I am seeking some automated way to accomplice the task for
really big DF of previous kind and fairly complex formula.
Can somebody suggest me for efficient way to perform this task in R?
A dataframe isn't really the best structure for this problem. What you
really have in R terms are three environments, indexed by A2, each
containing bindings to a, b and c. Within each of those environments
you want to create a new binding to z, according to the user-supplied
formula.
The way I'd implement that pretty much matches my description. Have a
named list of environments, write a function to evaluate the formula and
assign the value, then just lapply it to your list.
If you really do want things in the dataframe format, then write
functions to convert to it at the beginning, and from it at the very
end. Don't work with that format if efficiency matters to you.
Duncan Murdoch
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.