Hello, please help me with this basic question, I already spent two days on the internet and textbooks trying to come up with an answer... I will simplify my question to an example, rather than base it on the original variable names. I have a Dataset with 4 variables, 20000 cases. Variable A is an ID. Variable B is a continuous numerical variable, unique to each A. Variable C is categorical factor, has 6 possible levels. Variable D is also categorical factor, has 300 different levels.
I would like to create a new variable=E, which is the standard deviation of B around the group means of B, groups defined by C and D. I had no problem creating such column to get group means (with the ave() function), but can not find a solution for another function like sd that would assign proper group value to each case. I tried Dataset$E <- with(Dataset, tapply(B, list(C,D),FUN=sd)) but it is wrong, as it takes the 1800 different SD values, puts them in column E, then puts the same array of numbers there below it, repeats as many times as possible until the column is filled. The SD values are not corresponding to the proper groups. How can I match these data (1800 different SD values) to their corresponding cases in my original data? Is there a shortcut to do this all in one line, as for the means with the ave() function? I also tried ddply but I am doing something wrong (my R is on Linux and do not yet know how to get error messages, so I do not know what is wrong with my lines). Thank you for any help! Please give me as detailed script as possible. Zsuzsa ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.