Hi: To illustrate the idea of vectorization that the previous posters raised, here's a quick example of finding the z-scores that you requested:
# Define a vectorized function to do the standardization - the argument # x below is a vector. We'll keep it simple and ignore the possibility of # missing values and other complications... std <- function(x) (x - mean(x))/sd(x) # Create a new column in the original data frame for the z-scores, # where df is the name of your data frame... df <- transform(df, zscore = std(df[, 'cost'])) df Loc cost zscore 1 A 1 -1.0912993 2 C 3 -0.1925822 3 D 2 -0.6419407 4 F 3 -0.1925822 5 H 4 0.2567763 6 K 3 -0.1925822 7 M 8 2.0542104 transform() is a function used to add one or more columns to an existing data frame, usually by performing some function on its rows. Since a data frame can be indexed by its rows and columns, the comma before 'cost' signifies that we are to choose the column of df named cost, and all rows. (BTW, indexing is a very powerful feature of R that can be used to great advantage in data processing.) Also notice how the std() function takes advantage of the vector property of its input argument by computing the mean and standard deviation in-line and mapping the results to each element of the vector through the function definition. It also implicitly applies the 'recycling rule', since mean(x) and sd(x) are scalars that we are mapping to vectors.I find this more intuitive than the 'SAS way'. It takes three lines to read in the data, define the standardization function, apply it and attach it to the data frame. How many lines of SAS code would this take? HTH, Dennis On Thu, May 20, 2010 at 2:42 PM, egc <forum.qu...@gmail.com> wrote: > Greetings - > > While I've used R a fair bit for basic statistical machinations, I've > not used it for data manipulation - I've used SAS for 20+ years (and > SAS real shines in data handling). So, I've started the process of > trying to figure out 'how to do in R what I can do in my sleep in SAS' > - specifically wrt to data manipulating. So, these are decidely > 'newbie' level questions. > > So, starting very simple. Created a tine example CSV file, which I > call test.csv. > > Loc,cost > A,1 > C,3 > D,2 > F,3 > H,4 > K,3 > M,8 > > Now, all I want to do is read it in, and derive a new variable which > is a Z-transform of 'cost'. Here is what I've tried so far: > > > prices <- read.csv("c:/documents and > settings/user/desktop/test.csv",header=TRUE,sep=",",na.strings="."); > > print(prices$cost); > > So far, so good (being able to pull in the data is a good thing). > > Now, while I'm sure there are lots of ways to do what I want, I'm > going to brute force it, by calculating column mean and column SD for > 'cost', generate the Z-transformed value, and then add it to the > dataframe. However, here is where I'm having problems. After about an > hour of searching, I realized I need to use an 'apply' function to > apply a function (say, mean) to a column in a dataframe. But, I can > seem to get it to work successfully (and this is the gist of the > question). > > If I try > > > result <- sapply(prices['cost'],MARGIN=2,FUN=mean,na.rm=TRUE); > > print(result); > > > Works perfectly. > > But, if I simply change FUN=mean to FUN=sd, not so successful: > > If I try > > > result <- sapply(prices['cost'],MARGIN=2,FUN=sd,na.rm=TRUE); > > print(result); > > Throws the following error: > > Error in FUN(X[[1L]], ...) : unused argument(s) (MARGIN = 2) > > Further, If I try > > > result <- sapply(prices$cost,MARGIN=2,FUN=mean,na.rm=TRUE); > > print(result); > > it prints 8 values corresponding to the value of each element of the > data set - meaning, its treating prices$cost as a row vector.Which > makes no sense to me. What do I have to do to use prices$cost as the > first argument in the sapply call? If I can't, why not? > is.vector(prices$cost) shows TRUE, so why can't I take the mean over > the vector? > > At any rate, I'll start from here. Being able to apply functions to > column(s) of a dataframe seems pretty fundamental, so I'd like to > start by understanding the basics. > > Thanks in advance. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.