Hi:

To illustrate the idea of vectorization that the previous posters raised,
here's a quick example of finding the z-scores that you requested:

# Define a vectorized function to do the standardization - the argument
# x below is a vector. We'll keep it simple and ignore the possibility of
# missing values and other complications...
std <- function(x) (x - mean(x))/sd(x)

# Create a new column in the original data frame for the z-scores,
# where df is the name of your data frame...
df <- transform(df, zscore = std(df[, 'cost']))
df
  Loc cost     zscore
1   A    1 -1.0912993
2   C    3 -0.1925822
3   D    2 -0.6419407
4   F    3 -0.1925822
5   H    4  0.2567763
6   K    3 -0.1925822
7   M    8  2.0542104

transform() is a function used to add one or more columns to an existing
data
frame, usually by performing some function on its rows. Since a data frame
can
be indexed by its rows and columns, the comma before 'cost' signifies that
we
are to choose the column of df named cost, and all rows. (BTW, indexing is a
very
powerful feature of R that can be used to great advantage in data
processing.)

Also notice how the std() function takes advantage of the vector property of
its input argument by computing the mean and standard deviation in-line and
mapping the results to each element of the vector through the function
definition.
It also implicitly applies the 'recycling rule', since mean(x) and sd(x) are
scalars
that we are mapping to vectors.I find this more intuitive than the 'SAS
way'.
It takes three lines to read in the data, define the standardization
function,
apply it and attach it to the data frame. How many lines of SAS code would
this take?

HTH,
Dennis

On Thu, May 20, 2010 at 2:42 PM, egc <forum.qu...@gmail.com> wrote:

> Greetings -
>
> While I've used R a fair bit for basic statistical machinations, I've
> not used it for data manipulation - I've used SAS for 20+ years (and
> SAS real shines in data handling). So, I've started the process of
> trying to figure out 'how to do in R what I can do in my sleep in SAS'
> - specifically wrt to data manipulating. So, these are decidely
> 'newbie' level questions.
>
> So, starting very simple. Created a tine example CSV file, which I
> call test.csv.
>
> Loc,cost
> A,1
> C,3
> D,2
> F,3
> H,4
> K,3
> M,8
>
> Now, all I want to do is read it in, and derive a new variable which
> is a Z-transform of 'cost'. Here is what I've tried so far:
>
> > prices <- read.csv("c:/documents and
> settings/user/desktop/test.csv",header=TRUE,sep=",",na.strings=".");
> >  print(prices$cost);
>
> So far, so good (being able to pull in the data is a good thing).
>
> Now, while I'm sure there are lots of ways to do what I want, I'm
> going to brute force it, by calculating column mean and column SD for
> 'cost', generate the Z-transformed value, and then add it to the
> dataframe. However, here is where I'm having problems. After about an
> hour of searching, I realized I need to use an 'apply' function to
> apply a function (say, mean) to a column in a dataframe. But, I can
> seem to get it to work successfully (and this is the gist of the
> question).
>
> If I try
>
> > result <- sapply(prices['cost'],MARGIN=2,FUN=mean,na.rm=TRUE);
> > print(result);
>
>
> Works perfectly.
>
> But, if I simply change FUN=mean to FUN=sd, not so successful:
>
> If I try
>
> > result <- sapply(prices['cost'],MARGIN=2,FUN=sd,na.rm=TRUE);
> > print(result);
>
> Throws the following error:
>
> Error in FUN(X[[1L]], ...) : unused argument(s) (MARGIN = 2)
>
> Further, If I try
>
> > result <- sapply(prices$cost,MARGIN=2,FUN=mean,na.rm=TRUE);
> > print(result);
>
> it prints 8 values corresponding to the value of each element of the
> data set - meaning, its treating prices$cost as a row vector.Which
> makes no sense to me. What do I have to do to use prices$cost as the
> first argument in the sapply call? If I can't, why not?
> is.vector(prices$cost) shows TRUE, so why can't I take the mean over
> the vector?
>
> At any rate, I'll start from here. Being able to apply functions to
> column(s) of a dataframe seems pretty fundamental, so I'd like to
> start by understanding the basics.
>
> Thanks in advance.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to