On May 20, 2010, at 5:42 PM, egc wrote:
Greetings -
While I've used R a fair bit for basic statistical machinations, I've
not used it for data manipulation - I've used SAS for 20+ years (and
SAS real shines in data handling). So, I've started the process of
trying to figure out 'how to do in R what I can do in my sleep in SAS'
- specifically wrt to data manipulating. So, these are decidely
'newbie' level questions.
So, starting very simple. Created a tine example CSV file, which I
call test.csv.
Loc,cost
A,1
C,3
D,2
F,3
H,4
K,3
M,8
Now, all I want to do is read it in, and derive a new variable which
is a Z-transform of 'cost'. Here is what I've tried so far:
prices <- read.csv("c:/documents and settings/user/desktop/
test.csv",header=TRUE,sep=",",na.strings=".");
print(prices$cost);
So far, so good (being able to pull in the data is a good thing).
Now, while I'm sure there are lots of ways to do what I want, I'm
going to brute force it, by calculating column mean and column SD for
'cost', generate the Z-transformed value, and then add it to the
dataframe. However, here is where I'm having problems. After about an
hour of searching, I realized I need to use an 'apply' function to
apply a function (say, mean) to a column in a dataframe. But, I can
seem to get it to work successfully (and this is the gist of the
question).
If I try
result <- sapply(prices['cost'],MARGIN=2,FUN=mean,na.rm=TRUE);
print(result);
I suspect you are missing the easy way to do this;
mean( prices['cost'] )
Works perfectly.
But, if I simply change FUN=mean to FUN=sd, not so successful:
If I try
result <- sapply(prices['cost'],MARGIN=2,FUN=sd,na.rm=TRUE);
print(result);
Try:
result <- sd(prices['cost'])
R functions often expect to work on vectors without an explicit look
or apply function.
Throws the following error:
Error in FUN(X[[1L]], ...) : unused argument(s) (MARGIN = 2)
Further, If I try
result <- sapply(prices$cost,MARGIN=2,FUN=mean,na.rm=TRUE);
print(result);
it prints 8 values corresponding to the value of each element of the
data set - meaning, its treating prices$cost as a row vector.Which
makes no sense to me. What do I have to do to use prices$cost as the
first argument in the sapply call?
Not use sapply. "sapply" generally will be used to produce a vector or
list as a result. If you only want a scalar, then it's not the right
tool.
If I can't, why not?
is.vector(prices$cost) shows TRUE, so why can't I take the mean over
the vector?
At any rate, I'll start from here. Being able to apply functions to
column(s) of a dataframe seems pretty fundamental, so I'd like to
start by understanding the basics.
Thanks in advance.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.