Greetings - While I've used R a fair bit for basic statistical machinations, I've not used it for data manipulation - I've used SAS for 20+ years (and SAS real shines in data handling). So, I've started the process of trying to figure out 'how to do in R what I can do in my sleep in SAS' - specifically wrt to data manipulating. So, these are decidely 'newbie' level questions.
So, starting very simple. Created a tine example CSV file, which I call test.csv. Loc,cost A,1 C,3 D,2 F,3 H,4 K,3 M,8 Now, all I want to do is read it in, and derive a new variable which is a Z-transform of 'cost'. Here is what I've tried so far: > prices <- read.csv("c:/documents and > settings/user/desktop/test.csv",header=TRUE,sep=",",na.strings="."); > print(prices$cost); So far, so good (being able to pull in the data is a good thing). Now, while I'm sure there are lots of ways to do what I want, I'm going to brute force it, by calculating column mean and column SD for 'cost', generate the Z-transformed value, and then add it to the dataframe. However, here is where I'm having problems. After about an hour of searching, I realized I need to use an 'apply' function to apply a function (say, mean) to a column in a dataframe. But, I can seem to get it to work successfully (and this is the gist of the question). If I try > result <- sapply(prices['cost'],MARGIN=2,FUN=mean,na.rm=TRUE); > print(result); Works perfectly. But, if I simply change FUN=mean to FUN=sd, not so successful: If I try > result <- sapply(prices['cost'],MARGIN=2,FUN=sd,na.rm=TRUE); > print(result); Throws the following error: Error in FUN(X[[1L]], ...) : unused argument(s) (MARGIN = 2) Further, If I try > result <- sapply(prices$cost,MARGIN=2,FUN=mean,na.rm=TRUE); > print(result); it prints 8 values corresponding to the value of each element of the data set - meaning, its treating prices$cost as a row vector.Which makes no sense to me. What do I have to do to use prices$cost as the first argument in the sapply call? If I can't, why not? is.vector(prices$cost) shows TRUE, so why can't I take the mean over the vector? At any rate, I'll start from here. Being able to apply functions to column(s) of a dataframe seems pretty fundamental, so I'd like to start by understanding the basics. Thanks in advance. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.