If you are willing to rethink the definition of your special function, the process can be simplified. The function lmc() log-mean centers a single grouped numeric vector. Then sapply() can be used to center a batch of them.
> lmc <- function(x, g) unsplit(lapply(split(log(x), g), scale, scale=FALSE), g) > dat2 <- data.frame(dat[,1:2], sapply(dat[,3:4], lmc, g=dat[,1])) > dat2 Location X..TimePeriod Units AveragePrice 1 Los Angeles 5/1/11 0.213682659 0.071790268 2 Los Angeles 5/8/11 -0.005370907 -0.072872965 3 Los Angeles 5/15/11 -0.208311751 0.001082696 4 New York 5/1/11 0.235465925 0.101474328 5 New York 5/8/11 -0.090253520 -0.087116841 6 New York 5/15/11 -0.145212404 -0.014357487 7 Paris 5/1/11 0.219331999 0.117331641 8 Paris 5/8/11 -0.048703076 -0.049141723 9 Paris 5/15/11 -0.170628923 -0.068189918 ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352 > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- > project.org] On Behalf Of arun > Sent: Sunday, December 09, 2012 10:27 AM > To: Ray DiGiacomo, Jr. > Cc: R help > Subject: Re: [R] Mean-Centering Question > > Hi, > > You could also use: > newFunction1<-function(x) {t(t(log(x))-colMeans(log(x)))} > > res1<- > by(dat1[c("Units","AveragePrice")],dat1["Location"],newFunction1) > res1 > #Location: Los Angeles > # Units AveragePrice > #1 0.213682659 0.071790268 > #2 -0.005370907 -0.072872965 > #3 -0.208311751 0.001082696 > #------------------------------------------------------------ > #Location: New York > # Units AveragePrice > #4 0.23546592 0.10147433 > #5 -0.09025352 -0.08711684 > #6 -0.14521240 -0.01435749 > #------------------------------------------------------------ > #Location: Paris > # Units AveragePrice > #7 0.21933200 0.11733164 > #8 -0.04870308 -0.04914172 > #9 -0.17062892 -0.06818992 > > > newFunction <- function(x) { sweep(log(x), 2, colMeans(log(x)), "-") > } > res<-by(dat1[c("Units","AveragePrice")],dat1["Location"],newFunction) > res > #Location: Los Angeles > # Units AveragePrice > #1 0.213682659 0.071790268 > #2 -0.005370907 -0.072872965 > #3 -0.208311751 0.001082696 > #------------------------------------------------------------ > #Location: New York > # Units AveragePrice > #4 0.23546592 0.10147433 > #5 -0.09025352 -0.08711684 > #6 -0.14521240 -0.01435749 > #------------------------------------------------------------ > #Location: Paris > # Units AveragePrice > #7 0.21933200 0.11733164 > #8 -0.04870308 -0.04914172 > #9 -0.17062892 -0.06818992 > > #the ?identical() will be FALSE, as the list elements for res is > data.frame and res1 is matrix. > > A.K. > > > ----- Original Message ----- > From: "Ray DiGiacomo, Jr." <r...@liondatasystems.com> > To: R Help <r-help@r-project.org> > Cc: > Sent: Saturday, December 8, 2012 11:11 PM > Subject: Re: [R] Mean-Centering Question > > Hi David and Arun, > > Thanks for looking into this. I think I have found a solution. > > The "by" function will run ok without errors but the values returned in > the > second row of the "Los Angeles" output are both incorrect. These > incorrect > values are shown below in red. > > I think my original custom function was causing the incorrect values > because the subtraction inside the original custom function was > subtracting > frames that had different dimensions and I think there was some > "recycling" > happening. > > Using the "sweep" function fixes the problem. This is what I did to > fix > things: > > # here is my "new" custom function > newFunction <- function(x) { sweep(log(x), 2, colMeans(log(x)), "-") } > > # this gives the correct values > by(PullData[c("Units","AveragePrice")], > PullData[c("StoreLocation")], > newFunction) > > - Ray > > > > > > On Sat, Dec 8, 2012 at 7:12 PM, David Winsemius > <dwinsem...@comcast.net>wrote: > > > > > On Dec 8, 2012, at 3:54 PM, Ray DiGiacomo, Jr. wrote: > > > > Hello, > >> > >> I'm trying to create a custom function that "mean-centers" data and > can be > >> applied across many columns. > >> > >> Here is an example dataset, which is similar to my dataset: > >> > >> > >> dat <- read.table(text="Location,**TimePeriod,Units,AveragePrice > > > > Los Angeles,5/1/11,61,5.42 > > Los Angeles,5/8/11,49,4.69 > > Los Angeles,5/15/11,40,5.05 > > New York,5/1/11,259,6.4 > > New York,5/8/11,187,5.3 > > New York,5/15/11,177,5.7 > > Paris,5/1/11,672,6.26 > > Paris,5/8/11,514,5.3 > > Paris,5/15/11,455,5.2", header=TRUE, sep=",") > > > > > >> I want to mean-center the "Units" and "AveragePrice" Columns. > >> > >> So, I created this function: > >> > >> specialFunction <- function(x){ log(x) - colMeans(log(x), na.rm = T) > } > >> > > > > I needed to modify this to avoid errors relating to how colMeans is > > expecting its arguments: > > > > specialFunction2 <- function(x){ log(x) - mean(log(x), na.rm = T) } > > > > aggregate(dat[3:4], dat[1], FUN=specialFunction2) > > > > Location Units.1 Units.2 Units.3 AveragePrice.1 > > AveragePrice.2 > > 1 Los Angeles 0.2136827 -0.0053709 -0.2083118 0.0717903 > > -0.0728730 > > 2 New York 0.2354659 -0.0902535 -0.1452124 0.1014743 > > -0.0871168 > > 3 Paris 0.2193320 -0.0487031 -0.1706289 0.1173316 > > -0.0491417 > > AveragePrice.3 > > 1 0.0010827 > > 2 -0.0143575 > > 3 -0.0681899 > > > > > > > >> If I use only "one" column in the first argument of the "by" > function, > >> everything is in fine. For example the following code will work > fine: > >> > >> by(data[c("Units")], > >> data["Location"], > >> specialFunction) > >> > >> But the following code will "not" work, because I have "two" columns > in > >> the > >> first argument... > >> > >> by(data[c("Units", "AveragePrice")], > >> data["Location"], > >> specialFunction) > >> > > > > OK. So then I tried this with your function and was surprised to see > that > > it also works: > > > > > by(dat[c("Units", "AveragePrice")], > > + dat["Location"], > > + specialFunction) > > Location: Los Angeles > > Units AveragePrice > > 1 0.21368 0.0717903 > > 2 *2.27351 -2.3517586* > > 3 -0.20831 0.0010827 > > ------------------------------**------------------------------**----- > - > > Location: New York > > Units AveragePrice > > 4 0.23547 0.101474 > > 5 3.47628 -3.653655 > > 6 -0.14521 -0.014357 > > ------------------------------**------------------------------**----- > - > > Location: Paris > > Units AveragePrice > > 7 0.21933 0.11733 > > 8 4.52537 -4.62322 > > 9 -0.17063 -0.06819 > > > > > > > >> Does anyone have any ideas as to what I am doing wrong? > >> > > > > I guess I don't. Cannot reproduce and my other methods worked as > well.This > > also works with your version and with mine but I get the deprecation > > message for `mean.data.frame` from mine: > > > > > lapply( split(dat[3:4], dat[1]) , FUN=specialFunction ) > > $`Los Angeles` > > Units AveragePrice > > 1 0.21368 0.0717903 > > 2 2.27351 -2.3517586 > > 3 -0.20831 0.0010827 > > > > $`New York` > > Units AveragePrice > > 4 0.23547 0.101474 > > 5 3.47628 -3.653655 > > 6 -0.14521 -0.014357 > > > > $Paris > > Units AveragePrice > > 7 0.21933 0.11733 > > 8 4.52537 -4.62322 > > 9 -0.17063 -0.06819 > > > > > > > >> Please note that I'm trying to get the following results (for the > "Los > >> Angeles" group): > >> > >> Los Angeles "Units" variable (Mean-Centered) > >> 0.213682659 > >> -0.005370907 > >> -0.208311751 > >> > >> Los Angeles "AveragePrice" variable (Mean-Centered) > >> 0.071790268 > >> -0.072872965 > >> 0.001082696 > >> > > > > -- > > > > David Winsemius, MD > > Alameda, CA, USA > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.