Hi, You could also use: newFunction1<-function(x) {t(t(log(x))-colMeans(log(x)))}
res1<-by(dat1[c("Units","AveragePrice")],dat1["Location"],newFunction1) res1 #Location: Los Angeles # Units AveragePrice #1 0.213682659 0.071790268 #2 -0.005370907 -0.072872965 #3 -0.208311751 0.001082696 #------------------------------------------------------------ #Location: New York # Units AveragePrice #4 0.23546592 0.10147433 #5 -0.09025352 -0.08711684 #6 -0.14521240 -0.01435749 #------------------------------------------------------------ #Location: Paris # Units AveragePrice #7 0.21933200 0.11733164 #8 -0.04870308 -0.04914172 #9 -0.17062892 -0.06818992 newFunction <- function(x) { sweep(log(x), 2, colMeans(log(x)), "-") } res<-by(dat1[c("Units","AveragePrice")],dat1["Location"],newFunction) res #Location: Los Angeles # Units AveragePrice #1 0.213682659 0.071790268 #2 -0.005370907 -0.072872965 #3 -0.208311751 0.001082696 #------------------------------------------------------------ #Location: New York # Units AveragePrice #4 0.23546592 0.10147433 #5 -0.09025352 -0.08711684 #6 -0.14521240 -0.01435749 #------------------------------------------------------------ #Location: Paris # Units AveragePrice #7 0.21933200 0.11733164 #8 -0.04870308 -0.04914172 #9 -0.17062892 -0.06818992 #the ?identical() will be FALSE, as the list elements for res is data.frame and res1 is matrix. A.K. ----- Original Message ----- From: "Ray DiGiacomo, Jr." <r...@liondatasystems.com> To: R Help <r-help@r-project.org> Cc: Sent: Saturday, December 8, 2012 11:11 PM Subject: Re: [R] Mean-Centering Question Hi David and Arun, Thanks for looking into this. I think I have found a solution. The "by" function will run ok without errors but the values returned in the second row of the "Los Angeles" output are both incorrect. These incorrect values are shown below in red. I think my original custom function was causing the incorrect values because the subtraction inside the original custom function was subtracting frames that had different dimensions and I think there was some "recycling" happening. Using the "sweep" function fixes the problem. This is what I did to fix things: # here is my "new" custom function newFunction <- function(x) { sweep(log(x), 2, colMeans(log(x)), "-") } # this gives the correct values by(PullData[c("Units","AveragePrice")], PullData[c("StoreLocation")], newFunction) - Ray On Sat, Dec 8, 2012 at 7:12 PM, David Winsemius <dwinsem...@comcast.net>wrote: > > On Dec 8, 2012, at 3:54 PM, Ray DiGiacomo, Jr. wrote: > > Hello, >> >> I'm trying to create a custom function that "mean-centers" data and can be >> applied across many columns. >> >> Here is an example dataset, which is similar to my dataset: >> >> >> dat <- read.table(text="Location,**TimePeriod,Units,AveragePrice > > Los Angeles,5/1/11,61,5.42 > Los Angeles,5/8/11,49,4.69 > Los Angeles,5/15/11,40,5.05 > New York,5/1/11,259,6.4 > New York,5/8/11,187,5.3 > New York,5/15/11,177,5.7 > Paris,5/1/11,672,6.26 > Paris,5/8/11,514,5.3 > Paris,5/15/11,455,5.2", header=TRUE, sep=",") > > >> I want to mean-center the "Units" and "AveragePrice" Columns. >> >> So, I created this function: >> >> specialFunction <- function(x){ log(x) - colMeans(log(x), na.rm = T) } >> > > I needed to modify this to avoid errors relating to how colMeans is > expecting its arguments: > > specialFunction2 <- function(x){ log(x) - mean(log(x), na.rm = T) } > > aggregate(dat[3:4], dat[1], FUN=specialFunction2) > > Location Units.1 Units.2 Units.3 AveragePrice.1 > AveragePrice.2 > 1 Los Angeles 0.2136827 -0.0053709 -0.2083118 0.0717903 > -0.0728730 > 2 New York 0.2354659 -0.0902535 -0.1452124 0.1014743 > -0.0871168 > 3 Paris 0.2193320 -0.0487031 -0.1706289 0.1173316 > -0.0491417 > AveragePrice.3 > 1 0.0010827 > 2 -0.0143575 > 3 -0.0681899 > > > >> If I use only "one" column in the first argument of the "by" function, >> everything is in fine. For example the following code will work fine: >> >> by(data[c("Units")], >> data["Location"], >> specialFunction) >> >> But the following code will "not" work, because I have "two" columns in >> the >> first argument... >> >> by(data[c("Units", "AveragePrice")], >> data["Location"], >> specialFunction) >> > > OK. So then I tried this with your function and was surprised to see that > it also works: > > > by(dat[c("Units", "AveragePrice")], > + dat["Location"], > + specialFunction) > Location: Los Angeles > Units AveragePrice > 1 0.21368 0.0717903 > 2 *2.27351 -2.3517586* > 3 -0.20831 0.0010827 > ------------------------------**------------------------------**------ > Location: New York > Units AveragePrice > 4 0.23547 0.101474 > 5 3.47628 -3.653655 > 6 -0.14521 -0.014357 > ------------------------------**------------------------------**------ > Location: Paris > Units AveragePrice > 7 0.21933 0.11733 > 8 4.52537 -4.62322 > 9 -0.17063 -0.06819 > > > >> Does anyone have any ideas as to what I am doing wrong? >> > > I guess I don't. Cannot reproduce and my other methods worked as well.This > also works with your version and with mine but I get the deprecation > message for `mean.data.frame` from mine: > > > lapply( split(dat[3:4], dat[1]) , FUN=specialFunction ) > $`Los Angeles` > Units AveragePrice > 1 0.21368 0.0717903 > 2 2.27351 -2.3517586 > 3 -0.20831 0.0010827 > > $`New York` > Units AveragePrice > 4 0.23547 0.101474 > 5 3.47628 -3.653655 > 6 -0.14521 -0.014357 > > $Paris > Units AveragePrice > 7 0.21933 0.11733 > 8 4.52537 -4.62322 > 9 -0.17063 -0.06819 > > > >> Please note that I'm trying to get the following results (for the "Los >> Angeles" group): >> >> Los Angeles "Units" variable (Mean-Centered) >> 0.213682659 >> -0.005370907 >> -0.208311751 >> >> Los Angeles "AveragePrice" variable (Mean-Centered) >> 0.071790268 >> -0.072872965 >> 0.001082696 >> > > -- > > David Winsemius, MD > Alameda, CA, USA > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.