Peter. For suggestion 1, what advantages are there to using coef() rather than $coef?
For suggestion 2, thanks! I'm new to the plyr package and wasn't aware of the mutate() function. Jean On Wed, Apr 3, 2013 at 1:01 PM, Peter Ehlers <ehl...@ucalgary.ca> wrote: > A few minor improvements to Jean's post suggested inline below. > > > On 2013-04-03 05:41, Adams, Jean wrote: > >> Cecilia, >> >> Thanks for providing a reproducible example. Excellent. >> >> You could use the ddply() function in the plyr package to fit the model >> for >> each industry and year, keep the coefficients, and then estimate the >> fitted >> and residual values. >> >> Jean >> >> library(plyr) >> coef <- ddply(final3, .(industry, year), function(dat) lm(Y ~ X + Z, >> data=dat)$coef) >> names(coef) <- c("industry", "year", "b0", "b1", "b2") >> final4 <- merge(final3, coef) >> newdata1 <- transform(final4, Yhat = b0 + b1*X + b2*Z) >> newdata2 <- transform(newdata1, residual = Y-Yhat) >> plot(as.factor(newdata2$firm), newdata2$residual) >> > > Suggestion 1: > Use the extractor function coef() and also avoid using the name > of an R function as a variable name: > > Coef <- ddply(...., function(dat) coef(lm(....))) > > Suggestion 2: > Use plyr's mutate() to do both transforms at once: > > newdata <- mutate(final4, > Yhat = b0 + b1*X + b2*Z, > residual = Y-Yhat) > > [Or you could use within(), but I now find mutate handier, mainly > because it doesn't 'reverse' the order of the new variables.] > > Suggestion 3: > Use the 'data=' argument in the plot: > > boxplot(residual ~ firm, data = newdata) > > Peter Ehlers > > > >> On Wed, Apr 3, 2013 at 3:38 AM, Cecilia Carmo <cecilia.ca...@ua.pt> >> wrote: >> >> Hi R-helpers, >>> >>> >>> >>> My real data is a panel (unbalanced and with gaps in years) of thousands >>> of firms, by year and industry, and with financial information (variables >>> X, Y, Z, for example), the number of firms by year and industry is not >>> always equal, the number of years by industry is not always equal. >>> >>> >>> >>> #reproducible example >>> firm1<-sort(rep(1:10,5),**decreasing=F) >>> year1<-rep(2000:2004,10) >>> industry1<-rep(20,50) >>> X<-rnorm(50) >>> Y<-rnorm(50) >>> Z<-rnorm(50) >>> data1<-data.frame(firm1,year1,**industry1,X,Y,Z) >>> data1 >>> colnames(data1)<-c("firm","**year","industry","X","Y","Z") >>> >>> >>> >>> firm2<-sort(rep(11:15,3),**decreasing=F) >>> year2<-rep(2001:2003,5) >>> industry2<-rep(30,15) >>> X<-rnorm(15) >>> Y<-rnorm(15) >>> Z<-rnorm(15) >>> data2<-data.frame(firm2,year2,**industry2,X,Y,Z) >>> data2 >>> colnames(data2)<-c("firm","**year","industry","X","Y","Z") >>> >>> firm3<-sort(rep(16:20,4),**decreasing=F) >>> year3<-rep(2001:2004,5) >>> industry3<-rep(40,20) >>> X<-rnorm(20) >>> Y<-rnorm(20) >>> Z<-rnorm(20) >>> data3<-data.frame(firm3,year3,**industry3,X,Y,Z) >>> data3 >>> colnames(data3)<-c("firm","**year","industry","X","Y","Z") >>> >>> >>> >>> final1<-rbind(data1,data2) >>> final2<-rbind(final1,data3) >>> final2 >>> final3<-final2[order(final2$**industry,final2$year),] >>> final3 >>> >>> >>> >>> I need to estimate a linear model Y = b0 + b1X + b2Z by industry and >>> year, >>> to obtain the estimates of b0, b1 and b2 by industry and year (for >>> example >>> I need to have de b0 for industry 20 and year 2000, for industry 20 and >>> year 2001...). Then I need to calculate the fitted values and the >>> residuals >>> by firm so I need to keep b0, b1 and b2 in a way that I could do >>> something >>> like >>> newdata1<-transform(final3,Y'=**b0+b1.X+b2.Z) >>> newdata2<-transform(newdata1,**residual=Y-Y') >>> or another way to keep Y' and the residuals in a dataframe with the >>> columns firm and year. >>> >>> >>> >>> Until now I have been doing this in very hard way and because I need to >>> do >>> it several times, I need your help to get an easier way. >>> >>> >>> >>> Thank you, >>> >>> >>> >>> CecĂlia Carmo >>> >>> Universidade de Aveiro >>> >>> Portugal >>> >>> > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.