For Sectors the results are correct, but for some (Country, Industry, FISCALYEAR) combinations or (Country, FISCALYEAR) combinations the result don't match the spreadsheet (Excel) computation, so verifying from experts, whether I am using ddply correctly with the right intention?
On Fri, Mar 1, 2013 at 2:38 PM, John Kane <jrkrid...@inbox.com> wrote: > Is there any chance you meant > ddply (dat1 , .(Sector, FISCALYEAR), summarise, WROE=wavg(ROE, MKT)) > ?? > > It gives a result but I have no idea if it makes sense. > > John Kane > Kingston ON Canada > > >> -----Original Message----- >> From: anandpu...@gmail.com >> Sent: Fri, 1 Mar 2013 14:13:37 -0500 >> To: r-help@r-project.org >> Subject: Re: [R] Conditional Weighted Average (ddply or any other >> function) >> >> Hi John, >> >> The sample size is huge involving 10,000 + firms. I have put a >> representative sample using dput ( Name, ticker and country have been >> changed so that firms cannot be identified due to proprietary data >> set, also EPS is not required and removed from the dataset) >> >> structure(list(NAME = structure(c(8L, 8L, 8L, 8L, 8L, 8L, 7L, >> 7L, 7L, 7L, 7L, 7L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, >> 6L, 6L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, >> 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CCC", "CTAX", >> "INN", "NOB", "SH", "SZ", "WASH", "WILLSON"), class = "factor"), >> Ticker = structure(c(7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, >> 8L, 8L, 8L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, >> 6L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, >> 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CC13", >> "CT56", "INN12", "NB12", "SH12", "SZ12", "W12", "W15"), class = >> "factor"), >> Industry = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, >> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, >> 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 5L, 5L, >> 5L, 5L, 5L, 5L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Commercial & >> Professional Serv", >> "Energy", "Media", "Retail", "Transportation"), class = "factor"), >> Sector = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, >> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, >> 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, >> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Consumer >> Discretionary", >> "Energy", "Industrials"), class = "factor"), Country = >> structure(c(4L, >> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, >> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, >> 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, >> 2L, 2L, 2L), .Label = c("Brazil", "China", "India", "UK"), class = >> "factor"), >> FISCALYEAR = structure(c(3L, 2L, 1L, 4L, 5L, 6L, 3L, 2L, >> 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L, 3L, 2L, 1L, 4L, >> 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L, >> 2L, 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L), .Label = c("FY-1", >> "FY-2", "FY-3", "FY0", "FY1", "FY2"), class = "factor"), >> ROE = c(0.026, 0.0656, 0.1621, 0.1885, 0.1968, 0.2126, 0.0207, >> 0.0319, 0.0963, 0.0431, 0.066, 0.066, 0.0707, 0.0797, 0.0781, >> 0.078, 0.098, 0.126, 0.0352, 0.2625, 0.3714, 0.2929, 0.3133, >> 0.2509, 0.2398, 0.2779, 0.1109, 0.0509, 0.069, 0.1017, 0.1298, >> 0.5842, 0.3953, 0.4429, 0.3616, 0.26, 0.2, 0.4472, 0.2912, >> 0.21, 0.2849, 0.3553, 0.4347, 0.3289, 0.3846, 0.2643, 0.0458, >> 0.1265, 0.28), MKT = c(2919236084, 836858582, 2015182617, >> 3399344971, 4324821777, 4324821777, 7619453125, 3579844727, >> 4132238281, 3712239990, 2879757813, 2879757813, 1525237793, >> 700357605, 1814942993, 1858225342, 1242890503, 1242890503, >> 1879700000, 557093400, 224900300, 1634700000, 1443200000, >> 3582664735, 3582664735, 5830366211, 10660833984, 9024061523, >> 7628660645, 9154108398, 9154108398, 7064532227, 1804380005, >> 6331067871, 10445639648, 9153587891, 9153587891, 6231200000, >> 4.078e+09, 10107500000, 12460300000, 17800051556, 17800051556, >> 513478700, 260993500, 882575400, 1.151e+09, 855938413, 855938413 >> )), .Names = c("NAME", "Ticker", "Industry", "Sector", "Country", >> "FISCALYEAR", "ROE", "MKT"), class = "data.frame", row.names = c(NA, >> -49L)) >> >> Thanks, >> Punit >> >>> On Fri, Mar 1, 2013 at 12:51 PM, John Kane <jrkrid...@inbox.com> wrote: >>> See below >>> >>> >>>> -----Original Message----- >>>> From: anandpu...@gmail.com >>>> Sent: Fri, 1 Mar 2013 12:36:53 -0500 >>>> To: jrkrid...@inbox.com >>>> Subject: Re: [R] Conditional Weighted Average (ddply or any other >>>> function) >>>> >>>> Hi John, >>>> >>>> I was using symbols, Column ROE, EPS, MKTCAP are numeric, Name, >>>> Ticker, Sector, Country, FISCALYEAR or Year are character strings. >>>> >>>> and column "Year" is referring to "FISCALYEAR" >>>> >>> Definitely a no-no in R-help. :) We really need some representative >>> sample data to play with. See >>> https://github.com/hadley/devtools/wiki/Reproducibility for some general >>> pointers on how to compose a good question. The fact that you included >>> the code you are using was excellent but without some data it is rather >>> useless. >>> >>> The easiest way to supply data is to use the dput() function. Example >>> with your file named "testfile": >>> dput(testfile) >>> Then copy the output and paste into your email. This is what I did with >>> your data that I pasted into my email . I added the dat1 <- to it. >>> >>> For large data sets, you can just supply a representative sample. >>> Usually, dput(head(testfile, 100)) will be sufficient. >>> >>> I hope this is of some help. >>> >>> >>>> >>>> On Fri, Mar 1, 2013 at 12:31 PM, John Kane <jrkrid...@inbox.com> wrote: >>>>> It is not at all clear what you are doing. You state that the data >>>>> set >>>>> you are using is what I have called dat1 : see dput form below. >>>>> >>>>> As far as I can see there is no numerical value in there. >>>>> >>>>> ##===========data set in dput form================# >>>>> dat1 <- structure(list(Name = c("N1", "N1", "N1", "N1", "N1", "N1", >>>>> "N1", >>>>> "N2", "N2", "N2", "N2", "N2", "N2", "N2"), Ticker = c("T1", >>>>> "T1", >>>>> "T1", "T1", "T1", "T1", "T1", "T2", "T2", "T2", "T2", "T2", >>>>> "T2", >>>>> "T2"), Sector = c("S1", "S1", "S1", "S1", "S1", "S1", "S1", >>>>> "S2", >>>>> "S2", "S2", "S2", "S2", "S2", "S2"), Industry = c("I1", >>>>> "I1", >>>>> "I1", "I1", "I1", "I1", "I1", "I2", "I2", "I2", "I2", "I2", "I2", >>>>> "I2"), Country = c("C1", "C1", "C1", "C1", "C1", "C1", "C1", >>>>> "C2", "C2", "C2", "C2", "C2", "C2", "C2"), Year = c("FY-4", >>>>> "FY-3", >>>>> "FY-2", "FY-1", "FY0", "FY1", "FY2", "FY-4", "FY-3", "FY-2", >>>>> "FY-2", "FY0", "FY2", "FY2"), ROE = c("ROE11", "ROE12", >>>>> "ROE13", >>>>> "ROE14", "ROE15", "ROE16", "ROE17", "ROE21", "ROE22", >>>>> "ROE23", >>>>> "ROE24", "ROE25", "ROE26", "ROE27"), EPS = c("EPS11", >>>>> "EPS12", >>>>> "EPS13", "EPS14", "EPS15", "EPS16", "EPS17", "EPS21", >>>>> "EPS22", >>>>> "EPS23", "EPS24", "EPS25", "EPS26", "EPS27"), MKTCAP = >>>>> c("MKT11", >>>>> "MKT12", "MKT13", "MKT14", "MKT15", "MKT16", "MKT17", >>>>> "MKT21", >>>>> "MKT22", "MKT23", "MKT24", "MKT25", "MKT26", "MKT27")), >>>>> .Names >>>>> = c("Name", >>>>> "Ticker", "Sector", "Industry", "Country", "Year", "ROE", >>>>> "EPS", >>>>> "MKTCAP"), class = "data.frame", row.names = c(NA, -14L)) >>>>> ## =================end of dataset==================# >>>>> >>>>> There is no FISCALYEAR variable that you specifed below >>>>> >>>>>> ddply (dataread , .(Sector, FISCALYEAR), summarise, > WROE=wavg(ROE, >>>>>> MKTCAP))) >>>>> >>>>> I think we need a bit more information. >>>>> >>>>> John Kane >>>>> Kingston ON Canada >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: anandpu...@gmail.com >>>>>> Sent: Fri, 1 Mar 2013 11:01:42 -0500 >>>>>> To: r-help@r-project.org >>>>>> Subject: [R] Conditional Weighted Average (ddply or any other >>>>>> function) >>>>>> >>>>>> Hello R community, >>>>>> >>>>>> I am computing weighted average statistic by using ddply function: >>>>>> >>>>>> My data set is: >>>>>> N1 T1 S1 I1 C1 FY-4 ROE11 EPS11 MKT11 >>>>>> N1 T1 S1 I1 C1 FY-3 ROE12 EPS12 MKT12 >>>>>> N1 T1 S1 I1 C1 FY-2 ROE13 EPS13 MKT13 >>>>>> N1 T1 S1 I1 C1 FY-1 ROE14 EPS14 MKT14 >>>>>> N1 T1 S1 I1 C1 FY0 ROE15 EPS15 MKT15 >>>>>> N1 T1 S1 I1 C1 FY1 ROE16 EPS16 MKT16 >>>>>> N1 T1 S1 I1 C1 FY2 ROE17 EPS17 MKT17 >>>>>> N2 T2 S2 I2 C2 FY-4 ROE21 EPS21 MKT21 >>>>>> N2 T2 S2 I2 C2 FY-3 ROE22 EPS22 MKT22 >>>>>> N2 T2 S2 I2 C2 FY-2 ROE23 EPS23 MKT23 >>>>>> N2 T2 S2 I2 C2 FY-2 ROE24 EPS24 MKT24 >>>>>> N2 T2 S2 I2 C2 FY0 ROE25 EPS25 MKT25 >>>>>> N2 T2 S2 I2 C2 FY2 ROE26 EPS26 MKT26 >>>>>> N2 T2 S2 I2 C2 FY2 ROE27 EPS27 MKT27 >>>>>> >>>>>> with colnames: >>>>>> (Name,Ticker,Sector,Industry,Country,Year,ROE,EPS,MKTCAP) >>>>>> >>>>>> I want to compute >>>>>> 1) Weighted ROE based on Sector and Fiscal Year. >>>>>> For firm N1 of Sector S1 and Fiscalyear FY-3 weight is >>>>>> MKT1 / SUM(MKT, where Sector = S1, Fiscalyear FY-3) >>>>>> >>>>>> 2) Weighted ROE based on Country and Fiscal Year. >>>>>> For firm N1 of Country C1 and Fiscalyear FY-3 weight is >>>>>> MKT1 / SUM(MKT, where Country = C1, Fiscalyear FY-3) >>>>>> >>>>>> 3) Weighted ROE based on Country, Sector and Fiscal Year. >>>>>> For firm N1 of Country C1, Sector S1 and Fiscalyear FY-3 >>>>>> weight is MKT1 / SUM(MKT, where Country = C1, Sector = S1, Fiscalyear >>>>>> FY-3) >>>>>> >>>>>> 4) Weighted ROE based on Country, Industry and Fiscal Year. >>>>>> For firm N1 of Country C1, Industry I1 and Fiscalyear FY-3 >>>>>> weight is MKT1 / SUM(MKT, where Country = C1, Industry = I1, >>>>>> Fiscalyear >>>>>> FY-3) >>>>>> >>>>>> >>>>>> I tried using ddply function: >>>>>> ddply (dataread , .(Sector, FISCALYEAR), summarise, WROE=wavg(ROE, >>>>>> MKTCAP))) >>>>>> >>>>>> where wavg <- function(x, wt) x %*% wt/sum(wt) >>>>>> but this doesn't give me the right answer. >>>>>> >>>>>> I could try subseting the data into different sectors and compute the >>>>>> weighted average which doesn't look like an elegant solution and >>>>>> would >>>>>> defeat the purpose of ddply >>>>>> >>>>>> I coudn't think of properly using melt and cast functions to solve >>>>>> this issue. Any help will be highly appreciated. >>>>>> >>>>>> Thanks and Regards, >>>>>> Punit >>>>>> >>>>>> ______________________________________________ >>>>>> R-help@r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide >>>>>> http://www.R-project.org/posting-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ____________________________________________________________ > GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at > http://www.inbox.com/smileys > Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and > most webmails > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.