Dear R helpers, Thanks a lot for your earlier guidance esp. Mr Davind Winsemius Sir. However, there seems to be mis-communication from my end corresponding to my requirement. As I had mentioned in my earlier mail, I am dealing with a very large database of borrowers and I had given a part of it in my earlier mail as given below. For a given rating say "A", I needed to have the bad-wise sums of ead's (where bands are constructed using the ead size itself.) and not the number of borrowers falling in a particular band.
I am reproducing the data and solution as provided by Winsemius Sir (which generates the number of band-wise borrowers for a given rating. rating <- c("A", "AAA", "A", "BBB","AA","A","BB", "BBB", "AA", "AA", "AA", "A", "A", "AA","BB","BBB","AA", "A", "AAA","BBB","BBB", "BB", "A", "BB", "A", "AA", "B","A", "AA", "BBB", "A", "BBB") ead <- c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000, 1028360, 6000000, 17715000, 14430325.24, 1180946.57, 150000, 167490, 81255.16, 54812.5, 3000, 1275702.94, 9100, 1763142.3, 3283048.61, 1200000, 11800, 3000, 96894.02, 453671.72, 7590, 106065.24, 940711.67, 2443000, 9500000, 39000, 1501939.67) df$ead.cat <- cut(df$ead, breaks=c(0, 100000, 500000, 1000000, 2000000, 5000000 , 10000000, 100000000) ) df df_sorted <- df[order(df$rating),] # the output is as given below. > df_sorted rating ead ead.cat 1 A 169229.93 (1e+05,5e+05] 3 A 5877794.25 (5e+06,1e+07] 6 A 21000.00 (0,1e+05] 12 A 150000.00 (1e+05,5e+05] 13 A 167490.00 (1e+05,5e+05] 18 A 9100.00 (0,1e+05] 23 A 3000.00 (0,1e+05] 25 A 453671.72 (1e+05,5e+05] 28 A 940711.67 (5e+05,1e+06] 31 A 39000.00 (0,1e+05] 5 AA 75040962.06 (1e+07,1e+08] 9 AA 17715000.00 (1e+07,1e+08] 10 AA 14430325.24 (1e+07,1e+08] 11 AA 1180946.57 (1e+06,2e+06] 14 AA 81255.16 (0,1e+05] 17 AA 1275702.94 (1e+06,2e+06] 26 AA 7590.00 (0,1e+05] 29 AA 2443000.00 (2e+06,5e+06] 2 AAA 100.00 (0,1e+05] 19 AAA 1763142.30 (1e+06,2e+06] 27 B 106065.24 (1e+05,5e+05] 7 BB 1028360.00 (1e+06,2e+06] 15 BB 54812.50 (0,1e+05] 22 BB 11800.00 (0,1e+05] 24 BB 96894.02 (0,1e+05] 4 BBB 9530148.63 (5e+06,1e+07] 8 BBB 6000000.00 (5e+06,1e+07] 16 BBB 3000.00 (0,1e+05] 20 BBB 3283048.61 (2e+06,5e+06] 21 BBB 1200000.00 (1e+06,2e+06] 30 BBB 9500000.00 (5e+06,1e+07] 32 BBB 1501939.67 (1e+06,2e+06] ## The following command fetches rating-wise and ead size no of borrowers. Thus, for rating A, there are 4 borrowers in the ead range (0, 1e+05], 4 borrowers in the range (1e+05 to 5e+05] and so on...... > with(df, tapply(ead.cat, rating, table)) $A (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08] 4 4 1 0 0 1 0 $AA (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08] 2 0 0 2 1 0 3 $AAA (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08] 1 0 0 1 0 0 0 $B (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08] 0 1 0 0 0 0 0 $BB (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08] 3 0 0 1 0 0 0 $BBB (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08] 1 0 0 2 1 3 0 #### My ACTUAL REQUIREMENT Actually for a given rating, I don't want the number of borrowers falling in each of the ead_range. What I want is sum of eads falling in each range. Thus, say for rating "A", I need following. rating ead.cat ead_total 1 A (0,1e+05] 72100.00 # (21000+9100+3000+39000) 2 A (1e+05, 5e+05] 940391.65 #(169229.93+150000.00+167490.00+453671.72) and so on. I am extremely sorry for any mis-communication in my earlier mail. I could test the reply sent to me earlier by Winsemius Sir only today as I was traveling over weekends. Also, I have tried to go through earlier emails dealing with such conditional sums. Unfortunately, I couldn't understand as I have recently started my venture with R. Thanking you in advance and sincerely apologize for any mis-communication if it had occurred in my earlier mail. Regards Vincy --- On Fri, 8/27/10, David Winsemius <dwinsem...@comcast.net> wrote: From: David Winsemius <dwinsem...@comcast.net> Subject: Re: [R] Band-wise Sum To: "Vincy Pyne" <vincy_p...@yahoo.ca> Cc: r-help@r-project.org Received: Friday, August 27, 2010, 2:36 PM On Aug 27, 2010, at 9:49 AM, Vincy Pyne wrote: > Hi > > I have a large credit portfolio (exceeding 50000 borrowers). For particular > process I need to add up the exposures based on the bands. I am giving a > small test data below. I would think that cut() would be the accepted method for defining a factor variable based on specified cutpoints. If you then wanted to see what the cumsum() was across the range of possible levels, that to would be a fairly simple task. df$ead.cat <- cut(df$ead, breaks=c(0, 100000, 500000, 1000000, 2000000, 5000000 , 10000000, 100000000) ) df with(df, tapply(ead.cat, rating, length)) # A AA AAA B BB BBB # 10 8 2 1 4 7 with(df, tapply(ead.cat, rating, table)) # returns a list of table objects by bond rating lapply( with(df, tapply(ead.cat, rating, table)) , cumsum) #returns the cumsum of those tables # sapply gives a more compact output of that result: sapply( with(df, tapply(ead.cat, rating, table)) , cumsum) A AA AAA B BB BBB (0,1e+05] 4 2 1 0 3 1 (1e+05,5e+05] 8 2 1 1 3 1 (5e+05,1e+06] 9 2 1 1 3 1 (1e+06,2e+06] 9 4 2 1 4 3 (2e+06,5e+06] 9 5 2 1 4 4 (5e+06,1e+07] 10 5 2 1 4 7 (1e+07,1e+08] 10 8 2 1 4 7 Loops, you say we need loops? We don't need no stinkin' loops. --David. > > rating <- c("A", "AAA", "A", "BBB","AA","A","BB", "BBB", "AA", "AA", "AA", > "A", "A", "AA","BB","BBB","AA", "A", "AAA","BBB","BBB", "BB", "A", "BB", "A", > "AA", "B","A", "AA", "BBB", "A", "BBB") > > ead <- c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000, 1028360, > 6000000, 17715000, 14430325.24, 1180946.57, 150000, 167490, 81255.16, 54812.5, 3000, 1275702.94, 9100, 1763142.3, 3283048.61, 1200000, 11800, 3000, 96894.02, 453671.72, 7590, 106065.24, 940711.67, 2443000, 9500000, 39000, 1501939.67) > > ## First I have sorted the data rating-wise as > > df <- data.frame(rating, ead) > > df_sorted <- > df[order(df$rating),] > > df_sorted_AAA <- subset(df_sorted, rating=="AAA") > df_sorted_AA <- subset(df_sorted, rating=="AA") > df_sorted_A <- subset(df_sorted, rating=="A") > df_sorted_BBB <- subset(df_sorted, rating=="BBB") > df_sorted_BB <- subset(df_sorted, rating=="BB") > df_sorted_B <- subset(df_sorted, rating=="B") > df_sorted_CCC <- subset(df_sorted, rating=="CCC") > > ## we begin with BBB rating. The R output for df_sorted_BBB is as follows > >> df_sorted_BBB > rating ead > 4 BBB 9530149 > 8 BBB 6000000 > 16 BBB 3000 > 20 BBB 3283049 > 21 BBB 1200000 > 30 BBB 9500000 > 32 BBB 1501940 > > My problem is I need to totals of eads falling in the respective bands > > I > am defining bands in millions as > > seq_BBB <- seq(1000000, max(df_sorted_BBB$ead), by = 1000000) > > # The output is > [1] 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06 9e+06 > > So for the sub data pertaining to Rating "BBB", I want corresponding ead > totals i.e. I want ead totals where ead < 1e+06, then I want ead totals where 1+e06 < ead < 2e+06, 2e+06 < ead < 3e+06 ...and so on. > > I have tried the following code > > s_BBB <- NULL > > for (i in 1:length(s_BBB)) > { > s_BBB[i] = sum(subset(df_sorted_BBB$ead, df_sorted_BBB$ead < s_BBB[i])) > } > > I was trying to find totals ofads < 1e+06, ead < 2e+06, ead<3e+06and so on. > > but the result is > >> s_BBB > [1] 0 > > > I apologize if I am not able to express my problem properly. My only > objective is first to sort the whole portfolio rating-wise and then within > each of these rating-wise sorted data, I wish to find out total of eads based > on various bands starting <1000000, 1000000 - 200000, 2000000 - 3000000, > 3000000 - 4000000 and so on. Since the database contains more than 50000 > records, various ead amounts ranging from few 000's to billion are available. > > Please guide > > Thanking you all in advance > > Vincy > > > > > > > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.