Re: [R] Band-wise Conditional Sum - Actual problem

David Winsemius Mon, 30 Aug 2010 07:43:57 -0700


On Aug 30, 2010, at 4:05 AM, Vincy Pyne wrote:

Dear R helpers,
Thanks a lot for your earlier guidance esp. Mr Davind Winsemius Sir.However, there seems to be mis-communication from my endcorresponding to my requirement. As I had mentioned in my earliermail, I am dealing with a very large database of borrowers and I hadgiven a part of it in my earlier mail as given below. For a givenrating say "A", I needed to have the bad-wise sums of ead's (wherebands are constructed using the ead size itself.) and not the numberof borrowers falling in a particular band.
I am reproducing the data and solution as provided by Winsemius Sir(which generates the number of band-wise borrowers for a given rating.
rating <- c("A", "AAA", "A", "BBB","AA","A","BB", "BBB", "AA", "AA","AA", "A", "A", "AA","BB","BBB","AA", "A", "AAA","BBB","BBB", "BB","A", "BB", "A", "AA", "B","A", "AA", "BBB", "A", "BBB")
ead <- c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000,1028360, 6000000, 17715000, 14430325.24, 1180946.57, 150000,167490, 81255.16, 54812.5, 3000, 1275702.94, 9100, 1763142.3,3283048.61, 1200000, 11800, 3000, 96894.02, 453671.72, 7590,106065.24, 940711.67, 2443000, 9500000, 39000, 1501939.67)
df$ead.cat <- cut(df$ead, breaks=c(0, 100000, 500000, 1000000,2000000, 5000000 , 10000000, 100000000) )
df
df_sorted <- df[order(df$rating),] # the output is as givenbelow.
> df_sorted
   rating         ead                     ead.cat
1       A          169229.93        (1e+05,5e+05]
3       A         5877794.25        (5e+06,1e+07]
6       A            21000.00               (0,1e+05]
12      A          150000.00       (1e+05,5e+05]
13      A          167490.00       (1e+05,5e+05]
18      A             9100.00               (0,1e+05]
23      A             3000.00               (0,1e+05]
25      A          453671.72       (1e+05,5e+05]
28      A          940711.67       (5e+05,1e+06]
31      A            39000.00              (0,1e+05]
5      AA       75040962.06      (1e+07,1e+08]
9      AA       17715000.00      (1e+07,1e+08]
10     AA      14430325.24      (1e+07,1e+08]
11     AA        1180946.57      (1e+06,2e+06]
14     AA            81255.16             (0,1e+05]
17     AA         1275702.94     (1e+06,2e+06]
26     AA              7590.00            (0,1e+05]
29     AA         2443000.00     (2e+06,5e+06]
2     AAA               100.00             (0,1e+05]
19    AAA       1763142.30      (1e+06,2e+06]
27      B           106065.24      (1e+05,5e+05]
7      BB         1028360.00      (1e+06,2e+06]
15     BB            54812.50             (0,1e+05]
22     BB            11800.00             (0,1e+05]
24     BB            96894.02             (0,1e+05]
4     BBB        9530148.63      (5e+06,1e+07]
8     BBB        6000000.00      (5e+06,1e+07]
16    BBB            3000.00              (0,1e+05]
20    BBB       3283048.61       (2e+06,5e+06]
21    BBB       1200000.00       (1e+06,2e+06]
30    BBB       9500000.00       (5e+06,1e+07]
32    BBB       1501939.67       (1e+06,2e+06]
## The following command fetches rating-wise and ead size no ofborrowers. Thus, for rating A, there are 4 borrowers in the eadrange (0, 1e+05], 4 borrowers in the range (1e+05 to 5e+05] and soon......
> with(df, tapply(ead.cat, rating, table))
$A
(0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08]4 4 1 00 1 0
$AA
(0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08]2 0 0 21 0 3
$AAA
(0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08]1 0 0 10 0 0
$B
(0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08]0 1 0 00 0 0
$BB
(0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08]3 0 0 10 0 0
$BBB
(0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08]1 0 0 21 3 0
#### My ACTUAL REQUIREMENT
Actually for a given rating, I don't want the number of borrowersfalling in each of the ead_range. What I want is sum of eads fallingin each range. Thus, say for rating "A", I need following.
         rating        ead.cat                  ead_total
1 A (0,1e+05] 72100.00 #(21000+9100+3000+39000)
2          A           (1e+05, 5e+05]       940391.65

#(169229.93+150000.00+167490.00+453671.72)


So you just wanted simple sums within rating and ead.cat:

with(df_sorted, tapply(ead, list(rating,ead.cat), sum, na.rm=TRUE))

(0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06](5e+06,1e+07]A 72100.00 940391.6 940711.7 NANA 5877794AA 88845.16 NA NA 24566502443000 NAAAA 100.00 NA NA 1763142NA NAB NA 106065.2 NA NANA NABB 163506.52 NA NA 1028360NA NABBB 3000.00 NA NA 27019403283049 25030149

    (1e+07,1e+08]
A              NA
AA      107186287
AAA            NA
B              NA
BB             NA
BBB            NA

--
David.

and so on.
I am extremely sorry for any mis-communication in my earlier mail. Icould test the reply sent to me earlier by Winsemius Sir only todayas I was traveling over weekends. Also, I have tried to go throughearlier emails dealing with such conditional sums. Unfortunately, Icouldn't understand as I have recently started my venture with R.
Thanking you in advance and sincerely apologize for any mis-communication if it had occurred in my earlier mail.
Regards

Vincy


--- On Fri, 8/27/10, David Winsemius <dwinsem...@comcast.net> wrote:

From: David Winsemius <dwinsem...@comcast.net>
Subject: Re: [R] Band-wise Sum
To: "Vincy Pyne" <vincy_p...@yahoo.ca>
Cc: r-help@r-project.org
Received: Friday, August 27, 2010, 2:36 PM


On Aug 27, 2010, at 9:49 AM, Vincy Pyne wrote:

> Hi
>
> I have a large credit portfolio (exceeding 50000 borrowers). Forparticular process I need to add up the exposures based on thebands. I am giving a small test data below.
I would think that cut() would be the accepted method for defining afactor variable based on specified cutpoints. If you then wanted tosee what the cumsum() was across the range of possible levels, thatto would be a fairly simple task.
df$ead.cat <- cut(df$ead, breaks=c(0, 100000, 500000, 1000000,2000000, 5000000 , 10000000, 100000000) )
df
with(df, tapply(ead.cat, rating, length))
#  A  AA AAA   B  BB BBB
# 10   8   2   1   4   7
with(df, tapply(ead.cat, rating, table))
# returns a list of table objects by bond rating

lapply( with(df, tapply(ead.cat, rating, table)) , cumsum)
#returns the cumsum of those tables

# sapply gives a more compact output of that result:
sapply( with(df, tapply(ead.cat, rating, table)) , cumsum)
               A AA AAA B BB BBB
(0,1e+05]      4  2   1 0  3   1
(1e+05,5e+05]  8  2   1 1  3   1
(5e+05,1e+06]  9  2   1 1  3   1
(1e+06,2e+06]  9  4   2 1  4   3
(2e+06,5e+06]  9  5   2 1  4   4
(5e+06,1e+07] 10  5   2 1  4   7
(1e+07,1e+08] 10  8   2 1  4   7

Loops, you say we need loops? We don't need no stinkin' loops.

--David.

>
> rating <- c("A", "AAA", "A", "BBB","AA","A","BB", "BBB", "AA","AA", "AA", "A", "A", "AA","BB","BBB","AA", "A", "AAA","BBB","BBB","BB", "A", "BB", "A", "AA", "B","A", "AA", "BBB", "A", "BBB")
>
> ead <- c(169229.93,100, 5877794.25, 9530148.63, 75040962.06,21000, 1028360, 6000000, 17715000, 14430325.24, 1180946.57,150000, 167490, 81255.16, 54812.5, 3000, 1275702.94, 9100,1763142.3, 3283048.61, 1200000, 11800, 3000, 96894.02, 453671.72,7590, 106065.24, 940711.67, 2443000, 9500000, 39000, 1501939.67)
>
> ## First I have sorted the data rating-wise as
>
> df <- data.frame(rating, ead)
>
> df_sorted <-
> df[order(df$rating),]
>
> df_sorted_AAA <- subset(df_sorted, rating=="AAA")
> df_sorted_AA <- subset(df_sorted, rating=="AA")
> df_sorted_A <- subset(df_sorted, rating=="A")
> df_sorted_BBB <- subset(df_sorted, rating=="BBB")
> df_sorted_BB <- subset(df_sorted, rating=="BB")
> df_sorted_B <- subset(df_sorted, rating=="B")
> df_sorted_CCC <- subset(df_sorted, rating=="CCC")
>
> ## we begin with BBB rating. The R output for df_sorted_BBB is asfollows
>
>> df_sorted_BBB
>       rating      ead
> 4     BBB      9530149
> 8     BBB      6000000
> 16    BBB     3000
> 20    BBB     3283049
> 21    BBB     1200000
> 30    BBB     9500000
> 32    BBB     1501940
>
> My problem is I need to totals of eads falling in the respectivebands
>
> I
> am defining bands in millions as
>
> seq_BBB <- seq(1000000, max(df_sorted_BBB$ead), by = 1000000)
>
> # The output is
> [1] 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06 9e+06
>
> So for the sub data pertaining to Rating "BBB", I wantcorresponding ead totals i.e. I want ead totals where ead < 1e+06,then I want ead totals where 1+e06 < ead < 2e+06, 2e+06 < ead < 3e+06 ...and so on.
>
> I have tried the following code
>
> s_BBB <- NULL
>
> for (i in 1:length(s_BBB))
> {
> s_BBB[i] = sum(subset(df_sorted_BBB$ead, df_sorted_BBB$ead <s_BBB[i]))
> }
>
> I was trying to find totals ofads < 1e+06, ead < 2e+06, ead<3e+06and so on.
>
> but the result is
>
>> s_BBB
> [1] 0
>
>
> I apologize if I am not able to express my problem properly. Myonly objective is first to sort the whole portfolio rating-wise andthen within each of these rating-wise sorted data, I wish to findout total of eads based> on various bands starting <1000000, 1000000 - 200000, 2000000 -3000000, 3000000 - 4000000 and so on. Since the database containsmore than 50000 records, various ead amounts ranging from few 000'sto billion are available.
>
> Please guide
>
> Thanking  you all in advance


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Band-wise Conditional Sum - Actual problem

Reply via email to