Re: [R] Band-wise Conditional Sum - Actual problem

Vincy Pyne Mon, 30 Aug 2010 01:07:26 -0700

Dear R helpers,

Thanks a lot for your earlier guidance esp. Mr Davind Winsemius Sir. However, 
there seems to be mis-communication from my end corresponding to my 
requirement. As I had mentioned in my earlier mail, I am dealing with a very 
large database of borrowers and I had given a part of it in my earlier mail as 
given below. For a given rating say "A", I needed to have the bad-wise sums of
 ead's (where bands are constructed using the ead size itself.) and not the 
number of borrowers falling in a particular band.


I am reproducing the data and solution as provided by Winsemius Sir (which 
generates the number of band-wise
 borrowers for a given rating.

rating <- c("A", "AAA", "A", "BBB","AA","A","BB", "BBB", "AA", "AA", "AA", "A", 
"A", "AA","BB","BBB","AA", "A", "AAA","BBB","BBB", "BB", "A", "BB", "A",
 "AA", "B","A", "AA", "BBB", "A", "BBB")

ead <- c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000, 1028360,  
6000000, 17715000,  14430325.24, 1180946.57, 150000, 167490, 81255.16, 54812.5, 
3000, 1275702.94, 9100, 1763142.3, 3283048.61, 1200000, 11800, 3000,  
96894.02,  453671.72,  7590, 106065.24, 940711.67,  2443000, 9500000, 39000, 
1501939.67)

df$ead.cat <- cut(df$ead, breaks=c(0, 100000, 500000, 1000000, 2000000, 5000000 
, 10000000, 100000000) )

df          

df_sorted <- df[order(df$rating),]       # the output is as given below.

> df_sorted
   rating         ead                     ead.cat
1      
 A          169229.93        (1e+05,5e+05]
3       A         5877794.25        (5e+06,1e+07]
6       A            21000.00               (0,1e+05]
12      A          150000.00       (1e+05,5e+05]
13      A          167490.00       (1e+05,5e+05]
18      A             9100.00              
 (0,1e+05]
23      A             3000.00               (0,1e+05]
25      A          453671.72       (1e+05,5e+05]
28      A          940711.67       (5e+05,1e+06]
31      A            39000.00              (0,1e+05]
5      AA       75040962.06      (1e+07,1e+08]
9      AA       17715000.00      (1e+07,1e+08]
10     AA     
 14430325.24      (1e+07,1e+08]
11     AA        1180946.57      (1e+06,2e+06]
14     AA            81255.16             (0,1e+05]
17     AA         1275702.94     (1e+06,2e+06]
26     AA              7590.00            (0,1e+05]
29     AA         2443000.00     (2e+06,5e+06]
2     AAA              
 100.00             (0,1e+05]
19    AAA       1763142.30      (1e+06,2e+06]
27      B           106065.24      (1e+05,5e+05]
7      BB         1028360.00      (1e+06,2e+06]
15     BB            54812.50             (0,1e+05]
22     BB            11800.00             (0,1e+05]
24    
 BB            96894.02             (0,1e+05]
4     BBB        9530148.63      (5e+06,1e+07]
8     BBB        6000000.00      (5e+06,1e+07]
16    BBB            3000.00              (0,1e+05]
20    BBB       3283048.61       (2e+06,5e+06]
21    BBB       1200000.00       (1e+06,2e+06]
30    BBB      
 9500000.00       (5e+06,1e+07]
32    BBB       1501939.67       (1e+06,2e+06]


## The following command fetches rating-wise and ead size no of borrowers. 
Thus, for rating A, there are 4 borrowers in the ead range (0, 1e+05], 4 
borrowers in the range (1e+05 to 5e+05] and so on......

> with(df, tapply(ead.cat, rating, table))
$A

    (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] 
(5e+06,1e+07] (1e+07,1e+08] 
            4             4             1             0            
 0             1             0 

$AA

    (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] 
(5e+06,1e+07] (1e+07,1e+08] 
            2             0             0             2             
1             0             3 

$AAA

    (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] 
(5e+06,1e+07] (1e+07,1e+08]
 
            1             0             0             1             
0             0             0 

$B

    (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] 
(5e+06,1e+07] (1e+07,1e+08] 
            0             1             0            
 0             0             0             0 

$BB

    (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] 
(5e+06,1e+07] (1e+07,1e+08] 
            3             0             0             1             
0             0             0 

$BBB

    (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06]
 (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08] 
            1             0             0             2             
1             3             0


#### My ACTUAL REQUIREMENT

Actually for a given rating, I don't want the number of borrowers falling in 
each of the ead_range. What I want is sum of eads falling in each range. Thus, 
say for rating "A", I need following.


         rating       
 ead.cat                  ead_total  
1          A           (0,1e+05]                 72100.00    # 
(21000+9100+3000+39000)
2          A           (1e+05, 5e+05]       940391.65    

#(169229.93+150000.00+167490.00+453671.72)

and so on.

I am extremely sorry for any mis-communication in my earlier mail. I could test 
the reply sent to me earlier by Winsemius Sir only today as I was traveling 
over weekends. Also, I have tried to go through earlier emails dealing with 
such conditional sums. Unfortunately, I couldn't understand as I have recently 
started my
 venture with R.


Thanking you in advance and sincerely apologize for any mis-communication if it 
had occurred in my earlier mail. 

Regards

Vincy


--- On Fri, 8/27/10, David Winsemius <dwinsem...@comcast.net> wrote:

From: David Winsemius <dwinsem...@comcast.net>
Subject: Re: [R] Band-wise Sum
To: "Vincy Pyne" <vincy_p...@yahoo.ca>
Cc: r-help@r-project.org
Received: Friday, August 27, 2010, 2:36 PM


On Aug 27, 2010, at 9:49 AM, Vincy Pyne wrote:

> Hi
> 
> I have a large credit portfolio (exceeding 50000 borrowers). For particular 
> process I need to add up the exposures based on the bands. I am giving a 
> small test data below.

I would think that cut() would be the accepted method for defining
 a factor
 variable based
 on specified cutpoints. If you then wanted to see what the cumsum() was across 
the range of possible levels, that to would be a fairly simple task.

df$ead.cat <- cut(df$ead, breaks=c(0, 100000, 500000, 1000000, 2000000, 5000000 
, 10000000, 100000000) )
df
with(df, tapply(ead.cat, rating, length))
#  A  AA AAA   B  BB BBB
# 10   8   2   1   4   7
with(df, tapply(ead.cat, rating, table))
# returns a list of table objects by bond rating

lapply( with(df, tapply(ead.cat, rating, table)) , cumsum)
#returns the cumsum of those tables

# sapply gives a more compact output of that result:
 sapply( with(df, tapply(ead.cat, rating, table)) , cumsum)
               A AA AAA B BB BBB
(0,1e+05]      4  2   1 0 
 3   1
(1e+05,5e+05]  8  2   1 1  3   1
(5e+05,1e+06]  9  2   1 1  3   1
(1e+06,2e+06]  9  4   2 1  4   3
(2e+06,5e+06]  9  5   2 1  4   4
(5e+06,1e+07] 10  5   2 1  4   7
(1e+07,1e+08] 10  8   2 1  4   7

Loops, you say we need loops? We don't need no stinkin' loops.

--David.

> 
> rating <- c("A", "AAA", "A", "BBB","AA","A","BB", "BBB", "AA", "AA", "AA", 
> "A", "A", "AA","BB","BBB","AA", "A", "AAA","BBB","BBB", "BB", "A", "BB", "A", 
> "AA", "B","A", "AA", "BBB", "A", "BBB")
> 
> ead <- c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000, 1028360,  
> 6000000, 17715000,  14430325.24, 1180946.57, 150000,
 167490, 81255.16, 54812.5, 3000, 1275702.94, 9100, 1763142.3, 3283048.61, 
1200000, 11800, 3000,  96894.02,  453671.72,  7590, 106065.24, 940711.67,  
2443000, 9500000, 39000, 1501939.67)
> 
> ## First I have sorted the data rating-wise as
> 
> df <- data.frame(rating, ead)
> 
> df_sorted <-
> df[order(df$rating),]
> 
> df_sorted_AAA <- subset(df_sorted, rating=="AAA")
> df_sorted_AA <- subset(df_sorted, rating=="AA")
> df_sorted_A <- subset(df_sorted, rating=="A")
> df_sorted_BBB <- subset(df_sorted, rating=="BBB")
> df_sorted_BB <- subset(df_sorted, rating=="BB")
> df_sorted_B <- subset(df_sorted, rating=="B")
> df_sorted_CCC <- subset(df_sorted, rating=="CCC")
> 
> ## we begin with BBB rating. The R output for df_sorted_BBB is as follows
> 
>> df_sorted_BBB
>   
    rating      ead
> 4     BBB      9530149
> 8     BBB      6000000
> 16    BBB     3000
> 20    BBB     3283049
> 21    BBB     1200000
> 30    BBB     9500000
> 32    BBB     1501940
> 
> My problem is I need to totals of eads falling in the respective bands
> 
> I
> am defining bands in millions as
> 
> seq_BBB <- seq(1000000, max(df_sorted_BBB$ead), by = 1000000)
> 
> # The output is
> [1] 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06 9e+06
> 
> So for the sub data pertaining to Rating "BBB", I want corresponding ead 
> totals i.e. I want ead totals where ead < 1e+06, then I want ead
 totals where 1+e06 < ead < 2e+06, 2e+06 < ead < 3e+06 ...and so on.
> 
> I have tried the following code
> 
> s_BBB <- NULL
> 
> for (i in 1:length(s_BBB))
> {
> s_BBB[i] = sum(subset(df_sorted_BBB$ead, df_sorted_BBB$ead < s_BBB[i]))
> }
> 
> I was trying to find totals ofads < 1e+06, ead < 2e+06, ead<3e+06and so on.
> 
> but the result is
> 
>> s_BBB
> [1] 0
> 
> 
> I apologize if I am not able to express my problem properly. My only 
> objective is first to sort the whole portfolio rating-wise and then within 
> each of these rating-wise sorted data, I wish to find out total of eads based
> on various bands starting <1000000,  1000000 - 200000, 2000000 - 3000000, 
> 3000000 - 4000000 and so on. Since the database contains more than 50000 
> records, various ead amounts ranging from few 000's to billion are
 available.
> 
> Please guide
> 
> Thanking  you all in advance
> 
> Vincy
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>     [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT




        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Band-wise Conditional Sum - Actual problem

Reply via email to