Re: [R] Band-wise Conditional Sum - Actual problem

Vincy Pyne Mon, 30 Aug 2010 18:42:37 -0700

Dear David and Dennis Sir,

Thanks a lot for your guidance.


As guided by Mr Dennis Murphy Sir in his reply ....

Replace table in the tapply call with sum. While you're at it, typing 
?tapply to find out what the function does wouldn't hurt...

I had  really tried earlier to understand the apply, tapply, mapply and sapply 
commands before writing back to the R forum. But I was not able to figure out 
where was the problem. But Mr Dennis Sir really inspired me and when I 
revisited 'tapply', I realized that instead of using 'ead' for getting sum, I 
was using 'ead.cat', and that solved my problem. 

Then I had a new problem of 'How to get rid of NA's' , Again instead of  
posting to the group, I had accessed the earlier R mails and in the end got the 
solution. I sincerely thank both of you for taking so much efforts and guiding 
me.

I will certainly take efforts to understand 'R' at the earliest.

Regards

Vincy

________________________________________________________

Replace table in the tapply call with sum. While you're at it, typing 
?tapply to find out what the function does wouldn't hurt...

HTH,
Dennis

 

--- On Mon, 8/30/10, David Winsemius <dwinsem...@comcast.net> wrote:

From: David Winsemius <dwinsem...@comcast.net>
Subject: Re: [R] Band-wise Conditional Sum - Actual problem

Cc: r-help@r-project.org
Received: Monday, August 30, 2010, 2:43 PM


On Aug 30, 2010, at 4:05 AM, Vincy Pyne wrote:

> Dear R helpers,
> 
> Thanks a lot for your earlier guidance esp. Mr Davind Winsemius Sir. However, 
> there seems to be mis-communication from my end
 corresponding to my requirement. As I had mentioned in my earlier mail, I am 
dealing with a very large database of borrowers and I had given a part of it in 
my earlier mail as given below. For a given rating say "A", I needed to have 
the bad-wise sums of ead's (where bands are constructed using the ead size 
itself.) and not the number of borrowers falling in a particular band.
> 
> I am reproducing the data and solution as provided by Winsemius Sir (which 
> generates the number of band-wise borrowers for a given rating.
> 
> rating <- c("A", "AAA", "A", "BBB","AA","A","BB", "BBB", "AA", "AA", "AA", 
> "A", "A", "AA","BB","BBB","AA", "A", "AAA","BBB","BBB", "BB", "A", "BB", "A", 
> "AA", "B","A", "AA", "BBB", "A", "BBB")
> 
> ead <- c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000, 1028360,  
> 6000000, 17715000,  14430325.24, 1180946.57, 150000, 167490, 81255.16, 
> 54812.5, 3000, 1275702.94, 9100,
 1763142.3, 3283048.61, 1200000, 11800, 3000,  96894.02,  453671.72,  7590, 
106065.24, 940711.67,  2443000, 9500000, 39000, 1501939.67)
> 
> df$ead.cat <- cut(df$ead, breaks=c(0, 100000, 500000, 1000000, 2000000, 
> 5000000 , 10000000, 100000000) )
> 
> df
> 
> df_sorted <- df[order(df$rating),]      # the output is as given below.
> 
> > df_sorted
>    rating         ead                     ead.cat
> 1       A          169229.93        (1e+05,5e+05]
> 3       A         5877794.25        (5e+06,1e+07]
> 6       A            21000.00   
            (0,1e+05]
> 12      A          150000.00       (1e+05,5e+05]
> 13      A          167490.00       (1e+05,5e+05]
> 18      A             9100.00               (0,1e+05]
> 23      A             3000.00               (0,1e+05]
> 25      A          453671.72       (1e+05,5e+05]
> 28      A          940711.67       (5e+05,1e+06]
> 31      A            39000.00   
           (0,1e+05]
> 5      AA       75040962.06      (1e+07,1e+08]
> 9      AA       17715000.00      (1e+07,1e+08]
> 10     AA      14430325.24      (1e+07,1e+08]
> 11     AA        1180946.57      (1e+06,2e+06]
> 14     AA            81255.16             (0,1e+05]
> 17     AA         1275702.94     (1e+06,2e+06]
> 26     AA              7590.00            (0,1e+05]
> 29     AA     
    2443000.00     (2e+06,5e+06]
> 2     AAA               100.00             (0,1e+05]
> 19    AAA       1763142.30      (1e+06,2e+06]
> 27      B           106065.24      (1e+05,5e+05]
> 7      BB         1028360.00      (1e+06,2e+06]
> 15     BB            54812.50             (0,1e+05]
> 22     BB            11800.00             (0,1e+05]
> 24     BB           
 96894.02             (0,1e+05]
> 4     BBB        9530148.63      (5e+06,1e+07]
> 8     BBB        6000000.00      (5e+06,1e+07]
> 16    BBB            3000.00              (0,1e+05]
> 20    BBB       3283048.61       (2e+06,5e+06]
> 21    BBB       1200000.00       (1e+06,2e+06]
> 30    BBB       9500000.00       (5e+06,1e+07]
> 32    BBB       1501939.67       (1e+06,2e+06]
> 
> 
> ## The following command fetches rating-wise and ead
 size no of borrowers. Thus, for rating A, there are 4 borrowers in the ead 
range (0, 1e+05], 4 borrowers in the range (1e+05 to 5e+05] and so on......
> 
> > with(df, tapply(ead.cat, rating, table))
> $A
> 
>     (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] 
>(5e+06,1e+07] (1e+07,1e+08]
>             4             4             1             0             0         
>    1             0
> 
> $AA
> 
>     (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] 
>(5e+06,1e+07] (1e+07,1e+08]
>         
    2             0             0             2             1             0     
        3
> 
> $AAA
> 
>     (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] 
>(5e+06,1e+07] (1e+07,1e+08]
>             1             0             0             1             0         
>    0             0
> 
> $B
> 
>     (0,1e+05]
 (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] 
(1e+07,1e+08]
>             0             1             0             0             0         
>    0             0
> 
> $BB
> 
>     (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] 
>(5e+06,1e+07] (1e+07,1e+08]
>             3             0             0             1             0         
>    0 
            0
> 
> $BBB
> 
>     (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] 
>(5e+06,1e+07] (1e+07,1e+08]
>             1             0             0             2             1         
>    3             0
> 
> 
> #### My ACTUAL REQUIREMENT
> 
> Actually for a given rating, I don't want the number of borrowers falling in 
> each of the ead_range. What I want is sum of eads falling in each range. 
> Thus, say for rating "A", I need following.
> 
> 
>          rating        ead.cat 
                 ead_total
> 1          A           (0,1e+05]                 72100.00    # 
> (21000+9100+3000+39000)
> 2          A           (1e+05, 5e+05]       940391.65
> 
> #(169229.93+150000.00+167490.00+453671.72)

So you just wanted simple sums within rating and ead.cat:

with(df_sorted, tapply(ead, list(rating,ead.cat), sum, na.rm=TRUE))

    (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] 
(5e+06,1e+07]
A    72100.00      940391.6      940711.7            NA            NA   
    5877794
AA   88845.16            NA            NA       2456650       2443000           
 NA
AAA    100.00            NA            NA       1763142            NA           
 NA
B          NA      106065.2            NA            NA            NA           
 NA
BB  163506.52            NA            NA       1028360            NA   
         NA
BBB   3000.00            NA            NA       2701940       3283049      
25030149
    (1e+07,1e+08]
A              NA
AA      107186287
AAA            NA
B              NA
BB             NA
BBB            NA

--David.



> and so on.
> 
> I am extremely sorry for any mis-communication in my earlier mail. I could 
> test the reply sent to me earlier by Winsemius Sir only today as I was 
> traveling over weekends. Also, I have tried to go through earlier emails 
> dealing with such conditional sums. Unfortunately, I couldn't understand as I 
> have
 recently started my venture with R.
> 
> 
> Thanking you in advance and sincerely apologize for any mis-communication if 
> it had occurred in my earlier mail.
> 
> Regards
> 
> Vincy
> 
> 
> --- On Fri, 8/27/10, David Winsemius <dwinsem...@comcast.net> wrote:
> 
> From: David Winsemius <dwinsem...@comcast.net>
> Subject: Re: [R] Band-wise Sum

> Cc: r-help@r-project.org
> Received: Friday, August 27, 2010, 2:36 PM
> 
> 
> On Aug 27,
 2010, at 9:49 AM, Vincy Pyne wrote:
> 
> > Hi
> >
> > I have a large credit portfolio (exceeding 50000 borrowers). For particular 
> > process I need to add up the exposures based on the bands. I am giving a 
> > small test data below.
> 
> I would think that cut() would be the accepted method for defining a factor 
> variable based on specified cutpoints. If you then wanted to see what the 
> cumsum() was across the range of possible levels, that to would be a fairly 
> simple task.
> 
> df$ead.cat <- cut(df$ead, breaks=c(0, 100000, 500000, 1000000, 2000000, 
> 5000000 , 10000000, 100000000) )
> df
> with(df, tapply(ead.cat, rating, length))
> #  A  AA AAA   B  BB BBB
> # 10   8   2   1   4   7
> with(df, tapply(ead.cat, rating, table))
> # returns a list of table objects by
 bond rating
> 
> lapply( with(df, tapply(ead.cat, rating, table)) , cumsum)
> #returns the cumsum of those tables
> 
> # sapply gives a more compact output of that result:
> sapply( with(df, tapply(ead.cat, rating, table)) , cumsum)
>                A AA AAA B BB BBB
> (0,1e+05]      4  2   1 0  3   1
> (1e+05,5e+05]  8  2   1 1  3   1
> (5e+05,1e+06]  9  2   1 1  3   1
> (1e+06,2e+06]  9  4   2 1  4   3
> (2e+06,5e+06]  9  5   2 1  4   4
> (5e+06,1e+07] 10  5   2 1  4   7
> (1e+07,1e+08] 10  8   2 1  4   7
>

> Loops, you say we need loops? We don't need no stinkin' loops.
> 
> --David.
> 
> >
> > rating <- c("A", "AAA", "A", "BBB","AA","A","BB", "BBB", "AA", "AA", "AA", 
> > "A", "A", "AA","BB","BBB","AA", "A", "AAA","BBB","BBB", "BB", "A", "BB", 
> > "A", "AA", "B","A", "AA", "BBB", "A", "BBB")
> >
> > ead <- c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000, 
> > 1028360,  6000000, 17715000,  14430325.24, 1180946.57, 150000, 167490, 
> > 81255.16, 54812.5, 3000, 1275702.94, 9100, 1763142.3, 3283048.61, 1200000, 
> > 11800, 3000,  96894.02,  453671.72,  7590, 106065.24, 940711.67,  2443000, 
> > 9500000, 39000, 1501939.67)
> >
> > ## First I have sorted the data rating-wise as
> >
> > df <- data.frame(rating, ead)
> >
> > df_sorted <-
> > df[order(df$rating),]
> >
> > df_sorted_AAA <-
 subset(df_sorted, rating=="AAA")
> > df_sorted_AA <- subset(df_sorted, rating=="AA")
> > df_sorted_A <- subset(df_sorted, rating=="A")
> > df_sorted_BBB <- subset(df_sorted, rating=="BBB")
> > df_sorted_BB <- subset(df_sorted, rating=="BB")
> > df_sorted_B <- subset(df_sorted, rating=="B")
> > df_sorted_CCC <- subset(df_sorted, rating=="CCC")
> >
> > ## we begin with BBB rating. The R output for df_sorted_BBB is as follows
> >
> >> df_sorted_BBB
> >       rating      ead
> > 4     BBB      9530149
> > 8     BBB      6000000
> > 16    BBB     3000
> > 20    BBB     3283049
> > 21    BBB 
    1200000
> > 30    BBB     9500000
> > 32    BBB     1501940
> >
> > My problem is I need to totals of eads falling in the respective bands
> >
> > I
> > am defining bands in millions as
> >
> > seq_BBB <- seq(1000000, max(df_sorted_BBB$ead), by = 1000000)
> >
> > # The output is
> > [1] 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06 9e+06
> >
> > So for the sub data pertaining to Rating "BBB", I want corresponding ead 
> > totals i.e. I want ead totals where ead < 1e+06, then I want ead totals 
> > where 1+e06 < ead < 2e+06, 2e+06 < ead < 3e+06 ...and so on.
> >
> > I have tried the following code
> >
> > s_BBB <- NULL
> >
> > for (i in 1:length(s_BBB))
> > {
> >
 s_BBB[i] = sum(subset(df_sorted_BBB$ead, df_sorted_BBB$ead < s_BBB[i]))
> > }
> >
> > I was trying to find totals ofads < 1e+06, ead < 2e+06, ead<3e+06and so on.
> >
> > but the result is
> >
> >> s_BBB
> > [1] 0
> >
> >
> > I apologize if I am not able to express my problem properly. My only 
> > objective is first to sort the whole portfolio rating-wise and then within 
> > each of these rating-wise sorted data, I wish to find out total of eads 
> > based
> > on various bands starting <1000000,  1000000 - 200000, 2000000 - 3000000, 
> > 3000000 - 4000000 and so on. Since the database contains more than 50000 
> > records, various ead amounts ranging from few 000's to billion are 
> > available.
> >
> > Please guide
> >
> > Thanking  you all in advance

> 

David Winsemius, MD
West Hartford,
 CT




        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Band-wise Conditional Sum - Actual problem

Reply via email to