On Aug 30, 2010, at 4:05 AM, Vincy Pyne wrote:
Dear R helpers,
Thanks a lot for your earlier guidance esp. Mr Davind Winsemius Sir.
However, there seems to be mis-communication from my end
corresponding to my requirement. As I had mentioned in my earlier
mail, I am dealing with a very large database of borrowers and I had
given a part of it in my earlier mail as given below. For a given
rating say "A", I needed to have the bad-wise sums of ead's (where
bands are constructed using the ead size itself.) and not the number
of borrowers falling in a particular band.
I am reproducing the data and solution as provided by Winsemius Sir
(which generates the number of band-wise borrowers for a given rating.
rating <- c("A", "AAA", "A", "BBB","AA","A","BB", "BBB", "AA", "AA",
"AA", "A", "A", "AA","BB","BBB","AA", "A", "AAA","BBB","BBB", "BB",
"A", "BB", "A", "AA", "B","A", "AA", "BBB", "A", "BBB")
ead <- c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000,
1028360, 6000000, 17715000, 14430325.24, 1180946.57, 150000,
167490, 81255.16, 54812.5, 3000, 1275702.94, 9100, 1763142.3,
3283048.61, 1200000, 11800, 3000, 96894.02, 453671.72, 7590,
106065.24, 940711.67, 2443000, 9500000, 39000, 1501939.67)
df$ead.cat <- cut(df$ead, breaks=c(0, 100000, 500000, 1000000,
2000000, 5000000 , 10000000, 100000000) )
df
df_sorted <- df[order(df$rating),] # the output is as given
below.
> df_sorted
rating ead ead.cat
1 A 169229.93 (1e+05,5e+05]
3 A 5877794.25 (5e+06,1e+07]
6 A 21000.00 (0,1e+05]
12 A 150000.00 (1e+05,5e+05]
13 A 167490.00 (1e+05,5e+05]
18 A 9100.00 (0,1e+05]
23 A 3000.00 (0,1e+05]
25 A 453671.72 (1e+05,5e+05]
28 A 940711.67 (5e+05,1e+06]
31 A 39000.00 (0,1e+05]
5 AA 75040962.06 (1e+07,1e+08]
9 AA 17715000.00 (1e+07,1e+08]
10 AA 14430325.24 (1e+07,1e+08]
11 AA 1180946.57 (1e+06,2e+06]
14 AA 81255.16 (0,1e+05]
17 AA 1275702.94 (1e+06,2e+06]
26 AA 7590.00 (0,1e+05]
29 AA 2443000.00 (2e+06,5e+06]
2 AAA 100.00 (0,1e+05]
19 AAA 1763142.30 (1e+06,2e+06]
27 B 106065.24 (1e+05,5e+05]
7 BB 1028360.00 (1e+06,2e+06]
15 BB 54812.50 (0,1e+05]
22 BB 11800.00 (0,1e+05]
24 BB 96894.02 (0,1e+05]
4 BBB 9530148.63 (5e+06,1e+07]
8 BBB 6000000.00 (5e+06,1e+07]
16 BBB 3000.00 (0,1e+05]
20 BBB 3283048.61 (2e+06,5e+06]
21 BBB 1200000.00 (1e+06,2e+06]
30 BBB 9500000.00 (5e+06,1e+07]
32 BBB 1501939.67 (1e+06,2e+06]
## The following command fetches rating-wise and ead size no of
borrowers. Thus, for rating A, there are 4 borrowers in the ead
range (0, 1e+05], 4 borrowers in the range (1e+05 to 5e+05] and so
on......
> with(df, tapply(ead.cat, rating, table))
$A
(0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e
+06] (5e+06,1e+07] (1e+07,1e+08]
4 4 1 0
0 1 0
$AA
(0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e
+06] (5e+06,1e+07] (1e+07,1e+08]
2 0 0 2
1 0 3
$AAA
(0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e
+06] (5e+06,1e+07] (1e+07,1e+08]
1 0 0 1
0 0 0
$B
(0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e
+06] (5e+06,1e+07] (1e+07,1e+08]
0 1 0 0
0 0 0
$BB
(0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e
+06] (5e+06,1e+07] (1e+07,1e+08]
3 0 0 1
0 0 0
$BBB
(0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e
+06] (5e+06,1e+07] (1e+07,1e+08]
1 0 0 2
1 3 0
#### My ACTUAL REQUIREMENT
Actually for a given rating, I don't want the number of borrowers
falling in each of the ead_range. What I want is sum of eads falling
in each range. Thus, say for rating "A", I need following.
rating ead.cat ead_total
1 A (0,1e+05] 72100.00 #
(21000+9100+3000+39000)
2 A (1e+05, 5e+05] 940391.65
#(169229.93+150000.00+167490.00+453671.72)
So you just wanted simple sums within rating and ead.cat:
with(df_sorted, tapply(ead, list(rating,ead.cat), sum, na.rm=TRUE))
(0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06]
(5e+06,1e+07]
A 72100.00 940391.6 940711.7 NA
NA 5877794
AA 88845.16 NA NA 2456650
2443000 NA
AAA 100.00 NA NA 1763142
NA NA
B NA 106065.2 NA NA
NA NA
BB 163506.52 NA NA 1028360
NA NA
BBB 3000.00 NA NA 2701940
3283049 25030149
(1e+07,1e+08]
A NA
AA 107186287
AAA NA
B NA
BB NA
BBB NA
--
David.
and so on.
I am extremely sorry for any mis-communication in my earlier mail. I
could test the reply sent to me earlier by Winsemius Sir only today
as I was traveling over weekends. Also, I have tried to go through
earlier emails dealing with such conditional sums. Unfortunately, I
couldn't understand as I have recently started my venture with R.
Thanking you in advance and sincerely apologize for any mis-
communication if it had occurred in my earlier mail.
Regards
Vincy
--- On Fri, 8/27/10, David Winsemius <dwinsem...@comcast.net> wrote:
From: David Winsemius <dwinsem...@comcast.net>
Subject: Re: [R] Band-wise Sum
To: "Vincy Pyne" <vincy_p...@yahoo.ca>
Cc: r-help@r-project.org
Received: Friday, August 27, 2010, 2:36 PM
On Aug 27, 2010, at 9:49 AM, Vincy Pyne wrote:
> Hi
>
> I have a large credit portfolio (exceeding 50000 borrowers). For
particular process I need to add up the exposures based on the
bands. I am giving a small test data below.
I would think that cut() would be the accepted method for defining a
factor variable based on specified cutpoints. If you then wanted to
see what the cumsum() was across the range of possible levels, that
to would be a fairly simple task.
df$ead.cat <- cut(df$ead, breaks=c(0, 100000, 500000, 1000000,
2000000, 5000000 , 10000000, 100000000) )
df
with(df, tapply(ead.cat, rating, length))
# A AA AAA B BB BBB
# 10 8 2 1 4 7
with(df, tapply(ead.cat, rating, table))
# returns a list of table objects by bond rating
lapply( with(df, tapply(ead.cat, rating, table)) , cumsum)
#returns the cumsum of those tables
# sapply gives a more compact output of that result:
sapply( with(df, tapply(ead.cat, rating, table)) , cumsum)
A AA AAA B BB BBB
(0,1e+05] 4 2 1 0 3 1
(1e+05,5e+05] 8 2 1 1 3 1
(5e+05,1e+06] 9 2 1 1 3 1
(1e+06,2e+06] 9 4 2 1 4 3
(2e+06,5e+06] 9 5 2 1 4 4
(5e+06,1e+07] 10 5 2 1 4 7
(1e+07,1e+08] 10 8 2 1 4 7
Loops, you say we need loops? We don't need no stinkin' loops.
--David.
>
> rating <- c("A", "AAA", "A", "BBB","AA","A","BB", "BBB", "AA",
"AA", "AA", "A", "A", "AA","BB","BBB","AA", "A", "AAA","BBB","BBB",
"BB", "A", "BB", "A", "AA", "B","A", "AA", "BBB", "A", "BBB")
>
> ead <- c(169229.93,100, 5877794.25, 9530148.63, 75040962.06,
21000, 1028360, 6000000, 17715000, 14430325.24, 1180946.57,
150000, 167490, 81255.16, 54812.5, 3000, 1275702.94, 9100,
1763142.3, 3283048.61, 1200000, 11800, 3000, 96894.02, 453671.72,
7590, 106065.24, 940711.67, 2443000, 9500000, 39000, 1501939.67)
>
> ## First I have sorted the data rating-wise as
>
> df <- data.frame(rating, ead)
>
> df_sorted <-
> df[order(df$rating),]
>
> df_sorted_AAA <- subset(df_sorted, rating=="AAA")
> df_sorted_AA <- subset(df_sorted, rating=="AA")
> df_sorted_A <- subset(df_sorted, rating=="A")
> df_sorted_BBB <- subset(df_sorted, rating=="BBB")
> df_sorted_BB <- subset(df_sorted, rating=="BB")
> df_sorted_B <- subset(df_sorted, rating=="B")
> df_sorted_CCC <- subset(df_sorted, rating=="CCC")
>
> ## we begin with BBB rating. The R output for df_sorted_BBB is as
follows
>
>> df_sorted_BBB
> rating ead
> 4 BBB 9530149
> 8 BBB 6000000
> 16 BBB 3000
> 20 BBB 3283049
> 21 BBB 1200000
> 30 BBB 9500000
> 32 BBB 1501940
>
> My problem is I need to totals of eads falling in the respective
bands
>
> I
> am defining bands in millions as
>
> seq_BBB <- seq(1000000, max(df_sorted_BBB$ead), by = 1000000)
>
> # The output is
> [1] 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06 9e+06
>
> So for the sub data pertaining to Rating "BBB", I want
corresponding ead totals i.e. I want ead totals where ead < 1e+06,
then I want ead totals where 1+e06 < ead < 2e+06, 2e+06 < ead < 3e
+06 ...and so on.
>
> I have tried the following code
>
> s_BBB <- NULL
>
> for (i in 1:length(s_BBB))
> {
> s_BBB[i] = sum(subset(df_sorted_BBB$ead, df_sorted_BBB$ead <
s_BBB[i]))
> }
>
> I was trying to find totals ofads < 1e+06, ead < 2e+06, ead<3e
+06and so on.
>
> but the result is
>
>> s_BBB
> [1] 0
>
>
> I apologize if I am not able to express my problem properly. My
only objective is first to sort the whole portfolio rating-wise and
then within each of these rating-wise sorted data, I wish to find
out total of eads based
> on various bands starting <1000000, 1000000 - 200000, 2000000 -
3000000, 3000000 - 4000000 and so on. Since the database contains
more than 50000 records, various ead amounts ranging from few 000's
to billion are available.
>
> Please guide
>
> Thanking you all in advance
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.