Re: [R] aggregating along bins and bin-quantiles

Ivan Alves Wed, 22 Oct 2008 01:54:20 -0700

Dear Mark and all interested,

Unfortunately the code provided by Mark does not work - there is asyntax error when run as provided. I looked at possibly solving theproblem, but without much knowledge of the output of "split" (lookslike a list of lists, and not a list of data frames), it is difficultto identify where in the call to lapply the problem arises. Theproblem both in Mark's code and my original (with tapply) is on theformat of the output of the call to an implicit loop. In fact I findthis area of R one of the most obscure to my simplistic way ofthinking (I would expect the output to have the same format as theinput (data.frame to data.frame), but I am certain there must be goodreasons for the way implicit loop functions return what they do).

Any further help would be appreciated, as I may have to resort to some(less elegant) loop...


Kind regards,
Ivan

On 22 Oct 2008, at 00:22, [EMAIL PROTECTED] wrote:

Hi Ivan: I think I understand better so below is some new code butI'm still not totally sure that it's what you want. If not, then Ithink it brings you closer anyway ? the split function is veryuseful and I think that's what you need. let me know if below iswhat you needed.if it's close but not quite right, i can look at it again. it's nota problem. if i'm totally off, maybe you should resend to the listbecause that means I probably can't fix it.
#================================================================================a <- read.csv ( file = "/opt/mark/research/equity/projects/R_mails/example.csv" , colClasses = c ( "Date" , "numeric" ) ) #beware ofthe path
# SPLIT BY DATE
# TO CREATE A LIST OF
# DATAFRAMES
DFlist <- split(a,a$Date)
print(str(DFlist))

# USE LAPPLY TO CALL cut AND
# THEN aggregate ON EACH COMPONENT
# DATAFRAME IN THE LIST
tempresult <- lapply(DFlist,function(.df) {
.df$quantile <- cut(.df$value,breaks=quantile(.df$value,probs=seq(0,1,0.1),na.rm=TRUE))
  aggregate(.df$value,list(DATE=.df$Date,QUANTILE=.df$quantile),sum)
})

# CHECK IF IT WORKED
print(tempresult)

# RBIBND EVERYTHING BACK TOGETHER
# SO THAT IT"S ONE DATAFRAME
finalresult <- do.call(rbind,tempresult)
print(finalresult)




On Tue, Oct 21, 2008 at  5:47 PM, Ivan Alves wrote:
Hello Mark,
Many thanks for the reply. Your suggestion is essentiallyequivalent to my first attempt: the quantiles are estimated for theWHOLE of the a.value column. Essentially what I would need is tofirst break down the value column by "bins" determined by thea.date column and THEN estimate the quantile for each "bin". yousee, I would need the quantiles for each data entry, not for allthe entries, thus if there are 12 dates (or "bins"), then I wouldneed 12x#10 deciles, not just 10.
Kind regards,
Ivan

On 21 Oct 2008, at 22:20, [EMAIL PROTECTED] wrote:
Hi: I still wasn't very clear on what you wanted but that might bebecause i didn't save your original email ? I doubt that belowhelps. i used cut instead of cut2 because I didn't have Hmiscloaded and I think cut does what you want ? Jim will probablylater with a better answer.He's the real expert with this type of thing. I just like topractice.
a <- read.csv ( file = "/opt/mark/research/equity/projects/R_mails/ example.csv" , colClasses = c ( "Date" , "numeric" ) )a$quantile <- cut(a$value,breaks=quantile(a$value,probs=seq(0,1,0.1),na.rm=TRUE))
aggregate(a$value,list(DATE=a$Date,QUANTILE=a$quantile),sum)

On 21 Oct 2008, at 09:25, Ivan Alves wrote:

Dear all,

Thanks to Jim and Mark for suggesting including the reproduciblecode. Please note that the enclosed file would need to go to intothe home folder or that the path for reading the CSV file bechanged. I hope no encoding issues emerge when reading it.


And the code

library(Hmisc) #need the cut2 function to mark the quantile a givenline belongs toa <- read.csv(file = "~/example.csv",colClasses=c("Date","numeric")) #beware of the path

dim(a) #should give "[1] 5076    2"

aggregate(a$value, list(Date = a[,"Date"],Quantile=cut2(a$value,g=10)),sum) #should give the sum by year but on the quantilesfor the whole populationaggregate(a$value, list(Date = a[,"Date"],Quantile=tapply(a$value,use.filter$Date,cut2,g=10)),sum) #gives error mentioned below


Once again, many thanks for any help
Ivan

On 21 Oct 2008, at 02:40, jim holtman wrote:

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

You need to at least post a subset of your data so that we can

understand the data structures that you are using. 'dput' willcreate

an easily readable format for posting your data (much easier than if
you post the listing of a table).  Usually it is some 'type mismatch'

which says you really have to have the data to run the scriptagainst.


On Mon, Oct 20, 2008 at 6:38 PM, Ivan Alves <[EMAIL PROTECTED]> wrote:

Dear all,

I would like to aggregate a data frame (consisting of 2 columns -one

for the bins, say factors, and one for the values) along bins and
quantiles within the bins.

I have tried

aggregate(data.frame$values, list(bin = data.frame
$bin,Quantile=cut2(data.frame$bin,g=10)),sum)

but then the quantiles apply to the population as a whole and notthe

individual bins. Upon this realisation I have tried

aggregate(data.frame$values, list(bin = data.frame

$bin,Quantile=tapply(data.frame$values,data.frame$bin,cut2,g=10)),sum)


which gives the following error:

Error in sort.list(unique.default(x), na.last = TRUE) :
'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

clearly I am doing something wrong, but cannot figure out what.  I
believe the error stems either from a. the output of tapply being a
list of a dimension equal to the number of bins, and not a list of
equal dimension as the values, or b. that somehow aggregate does not
like that the second list (of the quantiles within the bins are not
sorted nicely)

1. Do you have a reference for doing the summation on both bins and
quantiles within the bins?
2. If not, can you give me some guidance as to what I am doing wrong
and how I can solve the sort/list issue?

Any help would be greatly appreciated

Kind regards,

Ivan Alves


     [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregating along bins and bin-quantiles

Reply via email to