Dear Mark and all interested,

Unfortunately the code provided by Mark does not work - there is a syntax error when run as provided. I looked at possibly solving the problem, but without much knowledge of the output of "split" (looks like a list of lists, and not a list of data frames), it is difficult to identify where in the call to lapply the problem arises. The problem both in Mark's code and my original (with tapply) is on the format of the output of the call to an implicit loop. In fact I find this area of R one of the most obscure to my simplistic way of thinking (I would expect the output to have the same format as the input (data.frame to data.frame), but I am certain there must be good reasons for the way implicit loop functions return what they do).

Any further help would be appreciated, as I may have to resort to some (less elegant) loop...

Kind regards,
Ivan

On 22 Oct 2008, at 00:22, [EMAIL PROTECTED] wrote:
Hi Ivan: I think I understand better so below is some new code but I'm still not totally sure that it's what you want. If not, then I think it brings you closer anyway ? the split function is very useful and I think that's what you need. let me know if below is what you needed. if it's close but not quite right, i can look at it again. it's not a problem. if i'm totally off, maybe you should resend to the list because that means I probably can't fix it.

#= = = = = = = = = = ====================================================================== a <- read.csv ( file = "/opt/mark/research/equity/projects/R_mails/ example.csv" , colClasses = c ( "Date" , "numeric" ) ) #beware of the path

# SPLIT BY DATE
# TO CREATE A LIST OF
# DATAFRAMES
DFlist <- split(a,a$Date)
print(str(DFlist))

# USE LAPPLY TO CALL cut AND
# THEN aggregate ON EACH COMPONENT
# DATAFRAME IN THE LIST
tempresult <- lapply(DFlist,function(.df) {
.df$quantile <- cut(.df$value,breaks=quantile(.df $value,probs=seq(0,1,0.1),na.rm=TRUE))
  aggregate(.df$value,list(DATE=.df$Date,QUANTILE=.df$quantile),sum)
})

# CHECK IF IT WORKED
print(tempresult)

# RBIBND EVERYTHING BACK TOGETHER
# SO THAT IT"S ONE DATAFRAME
finalresult <- do.call(rbind,tempresult)
print(finalresult)




On Tue, Oct 21, 2008 at  5:47 PM, Ivan Alves wrote:

Hello Mark,
Many thanks for the reply. Your suggestion is essentially equivalent to my first attempt: the quantiles are estimated for the WHOLE of the a.value column. Essentially what I would need is to first break down the value column by "bins" determined by the a.date column and THEN estimate the quantile for each "bin". you see, I would need the quantiles for each data entry, not for all the entries, thus if there are 12 dates (or "bins"), then I would need 12x#10 deciles, not just 10.
Kind regards,
Ivan

On 21 Oct 2008, at 22:20, [EMAIL PROTECTED] wrote:

Hi: I still wasn't very clear on what you wanted but that might be because i didn't save your original email ? I doubt that below helps. i used cut instead of cut2 because I didn't have Hmisc loaded and I think cut does what you want ? Jim will probably later with a better answer. He's the real expert with this type of thing. I just like to practice.

a <- read.csv ( file = "/opt/mark/research/equity/projects/ R_mails/ example.csv" , colClasses = c ( "Date" , "numeric" ) ) a$quantile <- cut(a$value,breaks=quantile(a $value,probs=seq(0,1,0.1),na.rm=TRUE))
aggregate(a$value,list(DATE=a$Date,QUANTILE=a$quantile),sum)

On 21 Oct 2008, at 09:25, Ivan Alves wrote:

Dear all,

Thanks to Jim and Mark for suggesting including the reproducible code. Please note that the enclosed file would need to go to into the home folder or that the path for reading the CSV file be changed. I hope no encoding issues emerge when reading it.

And the code

library(Hmisc) #need the cut2 function to mark the quantile a given line belongs to a <- read.csv(file = "~/example.csv", colClasses=c("Date","numeric")) #beware of the path
dim(a) #should give "[1] 5076    2"
aggregate(a$value, list(Date = a[,"Date"],Quantile=cut2(a $value,g=10)),sum) #should give the sum by year but on the quantiles for the whole population aggregate(a$value, list(Date = a[,"Date"],Quantile=tapply(a $value,use.filter$Date,cut2,g=10)),sum) #gives error mentioned below

Once again, many thanks for any help
Ivan

On 21 Oct 2008, at 02:40, jim holtman wrote:

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

You need to at least post a subset of your data so that we can
understand the data structures that you are using. 'dput' will create
an easily readable format for posting your data (much easier than if
you post the listing of a table).  Usually it is some 'type mismatch'
which says you really have to have the data to run the script against.

On Mon, Oct 20, 2008 at 6:38 PM, Ivan Alves <[EMAIL PROTECTED]> wrote:
Dear all,

I would like to aggregate a data frame (consisting of 2 columns - one
for the bins, say factors, and one for the values) along bins and
quantiles within the bins.

I have tried

aggregate(data.frame$values, list(bin = data.frame
$bin,Quantile=cut2(data.frame$bin,g=10)),sum)

but then the quantiles apply to the population as a whole and not the
individual bins. Upon this realisation I have tried

aggregate(data.frame$values, list(bin = data.frame
$bin,Quantile=tapply(data.frame$values,data.frame $bin,cut2,g=10)),sum)

which gives the following error:

Error in sort.list(unique.default(x), na.last = TRUE) :
'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

clearly I am doing something wrong, but cannot figure out what.  I
believe the error stems either from a. the output of tapply being a
list of a dimension equal to the number of bins, and not a list of
equal dimension as the values, or b. that somehow aggregate does not
like that the second list (of the quantiles within the bins are not
sorted nicely)

1. Do you have a reference for doing the summation on both bins and
quantiles within the bins?
2. If not, can you give me some guidance as to what I am doing wrong
and how I can solve the sort/list issue?

Any help would be greatly appreciated

Kind regards,

Ivan Alves


     [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to