Re: [R] Correctly applying aggregate.ts()

Rui Barradas Sat, 08 Sep 2018 03:55:05 -0700

Hello,

Like Bert said, your data is a data.frame so there is no need to callaggregate.ts. Besides, R will call the right method so unless you wantto change the standard behaviour, it would be enough to call aggregateand let the methods dispatch code to its job.

As for the problem, first an example of the formula interface, which Ialmost always prefer.



aggregate(prcp ~ substr(sampdate, 1, 7), data = dp, FUN = sum, na.rm = TRUE)
#  substr(sampdate, 1, 7) prcp
#1                2005-01 4.88
#2                2005-02 2.27
#3                2005-03 0.06

Now, you would have to change the name of the Month column, but itworked as expected, there was no NA issues.And there is no need to subset the data.frame, R will find the columnswhere they are, by their names, as long as you pass the argument data =dp to aggregate.

If you want several statistics at the same time, it's a bit trickier,but with practice it becomes intuitive. (So to speak.)

Define a custom summary function. I haven't changed the default na.rmsetting but it would make the rest of the code simpler to set na.rm =TRUE right now.


customSmry <- function(x, na.rm = FALSE){
  c(Sum = sum(x, na.rm = na.rm),
    Median = median(x, na.rm = na.rm),
    Max = max(x, na.rm = na.rm)
  )
}


#Now call aggregate:

agg <- aggregate(prcp ~ substr(sampdate, 1, 7), dp, FUN = customSmry,na.rm = TRUE)

But be VERY carefull, the result is not a df with 4 columns, it's a dfwith only two columns, the second being a matrix as you can see in theoutput of str.



str(agg)
#'data.frame':  3 obs. of  2 variables:
# $ substr(sampdate, 1, 7): chr  "2005-01" "2005-02" "2005-03"

# $ prcp : num [1:3, 1:3] 4.88 2.27 0.06 0.05 0 0.011.12 0.65 0.05

# ..- attr(*, "dimnames")=List of 2
# .. ..$ : NULL
# .. ..$ : chr  "Sum" "Median" "Max"


So the final steps will be to cbind those two "columns" into a df.

"columns" is between quotes because I am not cbinding the first column,I'm cbinding the sub-df agg[1]. Like this the method of cbind that iscalled is cbind.data.frame and the result is a df.Also, since df's are lists, the second column is an actual column butnot a vector, an object of class matrix. This column is a list member,like all df columns and I will subset the df 'agg' as a list, agg[[2]].

As a bonus, the colnames of the matrix are immediately right, no prcpprefix. The first column's name comes from the function substr, and isnot part of this story, just rename it when it's all done.



agg <- cbind(agg[1], agg[[2]])
str(agg)
#'data.frame':  3 obs. of  4 variables:
# $ substr(sampdate, 1, 7): chr  "2005-01" "2005-02" "2005-03"
# $ Sum                   : num  4.88 2.27 0.06
# $ Median                : num  0.05 0 0.01
# $ Max                   : num  1.12 0.65 0.05

names(agg)[1] <- "Month"
agg
#    Month  Sum Median  Max
#1 2005-01 4.88   0.05 1.12
#2 2005-02 2.27   0.00 0.65
#3 2005-03 0.06   0.01 0.05

Finally, try to get some practice with the formula interface, you willsee that it pays in code simplicity and readability.



Hope this helps,

Rui Barradas

Às 22:19 de 07/09/2018, Rich Shepard escreveu:

   I've read ?aggregate and several blog posts on using aggregate() yet I
still haven't applied it correctly to my dataframe. The sample data are:
structure(list(sampdate = c("2005-01-01", "2005-01-02", "2005-01-03","2005-01-04", "2005-01-05", "2005-01-06", "2005-01-07", "2005-01-08","2005-01-09", "2005-01-10", "2005-01-11", "2005-01-12", "2005-01-13","2005-01-14", "2005-01-15", "2005-01-16", "2005-01-17", "2005-01-18","2005-01-19", "2005-01-20", "2005-01-21", "2005-01-22", "2005-01-23","2005-01-24", "2005-01-25", "2005-01-26", "2005-01-27", "2005-01-28","2005-01-29", "2005-01-30", "2005-01-31", "2005-02-01", "2005-02-02","2005-02-03", "2005-02-04", "2005-02-05", "2005-02-06", "2005-02-07","2005-02-08", "2005-02-09", "2005-02-10", "2005-02-11", "2005-02-12","2005-02-13", "2005-02-14", "2005-02-15", "2005-02-16", "2005-02-17","2005-02-18", "2005-02-19", "2005-02-20", "2005-02-21", "2005-02-22","2005-02-23", "2005-02-24", "2005-02-25", "2005-02-26", "2005-02-27","2005-02-28", "2005-03-01", "2005-03-02", "2005-03-03"), prcp = c(0.59,0.08, 0.1, 0, 0, 0.02, 0.05, 0.1, 0, 0.02, 0, 0.05, 0.2, 0, 0, 0.5,0.41, 0.84, 0.01, 0.1, 0.01, 0, 0, 0, 0, 0.21, 0.24, 0.13, 1.12, 0.01,0.09, 0, 0, 0, 0.35, 0.18, 0.65, 0.16, 0, 0, 0, 0, 0.55, 0.21, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.17, 0.05, 0.01, 0)), row.names =c(NA, 62L), class = "data.frame")
   What I need to learn how to do is to calculate monthly sum, median, and
maximum rainfall amounts from the full data set which has daily rainfall
amounts. My most current effort to calculate monthly sums uses this syntax:

monthly.rain <- aggregate.ts(x = dp['sampdate','prcp'], by = list(month = \
substr(dp$sampdate, 1, 7)), FUN = sum, na.rm = TRUE)

(entered on a single line) which produces this result:

head(monthly.rain)
[1] NA
The sample data has 62 of the 113K rows in the dataframe. A largerset can
be provided if needed.

   An explanation of what I've missed is needed.

Regards,

Rich

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Correctly applying aggregate.ts()

Reply via email to