Sorry about the cross posting - i didn't realise it was bad etiquette. my sessioninfo was as follows:
> sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods [7] base other attached packages: [1] xts_0.8-2 zoo_1.7-6 loaded via a namespace (and not attached): [1] grid_2.14.1 lattice_0.20-0 i have now updated to R 2.15, and my session info is: > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods [7] base other attached packages: [1] xts_0.8-6 zoo_1.7-7 loaded via a namespace (and not attached): [1] grid_2.15.0 lattice_0.20-6 tools_2.15.0 For the XTS object mn your suggestion still fails with an error: > adf <- aggregate(mn[,-1]~mn[,1], data=mn, sum); adf Error in aggregate.formula(mn[, -1] ~ mn[, 1], data = mn, sum) : 'names' attribute [3] must be the same length as the vector [1] however when i convert to a zoo with > mnz <- as.zoo(mn) I get some errors, but it works > adf <- aggregate(mnz[,-1]~mnz[,1], data=mnz, sum); adf Warning messages: 1: In zoo(rval, index(x)[i]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique 2: In zoo(rval, index(x)[i]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique 3: In zoo(rval[i], index(x)[i]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique 4: In zoo(rval[i], index(x)[i]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique 5: In zoo(xc[ind], ix[ind]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique 6: In zoo(xc[ind], ix[ind]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique mnz[, 1] mnz[, -1] 1 97.90 408 2 97.89 208 So is this a bug in XTS? thanks for your patience mj On 13 June 2012 15:53, David Winsemius <dwinsem...@comcast.net> wrote: > > On Jun 13, 2012, at 9:38 AM, Matthew Johnson wrote: > >> Sorry, i'll try and put more flesh on the bones. >> >> please note, i changed the data in the example, as fiddling has raised >> another question that's best illustrated with a slightly different >> data set. >> >> first of all, when i do as you suggest, i obtain the following error: >> >>> PxMat <- aggregate(mm[,-1] ~ mm[,1], data=mm, sum) >> >> >> Error in aggregate.formula(mm[, -1] ~ mm[, 1], data = mm, sum) : >> 'names' attribute [3] must be the same length as the vector [1] > > > Very strange. When I just did it with the structure you (cross-) posted on > SO I got: > >> adf <- aggregate(mm[,-1]~mm[,1], data=mm, sum); adf > snipped warning messages > mm[, 1] mm[, -1] > > 1 97.91 538 > 2 97.92 918 > > I had earlier tested it with a zoo object I had constructed and did it again > with the structure below. > > mm[, 1] mm[, -1] > > 1 97.91 538 > 2 97.92 918 > > I'm using zoo_1.7-6 and R version 2.14.2 on a Mac. I do not remember you > posting the requested information about your versions. > > -- > David. > > >> >> my data.frame is an xts, and it looks like this: >> >> px_ym1 vol_ym1 >> 2012-06-01 09:30:00 97.90 9 >> 2012-06-01 09:30:00 97.90 60 >> 2012-06-01 09:30:00 97.90 71 >> 2012-06-01 09:30:00 97.90 5 >> 2012-06-01 09:30:00 97.90 3 >> 2012-06-01 09:30:00 97.90 21 >> 2012-06-01 09:31:00 97.90 5 >> 2012-06-01 09:31:00 97.89 192 >> 2012-06-01 09:31:00 97.89 65 >> 2012-06-01 09:31:00 97.89 73 >> 2012-06-01 09:31:00 97.89 1 >> 2012-06-01 09:31:00 97.89 1 >> 2012-06-01 09:31:00 97.89 39 >> 2012-06-01 09:31:00 97.90 15 >> 2012-06-01 09:31:00 97.90 1 >> 2012-06-01 09:31:00 97.89 1 >> 2012-06-01 09:31:00 97.90 18 >> 2012-06-01 09:31:00 97.89 1 >> 2012-06-01 09:32:00 97.89 33 >> 2012-06-01 09:34:00 97.89 1 >> 2012-06-01 09:34:00 97.89 1 >> >> dput(mn) returns: >> >>> dput(mn) >> >> structure(c(97.9, 97.9, 97.9, 97.9, 97.9, 97.9, 97.9, 97.89, >> 97.89, 97.89, 97.89, 97.89, 97.89, 97.9, 97.9, 97.89, 97.9, 97.89, >> 97.89, 97.89, 97.89, 9, 60, 71, 5, 3, 21, 5, 192, 65, 73, 1, >> 1, 39, 15, 1, 1, 18, 1, 33, 1, 1), .indexCLASS = c("POSIXct", >> "POSIXt"), .indexTZ = "GMT", class = c("xts", "zoo"), index = >> structure(c(1338543000, >> 1338543000, 1338543000, 1338543000, 1338543000, 1338543000, 1338543060, >> 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, >> 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543120, >> 1338543240, 1338543240), tzone = "GMT", tclass = c("POSIXct", >> "POSIXt")), .Dim = c(21L, 2L), .Dimnames = list(NULL, c("px_ym1", >> "vol_ym1"))) >> >> as you can see, the xts data.frame xts data.frame that contains dates, >> prices and volumes. There is much more data over a long time period, >> and i'm interested in various sub-setting and then aggregate >> operations. >> >> I would like to split the data by time period and aggregate the data, >> such that i obtain a table which reports the volume traded at each >> price, for each of the time-period splits that i have chosen. >> >> I have employed the following approach: >> >> PxMat <- aggregate(.~px_ym1, data=mn, sum) >> >> >> which yields: >> >> px_ym1 vol_ym1 >> 1 97.89 408 >> 2 97.90 208 >> >> and for subsets, i use the following grouping: >> >>> PxMat30 <- aggregate(.~px_ym1, data=mn[.indexmin(mn) == '30'], sum) >> >> >> Which yields: >> >> px_ym1 vol_ym1 >> 1 97.9 169 >> >> and >> >>> PxMat31 <- aggregate(.~px_ym1, data=mn[.indexmin(mn) == '31'], sum) >> >> >> which yields: >> >> px_ym1 vol_ym1 >> 1 97.89 373 >> 2 97.90 39 >> >> and so on and so forth for each minute. >> >> when i try and sub-set using general notation, as follows: >> >> PxMat <- aggregate(.~mn[,1], data=mn, sum) >> >> this yields a different form of output: >> >> px_ym1 px_ym1 vol_ym1 >> 1 97.90 1076.79 408 >> 2 97.89 979.00 208 >> >> the problem is that i now have the sum of the px_ym1 data (the sum of >> mn[,1]) >> >> hopefully things are now clearer - sorry to have wasted your time up >> until now. >> >> assuming that i have now made my situation clear, i am hope you can >> help with four specific questions. >> >> 1/ My data-sets are HUGE, so speed is an issue - is this the fastest >> way to sub-set and aggregate an xts? >> >> 2/ is there a way to do this for multiple splits? say a table for each >> minute, day, week, or month? the return would potentially be a list >> with a table for each day / minute etc showing volume traded at each >> price -- but it doesn't have to be a list ... >> >> i am writing a function with loops that would generate a table that >> reports volume traded at each price for each case of a specified time >> split (say for four tables, one for each minute in the example data, >> returned as a list). my solution is slow, it seems like something that >> someone would have done better already. is this the case? >> >> 3/ is there a way to do the sub-setting with templated variables? i >> would like to obtain the table i get with the named aggregate >> functions (reproduced above) with multiple data frames, as the column >> names will differ from time to time. i cannot figure out how to stop >> the command from summing the mn[,1] column when i stop using variable >> names. >> >> 4/ on a related note, is it possible to apply different functions to >> different columns of data? It would be nice, for example, if the table >> returned from an aggregate command could be made to be: >> >> px_ym1 count vol_ym1 >> 1 97.90 11 408 >> 2 97.89 10 208 >> >> where we have the price traded, the number of trades (a count of >> px_ym1 / mn[,1], and the sum of vol_ym1 (mn[,2]). >> >> thanks and best regards >> >> matt johnson >> >> On 13 June 2012 15:06, David Winsemius <dwinsem...@comcast.net> wrote: >>> >>> >>> >>> On Jun 12, 2012, at 11:32 PM, Matthew Johnson wrote: >>> >>>> Dear R-help, >>>> >>>> I have an xts data set that i have subset by date. >>>> >>>> now it contains a date-time-stamp, and two columns (price and volume >>>> traded): my objective is to create tables of volume traded at a price - >>>> and >>>> i've been successfully using aggregate to do so in interactive use. >>>> >>>> say the data looks as follows: >>>> >>>> px_ym1 vol_ym1 >>>> 2012-06-01 09:37:00 97.91 437 >>>> 2012-06-01 09:37:00 97.91 64 >>>> 2012-06-01 09:37:00 97.91 1 >>>> 2012-06-01 09:37:00 97.91 5 >>>> 2012-06-01 09:37:00 97.91 5 >>>> 2012-06-01 09:37:00 97.92 174 >>>> 2012-06-01 09:37:00 97.92 64 >>>> 2012-06-01 09:37:00 97.92 125 >>>> 2012-06-01 09:37:00 97.92 124 >>>> 2012-06-01 09:37:00 97.92 64 >>>> 2012-06-01 09:37:00 97.92 109 >>>> 2012-06-01 09:37:00 97.92 64 >>>> 2012-06-01 09:37:00 97.92 19 >>>> 2012-06-01 09:37:00 97.92 45 >>>> 2012-06-01 09:37:00 97.92 75 >>>> 2012-06-01 09:37:00 97.92 3 >>>> 2012-06-01 09:37:00 97.92 47 >>>> 2012-06-01 09:37:00 97.91 26 >>>> 2012-06-01 09:37:00 97.92 4 >>>> 2012-06-01 09:37:00 97.92 1 >>>> >>>> the the following gives me what i'm looking for: >>>> >>>>> adf <- aggregate(.~px_ym1, data=mm, sum) >>>> >>>> >>>> >>>> which is this table: >>>> >>>> px_ym1 vol_ym1 >>>> 1 97.91 538 >>>> 2 97.92 918 >>>> >>>> however now i'm trying to code it to run automatically, and use of the >>>> templated version: >>>> >>>>> adf <- aggregate(.~mm[,1], data=mm, sum) >>> >>> >>> >>> Did you try: >>> >>> adf <- aggregate(mm[,-1] ~ mm[,1], data=mm, sum) >>> adf >>> >>> I would have used names: >>> >>> adf <- aggregate(vol_ym1 ~ px_ym1, data=mm, sum) >>> adf >>> >>> >>> >>>> yields the following - which contains what i'd like, but is has also >>>> summed >>>> across the price column (not ideal). >>>> >>>> px_ym1 px_ym1 vol_ym1 >>>> 1 97.91 587.46 538 >>>> 2 97.92 1370.88 918 >>>> >>>> how do i code this so that i can enter an xts data-frame with arbitrary >>>> names and still obtain the table with only the information i desire? >>> >>> >>> >>> That is too far to the vague side of the vague-specific continuum. >>> >>> >>>> >>>> on a related point, is there a way to combine the two steps? >>> >>> >>> >>> Er, which two steps would that be? >>> >>> >>>> the function >>>> i've written splits by date and then returns a list containing >>>> data-frames >>>> that report the volume traded at each price on each date >>>> >>>> - am i re-creating the wheel here? is there canned function that does >>>> this? >>>> >>>> thanks + best regards >>>> >>>> matt johnson >>> >>> >>> >>> >>> David Winsemius, MD >>> West Hartford, CT >>> > > David Winsemius, MD > West Hartford, CT > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.