Abby, Vectors do have an internal mechanism for knowing that they are sorted via ALTREP (it was one of 2 core motivating features for 'smart vectors' the other being knowledge about presence of NAs).
Currently I don't think we expose it at the R level, though it is part of the official C API. I don't know of any plans for this to change, but I suppose it could. Plus for functions in R itself, we could even use it without exposing it more widely. A number of functions, including sort itself, already do this in fact, but more could. I'd be interested in hearing which functions you think would particularly benefit from this. ~G On Mon, Mar 15, 2021 at 12:01 PM SOEIRO Thomas <thomas.soe...@ap-hm.fr> wrote: > Hi Abby, > > Thank you for your positive feedback. > > I agree for your general comment about sorting. > > For ave specifically, ordering may not help because the output must > maintain the order of the input (as ave returns only x and not the entiere > data.frame). > > Thanks, > > Thomas > ________________________________________ > De : Abby Spurdle <spurdl...@gmail.com> > Envoyé : lundi 15 mars 2021 10:22 > À : SOEIRO Thomas > Cc : r-devel@r-project.org > Objet : Re: [Rd] Potential improvements of ave? > > EMAIL EXTERNE - TRAITER AVEC PRÉCAUTION LIENS ET FICHIERS > > Hi Thomas, > > These are some great suggestions. > But I can't help but feel there's a much bigger problem here. > > Intuitively, the ave function could (or should) sort the data. > Then the indexing step becomes almost trivial, in terms of both time > and space complexity. > And the ave function is not the only example of where a problem > becomes much simpler, if the data is sorted. > > Historically, I've never found base R functions user-friendly for > aggregation purposes, or for sorting. > (At least, not by comparison to SQL). > > But that's not the main problem. > It would seem preferable to sort the data, only once. > (Rather than sorting it repeatedly, or not at all). > > Perhaps, objects such as vectors and data.frame(s) could have a > boolean attribute, to indicate if they're sorted. > Or functions such as ave could have a sorted argument. > In either case, if true, the function assumes the data is sorted and > applies a more efficient algorithm. > > > B. > > > On Sat, Mar 13, 2021 at 1:07 PM SOEIRO Thomas <thomas.soe...@ap-hm.fr> > wrote: > > > > Dear all, > > > > I have two questions/suggestions about ave, but I am not sure if it's > relevant for bug reports. > > > > > > > > 1) I have performance issues with ave in a case where I didn't expect > it. The following code runs as expected: > > > > set.seed(1) > > > > df1 <- data.frame(id1 = sample(1:1e2, 5e2, TRUE), > > id2 = sample(1:3, 5e2, TRUE), > > id3 = sample(1:5, 5e2, TRUE), > > val = sample(1:300, 5e2, TRUE)) > > > > df1$diff <- ave(df1$val, > > df1$id1, > > df1$id2, > > df1$id3, > > FUN = function(i) c(diff(i), 0)) > > > > head(df1[order(df1$id1, > > df1$id2, > > df1$id3), ]) > > > > But when expanding the data.frame (* 1e4), ave fails (Error: cannot > allocate vector of size 1110.0 Gb): > > > > df2 <- data.frame(id1 = sample(1:(1e2 * 1e4), 5e2 * 1e4, TRUE), > > id2 = sample(1:3, 5e2 * 1e4, TRUE), > > id3 = sample(1:(5 * 1e4), 5e2 * 1e4, TRUE), > > val = sample(1:300, 5e2 * 1e4, TRUE)) > > > > df2$diff <- ave(df2$val, > > df2$id1, > > df2$id2, > > df2$id3, > > FUN = function(i) c(diff(i), 0)) > > > > This use case does not seem extreme to me (e.g. aggregate et al work > perfectly on this data.frame). > > So my question is: Is this expected/intended/reasonable? i.e. Does ave > need to be optimized? > > > > > > > > 2) Gabor Grothendieck pointed out in 2011 that drop = TRUE is needed to > avoid warnings in case of unused levels ( > https://urldefense.com/v3/__https://stat.ethz.ch/pipermail/r-devel/2011-February/059947.html__;!!JQ5agg!J2AUFbQr31F2c6LUpTnyc5TX2Kh1bJ-VqhMND1c0N5axWO_tQl0pCJhtucPfjU7NXrBO$ > ). > > Is it relevant/possible to expose the drop argument explicitly? > > > > > > > > Thanks, > > > > Thomas > > ______________________________________________ > > R-devel@r-project.org mailing list > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-devel__;!!JQ5agg!J2AUFbQr31F2c6LUpTnyc5TX2Kh1bJ-VqhMND1c0N5axWO_tQl0pCJhtucPfjUzdLFM1$ > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel