On Fri, Dec 12, 2008 at 8:41 AM, Duncan Murdoch <murd...@stats.uwo.ca> wrote: > On 12/12/2008 8:25 AM, hadley wickham wrote: >>> >>> From which you might conclude that I don't like the design of subset, and >>> you'd be right. However, I don't think this is a counterexample to my >>> general rule. In the subset function, the select argument is treated as >>> an >>> unevaluated expression, and then there are rules about what to do with >>> it. >>> (I.e. try to look up name `a` in the data frame, if that fails, ...) >>> >>> For the requested behaviour to similarly fall within the general rule, >>> we'd >>> have to treat all indices to all kinds of things (vectors, matrices, >>> dataframes, etc.) as unevaluated expressions, with special handling for >>> the >>> particular symbol `end`. >> >> Except you wouldn't have to necessarily change indexing - you could >> change seq instead. Then 5:end could produce some kind of special >> data structure (maybe an iterator) that was recognised by the various >> indexing functions. > > Ummm, doesn't that require changes to *both* indexing and seq?
Ooops, yes. I meant it wouldn't require indexing to use unevaluated expression. >> This would still be a lot of work for not a lot >> of payoff, but it would be a logically consistent way of adding this >> behaviour to indexing, and the basic work would make it possible to >> develop other sorts of indexing, eg df[evens(), ], or df[last(5), >> last(3)]. > > I agree: it would be a nice addition, but a fair bit of work. I think it > would be quite doable for the indexable things in the base packages, but > there are a lot of contributed packages that define [ methods, and those > methods would all need to be modified too. That's true, although I suspect many contributed [.methods eventually delegate to base methods and might work without further modification. > (Just to be clear, when I say doable, I'm thinking that your iterators > return functions that compute subsets of index ranges. For example, evens() > might be implemented as > > evens <- function() { > result <- function(indices) { > indices[indices %% 2 == 0] > } > class(result) <- "iterator" > return(result) > } > > and then `[` in v[evens()] would recognize that it had been passed an > iterator, and would pass 1:length(v) to the iterator to get the subset of > even indices. Is that what you had in mind?) Yes, that's exactly what I was thinking, although you'd have to put some thought into the conventions - would it be better to pass in the length of the vector instead of a vector of indices? Should all iterators return logical vectors? That way you could do x[evens() & last(5)] to get the even indices out of the last 5, as opposed to x[evens()][last(5)] which would return the last 5 even indices. You could also imagine similar iterators for random sampling, like samp(0.2) to choose 20% of the indices, or boot(0.8) to choose 80% with replacement. first(n) could also be useful, selecting the first min(n, length(vector)) observations. An iterator version of rev() would also be handy. Maybe selector would be a better name than iterator though, as these don't have the same feel as iterators in other languages. Hadley -- http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.