Re: [Rd] head/tail breaking change
> Jan Gorecki > on Thu, 19 Dec 2019 11:49:11 +0530 writes: > Thank you Gabriel, > I agree that new behaviour makes much more sense. Just wanted to confirm > before resolving compatibility of my unit tests. > Best, > Jan Indeed, Gabe's explanation is right-on-spot: With the generalization of head() / tail(), we really found it undesirable to stay "internally inconsistent". We do have to grab the chance for not-quite-back-compatible improvements -- when the costs look comparably small -- for R 4.0.0. Martin > On Wed 18 Dec, 2019, 10:46 PM Gabriel Becker, wrote: >> Jan, >> >> That is an intentional change as you can see in the documentation for >> head/tail in R-devel. Last time I discussed it with Martin, this behavior >> was desired and thus is unlikely to change unless "our" (ie his) mind does. >> >> The hope is that the new behavior is actually what people would want (note >> it already behaves this way for data.frames and for matrices, which are now >> explicitly array objects with 2 dimensions as well as classed as matrices, >> so its more consistent now, and more reasonable for the object). >> >> Best, >> ~G >> >> On Wed, Dec 18, 2019 at 2:44 AM Jan Gorecki wrote: >> >>> Hi R-devel community, >>> >>> I am aware of changes in R-devel in head/tail methods but I was not >>> expecting that to be a breaking change. >>> >>> # R 3.6.1 >>> ar = array(1:27, c(3,3,3)) >>> tail(ar, 1) >>> #[1] 27 >>> >>> The current output of R-devel is something that I would expect from a >>> >>> tail(ar, c(1, Inf, Inf)) >>> >>> or >>> >>> tail(ar, c(1, NA, NA)) >>> >>> calls. >>> Is it going to stay like this or there are plans to mitigate this >>> breaking change? >>> >>> # R-devel 2019-12-17 r77592 >>> ar = array(1:27, c(3,3,3)) >>> tail(ar, 1) >>> #, , 1 >>> # >>> # [,1] [,2] [,3] >>> #[3,]369 >>> # >>> #, , 2 >>> # >>> # [,1] [,2] [,3] >>> #[3,] 12 15 18 >>> # >>> #, , 3 >>> # >>> # [,1] [,2] [,3] >>> #[3,] 21 24 27 >>> >>> Best, >>> Jan Gorecki >>> >>> __ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> > [[alternative HTML version deleted]] > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] ALTREP string methods for substr and nchar
A useful extension of ALTREP is having two new string methods which return the number of characters of a given string element and to return a substring of an element. Having these methods would allow retrieving these values without needing to create a CHARSXP for the full element data, which could potentially be costly for long elements. For example say you have an ALTREP altstring vector where each element holds the sequence of a single chromosome, it would be useful to query the lengths of each chromosome and retrieve the first 100 characters etc. without having to put the whole chromosome in memory. I realize there are tools in Bioconductor to handle this particular case, but it seems the general case would be perfect for ALTREP. Jim __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] ALTREP string methods for substr and nchar
Hi Jim, Thanks for posting this. Honestly the methods list I initially proposed years ago and the one Luke eventually put in which had some of what I had said and a bunch of new stuff, was pretty heavily focused on numeric values (exclusively so, in my case, I think). I agree that there is a lot of space to beef up how AltStrings behave. I agree that nchar and substring also make a lot of sense. Perhaps nzchar as well. There are some other things that might be good as well. In particularly Michael Lawrence and I have talked about things like AltStrings that *know* all their elements have the same encoding in the same way that numeric, integer, or nwo logical ALTREP vectors can know they don't have any NAs. The suspicion being that this would make certain expensive (I think) encoding checks and possibly conversions, much cheaper. I'm far from an expert on encodings and the the costs/difficulties therein, but the concept seems pretty straightforward and reasonable to me. I hope to get back to the matching logic and get that hooked in (I did a bunch of work on it sometime ago but it ended up having problems at the time, so it either never went in or it did go in but luke had to pull it back out, I don't recall which). When(/if) that does happen I'd suspect that matching would be another one that we'd want AltStrings to have first class support for. Regexes in general are probably another big area, since I'd think it would be nice to not need to wrap and unrwap the elements when the underlying library doesn't want them wrapped as CHARSXPs anyway... Another area that is more fraught, but my intuition suggests might be really nice, is pasting. A paste(x,collapse="bla") method would be easily achievable and potentially useful. paste(x,y, z) where x is an ALTREP with a paste method could also be nice, potentially returning the same type of AltString representation of the concatenation. If there were both paste before and paste after then it would be possible to potentially support arbitrary pastes, though things would get complicated (perhaps fatally so?) if more than one argument was an AltString. Overall, though I agree. it is looking like I'll have some time shortly to get back to some R things I've been wanting to do so I'll put a proposal for some string altmethods together and see what people (mostly Luke, tbh) think. Best, ~G On Thu, Dec 19, 2019 at 11:39 AM Jim Hester wrote: > A useful extension of ALTREP is having two new string methods which > return the number of characters of a given string element and to > return a substring of an element. > > Having these methods would allow retrieving these values without > needing to create a CHARSXP for the full element data, which could > potentially be costly for long elements. > > For example say you have an ALTREP altstring vector where each element > holds the sequence of a single chromosome, it would be useful to query > the lengths of each chromosome and retrieve the first 100 characters > etc. without having to put the whole chromosome in memory. I realize > there are tools in Bioconductor to handle this particular case, but it > seems the general case would be perfect for ALTREP. > > Jim > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel