Hi Jim, Thanks for posting this. Honestly the methods list I initially proposed years ago and the one Luke eventually put in which had some of what I had said and a bunch of new stuff, was pretty heavily focused on numeric values (exclusively so, in my case, I think).
I agree that there is a lot of space to beef up how AltStrings behave. I agree that nchar and substring also make a lot of sense. Perhaps nzchar as well. There are some other things that might be good as well. In particularly Michael Lawrence and I have talked about things like AltStrings that *know* all their elements have the same encoding in the same way that numeric, integer, or nwo logical ALTREP vectors can know they don't have any NAs. The suspicion being that this would make certain expensive (I think) encoding checks and possibly conversions, much cheaper. I'm far from an expert on encodings and the the costs/difficulties therein, but the concept seems pretty straightforward and reasonable to me. I hope to get back to the matching logic and get that hooked in (I did a bunch of work on it sometime ago but it ended up having problems at the time, so it either never went in or it did go in but luke had to pull it back out, I don't recall which). When(/if) that does happen I'd suspect that matching would be another one that we'd want AltStrings to have first class support for. Regexes in general are probably another big area, since I'd think it would be nice to not need to wrap and unrwap the elements when the underlying library doesn't want them wrapped as CHARSXPs anyway... Another area that is more fraught, but my intuition suggests might be really nice, is pasting. A paste(x,collapse="bla") method would be easily achievable and potentially useful. paste(x,y, z) where x is an ALTREP with a paste method could also be nice, potentially returning the same type of AltString representation of the concatenation. If there were both paste before and paste after then it would be possible to potentially support arbitrary pastes, though things would get complicated (perhaps fatally so?) if more than one argument was an AltString. Overall, though I agree. it is looking like I'll have some time shortly to get back to some R things I've been wanting to do so I'll put a proposal for some string altmethods together and see what people (mostly Luke, tbh) think. Best, ~G On Thu, Dec 19, 2019 at 11:39 AM Jim Hester <james.f.hes...@gmail.com> wrote: > A useful extension of ALTREP is having two new string methods which > return the number of characters of a given string element and to > return a substring of an element. > > Having these methods would allow retrieving these values without > needing to create a CHARSXP for the full element data, which could > potentially be costly for long elements. > > For example say you have an ALTREP altstring vector where each element > holds the sequence of a single chromosome, it would be useful to query > the lengths of each chromosome and retrieve the first 100 characters > etc. without having to put the whole chromosome in memory. I realize > there are tools in Bioconductor to handle this particular case, but it > seems the general case would be perfect for ALTREP. > > Jim > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel