date:20191219

Re: [Rd] head/tail breaking change

2019-12-19 Thread Martin Maechler

> Jan Gorecki 
> on Thu, 19 Dec 2019 11:49:11 +0530 writes:

> Thank you Gabriel,
> I agree that new behaviour makes much more sense. Just wanted to confirm
> before resolving compatibility of my unit tests.
> Best,
> Jan

Indeed, Gabe's explanation is right-on-spot:  With the
generalization of head() / tail(), we really found it undesirable to
stay "internally inconsistent".

We do have to grab the chance for not-quite-back-compatible
improvements -- when the costs look comparably small --  for R 4.0.0.

Martin

> On Wed 18 Dec, 2019, 10:46 PM Gabriel Becker,  
wrote:

>> Jan,
>> 
>> That is an intentional change as you can see in the documentation for
>> head/tail in R-devel. Last time I discussed it with Martin, this behavior
>> was desired and thus is unlikely to change unless "our" (ie his) mind 
does.
>> 
>> The hope is that the new behavior is actually what people would want 
(note
>> it already behaves this way for data.frames and for matrices, which are 
now
>> explicitly array objects with 2 dimensions as well as classed as 
matrices,
>> so its more consistent now, and more reasonable for the object).
>> 
>> Best,
>> ~G
>> 
>> On Wed, Dec 18, 2019 at 2:44 AM Jan Gorecki  wrote:
>> 
>>> Hi R-devel community,
>>> 
>>> I am aware of changes in R-devel in head/tail methods but I was not
>>> expecting that to be a breaking change.
>>> 
>>> # R 3.6.1
>>> ar = array(1:27, c(3,3,3))
>>> tail(ar, 1)
>>> #[1] 27
>>> 
>>> The current output of R-devel is something that I would expect from a
>>> 
>>> tail(ar, c(1, Inf, Inf))
>>> 
>>> or
>>> 
>>> tail(ar, c(1, NA, NA))
>>> 
>>> calls.
>>> Is it going to stay like this or there are plans to mitigate this
>>> breaking change?
>>> 
>>> # R-devel 2019-12-17 r77592
>>> ar = array(1:27, c(3,3,3))
>>> tail(ar, 1)
>>> #, , 1
>>> #
>>> # [,1] [,2] [,3]
>>> #[3,]369
>>> #
>>> #, , 2
>>> #
>>> # [,1] [,2] [,3]
>>> #[3,]   12   15   18
>>> #
>>> #, , 3
>>> #
>>> # [,1] [,2] [,3]
>>> #[3,]   21   24   27
>>> 
>>> Best,
>>> Jan Gorecki
>>> 
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>> 

> [[alternative HTML version deleted]]

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] ALTREP string methods for substr and nchar

2019-12-19 Thread Jim Hester

A useful extension of ALTREP is having two new string methods which
return the number of characters of a given string element and to
return a substring of an element.

Having these methods would allow retrieving these values without
needing to create a CHARSXP for the full element data, which could
potentially be costly for long elements.

For example say you have an ALTREP altstring vector where each element
holds the sequence of a single chromosome, it would be useful to query
the lengths of each chromosome and retrieve the first 100 characters
etc. without having to put the whole chromosome in memory. I realize
there are tools in Bioconductor to handle this particular case, but it
seems the general case would be perfect for ALTREP.

Jim

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] ALTREP string methods for substr and nchar

2019-12-19 Thread Gabriel Becker

Hi Jim,

Thanks for posting this. Honestly the methods list I initially proposed
years ago and the one Luke eventually put in which had some of what I had
said and a bunch of new stuff, was pretty heavily focused on numeric values
(exclusively so, in my case, I think).

I agree that there is a lot of space to beef up how AltStrings behave. I
agree that nchar and substring also make a lot of sense. Perhaps nzchar as
well.

There are some other things that might be good as well. In particularly
Michael Lawrence and I have talked about things like AltStrings that *know* all
their elements have the same encoding in the same way that numeric,
integer, or nwo logical ALTREP vectors can know they don't have any NAs.
The suspicion being that this would make certain expensive (I think)
encoding checks and possibly conversions, much cheaper. I'm far from an
expert on encodings and the the costs/difficulties therein, but the concept
seems pretty straightforward and reasonable to me.

I hope to get back to the matching logic and get that hooked in  (I did a
bunch of work on it sometime ago but it ended up having problems at the
time, so it either never went in or it did go in but luke had to pull it
back out, I don't recall which). When(/if) that does happen I'd suspect
that matching would be another one that we'd want AltStrings to have first
class support for.

Regexes in general are probably another big area, since I'd think it would
be nice to not need to wrap and unrwap  the elements when the underlying
library doesn't want them wrapped as CHARSXPs anyway...

Another area that is more fraught, but my intuition suggests might be
really nice, is pasting. A paste(x,collapse="bla") method would be easily
achievable and potentially useful. paste(x,y, z) where x is an ALTREP with
a paste method could also be nice, potentially returning the same type of
AltString representation of the concatenation. If there were both paste
before and paste after then it would be possible to potentially support
arbitrary pastes, though things would get complicated (perhaps fatally so?)
if more than one argument was an AltString.

Overall, though I agree. it is looking like I'll have some time shortly to
get back to some R things I've been wanting to do so I'll put a proposal
for some string altmethods together and see what people (mostly Luke, tbh)
think.

Best,
~G

On Thu, Dec 19, 2019 at 11:39 AM Jim Hester 
wrote:

> A useful extension of ALTREP is having two new string methods which
> return the number of characters of a given string element and to
> return a substring of an element.
>
> Having these methods would allow retrieving these values without
> needing to create a CHARSXP for the full element data, which could
> potentially be costly for long elements.
>
> For example say you have an ALTREP altstring vector where each element
> holds the sequence of a single chromosome, it would be useful to query
> the lengths of each chromosome and retrieve the first 100 characters
> etc. without having to put the whole chromosome in memory. I realize
> there are tools in Bioconductor to handle this particular case, but it
> seems the general case would be perfect for ALTREP.
>
> Jim
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] head/tail breaking change

[Rd] ALTREP string methods for substr and nchar

Re: [Rd] ALTREP string methods for substr and nchar

3 matches

Site Navigation

Mail list logo

Footer information