Re: [Rd] [External] Possible ALTREP bug

2021-06-16 Thread Toby Hocking
By the way, where is the documentation for INTEGER_ELT, REAL_ELT, etc? I
looked in Writing R Extensions and R Internals but I did not see any
mention.
REAL_ELT is briefly mentioned on
https://svn.r-project.org/R/branches/ALTREP/ALTREP.html
Would it be possible to please add some mention of them to Writing R
Extensions?
- how many of these _ELT functions are there? INTEGER, REAL, ... ?
- in what version of R were they introduced?
- I guess input types are always SEXP and int?
- What are the output types for each?

On Fri, May 28, 2021 at 5:16 PM  wrote:

> Since the INTEGER_ELT, REAL_ELT, etc, functions are fairly new it may
> be possible to check that places where they are used allow for them to
> allocate. I have fixed the one that got caught by Gabor's example, and
> a rchk run might be able to pick up others if rchk knows these could
> allocate. (I may also be forgetting other places where the _ELt
> methods are used.)  Fixing all call sites for REAL, INTEGER, etc, was
> never realistic so there GC has to be suspended during the method
> call, and that is done in the dispatch mechanism.
>
> The bigger problem is jumps from inside things that existing code
> assumes will not do that. Catching those jumps is possible but
> expensive; doing anything sensible if one is caught is really not
> possible.
>
> Best,
>
> luke
>
> On Fri, 28 May 2021, Gabriel Becker wrote:
>
> > Hi Jim et al,
> > Just to hopefully add a bit to what Luke already answered, from what I am
> > recalling looking back at that bioconductor thread Elt methods are used
> in
> > places where there are hard implicit assumptions that no garbage
> collection
> > will occur (ie they are called on things that aren't PROTECTed), and
> beyond
> > that, in places where there are hard assumptions that no error (longjmp)
> > will occur. I could be wrong, but I don't know that suspending garbage
> > collection would protect from the second one. Ie it is possible that an
> > error *ever* being raised from R code that implements an elt method could
> > cause all hell to break loose.
> >
> > Luke or Tomas Kalibera would know more.
> >
> > I was disappointed that implementing ALTREPs in R code was not in the
> cards
> > (it was in my original proposal back in 2016 to the DSC) but I trust Luke
> > that there are important reasons we can't safely allow that.
> >
> > Best,
> > ~G
> >
> > On Fri, May 28, 2021 at 8:31 AM Jim Hester 
> wrote:
> >   From reading the discussion on the Bioconductor issue tracker it
> >   seems like
> >   the reason the GC is not suspended for the non-string ALTREP Elt
> >   methods is
> >   primarily due to performance concerns.
> >
> >   If this is the case perhaps an additional flag could be added to
> >   the
> >   `R_set_altrep_*()` functions so ALTREP authors could indicate if
> >   GC should
> >   be halted when that particular method is called for that
> >   particular ALTREP
> >   class.
> >
> >   This would avoid the performance hit (other than a boolean
> >   check) for the
> >   standard case when no allocations are expected, but allow
> >   authors to
> >   indicate that R should pause GC if needed for methods in their
> >   class.
> >
> >   On Fri, May 28, 2021 at 9:42 AM  wrote:
> >
> >   > integer and real Elt methods are not expected to allocate. You
> >   would
> >   > have to suspend GC to be able to do that. This currently can't
> >   be done
> >   > from package code.
> >   >
> >   > Best,
> >   >
> >   > luke
> >   >
> >   > On Fri, 28 May 2021, Gábor Csárdi wrote:
> >   >
> >   > > I have found some weird SEXP corruption behavior with
> >   ALTREP, which
> >   > > could be a bug. (Or I could be doing something wrong.)
> >   > >
> >   > > I have an integer ALTREP vector that calls back to R from
> >   the Elt
> >   > > method. When this vector is indexed in a lapply(), its first
> >   element
> >   > > gets corrupted. Sometimes it's just a type change to
> >   logical, but
> >   > > sometimes the corruption causes a crash.
> >   > >
> >   > > I saw this on macOS from R 3.5.3 to 4.2.0. I created a small
> >   package
> >   > > that demonstrates this:
> >   https://github.com/gaborcsardi/redfish
> >   > >
> >   > > The R callback in this package calls
> >   `loadNamespace("Matrix")`, but
> >   > > the same crash happens for other packages as well, and
> >   sometimes it
> >   > > also happens if I don't load any packages at all. (But that
> >   example
> >   > > was much more complicated, so I went with the package
> >   loading.)
> >   > >
> >   > > It is somewhat random, and sometimes turning off the JIT
> >   avoids the
> >   > > crash, but not always.
> >   > >
> >   > > Hopefully I am just doing something wrong in the ALTREP code
> >   (see
> >   > >
> >  

Re: [Rd] [External] Possible ALTREP bug

2021-06-16 Thread Oliver Madsen
*_ELT accessor functions are described in "vector accessor functions" in
Writing R extensions.

https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Vector-accessor-functions

On Wed, Jun 16, 2021 at 4:22 PM Toby Hocking  wrote:

> By the way, where is the documentation for INTEGER_ELT, REAL_ELT, etc? I
> looked in Writing R Extensions and R Internals but I did not see any
> mention.
> REAL_ELT is briefly mentioned on
> https://svn.r-project.org/R/branches/ALTREP/ALTREP.html
> Would it be possible to please add some mention of them to Writing R
> Extensions?
> - how many of these _ELT functions are there? INTEGER, REAL, ... ?
> - in what version of R were they introduced?
> - I guess input types are always SEXP and int?
> - What are the output types for each?
>
> On Fri, May 28, 2021 at 5:16 PM  wrote:
>
> > Since the INTEGER_ELT, REAL_ELT, etc, functions are fairly new it may
> > be possible to check that places where they are used allow for them to
> > allocate. I have fixed the one that got caught by Gabor's example, and
> > a rchk run might be able to pick up others if rchk knows these could
> > allocate. (I may also be forgetting other places where the _ELt
> > methods are used.)  Fixing all call sites for REAL, INTEGER, etc, was
> > never realistic so there GC has to be suspended during the method
> > call, and that is done in the dispatch mechanism.
> >
> > The bigger problem is jumps from inside things that existing code
> > assumes will not do that. Catching those jumps is possible but
> > expensive; doing anything sensible if one is caught is really not
> > possible.
> >
> > Best,
> >
> > luke
> >
> > On Fri, 28 May 2021, Gabriel Becker wrote:
> >
> > > Hi Jim et al,
> > > Just to hopefully add a bit to what Luke already answered, from what I
> am
> > > recalling looking back at that bioconductor thread Elt methods are used
> > in
> > > places where there are hard implicit assumptions that no garbage
> > collection
> > > will occur (ie they are called on things that aren't PROTECTed), and
> > beyond
> > > that, in places where there are hard assumptions that no error
> (longjmp)
> > > will occur. I could be wrong, but I don't know that suspending garbage
> > > collection would protect from the second one. Ie it is possible that an
> > > error *ever* being raised from R code that implements an elt method
> could
> > > cause all hell to break loose.
> > >
> > > Luke or Tomas Kalibera would know more.
> > >
> > > I was disappointed that implementing ALTREPs in R code was not in the
> > cards
> > > (it was in my original proposal back in 2016 to the DSC) but I trust
> Luke
> > > that there are important reasons we can't safely allow that.
> > >
> > > Best,
> > > ~G
> > >
> > > On Fri, May 28, 2021 at 8:31 AM Jim Hester 
> > wrote:
> > >   From reading the discussion on the Bioconductor issue tracker it
> > >   seems like
> > >   the reason the GC is not suspended for the non-string ALTREP Elt
> > >   methods is
> > >   primarily due to performance concerns.
> > >
> > >   If this is the case perhaps an additional flag could be added to
> > >   the
> > >   `R_set_altrep_*()` functions so ALTREP authors could indicate if
> > >   GC should
> > >   be halted when that particular method is called for that
> > >   particular ALTREP
> > >   class.
> > >
> > >   This would avoid the performance hit (other than a boolean
> > >   check) for the
> > >   standard case when no allocations are expected, but allow
> > >   authors to
> > >   indicate that R should pause GC if needed for methods in their
> > >   class.
> > >
> > >   On Fri, May 28, 2021 at 9:42 AM  wrote:
> > >
> > >   > integer and real Elt methods are not expected to allocate. You
> > >   would
> > >   > have to suspend GC to be able to do that. This currently can't
> > >   be done
> > >   > from package code.
> > >   >
> > >   > Best,
> > >   >
> > >   > luke
> > >   >
> > >   > On Fri, 28 May 2021, Gábor Csárdi wrote:
> > >   >
> > >   > > I have found some weird SEXP corruption behavior with
> > >   ALTREP, which
> > >   > > could be a bug. (Or I could be doing something wrong.)
> > >   > >
> > >   > > I have an integer ALTREP vector that calls back to R from
> > >   the Elt
> > >   > > method. When this vector is indexed in a lapply(), its first
> > >   element
> > >   > > gets corrupted. Sometimes it's just a type change to
> > >   logical, but
> > >   > > sometimes the corruption causes a crash.
> > >   > >
> > >   > > I saw this on macOS from R 3.5.3 to 4.2.0. I created a small
> > >   package
> > >   > > that demonstrates this:
> > >   https://github.com/gaborcsardi/redfish
> > >   > >
> > >   > > The R callback in this package calls
> > >   `loadNamespace("Matrix")`, but
> > >   > > the same crash happens for other packages as well,

Re: [Rd] [External] Possible ALTREP bug

2021-06-16 Thread Simon Urbanek


The usual quote applies: "use the source, Luke":

$ grep _ELT *.h | sort
Rdefines.h:#define SET_ELEMENT(x, i, val)   SET_VECTOR_ELT(x, i, val)
Rinternals.h:   The function STRING_ELT is used as an argument to arrayAssign 
even
Rinternals.h:#define VECTOR_ELT(x,i)((SEXP *) DATAPTR(x))[i]
Rinternals.h://SEXP (STRING_ELT)(SEXP x, R_xlen_t i);
Rinternals.h:Rbyte (RAW_ELT)(SEXP x, R_xlen_t i);
Rinternals.h:Rbyte ALTRAW_ELT(SEXP x, R_xlen_t i);
Rinternals.h:Rcomplex (COMPLEX_ELT)(SEXP x, R_xlen_t i);
Rinternals.h:Rcomplex ALTCOMPLEX_ELT(SEXP x, R_xlen_t i);
Rinternals.h:SEXP (STRING_ELT)(SEXP x, R_xlen_t i);
Rinternals.h:SEXP (VECTOR_ELT)(SEXP x, R_xlen_t i);
Rinternals.h:SEXP ALTSTRING_ELT(SEXP, R_xlen_t);
Rinternals.h:SEXP SET_VECTOR_ELT(SEXP x, R_xlen_t i, SEXP v);
Rinternals.h:double (REAL_ELT)(SEXP x, R_xlen_t i);
Rinternals.h:double ALTREAL_ELT(SEXP x, R_xlen_t i);
Rinternals.h:int (INTEGER_ELT)(SEXP x, R_xlen_t i);
Rinternals.h:int (LOGICAL_ELT)(SEXP x, R_xlen_t i);
Rinternals.h:int ALTINTEGER_ELT(SEXP x, R_xlen_t i);
Rinternals.h:int ALTLOGICAL_ELT(SEXP x, R_xlen_t i);
Rinternals.h:void ALTCOMPLEX_SET_ELT(SEXP x, R_xlen_t i, Rcomplex v);
Rinternals.h:void ALTINTEGER_SET_ELT(SEXP x, R_xlen_t i, int v);
Rinternals.h:void ALTLOGICAL_SET_ELT(SEXP x, R_xlen_t i, int v);
Rinternals.h:void ALTRAW_SET_ELT(SEXP x, R_xlen_t i, Rbyte v);
Rinternals.h:void ALTREAL_SET_ELT(SEXP x, R_xlen_t i, double v);
Rinternals.h:void ALTSTRING_SET_ELT(SEXP, R_xlen_t, SEXP);
Rinternals.h:void SET_INTEGER_ELT(SEXP x, R_xlen_t i, int v);
Rinternals.h:void SET_LOGICAL_ELT(SEXP x, R_xlen_t i, int v);
Rinternals.h:void SET_REAL_ELT(SEXP x, R_xlen_t i, double v);
Rinternals.h:void SET_STRING_ELT(SEXP x, R_xlen_t i, SEXP v);

So the indexing is with R_xlen_t and they return the value itself as one would 
expect.

Cheers,
Simon


> On Jun 17, 2021, at 2:22 AM, Toby Hocking  wrote:
> 
> By the way, where is the documentation for INTEGER_ELT, REAL_ELT, etc? I
> looked in Writing R Extensions and R Internals but I did not see any
> mention.
> REAL_ELT is briefly mentioned on
> https://svn.r-project.org/R/branches/ALTREP/ALTREP.html
> Would it be possible to please add some mention of them to Writing R
> Extensions?
> - how many of these _ELT functions are there? INTEGER, REAL, ... ?
> - in what version of R were they introduced?
> - I guess input types are always SEXP and int?
> - What are the output types for each?
> 
> On Fri, May 28, 2021 at 5:16 PM  wrote:
> 
>> Since the INTEGER_ELT, REAL_ELT, etc, functions are fairly new it may
>> be possible to check that places where they are used allow for them to
>> allocate. I have fixed the one that got caught by Gabor's example, and
>> a rchk run might be able to pick up others if rchk knows these could
>> allocate. (I may also be forgetting other places where the _ELt
>> methods are used.)  Fixing all call sites for REAL, INTEGER, etc, was
>> never realistic so there GC has to be suspended during the method
>> call, and that is done in the dispatch mechanism.
>> 
>> The bigger problem is jumps from inside things that existing code
>> assumes will not do that. Catching those jumps is possible but
>> expensive; doing anything sensible if one is caught is really not
>> possible.
>> 
>> Best,
>> 
>> luke
>> 
>> On Fri, 28 May 2021, Gabriel Becker wrote:
>> 
>>> Hi Jim et al,
>>> Just to hopefully add a bit to what Luke already answered, from what I am
>>> recalling looking back at that bioconductor thread Elt methods are used
>> in
>>> places where there are hard implicit assumptions that no garbage
>> collection
>>> will occur (ie they are called on things that aren't PROTECTed), and
>> beyond
>>> that, in places where there are hard assumptions that no error (longjmp)
>>> will occur. I could be wrong, but I don't know that suspending garbage
>>> collection would protect from the second one. Ie it is possible that an
>>> error *ever* being raised from R code that implements an elt method could
>>> cause all hell to break loose.
>>> 
>>> Luke or Tomas Kalibera would know more.
>>> 
>>> I was disappointed that implementing ALTREPs in R code was not in the
>> cards
>>> (it was in my original proposal back in 2016 to the DSC) but I trust Luke
>>> that there are important reasons we can't safely allow that.
>>> 
>>> Best,
>>> ~G
>>> 
>>> On Fri, May 28, 2021 at 8:31 AM Jim Hester 
>> wrote:
>>>  From reading the discussion on the Bioconductor issue tracker it
>>>  seems like
>>>  the reason the GC is not suspended for the non-string ALTREP Elt
>>>  methods is
>>>  primarily due to performance concerns.
>>> 
>>>  If this is the case perhaps an additional flag could be added to
>>>  the
>>>  `R_set_altrep_*()` functions so ALTREP authors could indicate if
>>>  GC should
>>>  be halted when that particular method is called for that
>>>  particular ALTREP
>>>  class.
>>> 
>>>  This would avoid the performance hit (other than a boolean
>>>