[Rd] Objectsize function visiting every element for alt-rep strings

2019-01-16 Thread Travers Ching
I have a toy alt-rep string package that generates randomly seeded strings.

example:
library(altstringisode)
x <- altrandomStrings(1e8)
head(x)
[1] "2PN0bdwPY7CA8M06zVKEkhHgZVgtV1" "5PN2qmWqBlQ9wQj99nsQzldVI5ZuGX" ... etc
object.size(1e8)

Object.size will call the set_altstring_Elt_method for every single
element, materializing (slowly) every element of the vector.  This is
a problem mostly in R-studio since object.size is called
automatically, defeating the purpose of alt-rep.

Is there a way to avoid the problem of forced materialization in rstudio?

PS: Is there a way to tell if a post has been received by the mailing
list?  How long does it take to show up in the archives?

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Objectsize function visiting every element for alt-rep strings

2019-01-19 Thread Travers Ching
Thanks for the detailed response, Gabriel!

I think that an object_size alt-rep method that package developers
need to implement might be hard to get right.  One alternative could
be an alt-rep method that returns the number of bytes/characters in a
given string element since I believe the object size of a CHARSXP
depends only on string length?  I think two optional alt-string
methods would be nice:

`alt_string_elt_nchars` -- for the `nchar` function in R
`alt_string_elt_nbytes` -- for `object.size` (which might be different
than nchars due to encoding)

Also since it's an issue that mainly affects R-studio, I started an
issue on their github, and it sounds like they'll avoid calling
object.size on alt-rep objects automatically.  That would fix the main
problem I've been having.

Thanks,
Travers

On Fri, Jan 18, 2019 at 2:49 PM Gabriel Becker  wrote:
>
> Travers,
>
> Great to hear you're trying out the ALTREP stuff, good on you :).
>
> Did you mean the get_altstring_Elt_method? I see the code in size.c within 
> utils that grabs each element, but I don't see any setting (and the setters 
> are noops currently anyway they just do things the old way).
>
> One thing we have to decide is what object.size means for an altrep. I tend 
> to think it should mean the size of the alternative representation currently 
> in use in memory, but I see that a small note in ?object.size indicates that 
> size of objects with compact internal representations may be overestimated, 
> so technically this is "as currently documented". The "we" here, of course, 
> is the R-core team so we'll have to see how they feel on the matter.
>
> As for what to do about it, one possibility is to add an object.size method 
> to the ALTREP method table that gets called if object.size is called on an 
> ALTREP object.  In this case, it would be up  to the class to define an 
> appropriate object.size method. That would be relatively easy to do from a 
> technical standpoint on R's side, but what comes out of object.size would be 
> a bit "Wild West-y", without the consistency and correctness guarantees one 
> might expect from a function in utils.
>
> Another option is to to have object.size recurse to calling object.size on 
> the two parts (SEXPS which together make up a CONS cell, I believe) that make 
> up an ALTREP  internally. Roughly speaking one of these is usually the 
> alternative representation while the other is the spot to put an object with 
> the traditional representation if the payload is ever fully materialized in 
> an altrep-unsafe way - e.g., C code grabs a writable dataptr via INTEGER, 
> REAL, DATAPTR, etc. Note there are exceptions to what I said above, 
> though,such as the wrapper ALTREP classes which always have the parent object 
> (typically a traditionally laid-out vector), because the "alternative 
> representation" part is strictly a metadata annotation in that case and 
> contains no representation of the payload data for those classes.
>
> In this second case the result of object.size would be consistent across all 
> ALTREP classes, but in both cases the result of object.size would no longer 
> give any information about the size of a vector payload. This is consistent 
> with how object.size deals with external pointers now, but could lead to some 
> surprise in the case of vectors which the end user may not even know are 
> ALTREPs.
>
> Thoughts from anyone else on this list?
>
> Anyway, thanks for pointing this out. I'll talk with Luke and see what makes 
> sense to do here.
>
> Best,
> ~G
>
> On Wed, Jan 16, 2019 at 3:49 AM Travers Ching  wrote:
>>
>> I have a toy alt-rep string package that generates randomly seeded strings.
>>
>> example:
>> library(altstringisode)
>> x <- altrandomStrings(1e8)
>> head(x)
>> [1] "2PN0bdwPY7CA8M06zVKEkhHgZVgtV1" "5PN2qmWqBlQ9wQj99nsQzldVI5ZuGX" ... etc
>> object.size(1e8)
>>
>> Object.size will call the set_altstring_Elt_method for every single
>> element, materializing (slowly) every element of the vector.  This is
>> a problem mostly in R-studio since object.size is called
>> automatically, defeating the purpose of alt-rep.
>>
>> Is there a way to avoid the problem of forced materialization in rstudio?
>>
>> PS: Is there a way to tell if a post has been received by the mailing
>> list?  How long does it take to show up in the archives?
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Objectsize function visiting every element for alt-rep strings

2019-01-23 Thread Travers Ching
It should be possible to calculate object.size in the presence of
sharing, at least with respect to all sub-nodes of a SEXP.  E.g.,
during calculation, keep a hash of all SEXP pointers visited.  If a
pointer has already been visited, add only the size of the pointer to
the total object size.

Travers

On Wed, Jan 23, 2019 at 1:33 AM Tomas Kalibera  wrote:
>
> On 1/22/19 6:17 PM, Kevin Ushey wrote:
> > I think that object.size() is most commonly used to answer the question,
> > "what R objects are consuming the most memory currently in my R session?"
> > and for that reason I think returning the size of the internal
> > representations of objects (for e.g. ALTREP objects; unevaluated promises)
> > is the right default behavior.
>
> I don't think one could answer that question at all in the presence of
> sharing (of objects with value semantics due to copy on write, string
> cache or other caches, sharing of objects with referential semantics
> such as environments, etc). Also the mapping from R objects (SEXPs) to
> what users might understand as objects would not be clear (which SEXPs
> belong to which "object", which SEXPs are too low-level for the user to
> be considered, etc). In principle, there could be a memory profiler
> working at SEXP level and exposing all the intricacies of the memory
> layout, answering reachability questions on a heap dump (so one could
> find out about a 1G integer vector and then list all bindings say in
> namespace environments from which it is reachable), but of course that
> would be a lot of work to implement and to maintain. The problem is not
> unique to R (e.g. see Java with the same problems of sharing that
> prevent meaningful definition for object size). I am not persuaded it
> makes sense to add more options to a function that does not have and
> cannot have a well defined user-level semantics, and I would discourage
> writing code that is trying to build on that function as I think that it
> might lead to confusion and frustration. I think equality for example is
> easier to define (just that one could come up with multiple meaningful
> definitions, so it makes sense to have multiple options).
>
> Best
> Tomas
> >
> > I also agree it would be worth considering adding arguments that control
> > how object.size() is computed for different kinds of R objects, since users
> > might want to use object.size() to answer different types of questions.
> >
> > All that said, if the ultimate goal here is to avoid having RStudio
> > materialize ALTREP objects in the background, then perhaps that change
> > should happen in RStudio :-)
> >
> > Best,
> > Kevin
> >
> > On Tue, Jan 22, 2019 at 8:21 AM Tierney, Luke 
> > wrote:
> >
> >> On Mon, 21 Jan 2019, Martin Maechler wrote:
> >>
> >>>>>>>> Travers Ching
> >>>>>>>>  on Tue, 15 Jan 2019 12:50:45 -0800 writes:
> >>> > I have a toy alt-rep string package that generates
> >>> > randomly seeded strings.  example: library(altstringisode)
> >>> > x <- altrandomStrings(1e8) head(x) [1]
> >>> > "2PN0bdwPY7CA8M06zVKEkhHgZVgtV1"
> >>> > "5PN2qmWqBlQ9wQj99nsQzldVI5ZuGX" ... etc object.size(1e8)
> >>>
> >>> > Object.size will call the set_altstring_Elt_method for
> >>> > every single element, materializing (slowly) every element
> >>> > of the vector.  This is a problem mostly in R-studio since
> >>> > object.size is called automatically, defeating the purpose
> >>> > of alt-rep.
> >> There is no sensible way in general to figure out how large the
> >> strings would be without computing them. There might be specifically
> >> for a deferred sequence conversion but it would require a fair bit of
> >> effort to figure out that would be better spent elsewhere.
> >>
> >> I've never been a big fan of object.size since what it is trying to
> >> compute isn't very well defined in the context of sharing and possible
> >> internal state changes (even before ALTREP byte code compilation could
> >> change the internals of a function [which object.size sees] and
> >> assigning into environments or evaluating promises can change
> >> environments [which object.size ignores]). The issue is not unlike the
> >> one faced by identical(), which has a bunch of options for the
> >> different ways objects can be identical, and might need even more.
> >>
> >> We could in general have object.size for a

[Rd] Object.size() should not visit every element for alt-rep strings, or there should be an altstring_objectsize_method

2019-01-31 Thread Travers Ching


Below is a toy alt-rep string example, that generates N random strings:

https://gist.github.com/traversc/a48a504eb062554f2d6ff8043ca16f9c

example:
`x <- altrandomStrings(1e8)`
`head(x)`
[1] "2PN0bdwPY7CA8M06zVKEkhHgZVgtV1" "5PN2qmWqBlQ9wQj99nsQzldVI5ZuGX" ...
`object.size(1e8)`

Object.size will call the `set_altstring_Elt_method` for every single
element, materializing (slowly) every element of the vector.  This is
a problem mostly in R-studio since object.size is called
automatically, defeating the purpose of alt-rep entirely.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Object.size() should not visit every element for alt-rep strings, or there should be an altstring_objectsize_method

2019-01-31 Thread Travers Ching
Hi Lujke,

Thanks for the response.  But for some reason, this is a duplicate
post I had sent WEEKS ago, but for some reason is only showing up now?
 I initially thought it was filtered out and detected as spam because
of the github link, so I re-wrote the email (several times in fact),
and you can see the other thread.   Very weird.

Also, the good people at rstudio seem to have fixed the issue!

Thanks
Travers

On Thu, Jan 31, 2019 at 5:35 AM Tierney, Luke  wrote:
>
> You should really take this up with RStudio. Calling object.size on
> every top level assignment as they appear to do is a bad idea, even
> without ALTREP. object.size is only a cheap operation for simple
> atomic vectors. For anything with recursive sturcture it needs to walk
> the object, so the effort is proprtional to object size:
>
> > x <- rep("A", 1e8)
> > system.time(object.size(x))
> user  system elapsed
>1.222   0.624   1.850
> > x <- rep(list(1), 1e8)
> > system.time(object.size(x))
> user  system elapsed
>1.247   0.022   1.273
>
> The current help for object.size says
>
>   Provides an estimate of the memory that is being used to store an
>   R object.
>
> If this is interpreted as the current memory use, which could change
> in the ALTREP context (or for environments, though there the changes
> are ignored), then we could define object.size for ALTREP objects to
> avoid any ALTREP-specific computation. I'm not convinced yet that this
> is a good idea, but it even if we do change this at the R level,
> RStudio would still be well-advised to have another look at what they
> are doing.
>
> Best,
>
> luke
>
> On Tue, 15 Jan 2019, Travers Ching wrote:
>
> >
> > Below is a toy alt-rep string example, that generates N random strings:
> >
> > https://gist.github.com/traversc/a48a504eb062554f2d6ff8043ca16f9c
> >
> > example:
> > `x <- altrandomStrings(1e8)`
> > `head(x)`
> > [1] "2PN0bdwPY7CA8M06zVKEkhHgZVgtV1" "5PN2qmWqBlQ9wQj99nsQzldVI5ZuGX" ...
> > `object.size(1e8)`
> >
> > Object.size will call the `set_altstring_Elt_method` for every single
> > element, materializing (slowly) every element of the vector.  This is
> > a problem mostly in R-studio since object.size is called
> > automatically, defeating the purpose of alt-rep entirely.
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa  Phone: 319-335-3386
> Department of Statistics andFax:   319-335-3017
> Actuarial Science
> 241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
> Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Intermittent crashes with inset `[<-` command

2019-02-27 Thread Travers Ching
On an azure centos VM, I can reproduce this bug which reports either:

 *** caught segfault ***
address 0x7006a, cause 'memory not mapped' (crash)

Or

incompatible types (from builtin to integer) in subassignment type fix
(no crash)

Like Gabriel, I could not reproduce the bug on a mac laptop.  Both R
versions 3.5.1.

Travers

On Wed, Feb 27, 2019 at 9:08 AM William Dunlap via R-devel
 wrote:
>
> Valgrind (without gctorture) reports memory misuse:
>
> % R --debugger=valgrind --debugger-args="--leak-check=full --num-callers=18"
> ...
> > x <- 1:20
> > y <- rep(letters[1:5], length(x) / 5L)
> > for (i in 1:1000) {
> +   # x[y == 'a'] <- x[y == 'b']
> +   x <- `[<-`(x, y == 'a', x[y == 'b'])
> +   cat(i, '')
> + }
> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
> 29 30 31 32 33 34 35 36 37 ==4711== Invalid read of size 1
> ==4711==at 0x501A40F: Rf_xlength (Rinlinedfuns.h:542)
> ==4711==by 0x501A40F: VectorAssign (subassign.c:658)
> ==4711==by 0x501CDFE: do_subassign_dflt (subassign.c:1641)
> ==4711==by 0x5020100: do_subassign (subassign.c:1571)
> ==4711==by 0x4F66398: bcEval (eval.c:6795)
> ==4711==by 0x4F7D86D: R_compileAndExecute (eval.c:1407)
> ==4711==by 0x4F7DA70: do_for (eval.c:2185)
> ==4711==by 0x4F7741C: Rf_eval (eval.c:691)
> ==4711==by 0x4FA7181: Rf_ReplIteration (main.c:258)
> ==4711==by 0x4FA7570: R_ReplConsole (main.c:308)
> ==4711==by 0x4FA760E: run_Rmainloop (main.c:1082)
> ==4711==by 0x40075A: main (Rmain.c:29)
> ==4711==  Address 0x19b3ab90 is 0 bytes inside a block of size 160,048
> free'd
> ==4711==at 0x4C2ACBD: free (vg_replace_malloc.c:530)
> ==4711==by 0x4FAFCB2: ReleaseLargeFreeVectors (memory.c:1055)
> ==4711==by 0x4FAFCB2: RunGenCollect (memory.c:1825)
> ==4711==by 0x4FAFCB2: R_gc_internal (memory.c:2998)
> ==4711==by 0x4FB166F: Rf_allocVector3 (memory.c:2682)
> ==4711==by 0x4FB2310: Rf_allocVector (Rinlinedfuns.h:577)
> ==4711==by 0x4FB2310: R_alloc (memory.c:2197)
> ==4711==by 0x5023F7A: logicalSubscript (subscript.c:575)
> ==4711==by 0x5026DA3: Rf_makeSubscript (subscript.c:994)
> ==4711==by 0x501A2F3: VectorAssign (subassign.c:656)
> ==4711==by 0x501CDFE: do_subassign_dflt (subassign.c:1641)
> ==4711==by 0x5020100: do_subassign (subassign.c:1571)
> ==4711==by 0x4F66398: bcEval (eval.c:6795)
> ==4711==by 0x4F7D86D: R_compileAndExecute (eval.c:1407)
> ==4711==by 0x4F7DA70: do_for (eval.c:2185)
> ==4711==by 0x4F7741C: Rf_eval (eval.c:691)
> ==4711==by 0x4FA7181: Rf_ReplIteration (main.c:258)
> ==4711==by 0x4FA7570: R_ReplConsole (main.c:308)
> ==4711==by 0x4FA760E: run_Rmainloop (main.c:1082)
> ==4711==by 0x40075A: main (Rmain.c:29)
> ==4711==  Block was alloc'd at
> ==4711==at 0x4C29BC3: malloc (vg_replace_malloc.c:299)
> ==4711==by 0x4FB1B04: Rf_allocVector3 (memory.c:2712)
> ==4711==by 0x5027574: Rf_allocVector (Rinlinedfuns.h:577)
> ==4711==by 0x5027574: Rf_ExtractSubset (subset.c:115)
> ==4711==by 0x502ADCD: VectorSubset (subset.c:198)
> ==4711==by 0x502ADCD: do_subset_dflt (subset.c:823)
> ==4711==by 0x502BE90: do_subset (subset.c:661)
> ==4711==by 0x4F7741C: Rf_eval (eval.c:691)
> ==4711==by 0x4F7BAC3: Rf_evalListKeepMissing (eval.c:2955)
> ==4711==by 0x50200CB: R_DispatchOrEvalSP (subassign.c:1535)
> ==4711==by 0x50200CB: do_subassign (subassign.c:1567)
> ==4711==by 0x4F66398: bcEval (eval.c:6795)
> ==4711==by 0x4F7D86D: R_compileAndExecute (eval.c:1407)
> ==4711==by 0x4F7DA70: do_for (eval.c:2185)
> ==4711==by 0x4F7741C: Rf_eval (eval.c:691)
> ==4711==by 0x4FA7181: Rf_ReplIteration (main.c:258)
> ==4711==by 0x4FA7570: R_ReplConsole (main.c:308)
> ==4711==by 0x4FA760E: run_Rmainloop (main.c:1082)
> ==4711==by 0x40075A: main (Rmain.c:29)
> ==4711==
> ==4711== Invalid read of size 8
> ==4711==at 0x501A856: XLENGTH_EX (Rinlinedfuns.h:189)
> ==4711==by 0x501A856: Rf_xlength (Rinlinedfuns.h:554)
> ==4711==by 0x501A856: VectorAssign (subassign.c:658)
> ==4711==by 0x501CDFE: do_subassign_dflt (subassign.c:1641)
> ==4711==by 0x5020100: do_subassign (subassign.c:1571)
> ==4711==by 0x4F66398: bcEval (eval.c:6795)
> ==4711==by 0x4F7D86D: R_compileAndExecute (eval.c:1407)
> ==4711==by 0x4F7DA70: do_for (eval.c:2185)
> ==4711==by 0x4F7741C: Rf_eval (eval.c:691)
> ==4711==by 0x4FA7181: Rf_ReplIteration (main.c:258)
> ==4711==by 0x4FA7570: R_ReplConsole (main.c:308)
> ==4711==by 0x4FA760E: run_Rmainloop (main.c:1082)
> ==4711==by 0x40075A: main (Rmain.c:29)
> ==4711==  Address 0x19b3abb0 is 32 bytes inside a block of size 160,048
> free'd
> ==4711==at 0x4C2ACBD: free (vg_replace_malloc.c:530)
> ==4711==by 0x4FAFCB2: ReleaseLargeFreeVectors (memory.c:1055)
> ==4711==by 0x4FAFCB2: RunGenCollect (memory.c:1825)
> ==4711==by 0x4FAFCB2: R_gc_internal (memory.c:2998)
> ==4711==by 0x4FB16

Re: [Rd] Intermittent crashes with inset `[<-` command

2019-02-27 Thread Travers Ching
Some testing:

Adding `gc()` inside the for loop prevented a crash for 10,000+
iterations, whereas adding `Sys.sleep(.2)` (which takes longer) did
not.  I couldn't wrap my head around the `vectorAssign` source code,
but I suspect it is a matter of an intermediate object not being
protected and being gc'ed.  Hope that helps someone

Travers


Travers

On Wed, Feb 27, 2019 at 11:48 AM Travers Ching  wrote:
>
> On an azure centos VM, I can reproduce this bug which reports either:
>
>  *** caught segfault ***
> address 0x7006a, cause 'memory not mapped' (crash)
>
> Or
>
> incompatible types (from builtin to integer) in subassignment type fix
> (no crash)
>
> Like Gabriel, I could not reproduce the bug on a mac laptop.  Both R
> versions 3.5.1.
>
> Travers
>
> On Wed, Feb 27, 2019 at 9:08 AM William Dunlap via R-devel
>  wrote:
> >
> > Valgrind (without gctorture) reports memory misuse:
> >
> > % R --debugger=valgrind --debugger-args="--leak-check=full --num-callers=18"
> > ...
> > > x <- 1:20
> > > y <- rep(letters[1:5], length(x) / 5L)
> > > for (i in 1:1000) {
> > +   # x[y == 'a'] <- x[y == 'b']
> > +   x <- `[<-`(x, y == 'a', x[y == 'b'])
> > +   cat(i, '')
> > + }
> > 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
> > 29 30 31 32 33 34 35 36 37 ==4711== Invalid read of size 1
> > ==4711==at 0x501A40F: Rf_xlength (Rinlinedfuns.h:542)
> > ==4711==by 0x501A40F: VectorAssign (subassign.c:658)
> > ==4711==by 0x501CDFE: do_subassign_dflt (subassign.c:1641)
> > ==4711==by 0x5020100: do_subassign (subassign.c:1571)
> > ==4711==by 0x4F66398: bcEval (eval.c:6795)
> > ==4711==by 0x4F7D86D: R_compileAndExecute (eval.c:1407)
> > ==4711==by 0x4F7DA70: do_for (eval.c:2185)
> > ==4711==by 0x4F7741C: Rf_eval (eval.c:691)
> > ==4711==by 0x4FA7181: Rf_ReplIteration (main.c:258)
> > ==4711==by 0x4FA7570: R_ReplConsole (main.c:308)
> > ==4711==by 0x4FA760E: run_Rmainloop (main.c:1082)
> > ==4711==by 0x40075A: main (Rmain.c:29)
> > ==4711==  Address 0x19b3ab90 is 0 bytes inside a block of size 160,048
> > free'd
> > ==4711==at 0x4C2ACBD: free (vg_replace_malloc.c:530)
> > ==4711==by 0x4FAFCB2: ReleaseLargeFreeVectors (memory.c:1055)
> > ==4711==by 0x4FAFCB2: RunGenCollect (memory.c:1825)
> > ==4711==by 0x4FAFCB2: R_gc_internal (memory.c:2998)
> > ==4711==by 0x4FB166F: Rf_allocVector3 (memory.c:2682)
> > ==4711==by 0x4FB2310: Rf_allocVector (Rinlinedfuns.h:577)
> > ==4711==by 0x4FB2310: R_alloc (memory.c:2197)
> > ==4711==by 0x5023F7A: logicalSubscript (subscript.c:575)
> > ==4711==by 0x5026DA3: Rf_makeSubscript (subscript.c:994)
> > ==4711==by 0x501A2F3: VectorAssign (subassign.c:656)
> > ==4711==by 0x501CDFE: do_subassign_dflt (subassign.c:1641)
> > ==4711==by 0x5020100: do_subassign (subassign.c:1571)
> > ==4711==by 0x4F66398: bcEval (eval.c:6795)
> > ==4711==by 0x4F7D86D: R_compileAndExecute (eval.c:1407)
> > ==4711==by 0x4F7DA70: do_for (eval.c:2185)
> > ==4711==by 0x4F7741C: Rf_eval (eval.c:691)
> > ==4711==by 0x4FA7181: Rf_ReplIteration (main.c:258)
> > ==4711==by 0x4FA7570: R_ReplConsole (main.c:308)
> > ==4711==by 0x4FA760E: run_Rmainloop (main.c:1082)
> > ==4711==by 0x40075A: main (Rmain.c:29)
> > ==4711==  Block was alloc'd at
> > ==4711==at 0x4C29BC3: malloc (vg_replace_malloc.c:299)
> > ==4711==by 0x4FB1B04: Rf_allocVector3 (memory.c:2712)
> > ==4711==by 0x5027574: Rf_allocVector (Rinlinedfuns.h:577)
> > ==4711==by 0x5027574: Rf_ExtractSubset (subset.c:115)
> > ==4711==by 0x502ADCD: VectorSubset (subset.c:198)
> > ==4711==by 0x502ADCD: do_subset_dflt (subset.c:823)
> > ==4711==by 0x502BE90: do_subset (subset.c:661)
> > ==4711==by 0x4F7741C: Rf_eval (eval.c:691)
> > ==4711==by 0x4F7BAC3: Rf_evalListKeepMissing (eval.c:2955)
> > ==4711==by 0x50200CB: R_DispatchOrEvalSP (subassign.c:1535)
> > ==4711==by 0x50200CB: do_subassign (subassign.c:1567)
> > ==4711==by 0x4F66398: bcEval (eval.c:6795)
> > ==4711==by 0x4F7D86D: R_compileAndExecute (eval.c:1407)
> > ==4711==by 0x4F7DA70: do_for (eval.c:2185)
> > ==4711==by 0x4F7741C: Rf_eval (eval.c:691)
> > ==4711==by 0x4FA7181: Rf_ReplIteration (main.c:258)
> > ==4711==by 0x4FA7570: R_ReplConsole (main.c:308)
> > ==4711==by 0x4FA760E: run_Rmainloop (main.c:1082)
> > ==4711

Re: [Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()

2019-04-12 Thread Travers Ching
Just throwing my two cents in:

I think removing/deprecating fork would be a bad idea for two reasons:

1) There are no performant alternatives
2) Removing fork would break existing workflows

Even if replaced with something using the same interface (e.g., a
function that automatically detects variables to export as in the
amazing `future` package), the lack of copy-on-write functionality
would cause scripts everywhere to break.

A simple example illustrating these two points:
`x <- 5e8; mclapply(1:24, sum, x, 8)`

Using fork, `mclapply` takes 5 seconds.  Using "psock", `clusterApply`
does not complete.

Travers

On Fri, Apr 12, 2019 at 2:32 AM Iñaki Ucar  wrote:
>
> On Thu, 11 Apr 2019 at 22:07, Henrik Bengtsson
>  wrote:
> >
> > ISSUE:
> > Using *forks* for parallel processing in R is not always safe.
> > [...]
> > Comments?
>
> Using fork() is never safe. The reference provided by Kevin [1] is
> pretty compelling (I kindly encourage anyone who ever forked a process
> to read it). Therefore, I'd go beyond Henrik's suggestion, and I'd
> advocate for deprecating fork clusters and eventually removing them
> from parallel.
>
> [1] 
> https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf
>
> --
> Iñaki Úcar
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()

2019-04-12 Thread Travers Ching
Hi Inaki,

> "Performant"... in terms of what. If the cost of copying the data
> predominates over the computation time, maybe you didn't need
> parallelization in the first place.

Performant in terms of speed.  There's no copying in that example
using `mclapply` and so it is significantly faster than other
alternatives.

It is a very simple and contrived example, but there are lots of
applications that depend on processing of large data and benefit from
multithreading.  For example, if I read in large sequencing data with
`Rsamtools` and want to check sequences for a set of motifs.

> I don't see why mclapply could not be rewritten using PSOCK clusters.

Because it would be much slower.

> To implement copy-on-write, Linux overcommits virtual memory, and this
>  is what causes scripts to break unexpectedly: everything works fine,
> until you change a small unimportant bit and... boom, out of memory.
> And in general, running forks in any GUI would cause things everywhere
> to break.

> I'm not sure how did you setup that, but it does complete. Or do you
> mean that you ran out of memory? Then try replacing "x" with, e.g.,
> "x+1" in your mclapply example and see what happens (hint: save your
> work first).

Yes, I meant that it ran out of memory on my desktop.  I understand
the limits, and it is not perfect because of the GUI issue you
mention, but I don't see a better alternative in terms of speed.

Regards,
Travers




On Fri, Apr 12, 2019 at 3:45 PM Iñaki Ucar  wrote:
>
> On Fri, 12 Apr 2019 at 21:32, Travers Ching  wrote:
> >
> > Just throwing my two cents in:
> >
> > I think removing/deprecating fork would be a bad idea for two reasons:
> >
> > 1) There are no performant alternatives
>
> "Performant"... in terms of what. If the cost of copying the data
> predominates over the computation time, maybe you didn't need
> parallelization in the first place.
>
> > 2) Removing fork would break existing workflows
>
> I don't see why mclapply could not be rewritten using PSOCK clusters.
> And as a side effect, this would enable those workflows on Windows,
> which doesn't support fork.
>
> > Even if replaced with something using the same interface (e.g., a
> > function that automatically detects variables to export as in the
> > amazing `future` package), the lack of copy-on-write functionality
> > would cause scripts everywhere to break.
>
> To implement copy-on-write, Linux overcommits virtual memory, and this
> is what causes scripts to break unexpectedly: everything works fine,
> until you change a small unimportant bit and... boom, out of memory.
> And in general, running forks in any GUI would cause things everywhere
> to break.
>
> > A simple example illustrating these two points:
> > `x <- 5e8; mclapply(1:24, sum, x, 8)`
> >
> > Using fork, `mclapply` takes 5 seconds.  Using "psock", `clusterApply`
> > does not complete.
>
> I'm not sure how did you setup that, but it does complete. Or do you
> mean that you ran out of memory? Then try replacing "x" with, e.g.,
> "x+1" in your mclapply example and see what happens (hint: save your
> work first).
>
> --
> Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] Open a file which name contains a tilde

2019-06-11 Thread Travers Ching
Hi Gabriel,

It may be bad practice, but you don't always have control over the file
name.

E.g. if someone shares a file with a tilde in it -- yes it is simple to
rename but it is extra time, and you might not bother to rename a file
without foreknowledge of this bug in the first place.

Even worse, if someone points you to a read only location on a shared
server, you won't even be able to rename the file, and copying might be
prohibitive if it's a large file.

There are also tilde files created automatically by other programs, notably
microsoft office.

Travers




On Tue, Jun 11, 2019 at 9:49 AM Gabriel Becker 
wrote:

> Hi Frank,
>
> I'm hesitant to be "that guy", but in case no one else has brought this up
> to you, having files with a tilde in their names (generally but especially
> on a linux system, where ~ in file names has a very important special
> meaning in some cases, as we know) strikes me as an exceptionally bad
> practice anyway. In light of that, the solution with the smallest amount of
> pain for you is almost surely to just... not do that. Your filenames will
> be better for it anyway.
>
> There is a reason no one has complained about this before, and while I
> haven't run a study or anything, I strongly suspect its that "everyone"
> else is already on the "no tildes in filenames" bandwagon, so this
> behavior, even if technically a bug, has no ability to cause them problems.
>
> Best,
> ~G
>
> On Tue, Jun 11, 2019 at 8:25 AM Frank Schwidom  wrote:
>
> > Hi,
> >
> > yes, I have seen this package and it has the same tilde expanding
> problem.
> >
> > Please excuse me I will cc this answer to r-help and r-devel to keep the
> > discussion running.
> >
> > Kind regards,
> > Frank Schwidom
> >
> > On 2019-06-11 09:12:36, Gábor Csárdi wrote:
> > > Just in case, have you seen the fs package?
> > > https://fs.r-lib.org/
> > >
> > > Gabor
> > >
> > > On Tue, Jun 11, 2019 at 7:51 AM Frank Schwidom 
> wrote:
> > > >
> > > > Hi,
> > > >
> > > > to get rid of any possible filename modification I started a little
> > project to cover my usecase:
> > > >
> > > > https://github.com/schwidom/simplefs
> > > >
> > > > This is my first R package, suggestions and a review are welcome.
> > > >
> > > > Thanks in advance
> > > > Frank Schwidom
> > > >
> > > > On 2019-06-07 09:04:06, Richard O'Keefe wrote:
> > > > >How can expanding tildes anywhere but the beginning of a file
> > name NOT be
> > > > >considered a bug?
> > > > >On Thu, 6 Jun 2019 at 23:04, Ivan Krylov <[1]
> > krylov.r...@gmail.com> wrote:
> > > > >
> > > > >  On Wed, 5 Jun 2019 18:07:15 +0200
> > > > >  Frank Schwidom <[2]schwi...@gmx.net> wrote:
> > > > >
> > > > >  > +> path.expand("a ~ b")
> > > > >  > [1] "a /home/user b"
> > > > >
> > > > >  > How can I switch off any file crippling activity?
> > > > >
> > > > >  It doesn't seem to be possible if readline is enabled and
> works
> > > > >  correctly.
> > > > >
> > > > >  Calls to path.expand [1] end up [2] in R_ExpandFileName [3],
> > which
> > > > >  calls R_ExpandFileName_readline [4], which uses libreadline
> > function
> > > > >  tilde_expand [5]. tilde_expand seems to be designed to expand
> > '~'
> > > > >  anywhere in the string it is handed, i.e. operate on whole
> > command
> > > > >  lines, not file paths.
> > > > >
> > > > >  I am taking the liberty of Cc-ing R-devel in case this can be
> > > > >  considered a bug.
> > > > >
> > > > >  --
> > > > >  Best regards,
> > > > >  Ivan
> > > > >
> > > > >  [1]
> > > > >  [3]
> >
> https://github.com/wch/r-source/blob/12d1d2d232d84aa355e48b81180a0e2c6f2f/src/main/names.c#L807
> > > > >
> > > > >  [2]
> > > > >  [4]
> >
> https://github.com/wch/r-source/blob/12d1d2d232d84aa355e48b81180a0e2c6f2f/src/main/platform.c#L1915
> > > > >
> > > > >  [3]
> > > > >  [5]
> >
> https://github.com/wch/r-source/blob/12d1d2d232d84aa355e48b81180a0e2c6f2f/src/unix/sys-unix.c#L147
> > > > >
> > > > >  [4]
> > > > >  [6]
> >
> https://github.com/wch/r-source/blob/12d1d2d232d84aa355e48b81180a0e2c6f2f/src/unix/sys-std.c#L494
> > > > >
> > > > >  [5]
> > > > >  [7]
> > https://git.savannah.gnu.org/cgit/readline.git/tree/tilde.c?h=devel#n187
> > > > >
> > > > >  __
> > > > >  [8]r-h...@r-project.org mailing list -- To UNSUBSCRIBE and
> > more, see
> > > > >  [9]https://stat.ethz.ch/mailman/listinfo/r-help
> > > > >  PLEASE do read the posting guide
> > > > >  [10]http://www.R-project.org/posting-guide.html
> > > > >  and provide commented, minimal, self-contained, reproducible
> > code.
> > > > >
> > > > > References
> > > > >
> > > > >Visible links
> > > > >1. mailto:krylov.r...@gmail.com
> > > > >2. mailto:schwi...@gmx.net
> > > > >3.
> >
> https://github.com/wch/r-source/blob/12d1d2d232d84aa355e48b81180a0e2c6f2f/src/main/names.c#L807
> > >

[Rd] Use of restricted c++ keywords as variable names in headers

2019-07-19 Thread Travers Ching
I was trying to use one of the headers in R_ext/, but had trouble.  I
determined that it was due to using restricted keywords as variable names.
So to load in the header, I needed to do this:

#define class klass
#define private krivate
#include 
#undef class
#undef private

I know that the altrep.h header previously had the same issue, but was
fixed.  Could this be changed as well?

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] S4SXP type vs S4 object bit?

2019-10-22 Thread Travers Ching
I'm trying to understand the R internals a bit better and reading over the
documentation.

I see that there is a bit related to whether an object is S4
(S4_OBJECT_MASK), and also the object type S4SXP (25).  The documentation
makes clear that these two things aren't the same.

But in practice, will the S4-bit and object type ever disagree for S4
objects?  I know that one can set the bit manually in C; are there any
practical applications for doing so?

Thank you
Travers

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] S4SXP type vs S4 object bit?

2019-10-22 Thread Travers Ching
Thanks you Jiefei and Michael!

Travers

On Tue, Oct 22, 2019 at 8:14 AM Wang Jiefei  wrote:

> Hi Travers,
>
> Just an additional remarks to Michael's answer, if your S4 class inherits
> from R's basic types, say integer, the resulting object will be an INTSXP.
> If your S4 class does not inherit from any class, it will be an S4SXP. You
> can think about this question from the object-oriented framework: If one
> class inherits the integer class, what should R do to make all the integer
> related functions compatible with the new class at C level?
>
> Best,
> Jiefei
>
> On Tue, Oct 22, 2019 at 4:28 AM Travers Ching  wrote:
>
>> I'm trying to understand the R internals a bit better and reading over the
>> documentation.
>>
>> I see that there is a bit related to whether an object is S4
>> (S4_OBJECT_MASK), and also the object type S4SXP (25).  The documentation
>> makes clear that these two things aren't the same.
>>
>> But in practice, will the S4-bit and object type ever disagree for S4
>> objects?  I know that one can set the bit manually in C; are there any
>> practical applications for doing so?
>>
>> Thank you
>> Travers
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Apple M1 CRAN checks

2021-02-22 Thread Travers Ching
I noticed CRAN is now doing checks against Apple M1, and some packages are
failing including a dependency I use.

Is building on M1 now a requirement, or can the check be ignored? If it's a
requirement, how can one test it out?

Travers

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Apple M1 CRAN checks

2021-02-22 Thread Travers Ching
Hi Prof Ripley,

Here is the automated message from CRAN which I thought meant needing to
fix an M1 issue:

"The auto-check found additional issues for the *last* version released on
CRAN:
  M1mac <https://www.stats.ox.ac.uk/pub/bdr/M1mac/stringfish.out>
CRAN incoming checks do not test for these additional issues and you will
need an appropriately instrumented build of R to reproduce these.
Hence please reply-all and explain: Have these been fixed? "

However, RcppParallel (a dependency) isn't building on M1:
https://www.stats.ox.ac.uk/pub/bdr/M1mac/RcppParallel.out

If I understand you correctly, I can ignore the M1 "Additional issues"
until official R support?

Thank you,
Travers

On Mon, Feb 22, 2021 at 11:25 PM Prof Brian Ripley 
wrote:

> On 22/02/2021 08:30, Travers Ching wrote:
> > I noticed CRAN is now doing checks against Apple M1, and some packages
> are
> > failing including a dependency I use.
>
> I don't know what this refers to: M1 Mac CRAN checks are planned but
> AFAICS not yet included in the main results tables.
>
> OTOH, 'Additional issues' on M1 Mac have been reported on the results
> pages since early December.
>
> > Is building on M1 now a requirement, or can the check be ignored? If
> it's a
> > requirement, how can one test it out?
>
> 'requirement' for what?
>
> I am not aware of any CRAN package for which 'R CMD build' does not work
> on an M1 Mac.
>
> *Checking* might need an M1 Mac machine.  CRAN has only been notifying
> issues which can easily be corrected without access to M1 hardware (such
> as using suggested packages unconditionally or using optional
> capabilities without checking).
>
> > Travers
> >
> >   [[alternative HTML version deleted]]
>
> Please do re-read the posting guide (and 'Writing R Extensions').
> Also, this is not r-package-devel 
>
> --
> Brian D. Ripley,  rip...@stats.ox.ac.uk
> Emeritus Professor of Applied Statistics, University of Oxford
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Crash/bug when calling match on latin1 strings

2021-10-10 Thread Travers Ching
Here's a brief example:

# A bunch of words in UTF8; replace *'s
words <- readLines("h://pastebin.c**/raw/MFCQfhpY", encoding = "UTF-8")
words2 <- iconv(words, "utf-8", "latin1")
gctorture(TRUE)
y <- match(words2, words2)


I searched bugzilla but didn't see anything. Apologies if this is already
reported.

The bug appears in both R-devel and the release, but doesn't seem to affect
R 4.0.5.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Minor bug with stats::isoreg

2023-09-27 Thread Travers Ching
Hello, I'd like to file a small bug report. I searched and didn't find a
duplicate report.

Calling isoreg with an Inf value causes a segmentation fault, tested on R
4.3.1 and R 4.2. A reproducible example is: `isoreg(c(0,Inf))`

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Bug report: parLapply with capture.output(type="message") produces an error

2023-10-05 Thread Travers Ching
Hello, I have tested this on a fresh ubuntu image with R 4.3.1.

Rscript -e 'library(parallel)
cl = makeCluster(2)
x = parLapply(cl, 1:100, function(i) {
  capture.output(message("hello"), type = "message")
})
print("bye")'

This produces the following output:

[1] "bye"
Error in unserialize(node$con) : error reading from connection
Calls:  ... doTryCatch -> recvData -> recvData.SOCKnode ->
unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls:  ... doTryCatch -> recvData -> recvData.SOCKnode ->
unserialize
Execution halted

The error does not occur interactively or if stopCluster gets called at the
end.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug report: parLapply with capture.output(type="message") produces an error

2023-10-06 Thread Travers Ching
e.  If we use the latter, we will see all output from
> the parallel worker(s).  Let's try that:
>
> $ Rscript --vanilla -e 'library(parallel); cl <- makeCluster(1,
> outfile = ""); x <- clusterEvalQ(cl, { })'
> starting worker pid=349252 on localhost:11036 at 17:45:05.125
> Error in unserialize(node$con) : error reading from connection
> Calls:  ... doTryCatch -> recvData -> recvData.SOCKnode ->
> unserialize
> Execution halted
>
> You see. There's a "starting worker ..." output that we now see.  But
> more importantly, we now also see that "error reading from connection"
> message.  So, as you see, that error message is there regardless of us
> capturing or sinking the "message" output.  Instead, what it tells us
> is that there is an error taking place at the very end, but we
> normally don't see it.
>
> This error is because when the main R session shuts down, the parallel
> workers are still running and trying to listen to the socket
> connection that they use to communicate with the main R session.  But
> that is now broken, so each parallel worker will fail when it tries to
> communicate.
>
> How to fix it? Make sure to close the 'cl' cluster before exiting the
> main R session, i.e.
>
> $ Rscript --vanilla -e 'library(parallel); cl <- makeCluster(1,
> outfile = ""); x <- clusterEvalQ(cl, { }); stopCluster(cl)'
> starting worker pid=349703 on localhost:11011 at 17:50:20.357
>
> The error is no longer there, because the main R session will tell the
> parallel workers to shut down *before* terminating itself. This means
> there are no stray parallel workers trying to reach a non-existing
> main R session.
>
> In a way, your example revealed that you forgot to call
> stopCluster(cl) at the end.
>
> But, the real message here is: Do not mess with the "message" output in R!
>
> I'll take the moment to rant about this: I think sink(..., type =
> "message") should not be part of the public R API; it's simply
> impossible to use safely, because there is no one owner controlling
> it. To prevent it being used by mistake, at least it could throw an
> error if there's already an active "message" sink.  Oh, well ...
>
>
> Almost finally, do what you're probably trying to achieve here, when you
> call:
>
>  out <- capture.output({ message("hello"); message("world") }, type =
> "message")
>
> What you really want to do is:
>
> capture_messages <- function(expr, envir = parent.frame()) {
>   msgs <- list()
>   withCallingHandlers({
> eval(expr, envir = envir)
>   }, message = function(m) {
> msgs <<- c(msgs, list(m))
> invokeRestart("muffleMessage")
>   })
>   msgs
> }
>
> msgs <- capture_messages({ message("hello"); message("world") })
>
> When you capture 'message' conditions this way, you can decide to
> resignal then later, e.g.
>
> > void <- lapply(msgs, message)
> hello
> world
>
> You can capture 'warning' conditions in the same way.
>
>
>
> Finally, if you've got to this because you wanted to
> capture/see/display/view output that is taking place on parallel
> workers, I recommend using the Futureverse (https://futureverse.org)
> for parallelization. Disclaimer, I'm the author.  The Futureverse
> takes care of relaying stdout, messages, warnings, errors, and other
> types of conditions automatically. Here's an example that resembles
> your original example:
>
> > cl <- parallel::makeCluster(2)
> > future::plan("cluster", workers = cl)
> > y <- future.apply::future_lapply(1:3, function(i) message("hello"))
> hello
> hello
> hello
> > parallel::stopCluster(cl)
>
> Note that those "hello" messages are truly relayed versions of the
> original 'message' conditions. Warnings works the same way.
>
> A cleaner and slightly better version of the above example is:
>
> > library(future.apply)
> > plan(multisession, workers = 2)
> > y <- future.apply::future_lapply(1:3, function(i) message("hello"))
> hello
> hello
> hello
> > plan(sequential)
>
> Over and out,
>
> Henrik
>
> On Thu, Oct 5, 2023 at 4:07 PM Travers Ching  wrote:
> >
> > Hello, I have tested this on a fresh ubuntu image with R 4.3.1.
> >
> > Rscript -e 'library(parallel)
> > cl = makeCluster(2)
> > x = parLapply(cl, 1:100, function(i) {
> >   capture.output(message("hello"), type = "message")
> > })
> > print("bye")'
> >
> > This produces the following output:
> >
> > [1] "bye"
> > Error in unserialize(node$con) : error reading from connection
> > Calls:  ... doTryCatch -> recvData -> recvData.SOCKnode ->
> > unserialize
> > Execution halted
> > Error in unserialize(node$con) : error reading from connection
> > Calls:  ... doTryCatch -> recvData -> recvData.SOCKnode ->
> > unserialize
> > Execution halted
> >
> > The error does not occur interactively or if stopCluster gets called at
> the
> > end.
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R hang/bug with circular references and promises

2024-05-10 Thread Travers Ching
The following code snippet causes R to hang. This example might be a
bit contrived as I was experimenting and trying to understand
promises, but uses only base R.

It looks like it is looking for "not_a_variable" recursively but since
it doesn't exist it goes on indefinitely.

x0 <- new.env()
x1 <- new.env(parent = x0)
parent.env(x0) <- x1
delayedAssign("v", not_a_variable, eval.env=x1)
delayedAssign("w", v, assign.env=x1, eval.env=x0)
x1$w

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] clarifying and adjusting the C API for R

2024-06-09 Thread Travers Ching
Hi Luke, thanks for all your work on R!

I'd like to ask specifically about R_serialize / R_unserialize (and
associated helper functions). These are used by at least a handful of
packages and I don't see them in the list from Yutani.

Are these API functions considered "stable"?

Best,
Travers

On Sat, Jun 8, 2024 at 9:29 PM Hiroaki Yutani  wrote:
>
> Thanks so much for your wonderful work, Luke!
> I didn't expect such a clarification to happen this soon. This is really
> great.
>
> For convenience, I created a quick web page to search the result of
> tools:::funAPI().
>
> https://yutannihilation.github.io/R-fun-API/
>
> Hope this helps those who are too lazy to install R-devel to check.
>
> Best,
> Yutani
>
> 2024年6月6日(木) 23:47 luke-tierney--- via R-devel :
>
> > This is an update on some current work on the C API for use in R
> > extensions.
> >
> > The internal R implementation makes use of tens of thousands of C
> > entry points. On Linux and Windows, which support visibility
> > restrictions, most of these are visible only within the R executble or
> > shared library. About 1500 are not hidden and are visible to
> > dynamically loaded shared libraries, such as ones in packages, and to
> > embedding applications.
> >
> > There are two main reasons for limiting access to entry points in a
> > software framework:
> >
> > - Some entry points are very easy to use in ways that corrupt internal
> >data, leading to segfaults or, worse, incorrect computations without
> >segfaults.
> >
> > - Some entry point expose internal structure and other implementation
> >details, which makes it hard to make improvements without breaking
> >client code that has come to depend on these details.
> >
> > The API of C entry points that can be used in R extensions, both for
> > packages and embedding, has evolved organically over many years. The
> > definition for the current release expressed in the Writing R
> > Extensions manual (WRE) is roughly:
> >
> >  An entry point can be used if (1) it is declared in a header file
> >  in R.home("include"), and (2) if it is documented for use in WRE.
> >
> > Ideally, (1) would be necessary and sufficient, but for a variety of
> > reasons that isn't achievable, at least not in the near term. (2) can
> > be challenging to determine; in particular, it is not amenable to a
> > computational answer.
> >
> > An experimental effort is underway to add annotations to the WRE
> > Texinfo source to allow (2) to be answered unambiguously. The
> > annotations so far mostly reflect my reading or WRE and may be revised
> > as they are reviewed by others. The annotated document can be used for
> > programmatically identifying what is currently considered part of the C
> > API. The result so far is an experimental function tools:::funAPI():
> >
> >  > head(tools:::funAPI())
> >  nameloc apitype
> >  1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.heapi
> >  2alloc3DArrayWRE api
> >  3  allocArrayWRE api
> >  4   allocLangWRE api
> >  5   allocListWRE api
> >  6 allocMatrixWRE api
> >
> > The 'apitype' field has three possible levels
> >
> >  | api  | stable (ideally) API |
> >  | eapi | experimental API |
> >  | emb  | embedding API|
> >
> > Entry points in the embedded API would typically only be used in
> > applications embedding R or providing new front ends, but might be
> > reasonable to use in packages that support embedding.
> >
> > The 'loc' field indicates how the entry point is identified as part of
> > an API: explicit mention in WRE, or declaration in a header file
> > identified as fully part of an API.
> >
> > [tools:::funAPI() may not be completely accurate as it relies on
> > regular expressions for examining header files considered part of the
> > API rather than proper parsing. But it seems to be pretty close to
> > what can be achieved with proper parsing.  Proper parsing would add
> > dependencies on additional tools, which I would like to avoid for
> > now. One dependency already present is that a C compiler has to be on
> > the search path and cc -E has to run the C pre-processor.]
> >
> > Two additional experimental functions are available for analyzing
> > package compliance: tools:::checkPkgAPI and tools:::checkAllPkgsAPI.
> > These examine installed packages.
> >
> > [These may produce some false positives on macOS; they may or may not
> > work on Windows at this point.]
> >
> > Using these tools initially showed around 200 non-API entry points
> > used across packages on CRAN and BIOC. Ideally this number should be
> > reduced to zero. This will require a combination of additions to the
> > API and changes in packages.
> >
> > Some entry points can safely be added to the API. Around 40 have
> > already be