[Rd] removeSource() vs. function literals

2023-03-30 Thread Ivan Krylov
Dear R-devel,

In a package of mine, I use removeSource on expression objects in order
to make expressions that are semantically the same serialize to the
same byte sequences:
https://github.com/cran/depcache/blob/854d68a/R/fixup.R#L8-L34

Today I learned that expressions containing function definitions also
contain the source references for the functions, not as an attribute,
but as a separate argument to the `function` call:

str(quote(function() NULL)[[4]])
# 'srcref' int [1:8] 1 11 1 25 11 25 1 1
# - attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile'
#   

This means that removeSource() on an expression that would define a
function when evaluated doesn't actually remove the source reference
from the object.

Do you think it would be appropriate to teach removeSource() to remove
such source references? What could be a good way to implement that?
if (is.call(fn) && identical(fn[[1]], 'function')) fn[[4]] <- NULL
sounds too arbitrary. if (inherits(fn, 'srcref')) return(NULL) sounds
too broad.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] removeSource() vs. function literals

2023-03-30 Thread Duncan Murdoch

On 30/03/2023 10:32 a.m., Ivan Krylov wrote:

Dear R-devel,

In a package of mine, I use removeSource on expression objects in order
to make expressions that are semantically the same serialize to the
same byte sequences:
https://github.com/cran/depcache/blob/854d68a/R/fixup.R#L8-L34

Today I learned that expressions containing function definitions also
contain the source references for the functions, not as an attribute,
but as a separate argument to the `function` call:

str(quote(function() NULL)[[4]])
# 'srcref' int [1:8] 1 11 1 25 11 25 1 1
# - attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile'
#   

This means that removeSource() on an expression that would define a
function when evaluated doesn't actually remove the source reference
from the object.

Do you think it would be appropriate to teach removeSource() to remove
such source references? What could be a good way to implement that?
if (is.call(fn) && identical(fn[[1]], 'function')) fn[[4]] <- NULL
sounds too arbitrary. if (inherits(fn, 'srcref')) return(NULL) sounds
too broad.



I don't think there's a simple way to do that.  Functions can define 
functions within themselves.  If you're talking about code that was 
constructed by messing with language objects, it could contain both 
function objects and calls to `function` to construct them.  You'd need 
to recurse through all expressions in the object.  Some of those 
expressions might be environments, so your changes could leak out of the 
function you're working on.


Things are simpler if you know the expression is the unmodified result 
of parsing source code, but if you know that, wouldn't you usually be 
able to control things by setting keep.source = FALSE?


Maybe a workable solution is something like parse(deparse(expr, control 
= "exact"), keep.source = FALSE).  Wouldn't work on environments or 
various exotic types, but would probably warn you if it wasn't working.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] write.csv performance improvements?

2023-03-30 Thread Gabriel Becker
Hi Toby et al,



On Wed, Mar 29, 2023 at 10:24 PM Toby Hocking  wrote:

> Dear R-devel,
> I did a systematic comparison of write.csv with similar functions, and
> observed two asymptotic inefficiencies that could be improved.
>
> 1. write.csv is quadratic time (N^2) in the number of columns N.
> Can write.csv be improved to use a linear time algorithm, so it can handle
> CSV files with larger numbers of columns?
>

Yes, I think there is a narrow fix and a wider discussion to be had.

I've posted a discussion and the narrow fix at:
https://bugs.r-project.org/show_bug.cgi?id=18500

For "normal data", ie data that doesn't have classed object columns, the
narrow change I propose in the patch us the performance we might expect
(see the attached, admittedly very ugly plots).

The fact remains though, that with the patch, write.table is still
quadratic in the number of *object-classed *columns.

It doesn't seem like it should be, but I haven't (yet) had a chance to dig
deeper to attack that.  Might be a good subject for the R developer sprint,
if R-core agrees.

~G

> For more details including figures and session info, please see
> https://github.com/tdhock/atime/issues/9
>
> 2. write.csv uses memory that is linear in the number of rows, whereas
> similar R functions for writing CSV use only constant memory. This is not
> as important of an issue to fix, because anyway linear memory is used to
> store the data in R. But since the other functions use constant memory,
> could write.csv also? Is there some copying happening that could be
> avoided? (this memory measurement uses bench::mark, which in turn uses
> utils::Rprofmem)
> https://github.com/tdhock/atime/issues/10
>
> Sincerely,
> Toby Dylan Hocking
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] removeSource() vs. function literals

2023-03-30 Thread Lionel Henry via R-devel
If you can afford a dependency on rlang, `rlang::zap_srcref()` deals
with this. It's recursive over expression vectors, calls (including
calls to `function` and their hidden srcref arg), and function
objects. It's implemented in C for efficiency as we found it to be a
bottleneck in some applications (IIRC caching). I'd be happy to
upstream this in base if R core is interested.

Best,
Lionel


On 3/30/23, Duncan Murdoch  wrote:
> On 30/03/2023 10:32 a.m., Ivan Krylov wrote:
>> Dear R-devel,
>>
>> In a package of mine, I use removeSource on expression objects in order
>> to make expressions that are semantically the same serialize to the
>> same byte sequences:
>> https://github.com/cran/depcache/blob/854d68a/R/fixup.R#L8-L34
>>
>> Today I learned that expressions containing function definitions also
>> contain the source references for the functions, not as an attribute,
>> but as a separate argument to the `function` call:
>>
>> str(quote(function() NULL)[[4]])
>> # 'srcref' int [1:8] 1 11 1 25 11 25 1 1
>> # - attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile'
>> #   
>>
>> This means that removeSource() on an expression that would define a
>> function when evaluated doesn't actually remove the source reference
>> from the object.
>>
>> Do you think it would be appropriate to teach removeSource() to remove
>> such source references? What could be a good way to implement that?
>> if (is.call(fn) && identical(fn[[1]], 'function')) fn[[4]] <- NULL
>> sounds too arbitrary. if (inherits(fn, 'srcref')) return(NULL) sounds
>> too broad.
>>
>
> I don't think there's a simple way to do that.  Functions can define
> functions within themselves.  If you're talking about code that was
> constructed by messing with language objects, it could contain both
> function objects and calls to `function` to construct them.  You'd need
> to recurse through all expressions in the object.  Some of those
> expressions might be environments, so your changes could leak out of the
> function you're working on.
>
> Things are simpler if you know the expression is the unmodified result
> of parsing source code, but if you know that, wouldn't you usually be
> able to control things by setting keep.source = FALSE?
>
> Maybe a workable solution is something like parse(deparse(expr, control
> = "exact"), keep.source = FALSE).  Wouldn't work on environments or
> various exotic types, but would probably warn you if it wasn't working.
>
> Duncan Murdoch
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel