Re: [Rd] sum() vs cumsum() implicit type coercion

2020-08-25 Thread Tomas Kalibera

On 8/23/20 5:02 PM, Rory Winston wrote:

Hi

I noticed a small inconsistency when using sum() vs cumsum()

  I have a char-based series

  > tryjpy$long

  [1] "0.0022"  "-0.0002" "-0.0149" "-0.0023" "-0.0342" "-0.0245" "-0.0022"

  [8] "0.0003"  "-0.0001" "-0.0004" "-0.0036" "-0.001"  "-0.0011" "-0.0012"

[15] "-0.0006" "0.0016"  "0.0006"

When I run sum() vs cumsum() , sum fails but cumsum converts the
series to numeric before summing:


sum(tryjpy$long)

Error in sum(tryjpy$long) : invalid 'type' (character) of argument


cumsum(tryjpy$long)

  [1]  0.0022  0.0020 -0.0129 -0.0152 -0.0494 -0.0739 -0.0761 -0.0758 -0.0759
[10] -0.0763 -0.0799 -0.0809 -0.0820 -0.0832 -0.0838 -0.0822 -0.0816

Which I guess is due to the following line in do_cum():

PROTECT(t = coerceVector(CAR(args), REALSXP));
This might be fine and there may be very good reasons why there is no
coercion in sum - just seems a little inconsistent in usage


Yes. I don't know the reason for this design, but please note it is 
documented in ?sum and in ?cumsum, which would also make it harder to 
change. One can always use a consistent subset (not rely on the coercion 
e.g. from characters).


Best
Tomas



Cheers
-- Rory

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] sum() vs cumsum() implicit type coercion

2020-08-25 Thread Martin Maechler
> Tomas Kalibera 
> on Tue, 25 Aug 2020 09:29:05 +0200 writes:

> On 8/23/20 5:02 PM, Rory Winston wrote:
>> Hi
>> 
>> I noticed a small inconsistency when using sum() vs cumsum()
>> 
>> I have a char-based series
>> 
>> > tryjpy$long
>> 
>> [1] "0.0022"  "-0.0002" "-0.0149" "-0.0023" "-0.0342" "-0.0245" "-0.0022"
>> 
>> [8] "0.0003"  "-0.0001" "-0.0004" "-0.0036" "-0.001"  "-0.0011" "-0.0012"
>> 
>> [15] "-0.0006" "0.0016"  "0.0006"
>> 
>> When I run sum() vs cumsum() , sum fails but cumsum converts the
>> series to numeric before summing:
>> 
>>> sum(tryjpy$long)
>> Error in sum(tryjpy$long) : invalid 'type' (character) of argument
>> 
>>> cumsum(tryjpy$long)
>> [1]  0.0022  0.0020 -0.0129 -0.0152 -0.0494 -0.0739 -0.0761 -0.0758 
-0.0759
>> [10] -0.0763 -0.0799 -0.0809 -0.0820 -0.0832 -0.0838 -0.0822 -0.0816
>> 
>> Which I guess is due to the following line in do_cum():
>> 
>> PROTECT(t = coerceVector(CAR(args), REALSXP));
>> This might be fine and there may be very good reasons why there is no
>> coercion in sum - just seems a little inconsistent in usage

> Yes. I don't know the reason for this design, but please note it is 
> documented in ?sum and in ?cumsum, which would also make it harder to 
> change. One can always use a consistent subset (not rely on the coercion 
> e.g. from characters).

> Best
> Tomas

Indeed.
Further note that most arithmetic/math  *fails* on
character vectors, so if a change would have to be made, it
should rather be such that cumsum() also rejects character
input.

We would have consistency then, but potentially break user code,
even package code which has hitherto assumed cumsum() to coerce
to numeric first.

If a majority of commentators and R core thinks we should make
such a change, I'd agree to consider it.

Otherwise, we save (ourselves and others) a bit of time.
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] sum() vs cumsum() implicit type coercion

2020-08-25 Thread Hugh Parsonage
(If I may be so bold, although I think it's unlikely that a majority
would be in favour of this change, and I doubt anyone is actually
proposing it, I think quite a bit more than "a majority" should be
required before a change like this should be allowed.

Considering the feature that cumsum coerces to numeric is documented,
the consistency of type coercion between sum and cumsum has never been
advertised, and that a custom version of cumsum that addresses the
inconsistency would be very easy for users to create themselves, I'd
struggle to think the change could ever have merit. Even public
unanimity would probably not be enough.)

On Tue, 25 Aug 2020 at 20:25, Martin Maechler
 wrote:
>
> > Tomas Kalibera
> > on Tue, 25 Aug 2020 09:29:05 +0200 writes:
>
> > On 8/23/20 5:02 PM, Rory Winston wrote:
> >> Hi
> >>
> >> I noticed a small inconsistency when using sum() vs cumsum()
> >>
> >> I have a char-based series
> >>
> >> > tryjpy$long
> >>
> >> [1] "0.0022"  "-0.0002" "-0.0149" "-0.0023" "-0.0342" "-0.0245" 
> "-0.0022"
> >>
> >> [8] "0.0003"  "-0.0001" "-0.0004" "-0.0036" "-0.001"  "-0.0011" 
> "-0.0012"
> >>
> >> [15] "-0.0006" "0.0016"  "0.0006"
> >>
> >> When I run sum() vs cumsum() , sum fails but cumsum converts the
> >> series to numeric before summing:
> >>
> >>> sum(tryjpy$long)
> >> Error in sum(tryjpy$long) : invalid 'type' (character) of argument
> >>
> >>> cumsum(tryjpy$long)
> >> [1]  0.0022  0.0020 -0.0129 -0.0152 -0.0494 -0.0739 -0.0761 -0.0758 
> -0.0759
> >> [10] -0.0763 -0.0799 -0.0809 -0.0820 -0.0832 -0.0838 -0.0822 -0.0816
> >>
> >> Which I guess is due to the following line in do_cum():
> >>
> >> PROTECT(t = coerceVector(CAR(args), REALSXP));
> >> This might be fine and there may be very good reasons why there is no
> >> coercion in sum - just seems a little inconsistent in usage
>
> > Yes. I don't know the reason for this design, but please note it is
> > documented in ?sum and in ?cumsum, which would also make it harder to
> > change. One can always use a consistent subset (not rely on the coercion
> > e.g. from characters).
>
> > Best
> > Tomas
>
> Indeed.
> Further note that most arithmetic/math  *fails* on
> character vectors, so if a change would have to be made, it
> should rather be such that cumsum() also rejects character
> input.
>
> We would have consistency then, but potentially break user code,
> even package code which has hitherto assumed cumsum() to coerce
> to numeric first.
>
> If a majority of commentators and R core thinks we should make
> such a change, I'd agree to consider it.
>
> Otherwise, we save (ourselves and others) a bit of time.
> Martin
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R 4.0.2 64-bit Windows hangs

2020-08-25 Thread Tomas Kalibera

On 8/22/20 9:33 PM, Jeroen Ooms wrote:

On Sat, Aug 22, 2020 at 9:10 PM Tomas Kalibera  wrote:

On 8/22/20 8:26 PM, Tomas Kalibera wrote:

On 8/22/20 7:58 PM, Jeroen Ooms wrote:

On Sat, Aug 22, 2020 at 8:39 AM Tomas Kalibera
 wrote:

On 8/21/20 11:45 PM, m19tdn+9alxwj7d2bmk--- via R-devel wrote:

Ah yes, this is related. I reported v2010 below, but it looks like
I was updated to this Insider Build overnight without my knowledge,
and conflated it with the new installation R v4 this morning.

I will continue to look into the issue with the methods Tomas
mentioned.

It is interesting that a rare 5 years old problem would re-appear on
current Insider builds. Which build of Windows are you running exactly?
I've seen another report about a crash on 20190.1000. It'd be nice to
know if it is present also in newer builds, i.e. in 20197.

I installed the latest 20197 build in a vm, and I can indeed reproduce
this problem.

What seems to be happening is that R triggers an infinite recursion in
Windows unwinding mechanism, and eventually dies with a stack
overflow. Attached a backtrace of the initial 100 frames of the main
thread (the pattern in the top ~30 frames continues forever).

The microsoft blog doesn't mention anything related to exception
handling has changed in recent versions:
https://docs.microsoft.com/en-us/windows-insider/at-home/active-dev-branch


Thanks, unfortunately that does not ring any bells (except below), I
can't guess from this what is the underlying cause of the problem.
There may be something wrong in how we use setjmp/longjmp or how
setjmp/longjmp works on Windows.

It reminds me of a problem I've been debugging few days ago, when
longjump implementation segfaults on Windows 10 (recent but not
Insider build) probably soon after unwinding the stack, but only with
GCC 10 / MinGW 7 and only in one of the no-segfault tests, and only
with -03 (not -O2, not with with -O3 -fno-split-loops). The problem
was sensitive to these optimization options interestingly on the call
site of long jump (do_abs), even when it was not an immediate caller
of the longjump. I've not tracked this down yet, it will require
looking at the assembly level, and I was suspecting a compiler error
causing the compiler to generate code that messes with the stack or
registers in a way that impacts the upcoming jump. But now as we have
this other problem with setjmp/logjmp, the compiler may not be the top
suspect anymore.

I may not be able to work on this in the next few days or a week, so
if anyone gets there first, please let me know what you find out.

Btw could you please try out if the UCRT build of R crashes as well in
the Insider Windows build ?

Yes, it hangs in exactly the same way, except that the backtrace shows

  ucrtbase!.intrinsic_setjmpex () from C:\WINDOWS\System32\ucrtbase.dll

Instead of msvcrt!_setjmpex (as expected of course).


Thanks. I found what is causing the problem I observed with GCC10/stock 
Windows 10, I expect this is the same one as in the Insider build.

I will investigate further,

Tomas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] trace creates object in base namespace if called on function argument

2020-08-25 Thread Antoine Fabri
Dear R-devel,

I don't think this is expected :

foo <- function() "hello"
trace2 <- function(fun) trace(fun, quote(print("!!!")))
base::fun
# Object with tracing code, class "functionWithTrace"
# Original definition:
# function() "hello"
#
# ## (to see the tracing code, look at body(object))

`untrace()` has the same behavior.

This is inconsistent with how debug works :

foo <- function() "hello"
debug2 <- function(fun) debug(fun)
debug2(foo)
isdebugged(foo)
# [1] TRUE

This can be worked around by defining :

trace2 <- function(fun) eval.parent(substitute(trace(fun,
quote(print("!!!")))

but I believe the current behavior is undesired and it'd be better to make
it behave as `debug()`, or to throw an error.

Best,

Antoine

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] trace creates object in base namespace if called on function argument

2020-08-25 Thread Antoine Fabri
Apologies there is one line missing in my last email, the code should be :

foo <- function() "hello"
trace2 <- function(fun) trace(fun, quote(print("!!!")))
trace2(foo) # <- THIS LINE WAS MISSING
base::fun

Best,

Antoine

Le mar. 25 août 2020 à 22:02, Antoine Fabri  a
écrit :

> Dear R-devel,
>
> I don't think this is expected :
>
> foo <- function() "hello"
> trace2 <- function(fun) trace(fun, quote(print("!!!")))
> base::fun
> # Object with tracing code, class "functionWithTrace"
> # Original definition:
> # function() "hello"
> #
> # ## (to see the tracing code, look at body(object))
>
> `untrace()` has the same behavior.
>
> This is inconsistent with how debug works :
>
> foo <- function() "hello"
> debug2 <- function(fun) debug(fun)
> debug2(foo)
> isdebugged(foo)
> # [1] TRUE
>
> This can be worked around by defining :
>
> trace2 <- function(fun) eval.parent(substitute(trace(fun,
> quote(print("!!!")))
>
> but I believe the current behavior is undesired and it'd be better to make
> it behave as `debug()`, or to throw an error.
>
> Best,
>
> Antoine
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] NAs and rle

2020-08-25 Thread Gabriel Becker
Hi All,

A twitter user, Mike fc (@coolbutuseless) mentioned today that he was
surprised that repeated NAs weren't treated as a run by the rle function.

Now I know why they are not. NAs represent values which could be the same
or different from eachother if they were known, so from a purely conceptual
standpoint there is no way to tell whether they are the same and thus
constitute a run or not.

This conceptual strictness isnt universally observed, though, because we
get the following:

> unique(c(1, 2, 3, NA, NA, NA))

[1]  1  2  3 NA


Which means that rle(sort(x))$value is not guaranteed to be the same as
unique(x), which is a little strange (though likely of little practical
impact).


Personally, to me it also seems that, from a purely data-compression
standpoint, it would be valid to collapse those missing values into a run
of missing, as it reduces size in-memory/on disk without losing any
information.

Now none of this is to say that I suggest the default behavior be changed
(that would surely disrupt some non-trivial amount of existing code) but
what do people think of a  group.nas argument which defaults to FALSE
controlling the behavior?

As a final point, there is some precedent here (though obviously not at all
binding), as Bioconductor's Rle functionality does group NAs.

Best,
~G

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel