Re: [Rd] sum() vs cumsum() implicit type coercion
On 8/23/20 5:02 PM, Rory Winston wrote: Hi I noticed a small inconsistency when using sum() vs cumsum() I have a char-based series > tryjpy$long [1] "0.0022" "-0.0002" "-0.0149" "-0.0023" "-0.0342" "-0.0245" "-0.0022" [8] "0.0003" "-0.0001" "-0.0004" "-0.0036" "-0.001" "-0.0011" "-0.0012" [15] "-0.0006" "0.0016" "0.0006" When I run sum() vs cumsum() , sum fails but cumsum converts the series to numeric before summing: sum(tryjpy$long) Error in sum(tryjpy$long) : invalid 'type' (character) of argument cumsum(tryjpy$long) [1] 0.0022 0.0020 -0.0129 -0.0152 -0.0494 -0.0739 -0.0761 -0.0758 -0.0759 [10] -0.0763 -0.0799 -0.0809 -0.0820 -0.0832 -0.0838 -0.0822 -0.0816 Which I guess is due to the following line in do_cum(): PROTECT(t = coerceVector(CAR(args), REALSXP)); This might be fine and there may be very good reasons why there is no coercion in sum - just seems a little inconsistent in usage Yes. I don't know the reason for this design, but please note it is documented in ?sum and in ?cumsum, which would also make it harder to change. One can always use a consistent subset (not rely on the coercion e.g. from characters). Best Tomas Cheers -- Rory __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] sum() vs cumsum() implicit type coercion
> Tomas Kalibera > on Tue, 25 Aug 2020 09:29:05 +0200 writes: > On 8/23/20 5:02 PM, Rory Winston wrote: >> Hi >> >> I noticed a small inconsistency when using sum() vs cumsum() >> >> I have a char-based series >> >> > tryjpy$long >> >> [1] "0.0022" "-0.0002" "-0.0149" "-0.0023" "-0.0342" "-0.0245" "-0.0022" >> >> [8] "0.0003" "-0.0001" "-0.0004" "-0.0036" "-0.001" "-0.0011" "-0.0012" >> >> [15] "-0.0006" "0.0016" "0.0006" >> >> When I run sum() vs cumsum() , sum fails but cumsum converts the >> series to numeric before summing: >> >>> sum(tryjpy$long) >> Error in sum(tryjpy$long) : invalid 'type' (character) of argument >> >>> cumsum(tryjpy$long) >> [1] 0.0022 0.0020 -0.0129 -0.0152 -0.0494 -0.0739 -0.0761 -0.0758 -0.0759 >> [10] -0.0763 -0.0799 -0.0809 -0.0820 -0.0832 -0.0838 -0.0822 -0.0816 >> >> Which I guess is due to the following line in do_cum(): >> >> PROTECT(t = coerceVector(CAR(args), REALSXP)); >> This might be fine and there may be very good reasons why there is no >> coercion in sum - just seems a little inconsistent in usage > Yes. I don't know the reason for this design, but please note it is > documented in ?sum and in ?cumsum, which would also make it harder to > change. One can always use a consistent subset (not rely on the coercion > e.g. from characters). > Best > Tomas Indeed. Further note that most arithmetic/math *fails* on character vectors, so if a change would have to be made, it should rather be such that cumsum() also rejects character input. We would have consistency then, but potentially break user code, even package code which has hitherto assumed cumsum() to coerce to numeric first. If a majority of commentators and R core thinks we should make such a change, I'd agree to consider it. Otherwise, we save (ourselves and others) a bit of time. Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] sum() vs cumsum() implicit type coercion
(If I may be so bold, although I think it's unlikely that a majority would be in favour of this change, and I doubt anyone is actually proposing it, I think quite a bit more than "a majority" should be required before a change like this should be allowed. Considering the feature that cumsum coerces to numeric is documented, the consistency of type coercion between sum and cumsum has never been advertised, and that a custom version of cumsum that addresses the inconsistency would be very easy for users to create themselves, I'd struggle to think the change could ever have merit. Even public unanimity would probably not be enough.) On Tue, 25 Aug 2020 at 20:25, Martin Maechler wrote: > > > Tomas Kalibera > > on Tue, 25 Aug 2020 09:29:05 +0200 writes: > > > On 8/23/20 5:02 PM, Rory Winston wrote: > >> Hi > >> > >> I noticed a small inconsistency when using sum() vs cumsum() > >> > >> I have a char-based series > >> > >> > tryjpy$long > >> > >> [1] "0.0022" "-0.0002" "-0.0149" "-0.0023" "-0.0342" "-0.0245" > "-0.0022" > >> > >> [8] "0.0003" "-0.0001" "-0.0004" "-0.0036" "-0.001" "-0.0011" > "-0.0012" > >> > >> [15] "-0.0006" "0.0016" "0.0006" > >> > >> When I run sum() vs cumsum() , sum fails but cumsum converts the > >> series to numeric before summing: > >> > >>> sum(tryjpy$long) > >> Error in sum(tryjpy$long) : invalid 'type' (character) of argument > >> > >>> cumsum(tryjpy$long) > >> [1] 0.0022 0.0020 -0.0129 -0.0152 -0.0494 -0.0739 -0.0761 -0.0758 > -0.0759 > >> [10] -0.0763 -0.0799 -0.0809 -0.0820 -0.0832 -0.0838 -0.0822 -0.0816 > >> > >> Which I guess is due to the following line in do_cum(): > >> > >> PROTECT(t = coerceVector(CAR(args), REALSXP)); > >> This might be fine and there may be very good reasons why there is no > >> coercion in sum - just seems a little inconsistent in usage > > > Yes. I don't know the reason for this design, but please note it is > > documented in ?sum and in ?cumsum, which would also make it harder to > > change. One can always use a consistent subset (not rely on the coercion > > e.g. from characters). > > > Best > > Tomas > > Indeed. > Further note that most arithmetic/math *fails* on > character vectors, so if a change would have to be made, it > should rather be such that cumsum() also rejects character > input. > > We would have consistency then, but potentially break user code, > even package code which has hitherto assumed cumsum() to coerce > to numeric first. > > If a majority of commentators and R core thinks we should make > such a change, I'd agree to consider it. > > Otherwise, we save (ourselves and others) a bit of time. > Martin > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R 4.0.2 64-bit Windows hangs
On 8/22/20 9:33 PM, Jeroen Ooms wrote: On Sat, Aug 22, 2020 at 9:10 PM Tomas Kalibera wrote: On 8/22/20 8:26 PM, Tomas Kalibera wrote: On 8/22/20 7:58 PM, Jeroen Ooms wrote: On Sat, Aug 22, 2020 at 8:39 AM Tomas Kalibera wrote: On 8/21/20 11:45 PM, m19tdn+9alxwj7d2bmk--- via R-devel wrote: Ah yes, this is related. I reported v2010 below, but it looks like I was updated to this Insider Build overnight without my knowledge, and conflated it with the new installation R v4 this morning. I will continue to look into the issue with the methods Tomas mentioned. It is interesting that a rare 5 years old problem would re-appear on current Insider builds. Which build of Windows are you running exactly? I've seen another report about a crash on 20190.1000. It'd be nice to know if it is present also in newer builds, i.e. in 20197. I installed the latest 20197 build in a vm, and I can indeed reproduce this problem. What seems to be happening is that R triggers an infinite recursion in Windows unwinding mechanism, and eventually dies with a stack overflow. Attached a backtrace of the initial 100 frames of the main thread (the pattern in the top ~30 frames continues forever). The microsoft blog doesn't mention anything related to exception handling has changed in recent versions: https://docs.microsoft.com/en-us/windows-insider/at-home/active-dev-branch Thanks, unfortunately that does not ring any bells (except below), I can't guess from this what is the underlying cause of the problem. There may be something wrong in how we use setjmp/longjmp or how setjmp/longjmp works on Windows. It reminds me of a problem I've been debugging few days ago, when longjump implementation segfaults on Windows 10 (recent but not Insider build) probably soon after unwinding the stack, but only with GCC 10 / MinGW 7 and only in one of the no-segfault tests, and only with -03 (not -O2, not with with -O3 -fno-split-loops). The problem was sensitive to these optimization options interestingly on the call site of long jump (do_abs), even when it was not an immediate caller of the longjump. I've not tracked this down yet, it will require looking at the assembly level, and I was suspecting a compiler error causing the compiler to generate code that messes with the stack or registers in a way that impacts the upcoming jump. But now as we have this other problem with setjmp/logjmp, the compiler may not be the top suspect anymore. I may not be able to work on this in the next few days or a week, so if anyone gets there first, please let me know what you find out. Btw could you please try out if the UCRT build of R crashes as well in the Insider Windows build ? Yes, it hangs in exactly the same way, except that the backtrace shows ucrtbase!.intrinsic_setjmpex () from C:\WINDOWS\System32\ucrtbase.dll Instead of msvcrt!_setjmpex (as expected of course). Thanks. I found what is causing the problem I observed with GCC10/stock Windows 10, I expect this is the same one as in the Insider build. I will investigate further, Tomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] trace creates object in base namespace if called on function argument
Dear R-devel, I don't think this is expected : foo <- function() "hello" trace2 <- function(fun) trace(fun, quote(print("!!!"))) base::fun # Object with tracing code, class "functionWithTrace" # Original definition: # function() "hello" # # ## (to see the tracing code, look at body(object)) `untrace()` has the same behavior. This is inconsistent with how debug works : foo <- function() "hello" debug2 <- function(fun) debug(fun) debug2(foo) isdebugged(foo) # [1] TRUE This can be worked around by defining : trace2 <- function(fun) eval.parent(substitute(trace(fun, quote(print("!!!"))) but I believe the current behavior is undesired and it'd be better to make it behave as `debug()`, or to throw an error. Best, Antoine [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] trace creates object in base namespace if called on function argument
Apologies there is one line missing in my last email, the code should be : foo <- function() "hello" trace2 <- function(fun) trace(fun, quote(print("!!!"))) trace2(foo) # <- THIS LINE WAS MISSING base::fun Best, Antoine Le mar. 25 août 2020 à 22:02, Antoine Fabri a écrit : > Dear R-devel, > > I don't think this is expected : > > foo <- function() "hello" > trace2 <- function(fun) trace(fun, quote(print("!!!"))) > base::fun > # Object with tracing code, class "functionWithTrace" > # Original definition: > # function() "hello" > # > # ## (to see the tracing code, look at body(object)) > > `untrace()` has the same behavior. > > This is inconsistent with how debug works : > > foo <- function() "hello" > debug2 <- function(fun) debug(fun) > debug2(foo) > isdebugged(foo) > # [1] TRUE > > This can be worked around by defining : > > trace2 <- function(fun) eval.parent(substitute(trace(fun, > quote(print("!!!"))) > > but I believe the current behavior is undesired and it'd be better to make > it behave as `debug()`, or to throw an error. > > Best, > > Antoine > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] NAs and rle
Hi All, A twitter user, Mike fc (@coolbutuseless) mentioned today that he was surprised that repeated NAs weren't treated as a run by the rle function. Now I know why they are not. NAs represent values which could be the same or different from eachother if they were known, so from a purely conceptual standpoint there is no way to tell whether they are the same and thus constitute a run or not. This conceptual strictness isnt universally observed, though, because we get the following: > unique(c(1, 2, 3, NA, NA, NA)) [1] 1 2 3 NA Which means that rle(sort(x))$value is not guaranteed to be the same as unique(x), which is a little strange (though likely of little practical impact). Personally, to me it also seems that, from a purely data-compression standpoint, it would be valid to collapse those missing values into a run of missing, as it reduces size in-memory/on disk without losing any information. Now none of this is to say that I suggest the default behavior be changed (that would surely disrupt some non-trivial amount of existing code) but what do people think of a group.nas argument which defaults to FALSE controlling the behavior? As a final point, there is some precedent here (though obviously not at all binding), as Bioconductor's Rle functionality does group NAs. Best, ~G [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel