Re: [Rd] Undefined behavior of head() and tail() with n = 0
> Florent Angly > on Wed, 25 Jan 2017 16:31:45 +0100 writes: > Hi all, > The documentation for head() and tail() describes the behavior of > these generic functions when n is strictly positive (n > 0) and > strictly negative (n < 0). How these functions work when given a zero > value is not defined. > Both GNU command-line utilities head and tail behave differently with +0 and -0: > http://man7.org/linux/man-pages/man1/head.1.html > http://man7.org/linux/man-pages/man1/tail.1.html > Since R supports signed zeros (1/+0 != 1/-0) whoa, whoa, .. slow down -- The above is misleading! Rather read in ?Arithmetic (*the* reference to consult for such issues), where the 2nd part of the following section || Implementation limits: || || [..] || || Another potential issue is signed zeroes: on IEC 60659 platforms || there are two zeroes with internal representations differing by || sign. Where possible R treats them as the same, but for example || direct output from C code often does not do so and may output || ‘-0.0’ (and on Windows whether it does so or not depends on the || version of Windows). One place in R where the difference might be || seen is in division by zero: ‘1/x’ is ‘Inf’ or ‘-Inf’ depending on || the sign of zero ‘x’. Another place is ‘identical(0, -0, num.eq = || FALSE)’. says the *contrary* ( __Where possible R treats them as the same__ ): We do _not_ want to distinguish -0 and +0, but there are cases where it is inavoidable And there are good reasons (mathematics !!) for this. I'm pretty sure that it would be quite a mistake to start differentiating it here... but of course we can continue discussing here if you like. Martin Maechler ETH Zurich and R Core > and the R head() and tail() functions are modeled after > their GNU counterparts, I would expect the R functions to > distinguish between +0 and -0 >> tail(1:5, n=0) > integer(0) >> tail(1:5, n=1) > [1] 5 >> tail(1:5, n=2) > [1] 4 5 >> tail(1:5, n=-2) > [1] 3 4 5 >> tail(1:5, n=-1) > [1] 2 3 4 5 >> tail(1:5, n=-0) > integer(0) # expected 1:5 >> head(1:5, n=0) > integer(0) >> head(1:5, n=1) > [1] 1 >> head(1:5, n=2) > [1] 1 2 >> head(1:5, n=-2) > [1] 1 2 3 >> head(1:5, n=-1) > [1] 1 2 3 4 >> head(1:5, n=-0) > integer(0) # expected 1:5 > For both head() and tail(), I expected 1:5 as output but got > integer(0). I obtained similar results using a data.frame and a > function as x argument. > An easy fix would be to explicitly state in the documentation what n = > 0 does, and that there is no practical difference between -0 and +0. > However, in my eyes, the better approach would be implement support > for -0 and document it. What do you think? > Best, > Florent > PS/ My sessionInfo() gives: > R version 3.3.2 (2016-10-31) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 7 x64 (build 7601) Service Pack 1 > locale: > [1] LC_COLLATE=German_Switzerland.1252 > LC_CTYPE=German_Switzerland.1252 > LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C > LC_TIME=German_Switzerland.1252 > attached base packages: > [1] stats graphics grDevices utils datasets methods base > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] RFC: tapply(*, ..., init.value = NA)
Last week, we've talked here about "xtabs(), factors and NAs", -> https://stat.ethz.ch/pipermail/r-devel/2017-January/073621.html In the mean time, I've spent several hours on the issue and also committed changes to R-devel "in two iterations". In the case there is a *Left* hand side part to xtabs() formula, see the help page example using 'esoph', it uses tapply(..., FUN = sum) and I now think there is a missing feature in tapply() there, which I am proposing to change. Look at a small example: > D2 <- data.frame(n = gl(3,4), L = gl(6,2, labels=LETTERS[1:6]), N=3)[-c(1,5), > ]; xtabs(~., D2) , , N = 3 L n A B C D E F 1 1 2 0 0 0 0 2 0 0 1 2 0 0 3 0 0 0 0 2 2 > DN <- D2; DN[1,"N"] <- NA; DN n L N 2 1 A NA 3 1 B 3 4 1 B 3 6 2 C 3 7 2 D 3 8 2 D 3 9 3 E 3 10 3 E 3 11 3 F 3 12 3 F 3 > with(DN, tapply(N, list(n,L), FUN=sum)) A B C D E F 1 NA 6 NA NA NA NA 2 NA NA 3 6 NA NA 3 NA NA NA NA 6 6 > and as you can see, the resulting matrix has NAs, all the same NA_real_, but semantically of two different kinds: 1) at ["1", "A"], the NA comes from the NA in 'N' 2) all other NAs come from the fact that there is no such factor combination *and* from the fact that tapply() uses array(dim = .., dimnames = ...) i.e., initializes the array with NAs (see definition of 'array'). My proposition is the following patch to tapply(), adding a new option 'init.value': - -tapply <- function (X, INDEX, FUN = NULL, ..., simplify = TRUE) +tapply <- function (X, INDEX, FUN = NULL, ..., init.value = NA, simplify = TRUE) { FUN <- if (!is.null(FUN)) match.fun(FUN) if (!is.list(INDEX)) INDEX <- list(INDEX) @@ -44,7 +44,7 @@ index <- as.logical(lengths(ans)) # equivalently, lengths(ans) > 0L ans <- lapply(X = ans[index], FUN = FUN, ...) if (simplify && all(lengths(ans) == 1L)) { - ansmat <- array(dim = extent, dimnames = namelist) + ansmat <- array(init.value, dim = extent, dimnames = namelist) ans <- unlist(ans, recursive = FALSE) } else { ansmat <- array(vector("list", prod(extent)), - With that, I can set the initial value to '0' instead of array's default of NA : > with(DN, tapply(N, list(n,L), FUN=sum, init.value=0)) A B C D E F 1 NA 6 0 0 0 0 2 0 0 3 6 0 0 3 0 0 0 0 6 6 > which now has 0 counts and NA as is desirable to be used inside xtabs(). All fine... and would not be worth a posting to R-devel, except for this: The change will not be 100% back compatible -- by necessity: any new argument for tapply() will make that argument name not available to be specified (via '...') for 'FUN'. The new function would be > str(tapply) function (X, INDEX, FUN = NULL, ..., init.value = NA, simplify = TRUE) where the '...' are passed FUN(), and with the new signature, 'init.value' then won't be passed to FUN "anymore" (compared to R <= 3.3.x). For that reason, we could use 'INIT.VALUE' instead (possibly decreasing the probability the arg name is used in other functions). Opinions? Thank you in advance, Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] strptime bug
Dear developer list, I want to submit the following problem that seems like a bug, as confirmed by an other user [1], related to date-time parsing: Here a simple script: # that works: as.numeric(as.POSIXlt(strptime('2016-03-27 01:05:50', format='%Y-%m-%d %H:%M:%S'))) # that not (it returns NA): as.numeric(as.POSIXlt(strptime('2016-03-27 02:05:50', format='%Y-%m-%d %H:%M:%S'))) # it works again as.numeric(as.POSIXlt(strptime('2016-03-27 03:05:50', format='%Y-%m-%d %H:%M:%S'))) I made several test and the problem seems to be related to the couple "2016-03-27" as date and "2" as hour. It seems not to be related to the datetime format. There is a similar bug on bugzilla [2] but in my case I cannot replicate it. My OS is Win 7 and R v3.3.2. Thank you rob [1] https://stat.ethz.ch/pipermail/r-help/2017-January/68.html [2] https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16764 [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] : strptime bug
Hi, You don't give the time zone but this is probably due to the clock jumping by one hour when switching to summer time. In UK this happens at 1am and on that day there is no such thing as 01:05, etc., see eg https://www.timeanddate.com/time/change/uk/london In your time zone this probably happens at 2am. Georgi Boshnakov -- Message: 7 Date: Thu, 26 Jan 2017 11:02:06 +0100 From: rob vech To: r-devel@r-project.org Subject: [Rd] strptime bug Message-ID: <70325e6b-6d54-8172-3915-bbfc8d5cd...@gmail.com> Content-Type: text/plain; charset="UTF-8" Dear developer list, I want to submit the following problem that seems like a bug, as confirmed by an other user [1], related to date-time parsing: Here a simple script: # that works: as.numeric(as.POSIXlt(strptime('2016-03-27 01:05:50', format='%Y-%m-%d %H:%M:%S'))) # that not (it returns NA): as.numeric(as.POSIXlt(strptime('2016-03-27 02:05:50', format='%Y-%m-%d %H:%M:%S'))) # it works again as.numeric(as.POSIXlt(strptime('2016-03-27 03:05:50', format='%Y-%m-%d %H:%M:%S'))) I made several test and the problem seems to be related to the couple "2016-03-27" as date and "2" as hour. It seems not to be related to the datetime format. There is a similar bug on bugzilla [2] but in my case I cannot replicate it. My OS is Win 7 and R v3.3.2. Thank you rob [1] https://stat.ethz.ch/pipermail/r-help/2017-January/68.html [2] https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16764 [[alternative HTML version deleted]] -- Subject: Digest Footer ___ R-devel@r-project.org mailing list DIGESTED https://stat.ethz.ch/mailman/listinfo/r-devel -- End of R-devel Digest, Vol 167, Issue 23 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RFC: tapply(*, ..., init.value = NA)
It would be cool if the default for tapply's init.value could be FUN(X[0]), so it would be 0 for FUN=sum or FUN=length, TRUE for FUN=all, -Inf for FUN=max, etc. But that would take time and would break code for which FUN did not work on length-0 objects. Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Jan 26, 2017 at 2:42 AM, Martin Maechler wrote: > Last week, we've talked here about "xtabs(), factors and NAs", > -> https://stat.ethz.ch/pipermail/r-devel/2017-January/073621.html > > In the mean time, I've spent several hours on the issue > and also committed changes to R-devel "in two iterations". > > In the case there is a *Left* hand side part to xtabs() formula, > see the help page example using 'esoph', > it uses tapply(..., FUN = sum) and > I now think there is a missing feature in tapply() there, which > I am proposing to change. > > Look at a small example: > >> D2 <- data.frame(n = gl(3,4), L = gl(6,2, labels=LETTERS[1:6]), >> N=3)[-c(1,5), ]; xtabs(~., D2) > , , N = 3 > >L > n A B C D E F > 1 1 2 0 0 0 0 > 2 0 0 1 2 0 0 > 3 0 0 0 0 2 2 > >> DN <- D2; DN[1,"N"] <- NA; DN >n L N > 2 1 A NA > 3 1 B 3 > 4 1 B 3 > 6 2 C 3 > 7 2 D 3 > 8 2 D 3 > 9 3 E 3 > 10 3 E 3 > 11 3 F 3 > 12 3 F 3 >> with(DN, tapply(N, list(n,L), FUN=sum)) >A B C D E F > 1 NA 6 NA NA NA NA > 2 NA NA 3 6 NA NA > 3 NA NA NA NA 6 6 >> > > and as you can see, the resulting matrix has NAs, all the same > NA_real_, but semantically of two different kinds: > > 1) at ["1", "A"], the NA comes from the NA in 'N' > 2) all other NAs come from the fact that there is no such factor combination >*and* from the fact that tapply() uses > >array(dim = .., dimnames = ...) > > i.e., initializes the array with NAs (see definition of 'array'). > > My proposition is the following patch to tapply(), adding a new > option 'init.value': > > - > > -tapply <- function (X, INDEX, FUN = NULL, ..., simplify = TRUE) > +tapply <- function (X, INDEX, FUN = NULL, ..., init.value = NA, simplify = > TRUE) > { > FUN <- if (!is.null(FUN)) match.fun(FUN) > if (!is.list(INDEX)) INDEX <- list(INDEX) > @@ -44,7 +44,7 @@ > index <- as.logical(lengths(ans)) # equivalently, lengths(ans) > 0L > ans <- lapply(X = ans[index], FUN = FUN, ...) > if (simplify && all(lengths(ans) == 1L)) { > - ansmat <- array(dim = extent, dimnames = namelist) > + ansmat <- array(init.value, dim = extent, dimnames = namelist) > ans <- unlist(ans, recursive = FALSE) > } else { > ansmat <- array(vector("list", prod(extent)), > > - > > With that, I can set the initial value to '0' instead of array's > default of NA : > >> with(DN, tapply(N, list(n,L), FUN=sum, init.value=0)) >A B C D E F > 1 NA 6 0 0 0 0 > 2 0 0 3 6 0 0 > 3 0 0 0 0 6 6 >> > > which now has 0 counts and NA as is desirable to be used inside > xtabs(). > > All fine... and would not be worth a posting to R-devel, > except for this: > > The change will not be 100% back compatible -- by necessity: any new argument > for > tapply() will make that argument name not available to be > specified (via '...') for 'FUN'. The new function would be > >> str(tapply) > function (X, INDEX, FUN = NULL, ..., init.value = NA, simplify = TRUE) > > where the '...' are passed FUN(), and with the new signature, > 'init.value' then won't be passed to FUN "anymore" (compared to > R <= 3.3.x). > > For that reason, we could use 'INIT.VALUE' instead (possibly decreasing > the probability the arg name is used in other functions). > > > Opinions? > > Thank you in advance, > Martin > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Undefined behavior of head() and tail() with n = 0
In addition, signed zeroes only exist for floating point numbers - the bit patterns for as.integer(0) and as.integer(-0) are identical. Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Jan 26, 2017 at 1:53 AM, Martin Maechler wrote: >> Florent Angly >> on Wed, 25 Jan 2017 16:31:45 +0100 writes: > > > Hi all, > > The documentation for head() and tail() describes the behavior of > > these generic functions when n is strictly positive (n > 0) and > > strictly negative (n < 0). How these functions work when given a zero > > value is not defined. > > > Both GNU command-line utilities head and tail behave differently with > +0 and -0: > > http://man7.org/linux/man-pages/man1/head.1.html > > http://man7.org/linux/man-pages/man1/tail.1.html > > > Since R supports signed zeros (1/+0 != 1/-0) > > whoa, whoa, .. slow down -- The above is misleading! > > Rather read in ?Arithmetic (*the* reference to consult for such issues), > where the 2nd part of the following section > > || Implementation limits: > || > || [..] > || > || Another potential issue is signed zeroes: on IEC 60659 platforms > || there are two zeroes with internal representations differing by > || sign. Where possible R treats them as the same, but for example > || direct output from C code often does not do so and may output > || ‘-0.0’ (and on Windows whether it does so or not depends on the > || version of Windows). One place in R where the difference might be > || seen is in division by zero: ‘1/x’ is ‘Inf’ or ‘-Inf’ depending on > || the sign of zero ‘x’. Another place is ‘identical(0, -0, num.eq = > || FALSE)’. > > says the *contrary* ( __Where possible R treats them as the same__ ): > We do _not_ want to distinguish -0 and +0, > but there are cases where it is inavoidable > > And there are good reasons (mathematics !!) for this. > > I'm pretty sure that it would be quite a mistake to start > differentiating it here... but of course we can continue > discussing here if you like. > > Martin Maechler > ETH Zurich and R Core > > > > and the R head() and tail() functions are modeled after > > their GNU counterparts, I would expect the R functions to > > distinguish between +0 and -0 > > >> tail(1:5, n=0) > > integer(0) > >> tail(1:5, n=1) > > [1] 5 > >> tail(1:5, n=2) > > [1] 4 5 > > >> tail(1:5, n=-2) > > [1] 3 4 5 > >> tail(1:5, n=-1) > > [1] 2 3 4 5 > >> tail(1:5, n=-0) > > integer(0) # expected 1:5 > > >> head(1:5, n=0) > > integer(0) > >> head(1:5, n=1) > > [1] 1 > >> head(1:5, n=2) > > [1] 1 2 > > >> head(1:5, n=-2) > > [1] 1 2 3 > >> head(1:5, n=-1) > > [1] 1 2 3 4 > >> head(1:5, n=-0) > > integer(0) # expected 1:5 > > > For both head() and tail(), I expected 1:5 as output but got > > integer(0). I obtained similar results using a data.frame and a > > function as x argument. > > > An easy fix would be to explicitly state in the documentation what n = > > 0 does, and that there is no practical difference between -0 and +0. > > However, in my eyes, the better approach would be implement support > > for -0 and document it. What do you think? > > > Best, > > > Florent > > > > PS/ My sessionInfo() gives: > > R version 3.3.2 (2016-10-31) > > Platform: x86_64-w64-mingw32/x64 (64-bit) > > Running under: Windows 7 x64 (build 7601) Service Pack 1 > > > locale: > > [1] LC_COLLATE=German_Switzerland.1252 > > LC_CTYPE=German_Switzerland.1252 > > LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C > > LC_TIME=German_Switzerland.1252 > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RFC: tapply(*, ..., init.value = NA)
On a related note, the storage mode should try to match ans[[1]] (or unlist:ed and) when allocating 'ansmat' to avoid coercion and hence a full copy. Henrik On Jan 26, 2017 07:50, "William Dunlap via R-devel" wrote: It would be cool if the default for tapply's init.value could be FUN(X[0]), so it would be 0 for FUN=sum or FUN=length, TRUE for FUN=all, -Inf for FUN=max, etc. But that would take time and would break code for which FUN did not work on length-0 objects. Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Jan 26, 2017 at 2:42 AM, Martin Maechler wrote: > Last week, we've talked here about "xtabs(), factors and NAs", > -> https://stat.ethz.ch/pipermail/r-devel/2017-January/073621.html > > In the mean time, I've spent several hours on the issue > and also committed changes to R-devel "in two iterations". > > In the case there is a *Left* hand side part to xtabs() formula, > see the help page example using 'esoph', > it uses tapply(..., FUN = sum) and > I now think there is a missing feature in tapply() there, which > I am proposing to change. > > Look at a small example: > >> D2 <- data.frame(n = gl(3,4), L = gl(6,2, labels=LETTERS[1:6]), N=3)[-c(1,5), ]; xtabs(~., D2) > , , N = 3 > >L > n A B C D E F > 1 1 2 0 0 0 0 > 2 0 0 1 2 0 0 > 3 0 0 0 0 2 2 > >> DN <- D2; DN[1,"N"] <- NA; DN >n L N > 2 1 A NA > 3 1 B 3 > 4 1 B 3 > 6 2 C 3 > 7 2 D 3 > 8 2 D 3 > 9 3 E 3 > 10 3 E 3 > 11 3 F 3 > 12 3 F 3 >> with(DN, tapply(N, list(n,L), FUN=sum)) >A B C D E F > 1 NA 6 NA NA NA NA > 2 NA NA 3 6 NA NA > 3 NA NA NA NA 6 6 >> > > and as you can see, the resulting matrix has NAs, all the same > NA_real_, but semantically of two different kinds: > > 1) at ["1", "A"], the NA comes from the NA in 'N' > 2) all other NAs come from the fact that there is no such factor combination >*and* from the fact that tapply() uses > >array(dim = .., dimnames = ...) > > i.e., initializes the array with NAs (see definition of 'array'). > > My proposition is the following patch to tapply(), adding a new > option 'init.value': > > - > > -tapply <- function (X, INDEX, FUN = NULL, ..., simplify = TRUE) > +tapply <- function (X, INDEX, FUN = NULL, ..., init.value = NA, simplify = TRUE) > { > FUN <- if (!is.null(FUN)) match.fun(FUN) > if (!is.list(INDEX)) INDEX <- list(INDEX) > @@ -44,7 +44,7 @@ > index <- as.logical(lengths(ans)) # equivalently, lengths(ans) > 0L > ans <- lapply(X = ans[index], FUN = FUN, ...) > if (simplify && all(lengths(ans) == 1L)) { > - ansmat <- array(dim = extent, dimnames = namelist) > + ansmat <- array(init.value, dim = extent, dimnames = namelist) > ans <- unlist(ans, recursive = FALSE) > } else { > ansmat <- array(vector("list", prod(extent)), > > - > > With that, I can set the initial value to '0' instead of array's > default of NA : > >> with(DN, tapply(N, list(n,L), FUN=sum, init.value=0)) >A B C D E F > 1 NA 6 0 0 0 0 > 2 0 0 3 6 0 0 > 3 0 0 0 0 6 6 >> > > which now has 0 counts and NA as is desirable to be used inside > xtabs(). > > All fine... and would not be worth a posting to R-devel, > except for this: > > The change will not be 100% back compatible -- by necessity: any new argument for > tapply() will make that argument name not available to be > specified (via '...') for 'FUN'. The new function would be > >> str(tapply) > function (X, INDEX, FUN = NULL, ..., init.value = NA, simplify = TRUE) > > where the '...' are passed FUN(), and with the new signature, > 'init.value' then won't be passed to FUN "anymore" (compared to > R <= 3.3.x). > > For that reason, we could use 'INIT.VALUE' instead (possibly decreasing > the probability the arg name is used in other functions). > > > Opinions? > > Thank you in advance, > Martin > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] : strptime bug [no more!]
Thank you Georg! It definitely resolves the problem! Adding the correct time zone (tz='GMT') returns a valid number as my data are in solar time. rob [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel