This is a simple fix. I just extract the part of cut.R that calculated breaks by a number, then convert the breaks format, provide the breaks manually to cut again. I used lubridate as_datetime because it's simpler. Of course it can be replaced with as.POSIXct.
The breaks are always formatted in one way, but user can format it anyway he/she want by just use divide. I felt the return result of divide is often very useful, so it's worth to be extracted as an individual function. ------------------------------------ # focused on one case: cut x into intervals given a number of interval count # divide x into interval_count intervals. Taken from https://github.com/wch/r-source/blob/trunk/src/library/base/R/cut.R divide <- function (x, interval_count) { if (is.na(interval_count) || interval_count < 2L) stop("invalid number of intervals") nb <- as.integer(interval_count + 1) # one more than #{intervals} dx <- diff(rx <- range(x, na.rm = TRUE)) if(dx == 0) { dx <- abs(rx[1L]) breaks <- seq.int(rx[1L] - dx/1000, rx[2L] + dx/1000, length.out = nb) } else { breaks <- seq.int(rx[1L], rx[2L], length.out = nb) breaks[c(1L, nb)] <- c(rx[1L] - dx/1000, rx[2L] + dx/1000) } return(breaks) } cut_date_time <- function(x, interval_count) { brks <- divide(as.numeric(x), interval_count) return(cut(x, as_datetime(brks))) } divide_date_time <- function(x, interval_count) { return(as_datetime(divide(as.numeric(x), interval_count))) } -------------------- Best, Xianghui Dong On Thu, Apr 6, 2017 at 3:37 PM, Xianghui Dong <xhd...@umd.edu> wrote: > The exact error was reported before in *Bug 14288* > <https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14288> *- **bug in > cut.POSIXt(..., breaks = <numeric>) and cut.Date. *But the fix in that > bug report only covered the simplest case. > > This is the error I met > ----------------------------- > > x <- structure(c(1057067700, 1057215720, 1060597800, 1061470800, > 1061911680, > 1062048000, 1062137880, 1064479440, 1064926380, 1064995140, 1066822800, > 1068033720, 1070869740, 1070939820, 1071030540, 1074244560, > 1077545880, > 1078449720, 1084955460, 1129020000, 1130324280, 1130404800, > 1131519420, > 1132640100, 1133772000, 1137567960, 1138952640, 1141810380, > 1147444200, > 1161643440, 1164086160), class = c("POSIXct", "POSIXt"), tzone = > "UTC") > > > cut(x, 20) > Error in `levels<-.factor`(`*tmp*`, value = as.character(if > (is.numeric(breaks)) x[!duplicated(res)] else breaks[-length(breaks)])) : > number of levels differs > ----------------------------- > > The cause of the bug is that the input have spread out date-time values, > only 10 breaks in the total 20 breaks have value. > ------------------- > > cut_n <- cut(as.numeric(x), 20) > > > unique(cut_n) > [1] (1.057e+09,1.062e+09] (1.062e+09,1.068e+09] (1.068e+09,1.073e+09] > (1.073e+09,1.078e+09] > [5] (1.084e+09,1.089e+09] (1.127e+09,1.132e+09] (1.132e+09,1.137e+09] > (1.137e+09,1.143e+09] > [9] (1.143e+09,1.148e+09] (1.159e+09,1.164e+09] > 20 Levels: (1.057e+09,1.062e+09] (1.062e+09,1.068e+09] > (1.068e+09,1.073e+09] ... (1.159e+09,1.164e+09] > ------------------------ > To get proper 20 labels of each break, the break need to be formatted from > number to date-time string. Current code didn't really convert the breaks > However the code just used the original date-time values from input data. > This will not work if the interval value doesn't happen to equal to > original input. For a even simpler example from the original bug report: > ----------------------- > x <- seq(as.POSIXct("2000-01-01"), by = "days", length = 20) > > cut(x, breaks = 30) > Error in `levels<-.factor`(`*tmp*`, value = as.character(if > (is.numeric(breaks)) x[!duplicated(res)] else breaks[-length(breaks)])) : > number of levels differs > --------------------- > > I think to fix the bug will need either > - get the actual numeric value of the breaks from "cut", modify "cut" if > needed. Then convert the numeric value back to date-time > - or use regex to extract the break value then convert to date-time > > Best, > Xianghui Dong > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel