>>>>> Ivan Krylov >>>>> on Thu, 28 Sep 2023 00:59:57 +0300 writes:
> В Wed, 27 Sep 2023 13:49:58 -0700 Travers Ching > <trave...@gmail.com> пишет: >> Calling isoreg with an Inf value causes a segmentation >> fault, tested on R 4.3.1 and R 4.2. A reproducible >> example is: `isoreg(c(0,Inf))` > Indeed, the code in src/library/stats/src/isoreg.c > contains the following loop: > do { > slope = R_PosInf; > for (i = known + 1; i <= n; i++) { > tmp = (REAL(yc)[i] - REAL(yc)[known]) / (i - known); > // if `tmp` becomes +Inf or NaN... > // or both `tmp` and `slope` become -Inf... > if (tmp < slope) { // <-- then this is false > slope = tmp; > ip = i; // <-- so this assignment never happens > } > }/* tmp := max{i= kn+1,.., n} slope(p[kn] -> p[i]) and > * ip = argmax{...}... */ > INTEGER(iKnots)[n_ip++] = ip; // <-- heap overflow and crash // ... > } while ((known = ip) < n); // <-- this loop never terminates > I'm not quite sure how to fix this. Checking for tmp <= slope would > have been a one-character patch, but it changes the reference outputs > and doesn't handle isnan(tmp), so it's probably not correct. The > INTEGER(iKnots)[n_ip++] = ip; assignment should only be reached in case > of knots, but since the `ip` index never progresses past the > +/-infinity, the knot condition is triggered repeatedly. > Least squares methods don't handle infinities well anyway, so maybe > it's best to put the check in the R function instead: > --- src/library/stats/R/isoreg.R (revision 85226) > +++ src/library/stats/R/isoreg.R (working copy) > @@ -22,8 +22,8 @@ > { > xy <- xy.coords(x,y) > x <- xy$x > - if(anyNA(x) || anyNA(xy$y)) > - stop("missing values not allowed") > + if(!all(is.finite(x)) || !all(is.finite(xy$y))) > + stop("missing and infinite values not allowed") > isOrd <- ((!is.null(xy$xlab) && xy$xlab == "Index") > || !is.unsorted(x, strictly = TRUE)) > if(!isOrd) { > -- > Best regards, > Ivan The above would not even be sufficient: It's the sum(y) really, because internally yc <- cumsum(c(0,y)) and actually diff(yc) is used where you get to Inf - Inf ==> NaN > isoreg(c(5, 9, 1:2, 7e308, 5:8, 3, 8))) *** caught segfault *** address 0x7e48000, cause 'memory not mapped' /u/maechler/bin/R_arg: Zeile 160: 873336 Speicherzugriffsfehler (Speicherabzug geschrieben) $exe $@ Also, the C code still does not work for long vectors, so I want to change the C code anyway. In any case: Thank you, Travers, Ben, and Ivan, for reporting and addressing the issue! ------ There is an interesting point here though: For dealing with +/- Inf, we used to follow the following idea in R quite keenly (and sometimes extremely): If 'Inf' leads a computation to "fail" (NB: 1/Inf |--> 0 does *not* fail) try to see what the mathematical *or* computational limit x --> Inf would be. If that is easily defined, we use that. So, often as a first step, look at what happens if you replace Inf by 1e100 (and then also what happens if you are finite but *close* to Inf, i.e. the 7e308 above). Now here, at least in some cases, such a limit cases are clearly detectable, e.g., when you let y[2] ---> -Inf here > n <- length(y0 <- c(5, 9, 1:2, 5:8, 3, 8)) > y2s <- c(10:0, -10, -20, -1000, -1e4, -1e10, -1e100, -1e200, -1e300) > iSet <- vapply(y2s, function(y2) isoreg({y <- y0; y[2] <- y2; y})$yf, > numeric(n)) > t(iSet) # *does* change as function of y2 *but* predictably [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 4.5000e+00 4.5000e+00 4.50 4.50 5 6 6 6 6 8 [2,] 4.2500e+00 4.2500e+00 4.25 4.25 5 6 6 6 6 8 [3,] 4.0000e+00 4.0000e+00 4.00 4.00 5 6 6 6 6 8 [4,] 3.7500e+00 3.7500e+00 3.75 3.75 5 6 6 6 6 8 [5,] 3.5000e+00 3.5000e+00 3.50 3.50 5 6 6 6 6 8 [6,] 3.2500e+00 3.2500e+00 3.25 3.25 5 6 6 6 6 8 [7,] 3.0000e+00 3.0000e+00 3.00 3.00 5 6 6 6 6 8 [8,] 2.7500e+00 2.7500e+00 2.75 2.75 5 6 6 6 6 8 [9,] 2.5000e+00 2.5000e+00 2.50 2.50 5 6 6 6 6 8 [10,] 2.2500e+00 2.2500e+00 2.25 2.25 5 6 6 6 6 8 [11,] 2.0000e+00 2.0000e+00 2.00 2.00 5 6 6 6 6 8 [12,] -2.5000e+00 -2.5000e+00 1.00 2.00 5 6 6 6 6 8 [13,] -7.5000e+00 -7.5000e+00 1.00 2.00 5 6 6 6 6 8 [14,] -4.9750e+02 -4.9750e+02 1.00 2.00 5 6 6 6 6 8 [15,] -4.9975e+03 -4.9975e+03 1.00 2.00 5 6 6 6 6 8 [16,] -5.0000e+09 -5.0000e+09 1.00 2.00 5 6 6 6 6 8 [17,] -5.0000e+99 -5.0000e+99 0.00 0.00 0 0 0 0 0 0 [18,] -5.0000e+199 -5.0000e+199 0.00 0.00 0 0 0 0 0 0 [19,] -5.0000e+299 -5.0000e+299 0.00 0.00 0 0 0 0 0 0 > so one could say that ideally, isoreg(c(5, -Inf, 1:2, 5:8, 3, 8)) should produce fitted values c(-Inf, -Inf, 0, 0, ..., 0) and if someone has a +/- elegant implementation we could again allow +/-Inf entries in isoreg(), at least when the Inf's have all the same sign. Martin ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel