Hi Charlotte, This may be a bit too late, but I just remembered your question. I have written some functions to extract various features of a time-series, uusing functional data analytic methods. These would be part of an R package that will be soon released. This package can analyze a large collection of time-series.
Here is how you can use that to solve your problem: source("features.txt") year <- c(1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009) piproute<-c(0.733333333,0.945945946,1.886363636,1.607843137,4.245614035,3.175675676,2.169014085,2,2.136363636,2.65625,2.080645161,2.114754098,2.090909091,3.012195122,2.935897436,2.592105263,1.075757576,1.210526316,1,1.1875,1.903614458,1.385542169,1.788990826,1.163793103,1.558558559,1.595238095,1.758333333,1.858267717,2.169117647,1.403225806,2.859375,3.236220472,2.054263566,3.854166667,1.812080537,2.708029197,2.75862069,2.625954198,4.540740741,3.686567164,2.8,2.968253968,3.517730496) ans <- features.mat(year, piproute, smoother="glk", plot.it=TRUE) ans ans$cptmat # critical points of the function (minima/maxima) # The answers depend on how you smooth the data. Here is a result showing smoothing using a pemalized spline smoother. ans <- features.mat(year, piproute, smoother="spm", plot.it=TRUE) ans ans$cptmat # critical points of the function (minima/maxima) Hope this is helpful, Ravi. ____________________________________________________________________ Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu ----- Original Message ----- From: Charlotte Chang <c.h.w.ch...@gmail.com> Date: Tuesday, April 27, 2010 2:25 am Subject: Re: [R] Identifying breakpoints/inflection points? To: Clint Bowman <cl...@ecy.wa.gov> Cc: r-help@r-project.org > Hi Clint, > > Thank you for your help with the code. The span recommendation really > improved the fit of my LOESS curve. I appreciate your thoughtful > assistance! > > My remaining question is how could I go about identifying the > inflection points for the LOESS curve? I was thinking about trying to > find the 2nd derivative and then using the uniroot function. > > My code is here (but it's buggy and doesn't work): > > birds.lo<-loess.smooth(x,y,span=0.45) > d2 <- function(x) { > predict(birds.lo, x, deriv=2)$y > } > x<-year > y<-piproute > > > d2(x) > Error in predict(birds.lo, x, deriv = 2)$y : > $ operator is invalid for atomic vectors > > #Desired next step: > uniroot(d2,c(7,10)) > > Any ideas about this would be profoundly appreciated! I'm hitting a > dead end. > > Yours, > > Charlotte > > On Mon, Apr 26, 2010 at 3:32 PM, Clint Bowman <cl...@ecy.wa.gov> wrote: > > Charlotte, > > > > Try: > > > > birds.lo <- loess(piproute~year,span=.25) > > # play with span to see your desired pattern > > birds.pr<-predict(birds.lo, data.frame(year = seq(1967, 2009, 1)), > se = > > FALSE) > > # > > plot($year,birds.pr$fit,ylim=c(0,5)) > > par(new=T) > > plot(year,birds.pr$fit,pch="+",col=2,ylim=c(0,5)) > > > > > > -- > > Clint Bowman INTERNET: cl...@ecy.wa.gov > > Air Quality Modeler INTERNET: cl...@math.utah.edu > > Department of Ecology VOICE: (360) 407-6815 > > PO Box 47600 FAX: (360) 407-7534 > > Olympia, WA 98504-7600 > > > > On Mon, 26 Apr 2010, Charlotte Chang wrote: > > > >> Hello! > >> I have a dataset with the following two vectors: > >> > >> > >> > year<-c(1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009) > >> > >> > >> > piproute<-c(0.733333333,0.945945946,1.886363636,1.607843137,4.245614035,3.175675676,2.169014085,2,2.136363636,2.65625,2.080645161,2.114754098,2.090909091,3.012195122,2.935897436,2.592105263,1.075757576,1.210526316,1,1.1875,1.903614458,1.385542169,1.788990826,1.163793103,1.558558559,1.595238095,1.758333333,1.858267717,2.169117647,1.403225806,2.859375,3.236220472,2.054263566,3.854166667,1.812080537,2.708029197,2.75862069,2.625954198,4.540740741,3.686567164,2.8,2.968253968,3.517730496) > >> > >> Pipits is the response variable (it is the number of birds counted > at > >> each survey site in each year) and year is the independent variable. > >> If you plot it in R (plot(year,piproute,pch=19)), you'll see that > the > >> relationship looks like a quintic polynomial. > >> > >> Initially I was trying to fit this curve using an iterative equation, > >> but it's not working. I suspect that the curve-fitting equation itself > >> is inappropriate (it's a modified version of the logistic growth > >> equation). Now what I'd like to do is identify the 3 break/inflection > >> points in the population trend. That way, I can make an argument that > >> the break points corresponded to shifts in government policy with > >> respect to land use management. I've been looking at the segmented > >> package, and initially I looked at change.pt test in the circ.stats > >> package (which is inappropriate b/c my data is not amenable to > >> circular statistical analysis). Any ideas on what I could do would > be > >> appreciated! > >> > >> Thank you! > >> > >> -Charlotte > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> > >> PLEASE do read the posting guide > >> > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > ______________________________________________ > R-help@r-project.org mailing list > > PLEASE do read the posting guide > and provide commented, minimal, self-contained, reproducible code.
### Functions library(SemiPar) library(lokern) ############################################################## myplot.spm <- function (spm.obj, x.out=NULL, ...) { object <- spm.obj if (is.null(x.out)) x.out <- c(object$info$pen$x) plot.params <- plotControl(...) drv <- plot.params$drv se <- plot.params$se shade <- plot.params$shade default.ylim <- FALSE if (is.null(plot.params$ylim)) default.ylim <- TRUE pen.present <- !is.null(object$info$pen) random.present <- !is.null(object$info$random) const.only <- FALSE block.inds <- object$aux$block.inds coefs <- c(object$fit$coef$fixed, object$fit$coef$random) if (se == TRUE) cov.mat <- object$aux$cov.mat if (random.present) { random.inds <- block.inds[[length(block.inds)]] coefs <- coefs[-random.inds] if (se == TRUE) cov.mat <- cov.mat[-random.inds, -random.inds] } basis <- object$info$pen$basis if (drv == 0) ave.vals <- 1 if (drv > 0) ave.vals <- 0 num.pen <- ncol(as.matrix(object$info$pen$x)) for (ipen in 1:num.pen) { deg.val <- object$info$pen$degree[ipen] if (basis == "trunc.poly") ncol.X.val <- deg.val if (basis == "tps") ncol.X.val <- (deg.val - 1)/2 ave.val <- mean(as.matrix(object$info$pen$x)[, ipen]) ave.vals <- c(ave.vals, ave.val) if (ncol.X.val > 1) for (ipow in 2:ncol.X.val) ave.vals <- c(ave.vals, ave.val^ipow) } if (pen.present) { num.pen <- ncol(as.matrix(object$info$pen$x)) for (ipen in 1:num.pen) { deg.val <- object$info$pen$degree[ipen] knots <- object$info$pen$knots[[ipen]] ave.val <- mean(as.matrix(object$info$pen$x)[, ipen]) if (basis == "trunc.poly") { ave.val.knots <- ave.val - knots ave.val.knots <- (ave.val.knots * (ave.val.knots > 0))^deg.val } if (basis == "tps") { ave.val.knots <- abs(ave.val - knots)^deg.val sqrt.Omega <- object$info$trans.mat[[ipen]] ave.val.knots <- t(solve(sqrt.Omega, ave.val.knots)) } ave.vals <- c(ave.vals, ave.val.knots) } } curve.type <- NULL if (pen.present) curve.type <- c(curve.type, rep("pen", ncol(as.matrix(object$info$pen$x)))) num.curves <- length(curve.type) plot.params <- set.plot.dflts(object, plot.params, num.curves) if ((!is.null(plot.params$xlim)) & (!is.list(plot.params$xlim))) { if (length(plot.params$xlim) != 2) stop("illegal xlim") plot.params$xlim <- list(lower = rep(plot.params$xlim[1], num.curves), upper = rep(plot.params$xlim[2], num.curves)) } if ((!is.null(plot.params$ylim)) & (!default.ylim) & (!is.list(plot.params$ylim))) { if (length(plot.params$ylim) != 2) stop("illegal ylim") plot.params$ylim <- list(lower = rep(plot.params$ylim[1], num.curves), upper = rep(plot.params$ylim[2], num.curves)) } if (pen.present) { x.vals <- NULL x.vals <- cbind(x.vals, object$info$pen$x) fc.stt.pos <- 2 pen.num <- 1 for (j in 1:num.curves) { # grid.size <- plot.params$grid.size[j] grid.size <- length(x.out) if (drv == 0) C.grid <- matrix(rep(ave.vals, grid.size), grid.size, length(ave.vals), byrow = TRUE) if (drv > 0) C.grid <- matrix(0, grid.size, length(ave.vals)) # x.grid <- seq(plot.params$xlim$lower[j], plot.params$xlim$upper[j], # length = plot.params$grid.size[j]) x.grid <- x.out if (curve.type[j] == "pen") deg.val <- object$info$pen$degree[pen.num] X.g.inst <- NULL if (curve.type[j] == "pen") { if (basis == "trunc.poly") ncol.X.val <- deg.val if (basis == "tps") ncol.X.val <- (deg.val - 1)/2 if (drv == 0) { for (pow in 1:ncol.X.val) { new.col.data <- x.vals[, j]^pow new.col <- x.grid^pow X.g.inst <- cbind(X.g.inst, new.col) } } if (drv > 0) { for (pow in 1:ncol.X.val) { new.col.data <- x.vals[, j]^pow pow.drv <- pow - drv if (pow.drv >= 0) new.col <- prod(pow:(pow.drv + 1)) * (x.grid^pow.drv) else new.col <- rep(0, length(x.grid)) X.g.inst <- cbind(X.g.inst, new.col) } } } C.g.inst <- X.g.inst if (curve.type[j] == "pen") { if (basis == "trunc.poly") ncol.X.val <- deg.val if (basis == "tps") ncol.X.val <- (deg.val - 1)/2 inst.col.inds <- fc.stt.pos:(fc.stt.pos + ncol.X.val - 1) fc.stt.pos <- fc.stt.pos + ncol.X.val knots <- object$info$pen$knots[[pen.num]] num.knots <- length(knots) Z.g.inst <- outer(x.grid, knots, "-") if (basis == "trunc.poly") { if (drv == 0) Z.g.inst <- (Z.g.inst * (Z.g.inst > 0))^deg.val else { if (deg.val >= drv) { mfac <- prod((deg.val - drv + 1):deg.val) Z.g.inst <- mfac * (Z.g.inst * (Z.g.inst > 0))^(deg.val - drv) } else Z.g.inst <- matrix(0, length(x.grid), length(knots)) } } if (basis == "tps") { if (drv == 0) { Z.g.inst <- abs(Z.g.inst)^deg.val sqrt.Omega <- object$info$trans.mat[[pen.num]] Z.g.inst <- t(solve(sqrt.Omega, t(Z.g.inst))) } else { if (deg.val >= drv) { mfac <- prod((deg.val - drv + 1):deg.val) Z.g.inst <- mfac * (Z.g.inst^(deg.val - drv - 1) * abs(Z.g.inst)) sqrt.Omega <- object$info$trans.mat[[pen.num]] Z.g.inst <- t(solve(sqrt.Omega, t(Z.g.inst))) } else Z.g.inst <- matrix(0, length(x.grid), length(knots)) } } C.g.inst <- cbind(C.g.inst, Z.g.inst) inst.col.inds <- c(inst.col.inds, block.inds[[j + 1]][-(1:ncol.X.val)]) pen.num <- pen.num + 1 } C.grid[, inst.col.inds] <- C.g.inst y.grid <- as.vector(C.grid %*% coefs) if (default.ylim) { plot.params$ylim$lower[j] <- min(y.grid) plot.params$ylim$upper[j] <- max(y.grid) } if (se == TRUE) { se.grid <- sqrt(diag(C.grid %*% cov.mat %*% t(C.grid))) lower <- y.grid - 2 * se.grid upper <- y.grid + 2 * se.grid if (default.ylim) { plot.params$ylim$lower[j] <- min(lower) plot.params$ylim$upper[j] <- max(upper) } ylim.val <- range(c(lower, upper)) } if (plot.params$plot.it[j] == TRUE) { plot(x.grid, y.grid, type = "n", bty = plot.params$bty[j], main = plot.params$main[j], xlab = plot.params$xlab[j], ylab = plot.params$ylab[j], xlim = c(plot.params$xlim$lower[j], plot.params$xlim$upper[j]), ylim = c(plot.params$ylim$lower[j], plot.params$ylim$upper[j])) if (se == TRUE) { if (shade == FALSE) { lines(x.grid, lower, lty = plot.params$se.lty[j], lwd = plot.params$se.lwd[j], col = plot.params$se.col[j]) lines(x.grid, upper, lty = plot.params$se.lty[j], lwd = plot.params$se.lwd[j], col = plot.params$se.col[j]) } if (shade == TRUE) { mat <- cbind(x.grid, lower, upper) mat <- mat[sort.list(mat[, 1]), ] x.grid <- mat[, 1] lower <- mat[, 2] upper <- mat[, 3] polygon(c(x.grid, rev(x.grid)), c(lower, rev(upper)), col = plot.params$shade.col[j], border = FALSE) } } if ((drv >= 1) & (plot.params$zero.line == TRUE)) abline(h = 0, err = -1) lines(x.grid, y.grid, lty = plot.params$lty[j], lwd = plot.params$lwd[j], col = plot.params$col[j]) if (plot.params$jitter.rug == TRUE) rug.x <- jitter(x.vals[, j]) if (plot.params$jitter.rug == FALSE) rug.x <- x.vals[, j] rug(rug.x, quiet = 1, col = plot.params$rug.col[j]) } } } if (se) return(list(x=x.grid,y=y.grid,se=se.grid)) if(!se) return(list(x=x.grid,y=y.grid)) invisible() } ########################################################### myspm <- function (form, random = NULL, group = NULL, family = "gaussian", spar.method = "REML", omit.missing = NULL) { require("nlme") spm.info <- spmFormRead(form, omit.missing) random.info <- NULL if (!is.null(random)) { random.info <- random.read(random, group) spm.info <- c(spm.info, list(random = random.info)) group <- as.numeric(factor(group)) } if (!is.null(unlist(spm.info$pen$spar))) { if (any(unlist(spm.info$pen$spar) == 0)) stop("zero smoothing parameters not supported in current version.") } design.info <- spmDesign(spm.info) X <- design.info$X Z <- design.info$Z Z.spline <- design.info$Z.spline y <- spm.info$y trans.mat <- design.info$trans.mat spm.info <- c(spm.info, list(trans.mat = trans.mat)) block.inds <- design.info$block.inds re.block.inds <- design.info$re.block.inds col.ones <- rep(1, nrow(X)) auto.spar.select <- FALSE if (!is.null(spm.info$pen)) { auto.spar <- 0 for (j in 1:length(spm.info$pen$name)) auto.spar <- (auto.spar + (is.null(spm.info$pen$spar[[j]]) & (spm.info$pen$adf[[j]] == "miss"))) if (auto.spar > 0) auto.spar.select <- TRUE } if (auto.spar.select == FALSE) { if (!is.null(spm.info$lin)) num.lin <- ncol(as.matrix(spm.info$lin$x)) if (is.null(spm.info$lin)) num.lin <- 0 compon.num <- 1 if (!is.null(spm.info$pen)) { basis.type <- spm.info$pen$basis for (j in 1:ncol(as.matrix(spm.info$pen$x))) { if (!is.null(spm.info$pen$adf[[j]]) & (spm.info$pen$adf[[j]] != "miss")) { deg.val <- spm.info$pen$degree[j] stt.ind <- block.inds[[compon.num + num.lin + 1]][1] ncol.Xj <- length(block.inds[[compon.num + num.lin + 1]]) ncol.Xj <- ncol.Xj - length(re.block.inds[[compon.num]]) Xj <- X[, stt.ind:(stt.ind + ncol.Xj - 1)] Xj <- cbind(col.ones, Xj) Zj <- Z[, re.block.inds[[compon.num]]] adf.val <- spm.info$pen$adf[[j]] spar.val <- df.to.spar(adf.val + 1, Xj, Zj) if (basis.type == "trunc.poly") spm.info$pen$spar[[j]] <- exp(log(spar.val)/(2 * deg.val)) else spm.info$pen$spar[[j]] <- exp(log(spar.val)/deg.val) compon.num <- compon.num + 1 } } } diag.G <- NULL if (!is.null(spm.info$pen)) for (j in 1:ncol(as.matrix(spm.info$pen$x))) { deg.val <- spm.info$pen$degree[j] spar.val <- spm.info$pen$spar[[j]] num <- length(spm.info$pen$knots[[j]]) if (basis.type == "trunc.poly") diag.G <- c(diag.G, rep(1/(exp((2 * deg.val) * log(spar.val))), num)) else diag.G <- c(diag.G, rep(1/(exp((deg.val) * log(spar.val))), num)) } } dummy.group.vec <- col.ones fdummy.group.vec <- factor(dummy.group.vec) assign("dummy.group.vec.Handan", dummy.group.vec, pos = 1) assign("fdummy.group.vec.Handan", fdummy.group.vec, pos = 1) assign("group.Handan", group, pos = 1) assign("X.Declan", X, pos = 1) if (!is.null(Z)) { if (is.null(random)) { assign("Z.Jaida", Z, pos = 1) data.fr <- groupedData(y ~ -1 + X.Declan | dummy.group.vec.Handan, data = data.frame(y, X.Declan, Z.Jaida, dummy.group.vec.Handan)) Z.block <- list() for (i in 1:length(re.block.inds)) Z.block[[i]] <- as.formula(paste("~Z.Jaida[,c(", paste(re.block.inds[[i]], collapse = ","), ")]-1")) if (length(re.block.inds) == 1) { lme.fit <- lme(y ~ -1 + X.Declan, random = pdIdent(~-1 + Z.Jaida), data = data.fr, method = spar.method) } if (length(re.block.inds) > 1) { lme.fit <- lme(y ~ -1 + X.Declan, random = list(dummy.group.vec.Handan = pdBlocked(Z.block, pdClass = rep("pdIdent", length(Z.block)))), data = data.fr, method = spar.method) } } if (!is.null(random)) { assign("Z.Jaida", Z, pos = 1) assign("Z.spline.Jaida", Z.spline, pos = 1) data.fr <- groupedData(y ~ -1 + X.Declan | dummy.group.vec.Handan, data = data.frame(y, X.Declan, Z.Jaida, dummy.group.vec.Handan)) Z.block <- list() for (i in 1:length(re.block.inds)) Z.block[[i]] <- as.formula(paste("~Z.Jaida[,c(", paste(re.block.inds[[i]], collapse = ","), ")]-1")) if (length(re.block.inds) == 1) { stop("implement later; use pigs to test") } if (length(re.block.inds) > 1) { Z.block <- list(list(fdummy.group.vec.Handan = pdIdent(~Z.spline.Jaida - 1)), list(group.Handan = pdIdent(~1))) Z.block <- unlist(Z.block, recursive = FALSE) data.fr <- groupedData(y ~ -1 + X.Declan | fdummy.group.vec.Handan, data = data.frame(y, X.Declan, Z.spline.Jaida, group)) lme.fit <- lme(y ~ -1 + X.Declan, data = data.fr, random = Z.block, method = spar.method) } } lme.fit <- c(lme.fit, list(sigma = summary(lme.fit)$sigma)) } if (is.null(Z)) { data.fr <- cbind(y, X.Declan, dummy.group.vec.Handan) G <- NULL lm.fit <- lm(y ~ -1 + X.Declan) lme.fit <- list(coef = list(fixed = lm.fit$coef), sigma = summary(lm.fit)$sigma) } if (!is.null(Z)) { lme.fit$coef$random <- unlist(lme.fit$coef$random) sig.u.hat <- lme.fit$sigma * exp(unlist(lme.fit$modelStruct)) diag.sqrt.G <- NULL for (ib in 1:length(re.block.inds)) diag.sqrt.G <- c(diag.sqrt.G, rep(sig.u.hat[ib], length(re.block.inds[[ib]]))) G <- diag(diag.sqrt.G^2) } resid.var <- lme.fit$sigma^2 if (auto.spar.select == FALSE) { G <- resid.var * diag(diag.G) qr.out <- lmeFitQr(y, X, Z, G, resid.var = resid.var) coef.ests <- qr.out$coefficients[1:(ncol(X) + ncol(Z))] lme.fit <- list() lme.fit$coef$fixed <- coef.ests[1:ncol(X)] lme.fit$coef$random <- coef.ests[(1 + ncol(X)):length(coef.ests)] } if (auto.spar.select == TRUE) { if ((!is.null(spm.info$pen)) | (!is.null(spm.info$krige))) { sigu2.hat <- rep(0, length(re.block.inds)) if (!is.null(Z)) for (ib in 1:length(re.block.inds)) sigu2.hat[ib] <- diag(G)[re.block.inds[[ib]][1]] if (is.null(spm.info$krige)) { basis.type <- spm.info$pen$basis for (ip in 1:ncol(as.matrix(spm.info$pen$x))) { deg.val <- spm.info$pen$degree[ip] if (basis.type == "trunc.poly") spm.info$pen$spar[[ip]] <- exp(log(resid.var/sigu2.hat[ip])/(2 * deg.val)) else spm.info$pen$spar[[ip]] <- exp(log(resid.var/sigu2.hat[ip])/deg.val) } } if ((!is.null(spm.info$pen)) & (!is.null(spm.info$krige))) { basis.type <- spm.info$pen$basis for (ip in 1:ncol(as.matrix(spm.info$pen$x))) { deg.val <- spm.info$pen$degree[ip] var.rats <- (resid.var/sigu2.hat[ip]) if (basis.type == "trunc.poly") spm.info$pen$spar[[ip]] <- var.rats^(1/(2 * deg.val)) else spm.info$pen$spar[[ip]] <- var.rats^(1/(deg.val)) } num.pen <- ncol(as.matrix(spm.info$pen$x)) var.rat <- (resid.var/sigu2.hat[num.pen + 1]) deg.val <- spm.info$krige$degree spm.info$krige$spar <- exp(log(var.rat)/deg.val) } } } aux.info <- lmeAux(X, Z, G, resid.var, block.inds) if (!is.null(Z)) { coef.ests <- c(lme.fit$coef$fixed, lme.fit$coef$random) C.mat <- cbind(X, Z) } if (is.null(Z)) { coef.ests <- lme.fit$coef$fixed C.mat <- X } mins <- NULL maxs <- NULL for (j in 1:length(block.inds)) { fitted.j <- as.matrix(C.mat[, block.inds[[j]]]) %*% as.matrix(coef.ests[block.inds[[j]]]) mins[j] <- min(fitted.j) maxs[j] <- max(fitted.j) } aux.info <- c(aux.info, list(mins = mins, maxs = maxs)) fitted <- as.vector(C.mat %*% coef.ests) resids <- y - fitted lme.fit$fitted <- fitted lme.fit$residuals <- resids names(lme.fit$coef$fixed) <- NULL names(lme.fit$coef$random) <- NULL spm.fit.obj <- list(fit = lme.fit, info = spm.info, aux = aux.info) class(spm.fit.obj) <- "spm" return(spm.fit.obj) } ########################################################### features.mat <- function(x, ymat, smoother=c("glkerns", "spm", "smooth.spline"), se=FALSE, npts=max(100, 2*length(x)), plot.it=FALSE, c.out=3, fits.return=FALSE, decim.out=2, ...) # Feature extraction # Author: Ravi Varadhan, June 23, 2009 # # Function arguments: # x = vector of independent variable, e.g. time # ymat = matrix of time-series to be smoothed # npts = number of points to use in computing features # smoother = technique to use for automatic smoothing; only two options are allowed # plot.it = a logical variable indicating whether the smoothed results should be plotted # se = a logical variable indicating whether standard error of fitted values should be calculated # c.out = a constant denoting number of standard deviations away from smooth fit for determining whether a point is an #outlier # # Extracted features: # frms = Root-mean-squared function value over the range of data # fmean = Mean function value over the range of data # fsd = square-root of the average variance of the function around its mean # noise = standard deviation of residuals after smoothing # snr = signal-to-noise ratio = fsd/noise # fwiggle = root-mean-squared second-derivative of function; a measure of "wiggliness" in the function # deriv.range = minimum and maximum of first derivative of smoothed function scaled by fsd # critical.pts = X-locations where the first derivative is zero # curvature = second derivative of smoothed function at critical points ( > 0 for minimum, and < 0 for maximum) # scaled by fsd # outliers = points that are potential outliers { trapezoid <- function(x,y) sum(diff(x)*(y[-1] + y[-length(y)]))/2 smoother <- match.arg(smoother, c("glkerns", "spm", "smooth.spline")) ipkgs <- installed.packages() if (smoother == "spm") { if ("SemiPar" %in% ipkgs[,1]) require(SemiPar, quietly=TRUE) else stop("Install package `SemiPar'", call.=FALSE) deriv1 <- function(z) myplot.spm(fit, drv=1, plot.it=FALSE, x.out=z, se=se)$y deriv2 <- function(z) myplot.spm(fit, drv=2, plot.it=FALSE, x.out=z, se=se)$y } if (smoother == "glkerns") { if ("lokern" %in% ipkgs[,1]) require(lokern, quietly=TRUE) else stop("Install package `lokern'", call.=FALSE) deriv1 <- function(z) glkerns(x, y, deriv=1, x.out=z, hetero=TRUE)$est deriv2 <- function(z) glkerns(x, y, deriv=2, x.out=z, hetero=TRUE)$est } if (smoother == "smooth.spline") { deriv1 <- function(z) predict(fit, deriv=1, x=z)$y deriv2 <- function(z) predict(fit, deriv=2, x=z)$y } x.out <- seq(min(x), max(x), length=npts) x <<- x if (is.null(dim(ymat))) ymat <- matrix(ymat, nrow=1, ncol=length(ymat)) nr <- nrow(ymat) nc <- ncol(ymat) fmat <- matrix(NA, nr, 10) cpmat <- matrix(NA, nr, max(ceiling(nc/10), 50)) cvmat <- matrix(NA, nr, max(ceiling(nc/10), 50)) olmat <- matrix(NA, nr, max(ceiling(nc/10), 50)) if (plot.it & nr > 1) par(ask=TRUE) if (fits.return) fits <- fits1 <- fits2 <- matrix(NA, nr, nc) for (k in 1:nr) { y <- as.numeric(ymat[k, ]) if (smoother == "spm") { y <<- y fit <- try(spm(y ~ f(x, basis="trunc.poly", degree=3), omit.missing=TRUE), silent=TRUE) if (class(fit) == "try-error") fit <- try(spm(y ~ f(x, basis="trunc.poly", degree=2), omit.missing=TRUE), silent=TRUE) if (class(fit) == "try-error") stop("Fitting error in spm") fit0 <- myplot.spm(fit, drv=0, plot.it=FALSE, x.out=x.out, se=se)$y fit1 <- myplot.spm(fit, drv=1, plot.it=FALSE, x.out=x.out, se=se)$y fit2 <- myplot.spm(fit, drv=2, plot.it=FALSE, x.out=x.out, se=se)$y resid <- y - fit$fit$fitted resid.scaled <- abs(scale(resid)) if (fits.return) { fits[k, ] <- fit$fit$fitted fits1[k, ] <- myplot.spm(fit, drv=1, plot.it=FALSE, x.out=x, se=se)$y fits2[k, ] <- myplot.spm(fit, drv=2, plot.it=FALSE, x.out=x, se=se)$y } } if (smoother == "glkerns") { fit <- glkerns(x, y, deriv=0, x.out=x, hetero=FALSE) fit0 <- glkerns(x, y, deriv=0, x.out=x.out, hetero=TRUE)$est fit1 <- glkerns(x, y, deriv=1, x.out=x.out, hetero=TRUE)$est fit2 <- glkerns(x, y, deriv=2, x.out=x.out, hetero=TRUE)$est resid <- y - fit$est resid.scaled <- abs(scale(resid)) if (fits.return) { fits[k, ] <- fit$est fits1[k, ] <- glkerns(x, y, deriv=1, x.out=x, hetero=TRUE)$est fits2[k, ] <- glkerns(x, y, deriv=2, x.out=x, hetero=TRUE)$est } } if (smoother == "smooth.spline") { fit <- smooth.spline(x, y, cv=FALSE) fit0 <- predict(fit, deriv=0, x=x.out)$y fit1 <- predict(fit, deriv=1, x=x.out)$y fit2 <- predict(fit, deriv=2, x=x.out)$y resid <- y - predict(fit)$y resid.scaled <- abs(scale(resid)) if (fits.return) { fits[k, ] <- predict(fit)$y fits1[k, ] <- predict(fit, deriv=1)$y fits2[k, ] <- predict(fit, deriv=2)$y } } fmean <- trapezoid(x.out, fit0) / diff(range(x.out)) fstar <- sqrt(trapezoid(x.out, fit0^2) / diff(range(x.out))) fsd <- sqrt(fstar^2 - fmean^2) noise <- attr(resid.scaled, "scaled:scale") snr <- fsd/noise fwiggle <- sqrt(trapezoid(x.out, fit2^2) / diff(range(x.out))) d0 <- (fit1[-1] * fit1[-npts]) < 0 ncrt <- sum(d0) if (ncrt == 0) crtpts <- curv <- NULL else { crtpts <- curv <- rep(NA, ncrt) ind0 <- (1:npts)[d0] for (i in 1:ncrt) { temp <- try(uniroot(interval=c(x.out[ind0[i]], x.out[1 + ind0[i]]), f=deriv1), silent=TRUE) if (class(temp) != "try-error") { crtpts[i] <- temp$root curv[i] <- deriv2(temp$root) } } } rm(fit) outl <- resid.scaled > c.out if (sum(outl) == 0) outliers <- NULL else outliers <- x[outl] if (plot.it) { plot(x, y, type="p", ...) lines(x.out, fit0, col=2) } if (!is.null(crtpts) ) { cpmat[k, 1:length(crtpts)] <- crtpts cvmat[k, 1:length(crtpts)] <- curv } if (!is.null(outliers) ) olmat[k, 1:length(outliers)] <- outliers fmat[k, ] <- as.numeric(c(fmean, as.numeric(range(fit0)), fsd, noise, snr, as.numeric(range(fit1)), fwiggle, length(crtpts))) } fmat <- as.data.frame(fmat) names(fmat) <- c("fmean", "fmin", "fmax", "fsd", "noise", "snr", "d1min", "d1max", "fwiggle", "ncpts") ncpmat <- max(apply(cpmat, 1, function(x) sum(!is.na(x)))) ncvmat <- max(apply(cvmat, 1, function(x) sum(!is.na(x)))) nolmat <- max(apply(olmat, 1, function(x) sum(!is.na(x)))) if (ncpmat==0) cpmat <- NULL else cpmat <- round(cpmat[, 1:ncpmat, drop=FALSE], decim.out) if (ncvmat==0) cvmat <- NULL else cvmat <- round(cvmat[, 1:ncvmat, drop=FALSE], decim.out) if (nolmat==0) olmat <- NULL else olmat <- olmat[, 1:nolmat, drop=FALSE] ret.obj <- list(fmat=round(fmat, decim.out), cptmat=cpmat, curvmat=cvmat, olmat=olmat) if (fits.return) attr(ret.obj, "fits") <- list(fits0=fits, fits1=fits1, fits2=fits2) return(ret.obj) } ##################################################
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.