Re: [R] Identifying breakpoints/inflection points?

Ravi Varadhan Sat, 19 Jun 2010 11:51:22 -0700

Hi Charlotte,

This may be a bit too late, but I just remembered your question.  I have 
written some functions to extract various features of a time-series, uusing 
functional data analytic methods.  These would be part of an R package that 
will be soon released.  This package can analyze a large collection of 
time-series.


Here is how you can use that to solve your problem:

source("features.txt")

year <- 
c(1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009)

piproute<-c(0.733333333,0.945945946,1.886363636,1.607843137,4.245614035,3.175675676,2.169014085,2,2.136363636,2.65625,2.080645161,2.114754098,2.090909091,3.012195122,2.935897436,2.592105263,1.075757576,1.210526316,1,1.1875,1.903614458,1.385542169,1.788990826,1.163793103,1.558558559,1.595238095,1.758333333,1.858267717,2.169117647,1.403225806,2.859375,3.236220472,2.054263566,3.854166667,1.812080537,2.708029197,2.75862069,2.625954198,4.540740741,3.686567164,2.8,2.968253968,3.517730496)


ans <- features.mat(year, piproute, smoother="glk", plot.it=TRUE)

ans 

ans$cptmat # critical points of the function (minima/maxima)


# The answers depend on how you smooth the data.  Here is a result showing 
smoothing using a pemalized spline smoother.

ans <- features.mat(year, piproute, smoother="spm", plot.it=TRUE)

ans 

ans$cptmat # critical points of the function (minima/maxima)


Hope this is helpful,
Ravi.
____________________________________________________________________

Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


----- Original Message -----
From: Charlotte Chang <c.h.w.ch...@gmail.com>
Date: Tuesday, April 27, 2010 2:25 am
Subject: Re: [R] Identifying breakpoints/inflection points?
To: Clint Bowman <cl...@ecy.wa.gov>
Cc: r-help@r-project.org


> Hi Clint,
>  
>  Thank you for your help with the code. The span recommendation really
>  improved the fit of my LOESS curve. I appreciate your thoughtful
>  assistance!
>  
>  My remaining question is how could I go about identifying the
>  inflection points for the LOESS curve? I was thinking about trying to
>  find the 2nd derivative and then using the uniroot function.
>  
>  My code is here (but it's buggy and doesn't work):
>  
>  birds.lo<-loess.smooth(x,y,span=0.45)
>  d2 <- function(x) {
>       predict(birds.lo, x, deriv=2)$y
>  }
>  x<-year
>  y<-piproute
>  
>  > d2(x)
>  Error in predict(birds.lo, x, deriv = 2)$y :
>    $ operator is invalid for atomic vectors
>  
>  #Desired next step:
>  uniroot(d2,c(7,10))
>  
>  Any ideas about this would be profoundly appreciated! I'm hitting a 
> dead end.
>  
>  Yours,
>  
>  Charlotte
>  
>  On Mon, Apr 26, 2010 at 3:32 PM, Clint Bowman <cl...@ecy.wa.gov> wrote:
>  > Charlotte,
>  >
>  > Try:
>  >
>  > birds.lo <- loess(piproute~year,span=.25)
>  > # play with span to see your desired pattern
>  > birds.pr<-predict(birds.lo, data.frame(year = seq(1967, 2009, 1)), 
> se =
>  > FALSE)
>  > #
>  > plot($year,birds.pr$fit,ylim=c(0,5))
>  > par(new=T)
>  > plot(year,birds.pr$fit,pch="+",col=2,ylim=c(0,5))
>  >
>  >
>  > --
>  > Clint Bowman                    INTERNET:       cl...@ecy.wa.gov
>  > Air Quality Modeler             INTERNET:       cl...@math.utah.edu
>  > Department of Ecology           VOICE:          (360) 407-6815
>  > PO Box 47600                    FAX:            (360) 407-7534
>  > Olympia, WA 98504-7600
>  >
>  > On Mon, 26 Apr 2010, Charlotte Chang wrote:
>  >
>  >> Hello!
>  >> I have a dataset with the following two vectors:
>  >>
>  >>
>  >> 
> year<-c(1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009)
>  >>
>  >>
>  >> 
> piproute<-c(0.733333333,0.945945946,1.886363636,1.607843137,4.245614035,3.175675676,2.169014085,2,2.136363636,2.65625,2.080645161,2.114754098,2.090909091,3.012195122,2.935897436,2.592105263,1.075757576,1.210526316,1,1.1875,1.903614458,1.385542169,1.788990826,1.163793103,1.558558559,1.595238095,1.758333333,1.858267717,2.169117647,1.403225806,2.859375,3.236220472,2.054263566,3.854166667,1.812080537,2.708029197,2.75862069,2.625954198,4.540740741,3.686567164,2.8,2.968253968,3.517730496)
>  >>
>  >> Pipits is the response variable (it is the number of birds counted 
> at
>  >> each survey site in each year) and year is the independent variable.
>  >> If you plot it in R (plot(year,piproute,pch=19)), you'll see that 
> the
>  >> relationship looks like a quintic polynomial.
>  >>
>  >> Initially I was trying to fit this curve using an iterative equation,
>  >> but it's not working. I suspect that the curve-fitting equation itself
>  >> is inappropriate (it's a modified version of the logistic growth
>  >> equation). Now what I'd like to do is identify the 3 break/inflection
>  >> points in the population trend. That way, I can make an argument that
>  >> the break points corresponded to shifts in government policy with
>  >> respect to land use management. I've been looking at the segmented
>  >> package, and initially I looked at change.pt test in the circ.stats
>  >> package (which is inappropriate b/c my data is not amenable to
>  >> circular statistical analysis). Any ideas on what I could do would 
> be
>  >> appreciated!
>  >>
>  >> Thank you!
>  >>
>  >> -Charlotte
>  >>
>  >> ______________________________________________
>  >> R-help@r-project.org mailing list
>  >> 
>  >> PLEASE do read the posting guide
>  >> 
>  >> and provide commented, minimal, self-contained, reproducible code.
>  >>
>  >
>  
>  ______________________________________________
>  R-help@r-project.org mailing list
>  
>  PLEASE do read the posting guide 
>  and provide commented, minimal, self-contained, reproducible code.

### Functions

library(SemiPar)
library(lokern)
##############################################################
myplot.spm <- function (spm.obj, x.out=NULL, ...) 
{
    object <- spm.obj
        if (is.null(x.out)) x.out <- c(object$info$pen$x)
    plot.params <- plotControl(...)
    drv <- plot.params$drv
    se <- plot.params$se
    shade <- plot.params$shade
    default.ylim <- FALSE
    if (is.null(plot.params$ylim)) 
        default.ylim <- TRUE
    pen.present <- !is.null(object$info$pen)
    random.present <- !is.null(object$info$random)
    const.only <- FALSE    
     block.inds <- object$aux$block.inds

        coefs <- c(object$fit$coef$fixed, object$fit$coef$random)
    if (se == TRUE) 
        cov.mat <- object$aux$cov.mat
    if (random.present) {
        random.inds <- block.inds[[length(block.inds)]]
        coefs <- coefs[-random.inds]
        if (se == TRUE) 
            cov.mat <- cov.mat[-random.inds, -random.inds]
    }
    basis <- object$info$pen$basis
    if (drv == 0) 
        ave.vals <- 1
    if (drv > 0) 
        ave.vals <- 0

        num.pen <- ncol(as.matrix(object$info$pen$x))
        for (ipen in 1:num.pen) {
            deg.val <- object$info$pen$degree[ipen]
            if (basis == "trunc.poly") 
                ncol.X.val <- deg.val
            if (basis == "tps") 
                ncol.X.val <- (deg.val - 1)/2
            ave.val <- mean(as.matrix(object$info$pen$x)[, ipen])
            ave.vals <- c(ave.vals, ave.val)
            if (ncol.X.val > 1) 
                for (ipow in 2:ncol.X.val) ave.vals <- c(ave.vals, 
                  ave.val^ipow)
        }

     if (pen.present) {
        num.pen <- ncol(as.matrix(object$info$pen$x))
        for (ipen in 1:num.pen) {
            deg.val <- object$info$pen$degree[ipen]
            knots <- object$info$pen$knots[[ipen]]
            ave.val <- mean(as.matrix(object$info$pen$x)[, ipen])
            if (basis == "trunc.poly") {
                ave.val.knots <- ave.val - knots
                ave.val.knots <- (ave.val.knots * (ave.val.knots > 
                  0))^deg.val
            }
            if (basis == "tps") {
                ave.val.knots <- abs(ave.val - knots)^deg.val
                sqrt.Omega <- object$info$trans.mat[[ipen]]
                ave.val.knots <- t(solve(sqrt.Omega, ave.val.knots))
            }
            ave.vals <- c(ave.vals, ave.val.knots)
        }
    }

        curve.type <- NULL
        if (pen.present) 
            curve.type <- c(curve.type, rep("pen", 
ncol(as.matrix(object$info$pen$x))))
        num.curves <- length(curve.type)

    plot.params <- set.plot.dflts(object, plot.params, num.curves)
    if ((!is.null(plot.params$xlim)) & (!is.list(plot.params$xlim))) {
        if (length(plot.params$xlim) != 2) 
            stop("illegal xlim")
        plot.params$xlim <- list(lower = rep(plot.params$xlim[1], 
            num.curves), upper = rep(plot.params$xlim[2], num.curves))
    }

    if ((!is.null(plot.params$ylim)) & (!default.ylim) & 
(!is.list(plot.params$ylim))) {
        if (length(plot.params$ylim) != 2) 
            stop("illegal ylim")
        plot.params$ylim <- list(lower = rep(plot.params$ylim[1], 
            num.curves), upper = rep(plot.params$ylim[2], num.curves))
    }

    if (pen.present) {
        x.vals <- NULL
        x.vals <- cbind(x.vals, object$info$pen$x)
        fc.stt.pos <- 2
        pen.num <- 1
        for (j in 1:num.curves) {
#            grid.size <- plot.params$grid.size[j]
            grid.size <- length(x.out)
            if (drv == 0) 
                C.grid <- matrix(rep(ave.vals, grid.size), grid.size, 
                  length(ave.vals), byrow = TRUE)
            if (drv > 0) 
                C.grid <- matrix(0, grid.size, length(ave.vals))
#            x.grid <- seq(plot.params$xlim$lower[j], 
plot.params$xlim$upper[j], 
#                length = plot.params$grid.size[j])
                x.grid <- x.out
            if (curve.type[j] == "pen") 
                deg.val <- object$info$pen$degree[pen.num]
            X.g.inst <- NULL

            if (curve.type[j] == "pen") {
                if (basis == "trunc.poly") 
                  ncol.X.val <- deg.val
                if (basis == "tps") 
                  ncol.X.val <- (deg.val - 1)/2
                if (drv == 0) {
                  for (pow in 1:ncol.X.val) {
                    new.col.data <- x.vals[, j]^pow
                    new.col <- x.grid^pow
                    X.g.inst <- cbind(X.g.inst, new.col)
                  }
                }
                if (drv > 0) {
                  for (pow in 1:ncol.X.val) {
                    new.col.data <- x.vals[, j]^pow
                    pow.drv <- pow - drv
                    if (pow.drv >= 0) 
                      new.col <- prod(pow:(pow.drv + 1)) * (x.grid^pow.drv)
                    else new.col <- rep(0, length(x.grid))
                    X.g.inst <- cbind(X.g.inst, new.col)
                  }
                }
            }
            C.g.inst <- X.g.inst

            if (curve.type[j] == "pen") {
                if (basis == "trunc.poly") 
                  ncol.X.val <- deg.val
                if (basis == "tps") 
                  ncol.X.val <- (deg.val - 1)/2
                inst.col.inds <- fc.stt.pos:(fc.stt.pos + ncol.X.val - 
                  1)
                fc.stt.pos <- fc.stt.pos + ncol.X.val
                knots <- object$info$pen$knots[[pen.num]]
                num.knots <- length(knots)
                Z.g.inst <- outer(x.grid, knots, "-")
                if (basis == "trunc.poly") {
                  if (drv == 0) 
                    Z.g.inst <- (Z.g.inst * (Z.g.inst > 0))^deg.val
                  else {
                    if (deg.val >= drv) {
                      mfac <- prod((deg.val - drv + 1):deg.val)
                      Z.g.inst <- mfac * (Z.g.inst * (Z.g.inst > 
                        0))^(deg.val - drv)
                    }
                    else Z.g.inst <- matrix(0, length(x.grid), 
                      length(knots))
                  }
                }
                if (basis == "tps") {
                  if (drv == 0) {
                    Z.g.inst <- abs(Z.g.inst)^deg.val
                    sqrt.Omega <- object$info$trans.mat[[pen.num]]
                    Z.g.inst <- t(solve(sqrt.Omega, t(Z.g.inst)))
                  }
                  else {
                    if (deg.val >= drv) {
                      mfac <- prod((deg.val - drv + 1):deg.val)
                      Z.g.inst <- mfac * (Z.g.inst^(deg.val - 
                        drv - 1) * abs(Z.g.inst))
                      sqrt.Omega <- object$info$trans.mat[[pen.num]]
                      Z.g.inst <- t(solve(sqrt.Omega, t(Z.g.inst)))
                    }
                    else Z.g.inst <- matrix(0, length(x.grid), 
                      length(knots))
                  }
                }
                C.g.inst <- cbind(C.g.inst, Z.g.inst)
                inst.col.inds <- c(inst.col.inds, block.inds[[j + 
                  1]][-(1:ncol.X.val)])
                pen.num <- pen.num + 1
            }
            C.grid[, inst.col.inds] <- C.g.inst
            y.grid <- as.vector(C.grid %*% coefs)
            if (default.ylim) {
                plot.params$ylim$lower[j] <- min(y.grid)
                plot.params$ylim$upper[j] <- max(y.grid)
            }
            if (se == TRUE) {
                se.grid <- sqrt(diag(C.grid %*% cov.mat %*% t(C.grid)))
                lower <- y.grid - 2 * se.grid
                upper <- y.grid + 2 * se.grid
                if (default.ylim) {
                  plot.params$ylim$lower[j] <- min(lower)
                  plot.params$ylim$upper[j] <- max(upper)
                }
                ylim.val <- range(c(lower, upper))
            }
            if (plot.params$plot.it[j] == TRUE) {
                plot(x.grid, y.grid, type = "n", bty = plot.params$bty[j], 
                  main = plot.params$main[j], xlab = plot.params$xlab[j], 
                  ylab = plot.params$ylab[j], xlim = 
c(plot.params$xlim$lower[j], 
                    plot.params$xlim$upper[j]), ylim = 
c(plot.params$ylim$lower[j], 
                    plot.params$ylim$upper[j]))
                if (se == TRUE) {
                  if (shade == FALSE) {
                    lines(x.grid, lower, lty = plot.params$se.lty[j], 
                      lwd = plot.params$se.lwd[j], col = plot.params$se.col[j])
                    lines(x.grid, upper, lty = plot.params$se.lty[j], 
                      lwd = plot.params$se.lwd[j], col = plot.params$se.col[j])
                  }
                  if (shade == TRUE) {
                    mat <- cbind(x.grid, lower, upper)
                    mat <- mat[sort.list(mat[, 1]), ]
                    x.grid <- mat[, 1]
                    lower <- mat[, 2]
                    upper <- mat[, 3]
                    polygon(c(x.grid, rev(x.grid)), c(lower, 
                      rev(upper)), col = plot.params$shade.col[j], 
                      border = FALSE)
                  }
                }
                if ((drv >= 1) & (plot.params$zero.line == TRUE)) 
                  abline(h = 0, err = -1)
                lines(x.grid, y.grid, lty = plot.params$lty[j], 
                  lwd = plot.params$lwd[j], col = plot.params$col[j])
                if (plot.params$jitter.rug == TRUE) 
                  rug.x <- jitter(x.vals[, j])
                if (plot.params$jitter.rug == FALSE) 
                  rug.x <- x.vals[, j]
                rug(rug.x, quiet = 1, col = plot.params$rug.col[j])
            }
        }
    }

    if (se) return(list(x=x.grid,y=y.grid,se=se.grid))
    if(!se) return(list(x=x.grid,y=y.grid))

    invisible()
}
###########################################################

myspm <- function (form, random = NULL, group = NULL, family = "gaussian",
    spar.method = "REML", omit.missing = NULL) 
{
    require("nlme")
    spm.info <- spmFormRead(form, omit.missing)
    random.info <- NULL
    if (!is.null(random)) {
        random.info <- random.read(random, group)
        spm.info <- c(spm.info, list(random = random.info))
        group <- as.numeric(factor(group))
    }
    if (!is.null(unlist(spm.info$pen$spar))) {
        if (any(unlist(spm.info$pen$spar) == 0)) 
            stop("zero smoothing parameters not supported in current version.")
    }
    design.info <- spmDesign(spm.info)
    X <- design.info$X
    Z <- design.info$Z
    Z.spline <- design.info$Z.spline
    y <- spm.info$y
    trans.mat <- design.info$trans.mat
    spm.info <- c(spm.info, list(trans.mat = trans.mat))
    block.inds <- design.info$block.inds
    re.block.inds <- design.info$re.block.inds
    col.ones <- rep(1, nrow(X))
    auto.spar.select <- FALSE
    if (!is.null(spm.info$pen)) {
        auto.spar <- 0
        for (j in 1:length(spm.info$pen$name)) auto.spar <- (auto.spar + 
            (is.null(spm.info$pen$spar[[j]]) & (spm.info$pen$adf[[j]] == 
                "miss")))
        if (auto.spar > 0) 
            auto.spar.select <- TRUE
    }

    if (auto.spar.select == FALSE) {
        if (!is.null(spm.info$lin)) 
            num.lin <- ncol(as.matrix(spm.info$lin$x))
        if (is.null(spm.info$lin)) 
            num.lin <- 0
        compon.num <- 1
        if (!is.null(spm.info$pen)) {
            basis.type <- spm.info$pen$basis
            for (j in 1:ncol(as.matrix(spm.info$pen$x))) {
                if (!is.null(spm.info$pen$adf[[j]]) & (spm.info$pen$adf[[j]] != 
                  "miss")) {
                  deg.val <- spm.info$pen$degree[j]
                  stt.ind <- block.inds[[compon.num + num.lin + 
                    1]][1]
                  ncol.Xj <- length(block.inds[[compon.num + 
                    num.lin + 1]])
                  ncol.Xj <- ncol.Xj - length(re.block.inds[[compon.num]])
                  Xj <- X[, stt.ind:(stt.ind + ncol.Xj - 1)]
                  Xj <- cbind(col.ones, Xj)
                  Zj <- Z[, re.block.inds[[compon.num]]]
                  adf.val <- spm.info$pen$adf[[j]]
                    spar.val <- df.to.spar(adf.val + 1, Xj, Zj)
                  if (basis.type == "trunc.poly")
                    spm.info$pen$spar[[j]] <- exp(log(spar.val)/(2 * 
                      deg.val))
                  else spm.info$pen$spar[[j]] <- exp(log(spar.val)/deg.val)
                  compon.num <- compon.num + 1
                }
            }
        }

        diag.G <- NULL
        if (!is.null(spm.info$pen)) 
            for (j in 1:ncol(as.matrix(spm.info$pen$x))) {
                deg.val <- spm.info$pen$degree[j]
                spar.val <- spm.info$pen$spar[[j]]
                num <- length(spm.info$pen$knots[[j]])
                if (basis.type == "trunc.poly") 
                  diag.G <- c(diag.G, rep(1/(exp((2 * deg.val) * 
                    log(spar.val))), num))
                else diag.G <- c(diag.G, rep(1/(exp((deg.val) * 
                  log(spar.val))), num))
            }
    }

    dummy.group.vec <- col.ones
    fdummy.group.vec <- factor(dummy.group.vec)
    assign("dummy.group.vec.Handan", dummy.group.vec, pos = 1)
    assign("fdummy.group.vec.Handan", fdummy.group.vec, pos = 1)
    assign("group.Handan", group, pos = 1)
    assign("X.Declan", X, pos = 1)
    if (!is.null(Z)) {
        if (is.null(random)) {
            assign("Z.Jaida", Z, pos = 1)
            data.fr <- groupedData(y ~ -1 + X.Declan | dummy.group.vec.Handan, 
                data = data.frame(y, X.Declan, Z.Jaida, dummy.group.vec.Handan))
            Z.block <- list()
            for (i in 1:length(re.block.inds)) Z.block[[i]] <- 
as.formula(paste("~Z.Jaida[,c(", 
                paste(re.block.inds[[i]], collapse = ","), ")]-1"))
            if (length(re.block.inds) == 1) {
                  lme.fit <- lme(y ~ -1 + X.Declan, random = pdIdent(~-1 +
                    Z.Jaida), data = data.fr, method = spar.method)
            }
            if (length(re.block.inds) > 1) {
                  lme.fit <- lme(y ~ -1 + X.Declan, random = 
list(dummy.group.vec.Handan = pdBlocked(Z.block,
                    pdClass = rep("pdIdent", length(Z.block)))), 
                    data = data.fr, method = spar.method)
            }
        }
        if (!is.null(random)) {
            assign("Z.Jaida", Z, pos = 1)
            assign("Z.spline.Jaida", Z.spline, pos = 1)
            data.fr <- groupedData(y ~ -1 + X.Declan | dummy.group.vec.Handan, 
                data = data.frame(y, X.Declan, Z.Jaida, dummy.group.vec.Handan))
            Z.block <- list()
            for (i in 1:length(re.block.inds)) Z.block[[i]] <- 
as.formula(paste("~Z.Jaida[,c(", 
                paste(re.block.inds[[i]], collapse = ","), ")]-1"))
            if (length(re.block.inds) == 1) {
                  stop("implement later; use pigs to test")
            }

            if (length(re.block.inds) > 1) {
                  Z.block <- list(list(fdummy.group.vec.Handan = 
pdIdent(~Z.spline.Jaida -
                    1)), list(group.Handan = pdIdent(~1)))
                  Z.block <- unlist(Z.block, recursive = FALSE)
                  data.fr <- groupedData(y ~ -1 + X.Declan | 
                    fdummy.group.vec.Handan, data = data.frame(y, 
                    X.Declan, Z.spline.Jaida, group))
                  lme.fit <- lme(y ~ -1 + X.Declan, data = data.fr, 
                    random = Z.block, method = spar.method)
            }
        }
        lme.fit <- c(lme.fit, list(sigma = summary(lme.fit)$sigma))
    }
    if (is.null(Z)) {
        data.fr <- cbind(y, X.Declan, dummy.group.vec.Handan)
        G <- NULL
            lm.fit <- lm(y ~ -1 + X.Declan)
            lme.fit <- list(coef = list(fixed = lm.fit$coef), 
                sigma = summary(lm.fit)$sigma)

    }
    if (!is.null(Z)) {
        lme.fit$coef$random <- unlist(lme.fit$coef$random)
        sig.u.hat <- lme.fit$sigma * exp(unlist(lme.fit$modelStruct))
        diag.sqrt.G <- NULL
        for (ib in 1:length(re.block.inds)) diag.sqrt.G <- c(diag.sqrt.G, 
            rep(sig.u.hat[ib], length(re.block.inds[[ib]])))
        G <- diag(diag.sqrt.G^2)
    }
    resid.var <- lme.fit$sigma^2
    if (auto.spar.select == FALSE) {
            G <- resid.var * diag(diag.G)
            qr.out <- lmeFitQr(y, X, Z, G, resid.var = resid.var)
            coef.ests <- qr.out$coefficients[1:(ncol(X) + ncol(Z))]
            lme.fit <- list()
            lme.fit$coef$fixed <- coef.ests[1:ncol(X)]
            lme.fit$coef$random <- coef.ests[(1 + ncol(X)):length(coef.ests)]
    }

    if (auto.spar.select == TRUE) {
        if ((!is.null(spm.info$pen)) | (!is.null(spm.info$krige))) {
            sigu2.hat <- rep(0, length(re.block.inds))
            if (!is.null(Z)) 
                for (ib in 1:length(re.block.inds)) sigu2.hat[ib] <- 
diag(G)[re.block.inds[[ib]][1]]
            if (is.null(spm.info$krige)) {
                basis.type <- spm.info$pen$basis
                for (ip in 1:ncol(as.matrix(spm.info$pen$x))) {
                  deg.val <- spm.info$pen$degree[ip]
                  if (basis.type == "trunc.poly") 
                    spm.info$pen$spar[[ip]] <- 
exp(log(resid.var/sigu2.hat[ip])/(2 * 
                      deg.val))
                  else spm.info$pen$spar[[ip]] <- 
exp(log(resid.var/sigu2.hat[ip])/deg.val)
                }
            }

            if ((!is.null(spm.info$pen)) & (!is.null(spm.info$krige))) {
                basis.type <- spm.info$pen$basis
                for (ip in 1:ncol(as.matrix(spm.info$pen$x))) {
                  deg.val <- spm.info$pen$degree[ip]
                  var.rats <- (resid.var/sigu2.hat[ip])
                  if (basis.type == "trunc.poly") 
                    spm.info$pen$spar[[ip]] <- var.rats^(1/(2 * 
                      deg.val))
                  else spm.info$pen$spar[[ip]] <- var.rats^(1/(deg.val))
                }
                num.pen <- ncol(as.matrix(spm.info$pen$x))
                var.rat <- (resid.var/sigu2.hat[num.pen + 1])
                deg.val <- spm.info$krige$degree
                spm.info$krige$spar <- exp(log(var.rat)/deg.val)
            }
        }
    }

        aux.info <- lmeAux(X, Z, G, resid.var, block.inds)

    if (!is.null(Z)) {
        coef.ests <- c(lme.fit$coef$fixed, lme.fit$coef$random)
        C.mat <- cbind(X, Z)
    }
    if (is.null(Z)) {
        coef.ests <- lme.fit$coef$fixed
        C.mat <- X
    }
    mins <- NULL
    maxs <- NULL
    for (j in 1:length(block.inds)) {
        fitted.j <- as.matrix(C.mat[, block.inds[[j]]]) %*% 
as.matrix(coef.ests[block.inds[[j]]])
        mins[j] <- min(fitted.j)
        maxs[j] <- max(fitted.j)
    }

    aux.info <- c(aux.info, list(mins = mins, maxs = maxs))
        fitted <- as.vector(C.mat %*% coef.ests)
        resids <- y - fitted
        lme.fit$fitted <- fitted
        lme.fit$residuals <- resids

    names(lme.fit$coef$fixed) <- NULL
    names(lme.fit$coef$random) <- NULL
    spm.fit.obj <- list(fit = lme.fit, info = spm.info, aux = aux.info)
    class(spm.fit.obj) <- "spm"
    return(spm.fit.obj)
}
###########################################################

features.mat <- function(x, ymat, smoother=c("glkerns", "spm", 
"smooth.spline"), se=FALSE, npts=max(100, 2*length(x)), 
        plot.it=FALSE, c.out=3, fits.return=FALSE, decim.out=2, ...) 
# Feature extraction
# Author:  Ravi Varadhan, June 23, 2009
#
# Function arguments:
# x = vector of independent variable, e.g. time
# ymat = matrix of time-series to be smoothed
# npts = number of points to use in computing features
# smoother = technique to use for automatic smoothing; only two options are 
allowed
# plot.it = a logical variable indicating whether the smoothed results should 
be plotted
# se = a logical variable indicating whether standard error of fitted values 
should be calculated
# c.out = a constant denoting number of standard deviations away from smooth 
fit for determining whether a point is an #outlier
#
# Extracted features:
#       frms = Root-mean-squared function value over the range of data
#       fmean = Mean function value over the range of data
#       fsd = square-root of the average variance of the function around its 
mean
#       noise = standard deviation of residuals after smoothing
#       snr = signal-to-noise ratio = fsd/noise
#     fwiggle = root-mean-squared second-derivative of function; a measure of 
"wiggliness" in the function
#       deriv.range = minimum and maximum of first derivative of smoothed 
function scaled by fsd
#       critical.pts = X-locations where the first derivative is zero
#       curvature = second derivative of smoothed function at critical points ( 
> 0 for minimum, and < 0 for maximum)
#       scaled by fsd
#       outliers = points that are potential outliers
{
trapezoid <- function(x,y) sum(diff(x)*(y[-1] + y[-length(y)]))/2

smoother <- match.arg(smoother, c("glkerns", "spm", "smooth.spline"))
ipkgs <- installed.packages()

if (smoother == "spm") {
        if ("SemiPar" %in% ipkgs[,1]) require(SemiPar, quietly=TRUE) 
        else stop("Install package `SemiPar'", call.=FALSE)
        deriv1 <- function(z) myplot.spm(fit, drv=1, plot.it=FALSE, x.out=z, 
se=se)$y
        deriv2 <- function(z) myplot.spm(fit, drv=2, plot.it=FALSE, x.out=z, 
se=se)$y
} 
if (smoother == "glkerns") {
        if ("lokern" %in% ipkgs[,1]) require(lokern, quietly=TRUE) 
        else stop("Install package `lokern'", call.=FALSE)
        deriv1 <- function(z) glkerns(x, y, deriv=1, x.out=z, hetero=TRUE)$est
        deriv2 <- function(z) glkerns(x, y, deriv=2, x.out=z, hetero=TRUE)$est
}

if (smoother == "smooth.spline") {
        deriv1 <- function(z) predict(fit, deriv=1, x=z)$y
        deriv2 <- function(z) predict(fit, deriv=2, x=z)$y
}

x.out <- seq(min(x), max(x), length=npts)
x <<- x

if (is.null(dim(ymat))) ymat <- matrix(ymat, nrow=1, ncol=length(ymat))

nr <- nrow(ymat)
nc <- ncol(ymat)
fmat <- matrix(NA, nr, 10)
cpmat <- matrix(NA, nr, max(ceiling(nc/10), 50))
cvmat <- matrix(NA, nr, max(ceiling(nc/10), 50))
olmat <- matrix(NA, nr, max(ceiling(nc/10), 50))

        if (plot.it & nr > 1) par(ask=TRUE)
        if (fits.return) fits <- fits1 <- fits2 <- matrix(NA, nr, nc)

for (k in 1:nr) {
y <- as.numeric(ymat[k, ])

if (smoother == "spm") {
y <<- y

fit <- try(spm(y ~ f(x, basis="trunc.poly", degree=3), omit.missing=TRUE), 
silent=TRUE)
if (class(fit) == "try-error") fit <- try(spm(y ~ f(x, basis="trunc.poly", 
degree=2), omit.missing=TRUE), silent=TRUE)
if (class(fit) == "try-error") stop("Fitting error in spm")

fit0 <- myplot.spm(fit, drv=0, plot.it=FALSE, x.out=x.out, se=se)$y
fit1 <- myplot.spm(fit, drv=1, plot.it=FALSE, x.out=x.out, se=se)$y
fit2 <- myplot.spm(fit, drv=2, plot.it=FALSE, x.out=x.out, se=se)$y
resid <- y - fit$fit$fitted
resid.scaled <- abs(scale(resid))
if (fits.return) {
        fits[k, ] <- fit$fit$fitted
        fits1[k, ] <- myplot.spm(fit, drv=1, plot.it=FALSE, x.out=x, se=se)$y
        fits2[k, ] <- myplot.spm(fit, drv=2, plot.it=FALSE, x.out=x, se=se)$y
        }
}

if (smoother == "glkerns") {
fit <- glkerns(x, y, deriv=0, x.out=x, hetero=FALSE)
fit0 <- glkerns(x, y, deriv=0, x.out=x.out, hetero=TRUE)$est
fit1 <- glkerns(x, y, deriv=1, x.out=x.out, hetero=TRUE)$est
fit2 <- glkerns(x, y, deriv=2, x.out=x.out, hetero=TRUE)$est
resid <- y - fit$est
resid.scaled <- abs(scale(resid))
if (fits.return) {
        fits[k, ] <- fit$est
        fits1[k, ] <- glkerns(x, y, deriv=1, x.out=x, hetero=TRUE)$est
        fits2[k, ] <- glkerns(x, y, deriv=2, x.out=x, hetero=TRUE)$est
        }
}

if (smoother == "smooth.spline") {
fit <- smooth.spline(x, y, cv=FALSE)
fit0 <- predict(fit, deriv=0, x=x.out)$y
fit1 <- predict(fit, deriv=1, x=x.out)$y
fit2 <- predict(fit, deriv=2, x=x.out)$y
resid <- y - predict(fit)$y
resid.scaled <- abs(scale(resid))
if (fits.return) {
        fits[k, ] <- predict(fit)$y
        fits1[k, ] <- predict(fit, deriv=1)$y
        fits2[k, ] <- predict(fit, deriv=2)$y
        }
}

fmean <- trapezoid(x.out, fit0) / diff(range(x.out))
fstar <- sqrt(trapezoid(x.out, fit0^2) / diff(range(x.out)))
fsd <- sqrt(fstar^2 - fmean^2)
noise <- attr(resid.scaled, "scaled:scale")
snr <- fsd/noise
fwiggle <- sqrt(trapezoid(x.out, fit2^2) / diff(range(x.out)))

d0 <- (fit1[-1] * fit1[-npts]) < 0
ncrt <- sum(d0)
if (ncrt == 0) crtpts <- curv <- NULL  else {
        crtpts <- curv <- rep(NA, ncrt)
        ind0 <- (1:npts)[d0]
        for (i in 1:ncrt) {
        temp <- try(uniroot(interval=c(x.out[ind0[i]], x.out[1 + ind0[i]]), 
f=deriv1), silent=TRUE)
        if (class(temp) != "try-error") {
                crtpts[i] <- temp$root
                curv[i] <- deriv2(temp$root)
                }
        }
    }

rm(fit)
outl <- resid.scaled > c.out
if (sum(outl) == 0) outliers <- NULL else outliers <- x[outl] 

if (plot.it) {
        plot(x, y, type="p", ...)
        lines(x.out, fit0, col=2)
        }


if (!is.null(crtpts) ) {
        cpmat[k, 1:length(crtpts)] <- crtpts
        cvmat[k, 1:length(crtpts)] <- curv
        }

if (!is.null(outliers) ) olmat[k, 1:length(outliers)] <- outliers

fmat[k, ] <- as.numeric(c(fmean, as.numeric(range(fit0)), fsd, noise, snr, 
as.numeric(range(fit1)), fwiggle, length(crtpts)))
}

fmat <- as.data.frame(fmat)
names(fmat) <- c("fmean", "fmin", "fmax", "fsd", "noise", "snr", "d1min", 
"d1max", "fwiggle", "ncpts")

ncpmat <- max(apply(cpmat, 1, function(x) sum(!is.na(x))))
ncvmat <- max(apply(cvmat, 1, function(x) sum(!is.na(x))))
nolmat <- max(apply(olmat, 1, function(x) sum(!is.na(x))))
if (ncpmat==0) cpmat <- NULL else cpmat <- round(cpmat[, 1:ncpmat, drop=FALSE], 
decim.out)
if (ncvmat==0) cvmat <- NULL else cvmat <- round(cvmat[, 1:ncvmat, drop=FALSE], 
decim.out)
if (nolmat==0) olmat <- NULL else olmat <- olmat[, 1:nolmat, drop=FALSE]

ret.obj <- list(fmat=round(fmat, decim.out), cptmat=cpmat, curvmat=cvmat, 
olmat=olmat)
if (fits.return) attr(ret.obj, "fits") <- list(fits0=fits, fits1=fits1, 
fits2=fits2)

return(ret.obj)
}
##################################################

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Identifying breakpoints/inflection points?

Reply via email to