[R] NLS sensitivity to start= values or poles in data range

Christopher Battles Fri, 11 May 2012 14:24:38 -0700

Greetings R-help!  I'm fairly new to R and am trying to expand my knowledge 
beyond using R for simple summary statistics and basic tests.  To that end I am 
attempting to write an interactive R-script that will perform a general 
rational function fit to a given dataset based on the example given at 
http://www.itl.nist.gov/div898/handbook/pmd/section6/pmd64.htm.


The problem that I seem to have is that the fitted function is nearly identical 
to the NIST results when I use their initial values (these are representative 
values that are close to actual data, but not taken from the dataset) for 
generating the starting point for the nls() routine.  However if I try to use 
actual data points extracted from the dataset for the preliminary fit, the 
fitted function can often fail to approximate the data.

One culprit that I suspect may be that a pole (real root of the denominator) of 
the initial fit is laying in the range of the data, since the NIST starting 
values produce a preliminary fit that does not have any poles over the data 
range.  Unfortunately, I'm not familiar enough with the nls() routines to pass 
judgement.

The attached code reads in the data from the web, performs the preliminary fit 
using lm(), passes that to nls() and plots the data, starting values, 
preliminary fit and final fit. The code block in the middle defining indsubset 
and depsubset contain three different options for the starting values. I thank 
you all in advance for any help or insights.

Simplified code follows...  Actual code for the interactive version is 
available at http://www.schrodingersghost.com/RationalFit-0.2.R


  # Start of Code       
  # Read and plot the data
  
  InputData <- 
read.table("http://itl.nist.gov/div898/strd/nls/data/LINKS/DATA/Hahn1.dat";,
                                  header=0,
                                  row.names=NULL, 
                                  col.names=c("CTE","K"),
                                  skip=60)
  indvar <- 2
  depvar <- 1
  
  indvarname <- names(InputData)[indvar]
  depvarname <- names(InputData)[depvar]
  
  attach(InputData)
  matplot(InputData[indvar],
          InputData[depvar], 
          type='p', pch=4, 
          xlab = indvarname, 
          ylab = depvarname, 
          main = bquote(paste(.(depvarname), " vs ", .(indvarname)))
          )

  # Specify values for determining nls start
  
  ########################################
  # Actual Data equally spaced in data set
  
  #indsubset <- c(14.13, 72.77, 163.48, 273.13, 419.51, 549.53, 850.98)
  #depsubset <- c(0.08, 7.267, 13.974, 16.337, 17.767, 18.61, 21.085)

  # Actual Data closest values to NIST starting values
  
  #indsubset <- c(14.13,31.30,40.03,50.24,120.25,202.14,845.97)
  #depsubset <- c(0.08,1.089,2.241,3.697,12.005,15.190,20.830)
  
  # NIST Starting Values
  
  indsubset <- c(10,30,40,50,120,200,800)
  depsubset <- c(0,2,3,5,12,15,20)
  #
  #########################################

  points(indsubset,depsubset, type="p", pch=19, col="red")
  
  # Generate starting values for nls per NIST process
  
  initmodel <- (depsubset ~   I(indsubset^1) 
                                        + I(indsubset^2) 
                                        + I(indsubset^3) 
                                        + I(-1 * indsubset^1 * depsubset) 
                                        + I(-1 * indsubset^2 * depsubset) 
                                        + I(-1 * indsubset^3 * depsubset))

  InputDatalm <- lm(initmodel)
  summary(InputDatalm)
  fitted.values(InputDatalm)
  nlsstart <- coef(InputDatalm)  
  names(nlsstart) <- c("A0", "A1", "A2", "A3", "B1", "B2", "B3")
  nlsstart <- as.list(nlsstart)
  attach(nlsstart)
  curve((A0*x^0+A1*x^1+A2*x^2+A3*x^3)/(1+B1*x^1+B2*x^2+B3*x^3), 
                from = min(K), 
                to = max(K), 
                add=TRUE, 
                col="blue", 
                lty=5, lwd=2 )
  
  # Check the poles of the starting function
  
  poles <- polyroot(c(1,B1,B2,B3))
  cat((poles), '\n')
  
  # Run the nls using above values as starting points
  
  fullmodel <- {CTE ~ (A0 * K^0 + A1 * K^1 + A2 * K^2 + A3 * K^3)/
                            (1 + B1 * K^1 + B2 * K^2 + B3 * K^3)}
  rationalfit <- nls(fullmodel,start=nlsstart)
  summary(rationalfit)
  rationalcoef <- as.list(coef(rationalfit))
  attach(rationalcoef)
  curve((A0*x^0+A1*x^1+A2*x^2+A3*x^3)/(1+B1*x^1+B2*x^2+B3*x^3), 
                from = min(K), 
                to = max(K), 
                add=TRUE, lwd=2)
  legend(600,5,c("Starting Values","Initial Value Fit","Full Model Fit","Data"),
                lty=c(0,5,1,0),
                pch=c(19,NA,NA,4),
                col=c("red","blue","black","black"),
                plot=TRUE)

  # Check the poles of the final function
  
  poles <- polyroot(c(1,B1,B2,B3))
  cat((poles), '\n')
  
  # Cleanup
  detach(InputData)
  detach(rationalcoef)
  # End of Code




Thank you,

Christopher Battles

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] NLS sensitivity to start= values or poles in data range

Reply via email to