Hi again, I'm looking for some clarification on 2 things. 1. On that last note, I realize that s(x1,x2) would be the other obvious interaction to compare with - and I see that you recommend te(x1,x2) if they are not on the same scale. 2. If s(x1,by=x1) gives you a "parameter" value similar to a GLM when you plot s(x1):x1, why does my function above return the same yhat as predict(mdl,type='response') ? Shouldn't each of the terms need to be multiplied by the variable value before applying rowSums()+attr(sterms,'constant') ?? Thanks again Dominik
On Wed, May 11, 2016 at 10:11 AM, Dominik Schneider < dominik.schnei...@colorado.edu> wrote: > Hi Simon, Thanks for this explanation. > To make sure I understand, another way of explaining the y axis in my > original example is that it is the contribution to snowdepth relative to > the other variables (the example only had fsca, but my actual case has a > couple others). i.e. a negative s(fsca) of -0.5 simply means snowdepth 0.5 > units below the intercept+s(x_i), where s(x_i) could also be negative in > the case where total snowdepth is less than the intercept value. > > The use of by=fsca is really useful for interpreting the marginal impact > of the different variables. With my actual data, the term s(fsca):fsca is > never negative, which is much more intuitive. Is it appropriate to compare > magnitudes of e.g. s(x2):x2 / mean(x2) and s(x2):x2 / mean(x2) where > mean(x_i) are the mean of the actual data? > > Lastly, how would these two differ: s(x1,by=x2); or > s(x1,by=x1)*s(x2,by=x2) since interactions are surely present and i'm not > sure if a linear combination is enough. > > Thanks! > Dominik > > > On Wed, May 11, 2016 at 3:11 AM, Simon Wood <simon.w...@bath.edu> wrote: > >> The spline having a positive value is not the same as a glm coefficient >> having a positive value. When you plot a smooth, say s(x), that is >> equivalent to plotting the line 'beta * x' in a GLM. It is not equivalent >> to plotting 'beta'. The smooths in a gam are (usually) subject to >> `sum-to-zero' identifiability constraints to avoid confounding via the >> intercept, so they are bound to be negative over some part of the covariate >> range. For example, if I have a model y ~ s(x) + s(z), I can't estimate the >> mean level for s(x) and the mean level for s(z) as they are completely >> confounded, and confounded with the model intercept term. >> >> I suppose that if you want to interpret the smooths as glm parameters >> varying with the covariate they relate to then you can do, by setting the >> model up as a varying coefficient model, using the `by' argument to 's'... >> >> gam(snowdepth~s(fsca,by=fsca),data=dat) >> >> >> this model is `snowdepth_i = f(fsca_i) * fsca_i + e_i' . s(fsca,by=fsca) >> is not confounded with the intercept, so no constraint is needed or >> applied, and you can now interpret the smooth like a local GLM coefficient. >> >> best, >> Simon >> >> >> >> >> On 11/05/16 01:30, Dominik Schneider wrote: >> >>> Hi, >>> Just getting into using GAM using the mgcv package. I've generated some >>> models and extracted the splines for each of the variables and started >>> visualizing them. I'm noticing that one of my variables is physically >>> unrealistic. >>> >>> In the example below, my interpretation of the following plot is that the >>> y-axis is basically the equivalent of a "parameter" value of a GLM; in >>> GAM >>> this value can change as the functional relationship changes between x >>> and >>> y. In my case, I am predicting snowdepth based on the fractional snow >>> covered area. In no case will snowdepth realistically decrease for a unit >>> increase in fsca so my question is: *Is there a way to constrain the >>> spline >>> to positive values? * >>> >>> Thanks >>> Dominik >>> >>> library(mgcv) >>> library(dplyr) >>> library(ggplot2) >>> extract_splines=function(mdl){ >>> sterms=predict(mdl,type='terms') >>> datplot=cbind(sterms,mdl$model) %>% tbl_df >>> datplot$intercept=attr(sterms,'constant') >>> datplot$yhat=rowSums(sterms)+attr(sterms,'constant') >>> return(datplot) >>> } >>> dat=data_frame(snowdepth=runif(100,min = >>> 0.001,max=6.7),fsca=runif(100,0.01,.99)) >>> mdl=gam(snowdepth~s(fsca),data=dat) >>> termdF=extract_splines(mdl) >>> ggplot(termdF)+ >>> geom_line(aes(x=fsca,y=`s(fsca)`)) >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> -- >> Simon Wood, School of Mathematics, University of Bristol BS8 1TW UK >> +44 (0)117 33 18273 http://www.maths.bris.ac.uk/~sw15190 >> >> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.