Ariyo Kanno wrote: > Sorry, let me fix 1 sentence. > > "Here I try to mean by "overfitting" that GCV was significantly SMALLER > than the mean square error of prediction of the validation data, which > was randomly selected and not used for regression." > >> Thank you for valuable advices.
If your test sample includes fewer than 10,000 cases and your signal to noise ratio is not large, your estimate of cross-validation accuracy may be unreliable. Often 50-fold repeats of 10-fold cross-validation is required, without setting aside a single "test" sample. Frank >> I'm sorry Dr. N. Wood that by mistake I sent this reply firstly to >> your personal e-mail address. >> >> I will use the "min.sp" argument when the data size is very small. I'd >> like to know if there is any criteria for selecting "min.sp." >> >> I compared gamma=1.0 and 1.4, and I could see the smoothing effects of >> enhancing gamma by comparing edf and smoothing parameter. But it was >> not enough to suppress the overfitting when data size was small. >> >> Here I try to mean by "overfitting" that GCV was significantly larger >> than the mean square error of prediction of the validation data, which >> was randomly selected and not used for regression. >> >> Best Wishes, >> Ariyo >> >> 2007/10/3, Simon Wood <[EMAIL PROTECTED]>: >>> On Wednesday 03 October 2007 10:49, Ariyo Kanno wrote: >>>> I appreciate your quick reply. >>>> I am using the model of the following structure : >>>> >>>> fit <- gam(y~x1+s(x2)) >>>> >>>> ,where y, x1, and x2 are quantitative variables. >>>> So the response distribution is assumed to be gaussian(default). >>>> >>>> Now I understand that the data size was too small. >>> -- Well, the 10 end is definitely too small, but you can get quite >>> reasonable >>> estimates of a single smoothing parameter from 30+ gaussian data. >>> -- You can force smoother models my either setting the smoothing parameter >>> yourself using the `sp' argument to `gam', or by using the `min.sp' argument >>> to set a lower bound on the smoothing parameter. >>> -- I'm suprised that `gamma' had no effect - how high did you try? >>> >>> best, >>> Simon >>> >>> >>> >>>> Thank you. >>>> >>>> Best Wishes, >>>> >>>> Ariyo >>>> >>>> 2007/10/3, Simon Wood <[EMAIL PROTECTED]>: >>>>> What sort of model structure are you using? In particular what is the >>>>> response distribution? For poisson and binomial then overfitting can be a >>>>> sign of overdispersion and quasipoisson or quasibinomial may be better. >>>>> Also I would not expect to get useful smoothing parameter estimates from >>>>> 10 data! >>>>> >>>>> best, >>>>> Simon >>>>> >>>>> On Wednesday 03 October 2007 06:55, ???? wrote: >>>>>> Dear listers, >>>>>> >>>>>> I'm using gam(from mgcv) for semi-parametric regression on small and >>>>>> noisy datasets(10 to 200 >>>>>> observations), and facing a problem of overfitting. >>>>>> >>>>>> According to the book(Simon N. Wood / Generalized Additive Models: An >>>>>> Introduction with R), it is >>>>>> suggested to avoid overfitting by inflating the effective degrees of >>>>>> freedom in GCV evaluation with >>>>>> increased "gamma" value(e.g. 1.4). But in my case, it didn't make a >>>>>> significant change in the >>>>>> results. >>>>>> >>>>>> The only way I've found to suppress overfitting is to set the basis >>>>>> dimension "k" at very low values >>>>>> (3 to 5). However, I don't think this is reasonable because knots >>>>>> selection will then be an >>>>>> important issue. >>>>>> >>>>>> Is there any other means to avoid overfitting when alalyzing small >>>>>> datasets? >>>>>> >>>>>> Thank you for your help in advance, >>>>>> Ariyo Kanno >>>>>> >>>>>> -- >>>>>> Ariyo Kanno >>>>>> 1st-year doctor's degree student at >>>>>> Institute of Environmental Studies, >>>>>> The University of Tokyo >>>>>> >>>>>> ______________________________________________ >>>>>> R-help@r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide >>>>>> http://www.R-project.org/posting-guide.html and provide commented, >>>>>> minimal, self-contained, reproducible code. >>>>> -- >>>>> >>>>>> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK >>>>>> +44 1225 386603 www.maths.bath.ac.uk/~sw283 >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html and provide commented, >>>>> minimal, self-contained, reproducible code. >>> -- >>>> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK >>>> +44 1225 386603 www.maths.bath.ac.uk/~sw283 >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> > > > ------------------------------------------------------------------------ > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.