Re: [Rd] application to mentor syrfr package development for Google Summer of Code 2010

Chidambaram Annamalai Sun, 07 Mar 2010 22:40:04 -0800

> If I understand your concern, you want to lay the foundation for
> derivatives so that you can implement the search strategies described
> in Schmidt and Lipson (2010) --
> http://www.springerlink.com/content/l79v2183725413w0/ -- is that
> right?



Yes. Basically traditional "naive" error estimators or fitness functions
fail miserably when used in SR with implicit equations because they
immediately close in on "best" fits like f(x) = x - x and other trivial
solutions. In such cases no amount of regularization and complexity
penalizing methods will help since x - x is fairly simple by most measures
of complexity and it does have zero error. So the paper outlines such
problems associated with "direct" error estimators and thus they infer the
"triviality" of the fit by probing its estimates around nearby points and
seeing if it does follow the pattern dictated by the data points -- ergo
derivatives.

Also, somewhat like a side benefit, this method also enables us to perform
regression on closed loops and other implicit equations since the fitness
functions are based only on derivatives. The specific form of the error is
equation 1.2 which is what, I believe, comprises of the internals of the
evaluation procedure used in Eureqa.

You are correct in pointing out that there is no reason to not work in
parallel, since GAs generally have a more or less fixed form
(evaluate-reproduce cycle) which is quite easily parallelized. I have used
OpenMP in the past, in which it is fairly trivial to parallelize well formed
for loops.

Chillu

It is not clear to me how well this generalized approach will
> work in practice, but there is no reason not to proceed in parallel to
> establish a framework under which you could implement the metrics
> proposed by Schmidt and Lipson in the contemplated syrfr package.
>
> I have expanded the test I proposed with two more questions -- at
> http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2010:syrfr
> -- specifically:
>
> 5. Critique http://sites.google.com/site/gptips4matlab/
>
> 6. Use anova to compare the goodness-of-fit of a SSfpl nls fit with a
> linear model of your choice. How can your characterize the
> degree-of-freedom-adjusted goodness of fit of nonlinear models?
>
> I believe pairwise anova.nls is the optimal comparison for nonlinear
> models, but there are several good choices for approximations,
> including the residual standard error, which I believe can be adjusted
> for degrees of freedom, as can the F statistic which TableCurve uses;
> see: http://en.wikipedia.org/wiki/F-test#Regression_problems
>
> Best regards,
> James Salsman
>
>
> On Sun, Mar 7, 2010 at 7:35 PM, Chidambaram Annamalai
> <quantumeli...@gmail.com> wrote:
> > It's been a while since I proposed syrfr and I have been constantly in
> > contact with the many people in the R community and I wasn't able to find
> a
> > mentor for the project. I later got interested in the Automatic
> > Differentiation proposal (adinr) and, on consulting with a few others
> within
> > the R community, I mailed John Nash (who proposed adinr in the first
> place)
> > if he'd be willing to take me up on the project. I got a positive reply
> only
> > a few hours ago and it was my mistake to have not removed the syrfr
> proposal
> > in time from the wiki, as being listed under proposals looking for
> mentors.
> >
> > While I appreciate your interest in the syrfr proposal I am afraid my
> > allegiances have shifted towards the adinr proposal, as I got convinced
> that
> > it might interest a larger group of people and it has wider scope in
> > general.
> >
> > I apologize for having caused this trouble.
> >
> > Best Regards,
> > Chillu
> >
> > On Mon, Mar 8, 2010 at 6:41 AM, James Salsman <jsals...@talknicer.com>
> > wrote:
> >>
> >> Per http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2010
> >> -- and
> >>
> http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2010:syrfr
> >> -- I am applying to mentor the "Symbolic Regression for R" (syrfr)
> >> package for the Google Summer of Code 2010.
> >>
> >> I propose the following test which an applicant would have to pass in
> >> order to qualify for the topic:
> >>
> >> 1. Describe each of the following terms as they relate to statistical
> >> regression: categorical, periodic, modular, continuous, bimodal,
> >> log-normal, logistic, Gompertz, and nonlinear.
> >>
> >> 2. Explain which parts of http://bit.ly/tablecurve were adopted in
> >> SigmaPlot and which weren't.
> >>
> >> 3. Use the 'outliers' package to improve a regression fit maintaining
> >> the correct extrapolation confidence intervals as are between those
> >> with and without outlier exclusions in proportion to the confidence
> >> that the outliers were reasonably excluded.  (Show your R transcript.)
> >>
> >> 4. Explain the relationship between degrees of freedom and correlated
> >> independent variables.
> >>
> >> Best regards,
> >>
> >> James Salsman
> >> jsals...@talknicer.com
> >> http://talknicer.com
> >>
> >> ______________________________________________
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
>

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] application to mentor syrfr package development for Google Summer of Code 2010

Reply via email to