Hello R folks,
Today I noticed that using the subset argument in lm() with a polynomial gives
a different result than using the polynomial when the data has already been
subsetted. This was not at all intuitive for me. You can see an example
here:
https://stackoverflow.com/questions/70490599/why-does-lm-with-the-subset-argument-give-a-different-answer-than-subsetting-i
If this is a design feature that you don’t think should be
fixed, can you please include it in the documentation and explain why it makes
sense to figure out the orthogonal polynomials on the entire dataset? This
feels like a serous leak of information when evaluating train and test datasets
in a statistical learning framework.
Ray
Raymond R. Balise, PhD
Assistant Professor
Department of Public Health Sciences, Biostatistics
University of Miami, Miller School of Medicine
1120 N.W. 14th Street
Don Soffer Clinical Research Center - Room 1061
Miami, Florida 33136
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel