In thinking about this 'problem' last night, I found the 'solution'. Any NN algorithm needs to keep track of all the data it is given, both X and Y data, otherwise how could it find and report the nearest neighbour! When predicting (i.e. predict.kknn) it will find the closest match (nearest neighbour), which, for a point from the original dataset /is that point/!
In contrast, the kknn$fitted.values are derived from some cross validation approach; likely either finding the nearest point with non-zero distance, or build a model without that point and see where it falls. Otherwise, it wouldn't be possible to report the accuracy of the model using only a single dataset. I will retest the algorithm using a split training/test dataset to better understand how predict.kknn selects a model from the suite generated by train.kknn—my original question. I assume it chooses kknn$best.parameters, but want to verify this. Hopefully that clarifies the issue. I post here in case future users have a similar question. Thanks to any who took the time to think about this! Jonathan -- View this message in context: http://r.789695.n4.nabble.com/kknn-predict-and-kknn-fitted-values-tp4711625p4711634.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.