In thinking about this 'problem' last night, I found the 'solution'. Any NN
algorithm needs to keep track of all the data it is given, both X and Y
data, otherwise how could it find and report the nearest neighbour! When
predicting (i.e. predict.kknn) it will find the closest match (nearest
neighbour), which, for a point from the original dataset /is that point/!

In contrast, the kknn$fitted.values are derived from some cross validation
approach; likely either finding the nearest point with non-zero distance, or
build a model without that point and see where it falls. Otherwise, it
wouldn't be possible to report the accuracy of the model using only a single
dataset.

I will retest the algorithm using a split training/test dataset to better
understand how predict.kknn selects a model from the suite generated by
train.kknn—my original question. I assume it chooses kknn$best.parameters,
but want to verify this.

Hopefully that clarifies the issue. I post here in case future users have a
similar question. 

Thanks to any who took the time to think about this!
Jonathan



--
View this message in context: 
http://r.789695.n4.nabble.com/kknn-predict-and-kknn-fitted-values-tp4711625p4711634.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to