Thank you very much for your prompt response. Now I see why the results have a random part: although all units with tied distances are included in the neighbourhood, the votes have to be broken at random.
Thank you! Itziar Irigoien On or., 2015.eko azaren 20a 16:40, David L Carlson wrote:
Changing your definition of cl to clase let me replicate the problem. If you set a random seed just before running knn() the results are consistent so that indicates that the function is drawing a random number at some point. You should probably contact the package maintainer, but your toy data set is trivially simple. You have 40 total observations, but X1 has only 3 different values and X2 has only 2 different values so there are only 6 different combinations. The distance matrix on your training set has 435 distances, but only 5 different values! As a result there are many, many tied values so the algorithm probably uses a random method of selecting which 3 to use. ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.