Date: Fri, 19 Sep 2008 07:00:33 +0000 (UTC)
From: "Hans W. Borchers" <[EMAIL PROTECTED]>
Subject: Re: [R] How to do knn regression?
To: [EMAIL PROTECTED]
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; charset=us-ascii
Shengqiao Li <shli <at> stat.wvu.edu> writes:
Hello,
I want to do regression or missing value imputation by knn. I searched
r-help mailing list. This question was asked in 2005. ksmooth and loess
were recommended. But my case is different. I have many predictors
(p>20) and I really want try knn with a given k. ksmooth and loess use
band width to define neighborhood size. This contrasts to knn's
variable
band width via fixing a k. Are there any such functions I can use in R
packages?
The R package 'knnFinder' provides a nearest neighbor search based on
the approach through kd-tree data structures. Therefore, it is extremely
fast even for very large data sets. It returns as many neighbors as you
need and can also be used, e.g., for determining distance-based
outliers.
Thanks for your info. But it seems that there are problems to use
knnFinder. knnFinder doesn't distinguish Test data and Train data. It
searches in all data. New data with unknow Y's may appear in neighbors in
the X space. The mask arg. seems not solving this problems. In addtion, I
notice that there are several other possible problems with knnFinder:
(1) Ties are ignored.
(2) knnFinder is slower than class::knn when number of
variables is relatively small, eg. 70.
(3) Memory leakage.
(4) Maximum distance is small.
(5) One extra column is needed.
I rewrote knnFinder code to solve the last three problems for other
purposes for which the self-match is not allowed. But self-math option is
not a function parameter. It's a MACRO variable. So this option cannot be
changed once the library is compiled. For regression, ties should be
used. I have to compile two versions. This is not neat.
Any other convenient ways?
Hans Werner Borchers
ABB Corporate Research
Your help is highly appreciated.
Shengqiao Li
______________________________________________
R-help <at> r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.