Re: [MATH] MATH-1378: KMeansPlusPlusClusterer optimize seeding procedure.

2016-06-23 Thread Artem Barger
Thanks, now then I've looked on it again, I think I can improve it more, since I currently at each iteration of the seed each points sampled with worst case complexity of O(n) (n is number of points) I think it's possible to reduce it to O(log(n)), while using O(n) of additional space. Best regard

Re: [MATH] MATH-1378: KMeansPlusPlusClusterer optimize seeding procedure.

2016-06-23 Thread Eric Barnhill
I use kmeans a bit and I will look at it. On Thu, Jun 23, 2016 at 2:10 PM, Artem Barger wrote: > Hi all, > > While I understand there is a project decision threads are going on ML, > however I'd like to suggest and provide some improvements of CM kmeans++ > implementation in the seeding procedur

[MATH] MATH-1378: KMeansPlusPlusClusterer optimize seeding procedure.

2016-06-23 Thread Artem Barger
Hi all, While I understand there is a project decision threads are going on ML, however I'd like to suggest and provide some improvements of CM kmeans++ implementation in the seeding procedure. Currently sum of squared distances computed each iteration during initial centers seeding, which is redu