[scikit-learn] How to determine suitable cluster algo

2019-01-24 Thread lampahome
I want to do customized clustering algo for my datasets, that's cuz I don't want to try every algo and its hyperparameters. I though I just define the default range of import hyperparameters ex: number of cluster in K-means. I want to iterate some possible clutering alog like K-means, DBSCAN, AP.

Re: [scikit-learn] How to determine suitable cluster algo

2019-01-24 Thread Matti Viljamaa
GridSearchCV is meant for tuning hyperparameters of a model over some ranges of configurations and parameter values. Like the documentation explains: https://scikit-learn.org/stable/modules/grid_search.html (and it also has some examples) The (e.g. 10-fold) cross-validation as measure of accura

Re: [scikit-learn] Imblearn: SMOTENC

2019-01-24 Thread S Hamidizade
Thanks. Unfortunately, now the error is: ValueError: Some of the categorical indices are out of range. Indices should be between 0 and 160. Best regards, On Sun, Jan 20, 2019 at 8:31 PM S Hamidizade wrote: > Dear Scikit-learners > Hi. > > I would greatly appreciate if you could let me know how t

Re: [scikit-learn] Imblearn: SMOTENC

2019-01-24 Thread Guillaume LemaƮtre
You should open a ticket on imbalanced-learn GitHub issue. This is easier to post a reproducible example and for us to test it. >From the error message, I can understand that you have 161 features and require a feature above the index 160. On Thu, 24 Jan 2019 at 16:19, S Hamidizade wrote: > Th

Re: [scikit-learn] How to determine suitable cluster algo

2019-01-24 Thread lampahome
Maybe the suitable way is try-and-error? What I'm interesting is that my datasets is very huge and I can't try number of cluster from 1 to N if I have N samples That cost too much time for me. Maybe I should define the initial number of cluster based on execution time? Then analyze the next step