On Wed, 5 May 2010, Ralf B wrote:

Are there R packages that allow for dynamic clustering, i.e. where the
number of clusters are not predefined?

Yes.

I have a list of numbers that
falls in either 2 or just 1 cluster. Here an example of one that
should be clustered into two clusters:

two <- c(1,2,3,2,3,1,2,3,400,300,400)

and here one that only contains one cluster and would therefore not
need to be clustered at all.

one <- c(400,402,405, 401,410,415, 407,412)

Given a sufficiently large amount of data, a statistical test or an
effect size should be able to determined if a data set makes sense to
be divided i.e. if there are two groups that differ well enough. I am
not familiar with the underlying techniques in kmeans, but I know that
it blindly divides both data sets based on the predefined number of
clusters. Are there any more sophisticated methods that allow me to
determine the number of clusters in a data set based on statistical
tests or effect sizes ?

There are loads of techniques, e.g., cluster indices, or information criteria, etc.

Inference is more difficult but there are also certain tools available.

In any case, there is a multitude of methods and many of them are discussed in standard textbooks about clustering and/or multivariate analysis etc.

Is it possible that this is not a clustering problem but a
classification problem?

That depends on the terminology. "Clustering" is rather unambiguous while "classification" can have different meanings.

  - In statistical learning, for example, one often distinguishes between
    "supervised" learning (a response variable is modeled using certain
    explanatory variables) versus "unsupervised" learning (there is no
    response). In this terminology: clustering would be unsupervised
    learning (i.e., what you are trying to do). Supervised learning would
    encompass "regression" (numeric response) and "classification"
    (categorical response).

  - In other statistical communities "classification" is used as term
    that encompasses "clustering". For example, Gordon's textbook
    (see ?hclust) is called "Classification".

So in the latter terminology the answer to your question is: Yes, it is classification (= clustering).

In the former terminology the answer is: No, it's unsupervised learning
(= clustering), not supervised learning (= regression/classification).

Best,
Z

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to