Dear Laura,
I have R 2.6.0. I tried dist on a vector of length 200,000 and it told me
that it is too long. Theoretically, if you have 260,000 observations, the
length of the dist object should be 260,000*259,999/2, which is too large
for our computers, I guess. Which means that unfortunately cluster.stats
won't work for such a large data set, because it needs the full casewise
dissimilarity information.
I don't understand how you managed to produce a dist object of length
of only 130,000 out of your data, but it certainly doesn't give all
pairwise distance information for 260,000 points and therefore cannot be
used in cluster.stats with a clustering vector of length 260,000 or so.
Sorry,
Christian
On Sat, 14 Jun 2008, Laura Poggio wrote:
Thank. See below.
Laura
2008/6/14 Christian Hennig <[EMAIL PROTECTED]>:
What does str(ddata) give?
Class 'dist' atomic [1:130816] 69.2 117.1 145.6 179.9 195.6 ...
dcent doesn't make sense as input for cluster.stats, because you need a
dissimilarity matrix between all objects.
Yes I know ... I simply try to see if something was changing with different
structure of data
Christian
On Sat, 14 Jun 2008, Laura Poggio wrote:
I am sorry I did not provide enough information.
I am not using img later, but data that is data.frame.
I wrote that img is a "image" just to explain what kind of data is coming
from, but the object I am using is data and it is a data.frame (checked
many
times).
I am not using as.dist, but dist in order to calculate the distance matrix
among the data I have. Then the whole code I am using is:
data <- <- as(img, "data.frame")[1:1] #(where img is an image 256x256
px)
kl <- kmeans(data, 5)
library(fpc)
ddata <- dist(data)
dcent <- dist(kl$centers)
cluster.stats(ddata, kl$cluster)
cluster.stats(dcent, kl$cluster)
In both cases I got the same error:
Error in as.dist(dmat[clustering == i, clustering == i]) : (subscript)
logical subscript too long
Below the structure of the different objects is detailed below:
data is "'data.frame': 262144 obs. of 1 variable"
kl$centers is "num [1:5, 1]"
kl$cluster is "Named int [1:262144]"
I hope it is more informative. I am sorry but I did not find any
explanation
for the error message I am getting.
Thank you very much in advance
Laura
2008/6/14 Christian Hennig <[EMAIL PROTECTED]>:
The given information is not enough to tell you what's going on. as.dist
doesn't appear in the given code and it's not clear to me what kind of
object img is ("a small image" doesn't tell me what R makes of it).
Also, try to read the help pages first and find out whether img is of the
format that is required by the functions. And check (using str for
example)
whether "data" is what you expect it to be.
Christian
On Sat, 14 Jun 2008, Laura Poggio wrote:
Thank you very much for your answer.
I tried to run the function on my data and now I am getting this message
of
error
Error in as.dist(dmat[clustering == i, clustering == i]) : (subscript)
logical subscript too long
Below the code I am using (version2.7.0 of R with all packages updated):
data <- <- as(img, "data.frame")[1:1] #(where img is a small image
256
px
x 256 px)
kl <- kmeans(data, 5)
library(fpc)
cluster.stats(data, kl$cluster)
Thank you for any hints on the reasons and meaning of the error!
Laura
2008/6/13 Christian Hennig <[EMAIL PROTECTED]>:
Dear Laura,
Dear list,
I just tried to use the function cluster.stat in the package fpc.
I just have a couple of questions about the syntax:
cluster.stats(d,clustering,alt.clustering=NULL,
silhouette=TRUE,G2=FALSE,G3=FALSE)
1) the distance object (d) is an object obtained by the function
dist()
on
my own original matrix?
d is allowed to be an object of class dist or a dissimilarity matrix.
The answer to your question depends on what your "original matrix" is.
If
it is something on which you can compute a distance by dist(), you're
right,
at least if dist() delivers the distance you are interested in.
2) clustering is the clusters vector as result of one of the many
clustering
methods?
The help page tells you what clustering can be. So it could be the
clustering/partition vector of a clustering method or it could be
something
else. Note that cluster.stats doesn't depend on any particular
clustering
method. It computes the statistics regardless of where the clustering
vector
comes from.
Best regards,
Christian
Thank you very much in advance and sorry for such basic question, but
I
did
not manage to clarify my mind.
Laura
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
[EMAIL PROTECTED],
www.homepages.ucl.ac.uk/~ucakche<http://www.homepages.ucl.ac.uk/%7Eucakche>
<http://www.homepages.ucl.ac.uk/%7Eucakche>
<http://www.homepages.ucl.ac.uk/%7Eucakche>
*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
[EMAIL PROTECTED],
www.homepages.ucl.ac.uk/~ucakche<http://www.homepages.ucl.ac.uk/%7Eucakche>
<http://www.homepages.ucl.ac.uk/%7Eucakche>
*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
[EMAIL PROTECTED],
www.homepages.ucl.ac.uk/~ucakche<http://www.homepages.ucl.ac.uk/%7Eucakche>
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
[EMAIL PROTECTED], www.homepages.ucl.ac.uk/~ucakche
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.