i'll try to describe the data, here [1] there is a subdatset (255 rows) 6 columns (a to f) the last columns contains the Identification Number (ID) for a particular species. the ID in f are 20 different species and it should be my 'label':
16001 11012 25011 13011 11029 11027 10022 10024 20009 11016 20002 13001 11010 22037 15001 30016 21005 11028 15002 20008 the other vars (from 'a' to 'e') are : depth temperature salinity substrate-class morphology-class my target is to have 'groups of species' based on the similarity of theyr environmental parameters, and build a dendrogram like [2] the full dataset (1,5 mb) is available here [3] [1] http://massimo-timecapsule.whoi.edu//data/img/subdataset.txt [2] http://massimo-timecapsule.whoi.edu//data/img/manova_clust_matlab.png [3] http://massimo-timecapsule.whoi.edu//data/img/x.txt Il giorno Mar 9, 2012, alle ore 7:18 PM, Peter Langfelder ha scritto: > On Fri, Mar 9, 2012 at 1:50 PM, Massimo Di Stefano > <massimodisa...@gmail.com> wrote: >> Peter, >> >> really thanks for your answer. >> >> >> >> install.packages("flashClust") >> library(flashClust) >> data <- read.csv('/Users/epifanio/Desktop/cluster/x.txt') >> data <- na.omit(data) >> data <- scale(data) >>> mydata >> a b c d e >> 1 -0.207709346 -6.618558e-01 0.481413046 0.7761133 0.96473124 >> 2 -0.207709346 -6.618558e-01 0.481413046 0.7761133 0.96473124 >> 3 -0.256330843 -6.618558e-01 -0.352285877 0.7761133 0.96473124 >> 4 -0.289039851 -6.618558e-01 -0.370032451 -0.2838308 0.96473124 >> >> >> my target is to group my observation by 'speciesID' >> the speciesID is the last column : 'e' >> >> >> >> Before to go ahead, i should understand how to tell R that the he has to >> generate the groups using the column 'e' as variable, >> so to have the groups by speciesID. >> >> using this instruction : >> >> d <- dist(data) >> clust <- hclust(d) >> >> is not clear to me how R will understand to use the column 'e' as label. > > Well, you didn't say that column e was a label that you wanted to keep > separate. Any other labels in the data? You may not want to use labels > in the distance calculation. > > Do I understand right that you want to cluster each species separately? > > Peter ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.