You need to clarify what you are trying to achieve and fix some errors in your code. First, thanks for giving us reproducible data.
Once you have read the file, you seem to be attempting to remove cases with missing values, but you check for missing values of "count" twice and you never check "depth." The whole line can be replaced with dd <- na.omit(mat) Now you have data with complete cases. In your next step you create a distance matrix that includes "idcode" as a variable! Although it is numeric, it is really a categorical variable. That suggests you need to read up on R and cluster analysis. It is very likely that you want to exclude this variable from the distance matrix and possibly the "count" variable as well. What does one row of data represent? You have 8036 complete cases representing data on 100 species. There are great differences in the number of rows for each species (idcode) ranging from 1 to 1066. ------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of epi Sent: Tuesday, April 30, 2013 8:06 PM To: r-help@r-project.org Subject: [R] help understanding hierarchical clustering Hi All, i've problem to understand how to work with R to generate a hierarchical clustering my data are in a csv and looks like : idcode,count,temp,sal,depth_m,subs 16001,136,4.308,32.828,63.46,47 16001,109,4.31,32.829,63.09,49 16001,107,4.302,32.822,62.54,47 16001,87,4.318,32.834,62.54,48 16002,82,4.312,32.832,63.28,49 16002,77,4.325,32.828,65.65,46 16002,77,4.302,32.821,62.36,47 16002,71,4.299,32.832,65.84,37 16002,70,4.302,32.821,62.54,49 where idcode is a specie identification number and the other fields are environmental parameters. library(vegan) mat<-read.csv("http://epi.whoi.edu/ipython/results/mdistefano/pg_site1.csv", header=T) dd <- mat[!is.na(mat$idcode) & !is.na(mat$temp) & !is.na(mat$sal) & !is.na(mat$count) & !is.na(mat$count) & !is.na(mat$subs),] distmat<-vegdist(dd) clusa<-hclust(distmat,"average") print(clusa) Call: hclust(d = distmat, method = "average") Cluster method : average Distance : bray Number of objects: 8036 print(dend1 <- as.dendrogram(clusa)) 'dendrogram' with 2 branches and 8036 members total, at height 0.3194225 dend2 <- cut(dend1, h=0.07) a complete run with plots is available here : http://nbviewer.ipython.org/5492912 i'm trying try to group together the species (idcode's) that are sharing similar environmental parameters like (looking at the plots) i should be able to retrieve the list of idcode for each branch at "cut-level" X in the example : X = 0.07 branches1 : [idcodeA, .. .. ,idcodeJ] .. .. branche6 : [idcodeB, .. .. , idcodeK] Many thanks for your precious help!!! Massimo. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.