Hi David, thank yuou so much for helping me!
Il giorno 01/mag/2013, alle ore 10:16, David Carlson <dcarl...@tamu.edu> ha scritto: > You need to clarify what you are trying to achieve and fix some errors in > your code. First, thanks for giving us reproducible data. > i tried to fix the errors , thanks for your advice! > Once you have read the file, you seem to be attempting to remove cases with > missing values, but you check for missing values of "count" twice and you > never check "depth." The whole line can be replaced with > > dd <- na.omit(mat) > > Now you have data with complete cases. In your next step you create a > distance matrix that includes "idcode" as a variable! Although it is > numeric, it is really a categorical variable. That suggests you need to read > up on R and cluster analysis. It is very likely that you want to exclude > this variable from the distance matrix and possibly the "count" variable as > well. i excluded idcode and count from the distance matrix > > What does one row of data represent? You have 8036 complete cases > representing data on 100 species. There are great differences in the number > of rows for each species (idcode) ranging from 1 to 1066. i'm trying to clean-up the data, i removed all the records where the species "idcode" is found less than 100 times I uploaded a new link to the new-data and code [1] is this correct ? can i go further and try to understand which species are assigned for each branch of the dendrogram at a specified "cut-level" ? thanks All for any further help! Massimo. [1] http://nbviewer.ipython.org/5499800 > > ------------------------------------- > David L Carlson > Associate Professor of Anthropology > Texas A&M University > College Station, TX 77840-4352 > > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of epi > Sent: Tuesday, April 30, 2013 8:06 PM > To: r-help@r-project.org > Subject: [R] help understanding hierarchical clustering > > Hi All, > > i've problem to understand how to work with R to generate a hierarchical > clustering my data are in a csv and looks like : > > idcode,count,temp,sal,depth_m,subs > 16001,136,4.308,32.828,63.46,47 > 16001,109,4.31,32.829,63.09,49 > 16001,107,4.302,32.822,62.54,47 > 16001,87,4.318,32.834,62.54,48 > 16002,82,4.312,32.832,63.28,49 > 16002,77,4.325,32.828,65.65,46 > 16002,77,4.302,32.821,62.36,47 > 16002,71,4.299,32.832,65.84,37 > 16002,70,4.302,32.821,62.54,49 > > where idcode is a specie identification number and the other fields are > environmental parameters. > > library(vegan) > mat<-read.csv("http://epi.whoi.edu/ipython/results/mdistefano/pg_site1.csv", > header=T) > dd <- mat[!is.na(mat$idcode) & > !is.na(mat$temp) & > !is.na(mat$sal) & > !is.na(mat$count) & > !is.na(mat$count) & > !is.na(mat$subs),] > distmat<-vegdist(dd) > clusa<-hclust(distmat,"average") > print(clusa) > Call: > hclust(d = distmat, method = "average") > > Cluster method : average > Distance : bray > Number of objects: 8036 > print(dend1 <- as.dendrogram(clusa)) > 'dendrogram' with 2 branches and 8036 members total, at height > 0.3194225 > dend2 <- cut(dend1, h=0.07) > > > a complete run with plots is available here : > > http://nbviewer.ipython.org/5492912 > > i'm trying try to group together the species (idcode's) that are sharing > similar environmental parameters > > like (looking at the plots) i should be able to retrieve the list of idcode > for each branch at "cut-level" X > > in the example : > > > X = 0.07 > > branches1 : [idcodeA, .. .. ,idcodeJ] > .. > .. > branche6 : [idcodeB, .. .. , idcodeK] > > > > Many thanks for your precious help!!! > > Massimo. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.