Hi David,

thank yuou so much for helping me!


Il giorno 01/mag/2013, alle ore 10:16, David Carlson <dcarl...@tamu.edu> ha 
scritto:

> You need to clarify what you are trying to achieve and fix some errors in
> your code. First, thanks for giving us reproducible data. 
> 

i tried to fix the errors , thanks for your advice!


> Once you have read the file, you seem to be attempting to remove cases with
> missing values, but you check for missing values of "count" twice and you
> never check "depth." The whole line can be replaced with
> 
> dd <- na.omit(mat)
> 
> Now you have data with complete cases. In your next step you create a
> distance matrix that includes "idcode" as a variable! Although it is
> numeric, it is really a categorical variable. That suggests you need to read
> up on R and cluster analysis. It is very likely that you want to exclude
> this variable from the distance matrix and possibly the "count" variable as
> well. 


 i excluded idcode and count from the distance matrix

> 
> What does one row of data represent? You have 8036 complete cases
> representing data on 100 species. There are great differences in the number
> of rows for each species (idcode) ranging from 1 to 1066. 


i'm trying to clean-up the data,  i removed all the records where the species 
"idcode" is found less than 100 times

I uploaded a new link to the new-data and code [1]


is this correct ?
can i go further and try to understand which species are assigned for each 
branch of the dendrogram at a specified "cut-level" ?

thanks All for any further help!

Massimo.


[1] http://nbviewer.ipython.org/5499800

> 
> -------------------------------------
> David L Carlson
> Associate Professor of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
> 
> -----Original Message-----
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
> Behalf Of epi
> Sent: Tuesday, April 30, 2013 8:06 PM
> To: r-help@r-project.org
> Subject: [R] help understanding hierarchical clustering
> 
> Hi All,
> 
> i've problem to understand how to work with R to generate a hierarchical
> clustering my data are in a csv and looks like :
> 
> idcode,count,temp,sal,depth_m,subs
> 16001,136,4.308,32.828,63.46,47
> 16001,109,4.31,32.829,63.09,49
> 16001,107,4.302,32.822,62.54,47
> 16001,87,4.318,32.834,62.54,48
> 16002,82,4.312,32.832,63.28,49
> 16002,77,4.325,32.828,65.65,46
> 16002,77,4.302,32.821,62.36,47
> 16002,71,4.299,32.832,65.84,37
> 16002,70,4.302,32.821,62.54,49
> 
> where idcode is a specie identification number and the other fields are
> environmental parameters.
> 
> library(vegan)
> mat<-read.csv("http://epi.whoi.edu/ipython/results/mdistefano/pg_site1.csv";,
> header=T)
> dd <- mat[!is.na(mat$idcode) &
>              !is.na(mat$temp) &
>              !is.na(mat$sal) &
>              !is.na(mat$count) &
>              !is.na(mat$count) &
>              !is.na(mat$subs),]
> distmat<-vegdist(dd)
> clusa<-hclust(distmat,"average")
> print(clusa)
>       Call:
>       hclust(d = distmat, method = "average")
>       
>       Cluster method   : average 
>       Distance         : bray 
>       Number of objects: 8036
> print(dend1 <- as.dendrogram(clusa))
>       'dendrogram' with 2 branches and 8036 members total, at height
> 0.3194225
> dend2 <- cut(dend1, h=0.07)
> 
> 
> a complete run with plots is available here :  
> 
> http://nbviewer.ipython.org/5492912
> 
> i'm trying try to group together the species (idcode's) that are sharing
> similar environmental parameters
> 
> like (looking at the plots) i should be able to retrieve the list of idcode
> for each branch at "cut-level" X
> 
> in the example :  
> 
> 
> X = 0.07 
> 
> branches1 : [idcodeA, .. .. ,idcodeJ]
> ..
> ..
> branche6 : [idcodeB, .. .. , idcodeK]
> 
> 
> 
> Many thanks for your precious help!!!
> 
> Massimo.
> 
> 
> 
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to