Hi David, thank yuou so much for helping me!
Il giorno 01/mag/2013, alle ore 10:16, David Carlson <dcarl...@tamu.edu> ha scritto: > You need to clarify what you are trying to achieve and fix some errors in > your code. First, thanks for giving us reproducible data. > i tried to fix the errors and uploaded a new link to data and code [1] Thanks for your advice! i'll try to describe the dataset : in the csv are stored information recorded by an underwater towed camera [imagename, temp, sal, depth_m] plus 3 fields added later by an image analyst [idcode, count, subs] so each ROW in the data is composed by - idcode (unique identifier for specie) - count (how many individuals of species 'J' are found in image 'X' ) - temp (temperature) - sal (salinity) - depth_m (depth in meters) - subs (substrate complexity, integer number describing the seafloor texture [hard <-> soft bottom] ) The csv looks like : idcode count temp sal depth_m subs 16001 136 4.308 32.828 63.46 47 .. 10010 1 4.342 32.865 83.58 35 > Once you have read the file, you seem to be attempting to remove cases with > missing values, but you check for missing values of "count" twice and you > never check "depth." The whole line can be replaced with > > dd <- na.omit(mat) my mistake sorry about that. fixed in the code > > Now you have data with complete cases. In your next step you create a > distance matrix that includes "idcode" as a variable! Although it is > numeric, it is really a categorical variable. That suggests you need to read > up on R and cluster analysis. It is very likely that you want to exclude > this variable from the distance matrix and possibly the "count" variable as > well. big mistake here, idcode is my "categorical value" the one i'm trying in grouping into classes fixed in the code, i now running the code including the count [ dd1 ] and without including count [ dd2 ] the count should express the "density for each species" with particular environmental parameters associated (i think it was important, it isn't?) > > What does one row of data represent? You have 8036 complete cases > representing data on 100 species. There are great differences in the number > of rows for each species (idcode) ranging from 1 to 1066. - trying to clem up the dataset should i remove the records for the idcode that are not well represented (IDcode with a low number of records) so to have a subset of representative species ? - idcodelist = [id_1, , id_N] with count(id_i) >= X note : in the data each record refer to a single species identified in an image, this means that there are multiple records for the same image (one record for each species identified in a single image) in the database i have an unique [imagename] and position [lon lat] for each image, should i include this information in my csv ? so that it looks like : idcode count temp sal depth_m subs lon lat imagename 16001 136 4.308 32.828 63.46 47 x1 y1 image_year_day_h_m_ms_1 18005 15 4.308 32.828 63.46 47 x1 y1 image_year_day_h_m_ms_1 .. 10010 5 4.342 31.925 82.18 35 xN yN image_year_day_h_m_ms_N 13010 1 4.342 31.925 82.18 35 xN yN image_year_day_h_m_ms_N and group my data by [imagename] adding a field for each representative species where to store the relative count ? the example below should look like : count_id_1 count_id_2 count_id_5 count_id_9 idcode_N-1 idcode_N temp sal depth_m subs lon lat imagename 136 0 15 0 0 0 4.308 32.828 63.46 47 x1 y1 image_year_day_h_m_ms_1 .. 0 5 0 0 1 0 4.342 31.925 82.18 35 xN yN image_year_day_h_m_ms_N where : count_id_1 is the count for the species with idcode 16001 in the image Xi count_id_5 // 16005 // count_id_2 // 10010 // count_id_N-1 // 13010 // thank you for any further advice, Massimo. [1] http://nbviewer.ipython.org/5497996 > > ------------------------------------- > David L Carlson > Associate Professor of Anthropology > Texas A&M University > College Station, TX 77840-4352 > > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of epi > Sent: Tuesday, April 30, 2013 8:06 PM > To: r-help@r-project.org > Subject: [R] help understanding hierarchical clustering > > Hi All, > > i've problem to understand how to work with R to generate a hierarchical > clustering my data are in a csv and looks like : > > idcode,count,temp,sal,depth_m,subs > 16001,136,4.308,32.828,63.46,47 > 16001,109,4.31,32.829,63.09,49 > 16001,107,4.302,32.822,62.54,47 > 16001,87,4.318,32.834,62.54,48 > 16002,82,4.312,32.832,63.28,49 > 16002,77,4.325,32.828,65.65,46 > 16002,77,4.302,32.821,62.36,47 > 16002,71,4.299,32.832,65.84,37 > 16002,70,4.302,32.821,62.54,49 > > where idcode is a specie identification number and the other fields are > environmental parameters. > > library(vegan) > mat<-read.csv("http://epi.whoi.edu/ipython/results/mdistefano/pg_site1.csv", > header=T) > dd <- mat[!is.na(mat$idcode) & > !is.na(mat$temp) & > !is.na(mat$sal) & > !is.na(mat$count) & > !is.na(mat$count) & > !is.na(mat$subs),] > distmat<-vegdist(dd) > clusa<-hclust(distmat,"average") > print(clusa) > Call: > hclust(d = distmat, method = "average") > > Cluster method : average > Distance : bray > Number of objects: 8036 > print(dend1 <- as.dendrogram(clusa)) > 'dendrogram' with 2 branches and 8036 members total, at height > 0.3194225 > dend2 <- cut(dend1, h=0.07) > > > a complete run with plots is available here : > > http://nbviewer.ipython.org/5492912 > > i'm trying try to group together the species (idcode's) that are sharing > similar environmental parameters > > like (looking at the plots) i should be able to retrieve the list of idcode > for each branch at "cut-level" X > > in the example : > > > X = 0.07 > > branches1 : [idcodeA, .. .. ,idcodeJ] > .. > .. > branche6 : [idcodeB, .. .. , idcodeK] > > > > Many thanks for your precious help!!! > > Massimo. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.