First. Do not use html messages. They are converted to plain text and your table ends up a mess. See below. It appears the variables are all numeric? If so, there are two standard approaches to handling multiple scales and magnitudes with cluster analysis:
1. Use z-scores. The scale() function will convert each variable into a standard score with a mean of 0 and a standard deviation of 1. Then use Euclidean distance in the dist() function which will adjust for your missing values. 2. Use prcomp() on the correlation matrix of the variables to extract a set of principal components and use the principal component scores in the cluster analysis. This may allow you to reduce the number of variables in the data set if the 29 variables are correlated with one another. ------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77840-4352 From: Elizabeth Beck [mailto:elizabethbe...@gmail.com] Sent: Friday, May 10, 2013 1:20 PM To: dcarl...@tamu.edu Cc: r-help@r-project.org Subject: Re: [R] NMDS with missing data? Hi David, You are right in that Bray-Curtis is not suitable for my dataset, and that my variables are very different. Given your suggestions, I am struggling with how to transform or standardize my data given that they vary so much. Additionally, looking at the dist() package I am not sure which distance measure would be most appropriate. Euclidean seems to most widely used but I'm not sure if it is appropriate for myself (there much more help for ecology data than toxicology). Given a sample of my data below ( total of 287 obs. of 29 variables) can you suggest a starting point? SODIUM K CL HCO3 ANION CA P GLUCOSE CHOLEST GGT GLDH CK AST PROTEIN ALBUMIN GLOBULIN A_G UA BA CORTICO T3 T4 THYROID 145 3.3 102 24 22 2.9 2.45 9.8 5.7 3 3 678 5 34 15 19 0.79 180 6 70.97 1.31 12.77 0.102376 146 3.2 102 21 26 2.89 2.68 11.1 6.78 3 4 1290 9 36 18 18 1 170 13 79.1 3.51 18.78 0.186751 147 2.5 103 22 25 2.96 2.59 10 5.78 3 6 1582 11 35 17 18 0.94 272 10 65.84 1.84 15.5 0.118602 148 2.5 101 21 29 2.91 2.91 10.6 5.83 3 3 1479 8 35 17 18 0.94 317 8 74.9 2.59 20.68 0.125389 Thank you! Elizabeth On Thu, May 9, 2013 at 7:50 AM, David Carlson <dcarl...@tamu.edu> wrote: Since you pass your entire data.frame to metaMDS(), your first error probably comes from the fact that you have included ID as one of the variables. You should look at the results of str(dat) You can drop cases with missing values using > dat2 <- na.omit(dat) > metaMDS(dat2[,-1]) would run the analysis on all but the first column (ID) with all the cases containing complete data. But that assumes that sex and exposure are not factors. Or you could use one of the distance functions in dist() which adjust for missing values. However dist() does not have an option to use Bray-Curtis (the default in metaMDS()). Bray-Curtis is designed for comparing species counts or proportions so it is not clear that it is an appropriate dissimilarity measure for your data. Further, your data seem contain a mixture of measurement scales and/or magnitudes so some variable standardization or transformations are probably necessary before you can get any useful results from MDS. ------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Elizabeth Beck Sent: Wednesday, May 8, 2013 3:39 PM To: r-help@r-project.org Subject: [R] NMDS with missing data? Hi, I'm trying to run NMDS (non-metric multidimensional scaling) with R vegan (metaMDS) but I have a few NAs in my data set. I've tried to run it 2 ways. The first way with my entire data set which includes variables such as ID, sex, exposure, treatment, sodium, potassium, chloride.... mydata.mds<-metaMDS(dat) I get the following error: in if (any(autotransform, noshare > 0, wascores) && any(comm < 0)) { : missing value where TRUE/FALSE needed In addition: Warning messages: 1: In Ops.factor(left, right) : < not meaningful for factors 2: In Ops.factor(left, right) : < not meaningful for factors 3: In Ops.factor(left, right) : < not meaningful for factors 4: In Ops.factor(left, right) : < not meaningful for factors 5: In Ops.factor(left, right) : < not meaningful for factors The second way with only those last biochemical variables (29 in total). mydata.mds<-metaMDS(measurements) I get this error: Error in if (any(autotransform, noshare > 0, wascores) && any(comm < 0)) { : missing value where TRUE/FALSE needed My go to "na.rm=TRUE" does nothing. Any ideas on how to account for NAs and if so which of the above options I should be using? Thanks! Elizabeth [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.