On Fri, 21 Oct 2011, David Winsemius wrote:
First you need to clarify whether "TDS" is the name of a column or a possible value in a column named "param". This whole painful multi-question process would be greatly accelerated if you offered str(chemdata).
Yes, I did on a different thread, but not on this one. str(chemdata) 'data.frame': 47244 obs. of 6 variables: $ site : Factor w/ 143 levels "BC-0.5","BC-1",..: 134 134 134 127 127 $ sampdate: Date, format: "2006-12-06" "2006-12-06" ... $ param : Factor w/ 66 levels "AGP","ANP","ANP/AGP",..: 58 66 12 24 59 66 $ quant : num 1.08e+04 7.95 1.80e-02 2.80e+02 1.90e+01 8.44 1.62e+03 $ stream : Factor w/ 24 levels "B","C",..: 4 4 4 21 21 21 4 $ basin : Factor w/ 2 levels "Basin1","Basin2": 1 1 1 1 1 1 1 1 1 2 ... What I need to do is examine the relationships between the parameter "TDS" and other parameters associated with it; e.g., "Cond" and "SO4". I started by subsetting the main data frame (chemdata) tds.basin <- subset(chemdata, param == "TDS", select = c(param, quant, \ basin), na.rm = TRUE, drop = TRUE) cond.basin <- subset(chemdata, param == "Cond", select = c(param, quant, \ basin), na.rm = TRUE, drop = TRUE) However, these left the NA rows in the new data frames. I can produce an xyplot() using tds.basin$quant and cond.basin$quant, but it's obvious there are many points where one or the other have NA values. When I tried a linear regression it failed because of an unequal number of rows in both data frames. What I need to learn are: 1) how to write the subset() to remove the NA rows for each one and 2) how to perform linear regression (and further analyses) on these pairs of data frames.
If you do not offer both the code and the verbatim copy of the error there will be very little that we can do to diagnose your problem.
str(tds.basin) 'data.frame': 2206 obs. of 3 variables: $ param: Factor w/ 66 levels "AGP","ANP","ANP/AGP",..: 58 58 58 58 58 58 58 $ quant: num 10800 530 3838 3658 3756 ... $ basin: Factor w/ 2 levels "Basin1","Basin2": 1 2 2 2 2 2 2 2 2 2 ... str(cond.basin) 'data.frame': 1191 obs. of 3 variables: $ param: Factor w/ 66 levels "AGP","ANP","ANP/AGP",..: 24 24 24 24 24 24 24 $ quant: num 280 3170 4220 3420 3700 ... $ basin: Factor w/ 2 levels "Basin1","Basin2": 1 2 2 2 2 2 2 2 2 2 ... then, m1 <- lm(tds.basin$quant ~ cond.basin$quant) Error in model.frame.default(formula = tds.basin$quant ~ cond.basin$quant, : variable lengths differ (found for 'cond.basin$quant') Rich ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.