On Oct 21, 2011, at 3:02 PM, Rich Shepard wrote:

On Fri, 21 Oct 2011, David Winsemius wrote:

First you need to clarify whether "TDS" is the name of a column or a
possible value in a column named "param". This whole painful
multi-question process would be greatly accelerated if you offered
str(chemdata).

 Yes, I did on a different thread, but not on this one.

str(chemdata)
'data.frame':   47244 obs. of  6 variables:
$ site : Factor w/ 143 levels "BC-0.5","BC-1",..: 134 134 134 127 127
$ sampdate: Date, format: "2006-12-06" "2006-12-06" ...
$ param : Factor w/ 66 levels "AGP","ANP","ANP/AGP",..: 58 66 12 24 59 66 $ quant : num 1.08e+04 7.95 1.80e-02 2.80e+02 1.90e+01 8.44 1.62e +03
$ stream  : Factor w/ 24 levels "B","C",..: 4 4 4 21 21 21 4
$ basin : Factor w/ 2 levels "Basin1","Basin2": 1 1 1 1 1 1 1 1 1 2 ...

What I need to do is examine the relationships between the parameter "TDS"
and other parameters associated with it; e.g., "Cond" and "SO4".

How are we to determine which lines contain information about the "relationships" of param=="TDS" with whatever cases or variable has values of "Cond" and "SO4"? Are you really trying to compare two disjoint groups on some statistic like the means and std-dev of "quant"? (This would be a job for `aggregate`.)

I started
by subsetting the main data frame (chemdata)

tds.basin <- subset(chemdata, param == "TDS", select = c(param, quant, \
basin), na.rm = TRUE, drop = TRUE)

cond.basin <- subset(chemdata, param == "Cond", select = c(param, quant, \
basin), na.rm = TRUE, drop = TRUE)

So now you have two disjoint subsets. Why should we think they can be analyzed with regression methods?


However, these left the NA rows in the new data frames.

Not for the "param" column I hope. And the na.rm= arguments should get ignored by subset.


I can produce an xyplot() using tds.basin$quant and cond.basin $quant, but it's obvious there are many points where one or the other have NA values. When I tried a linear regression it failed because of an unequal number of
rows in both data frames.

What I need to learn are: 1) how to write the subset() to remove the NA
rows for each one and 2) how to perform linear regression (and further
analyses) on these pairs of data frames.

If you do not offer both the code and the verbatim copy of the error there
will be very little that we can do to diagnose your problem.

str(tds.basin)
'data.frame':   2206 obs. of  3 variables:
$ param: Factor w/ 66 levels "AGP","ANP","ANP/AGP",..: 58 58 58 58 58 58 58
$ quant: num  10800 530 3838 3658 3756 ...
$ basin: Factor w/ 2 levels "Basin1","Basin2": 1 2 2 2 2 2 2 2 2 2 ...

str(cond.basin)
'data.frame':   1191 obs. of  3 variables:
$ param: Factor w/ 66 levels "AGP","ANP","ANP/AGP",..: 24 24 24 24 24 24 24
$ quant: num  280 3170 4220 3420 3700 ...
$ basin: Factor w/ 2 levels "Basin1","Basin2": 1 2 2 2 2 2 2 2 2 2 ...

then,

m1 <- lm(tds.basin$quant ~ cond.basin$quant)
Error in model.frame.default(formula = tds.basin$quant ~ cond.basin $quant,
:
 variable lengths differ (found for 'cond.basin$quant')

In regression call it is almost alwasy better to construct them with a data argument:

m1 <- lm(tds.basin$quant ~ cond.basin$quant)


Rich

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to