On Oct 21, 2011, at 3:02 PM, Rich Shepard wrote:
On Fri, 21 Oct 2011, David Winsemius wrote:
First you need to clarify whether "TDS" is the name of a column or a
possible value in a column named "param". This whole painful
multi-question process would be greatly accelerated if you offered
str(chemdata).
Yes, I did on a different thread, but not on this one.
str(chemdata)
'data.frame': 47244 obs. of 6 variables:
$ site : Factor w/ 143 levels "BC-0.5","BC-1",..: 134 134 134 127
127
$ sampdate: Date, format: "2006-12-06" "2006-12-06" ...
$ param : Factor w/ 66 levels "AGP","ANP","ANP/AGP",..: 58 66 12
24 59 66
$ quant : num 1.08e+04 7.95 1.80e-02 2.80e+02 1.90e+01 8.44 1.62e
+03
$ stream : Factor w/ 24 levels "B","C",..: 4 4 4 21 21 21 4
$ basin : Factor w/ 2 levels "Basin1","Basin2": 1 1 1 1 1 1 1 1 1
2 ...
What I need to do is examine the relationships between the
parameter "TDS"
and other parameters associated with it; e.g., "Cond" and "SO4".
How are we to determine which lines contain information about the
"relationships" of param=="TDS" with whatever cases or variable has
values of "Cond" and "SO4"? Are you really trying to compare two
disjoint groups on some statistic like the means and std-dev of
"quant"? (This would be a job for `aggregate`.)
I started
by subsetting the main data frame (chemdata)
tds.basin <- subset(chemdata, param == "TDS", select = c(param,
quant, \
basin), na.rm = TRUE, drop = TRUE)
cond.basin <- subset(chemdata, param == "Cond", select = c(param,
quant, \
basin), na.rm = TRUE, drop = TRUE)
So now you have two disjoint subsets. Why should we think they can be
analyzed with regression methods?
However, these left the NA rows in the new data frames.
Not for the "param" column I hope. And the na.rm= arguments should get
ignored by subset.
I can produce an xyplot() using tds.basin$quant and cond.basin
$quant, but
it's obvious there are many points where one or the other have NA
values.
When I tried a linear regression it failed because of an unequal
number of
rows in both data frames.
What I need to learn are: 1) how to write the subset() to remove
the NA
rows for each one and 2) how to perform linear regression (and further
analyses) on these pairs of data frames.
If you do not offer both the code and the verbatim copy of the
error there
will be very little that we can do to diagnose your problem.
str(tds.basin)
'data.frame': 2206 obs. of 3 variables:
$ param: Factor w/ 66 levels "AGP","ANP","ANP/AGP",..: 58 58 58 58
58 58 58
$ quant: num 10800 530 3838 3658 3756 ...
$ basin: Factor w/ 2 levels "Basin1","Basin2": 1 2 2 2 2 2 2 2 2 2 ...
str(cond.basin)
'data.frame': 1191 obs. of 3 variables:
$ param: Factor w/ 66 levels "AGP","ANP","ANP/AGP",..: 24 24 24 24
24 24 24
$ quant: num 280 3170 4220 3420 3700 ...
$ basin: Factor w/ 2 levels "Basin1","Basin2": 1 2 2 2 2 2 2 2 2 2 ...
then,
m1 <- lm(tds.basin$quant ~ cond.basin$quant)
Error in model.frame.default(formula = tds.basin$quant ~ cond.basin
$quant,
:
variable lengths differ (found for 'cond.basin$quant')
In regression call it is almost alwasy better to construct them with a
data argument:
m1 <- lm(tds.basin$quant ~ cond.basin$quant)
Rich
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.