Re: [R] Working With Variables Having Different Lengths

David Winsemius Fri, 21 Oct 2011 12:44:16 -0700


On Oct 21, 2011, at 3:02 PM, Rich Shepard wrote:

On Fri, 21 Oct 2011, David Winsemius wrote:
First you need to clarify whether "TDS" is the name of a column or a
possible value in a column named "param". This whole painful
multi-question process would be greatly accelerated if you offered
str(chemdata).
 Yes, I did on a different thread, but not on this one.

str(chemdata)
'data.frame':   47244 obs. of  6 variables:
$ site : Factor w/ 143 levels "BC-0.5","BC-1",..: 134 134 134 127127
$ sampdate: Date, format: "2006-12-06" "2006-12-06" ...
$ param : Factor w/ 66 levels "AGP","ANP","ANP/AGP",..: 58 66 1224 59 66$ quant : num 1.08e+04 7.95 1.80e-02 2.80e+02 1.90e+01 8.44 1.62e+03
$ stream  : Factor w/ 24 levels "B","C",..: 4 4 4 21 21 21 4
$ basin : Factor w/ 2 levels "Basin1","Basin2": 1 1 1 1 1 1 1 1 12 ...
What I need to do is examine the relationships between theparameter "TDS"
and other parameters associated with it; e.g., "Cond" and "SO4".

How are we to determine which lines contain information about the"relationships" of param=="TDS" with whatever cases or variable hasvalues of "Cond" and "SO4"? Are you really trying to compare twodisjoint groups on some statistic like the means and std-dev of"quant"? (This would be a job for `aggregate`.)

I started
by subsetting the main data frame (chemdata)
tds.basin <- subset(chemdata, param == "TDS", select = c(param,quant, \
basin), na.rm = TRUE, drop = TRUE)
cond.basin <- subset(chemdata, param == "Cond", select = c(param,quant, \
basin), na.rm = TRUE, drop = TRUE)

So now you have two disjoint subsets. Why should we think they can beanalyzed with regression methods?


However, these left the NA rows in the new data frames.

Not for the "param" column I hope. And the na.rm= arguments should getignored by subset.

I can produce an xyplot() using tds.basin$quant and cond.basin$quant, butit's obvious there are many points where one or the other have NAvalues.When I tried a linear regression it failed because of an unequalnumber of
rows in both data frames.
What I need to learn are: 1) how to write the subset() to removethe NA
rows for each one and 2) how to perform linear regression (and further
analyses) on these pairs of data frames.
If you do not offer both the code and the verbatim copy of theerror there
will be very little that we can do to diagnose your problem.
str(tds.basin)
'data.frame':   2206 obs. of  3 variables:
$ param: Factor w/ 66 levels "AGP","ANP","ANP/AGP",..: 58 58 58 5858 58 58
$ quant: num  10800 530 3838 3658 3756 ...
$ basin: Factor w/ 2 levels "Basin1","Basin2": 1 2 2 2 2 2 2 2 2 2 ...

str(cond.basin)
'data.frame':   1191 obs. of  3 variables:
$ param: Factor w/ 66 levels "AGP","ANP","ANP/AGP",..: 24 24 24 2424 24 24
$ quant: num  280 3170 4220 3420 3700 ...
$ basin: Factor w/ 2 levels "Basin1","Basin2": 1 2 2 2 2 2 2 2 2 2 ...

then,

m1 <- lm(tds.basin$quant ~ cond.basin$quant)
Error in model.frame.default(formula = tds.basin$quant ~ cond.basin$quant,
:
 variable lengths differ (found for 'cond.basin$quant')

In regression call it is almost alwasy better to construct them with adata argument:

m1 <- lm(tds.basin$quant ~ cond.basin$quant)


Rich

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Working With Variables Having Different Lengths

Reply via email to