Any insight into the behavior of "by" in the following case would be appreciated. There is a note in the help details for "by" about documenting behavior since v2.7 but I don't entirely understand what it is saying. I'm using R2.7.2 Windows. I'm interested if the following behavior was a change or whether it has always worked this way. I looked at RSiteSearch and read through version changes but found nothing.

Take a dataframe as follows:
> samples
  Region.Label  Area Sample.Label Effort Label
1             1 10000            1    100    11
2             1 10000            2    100    12
3             1 10000            3    100    13
4             1 10000            4    100    14
5             1 10000            5    100    15
6             1 10000            6    100    16
7             1 10000            7    100    17
8             1 10000            8    100    18
9             1 10000            9    100    19
10            1 10000           10    100   110

Use "by" to tally number of entries with particular values of Region.Label (in this case there is only 1 value of Region.Label)

by(samples$Effort,samples$Region.Label,length)
INDICES: 1
[1] 1

I expected to get 10 instead of 1. I debugged into by.data.frame and I can see that it used drop=FALSE, so length returned the number of columns which is 1. But if I do any of the following, I get the 10 I expect.

> by(rep(1,10),samples$Region.Label,length)
samples$Region.Label: 1
[1] 10
by(samples$Label,samples$Region.Label,length)
samples$Region.Label: 1
[1] 10

Also if I use "tapply" with samples$Effort instead of "by" I get the 10 I expect.

tapply(samples$Effort,samples$Region.Label,length)
1
10

I do not understand why I'm getting these differences but I can see that I'm going to use tapply from now on.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to