Hi Petr, a couple of comments inserted below.
Petr PIKAL wrote:
Hi
r-help-boun...@r-project.org napsal dne 04.02.2010 11:31:51:
Petr PIKAL wrote:
Hi
so do you think I shall fire a bug announcement? I think I rather wait
to
see if there is some reaction from others. Maybe, there is some reason
behind such behaviour. Those simple statistics tend to behave
differently
when operating on data.frames so median is not such a huge surprise.
see
sd(df1), var(df1), mean(df1), max(df1), min(df1), range(df1)
Produced results are usually clearly documented, however for novice it
is
rather mysterious why using those functions on vector produce easily
understandable results but using them on data.frame (which is most
common
structure of data) is far from consistent and intuitive.
But I agree with you that mean and median in best case shall give
similar
results regarding results structure.
Regards
Petr
Well, I don't think that it's a bug since the documentation
for median() does not indicate that median should work for
dataframes, whereas for mean() it clearly says that a method
exists. methods('mean') and methods('median') as well as
mean.default(df1) are informative.
It depends informative for whom. Here is a snippet from median help page
This is a generic function for which methods can be written. However, the
default method makes use of sort and mean, both of which are generic, and
so the default method will work for most classes
^^^^^^^^^
I agree that this is at the very least misleading. The default method
certainly does not work for data.frames and I wouldn't consider those
to be an unusual class. Still, at this point, I think we're talking
more about a wishlist item than a bug.
I must admit, I've never run across this situation. Good of you
to spot it.
-Peter Ehlers
If you consider data.frame an unusual class I could accept your point but
if help page tells me that a function works for most classes I would not
expect that data.frame class shall be avoided. Especially if work around
is such simple (for experienced user). As I said, if I encountered this in
real world I can make it easily work with *apply.
I tried to give my audience experience that matrix is different from
data.frame with respect of such simple statistic functions. But how do you
explain, that using mean on matrix produces one number but using it on
data.frame it produces mean separately for each column. I wanted to show
that it is similar for median but being such candid moron I luckily tried
it before I presented it. :-)
It seems to me to be a simple fix so I wonder what I'm
missing. Paraphrasing mean.data.frame:
median.data.frame <- function(x, ...) sapply(x, median, ...)
I think that it would be desirable to have similar behaviour
for both functions or at least a warning if median.default
is incorrectly applied to a data.frame object.
Agreed. For the benefit of novices I would vote for changing behaviour for
data.frames to get mean-like behaviour.
Regards
Petr
-Peter Ehlers
r-help-boun...@r-project.org napsal dne 04.02.2010 10:28:16:
Well, I get the same as Petr with R version 2.10.0 (2009-10-26)
on Linux.
To me, this suggests that median is broken! Any user would,
a priori, expect that median() should operate in exactly
the same way as mean(). To extend Petr's example:
mat <- matrix(1:32, 4,8)
df1 <- data.frame(mat)
mean(df1)
# X1 X2 X3 X4 X5 X6 X7 X8
# 2.5 6.5 10.5 14.5 18.5 22.5 26.5 30.5
median(df1)
# [1] 14.5 18.5
so (as in Petr's original example, but more clearly) median()
returns the medians of the two "central" columns X4 and X5 of df1.
But that is with an even number of columns. Now look at what
happens with an odd number:
mat <- matrix(1:28, 4,7)
df1 <- data.frame(mat)
mean(df1)
# X1 X2 X3 X4 X5 X6 X7
# 2.5 6.5 10.5 14.5 18.5 22.5 26.5
median(df1)
# structure(c("13", "14", "15", "16"), class = "AsIs")
# 1 13
# 2 14
# 3 15
# 4 16
Wow!!!!!!!!!!
This does suggest a tie-in with Petr's observation about "As.Is",
and there is no doubt at all that the above result is rubbish.
It is certainly not what a user would expect, and in the context
of Petr's intention to present R lessons to a class, I could
foresee students turning their backs on R if they came up with
such a result in their early encounters!
Ted.
On 04-Feb-10 08:59:59, Mario Valle wrote:
Linux 2.9.0 gives:
median(df1)
[1] 34
Ever stranger...
mario
Petr PIKAL wrote:
During some experimentation in preparing R lessons I encountered
this
behaviour which I can not explain fully
mat <- matrix(1:16, 4,4)
df1 <- data.frame(mat)
mean(df1)
X1 X2 X3 X4
2.5 6.5 10.5 14.5
Expected, documented
median(df1)
[1] 6.5 10.5
Rather weird, AFAIK there shall not be an issue with data frame at
least I
did not find any in help page. I tracked it down probably to an
As.Is
operation with object and subsequent sorting in median.default.
I know other (*apply) ways how to compute median for data frames so
I
just
would like to hear an opinion about this behaviour from more
experienced
people.
Thank you
Best regards
Petr
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Ing. Mario Valle
Data Analysis and Visualization Group |
http://www.cscs.ch/~mvalle
Swiss National Supercomputing Centre (CSCS) | Tel: +41 (91)
610.82.60
v. Cantonale Galleria 2, 6928 Manno, Switzerland | Fax: +41 (91)
610.82.82
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 04-Feb-10 Time: 09:28:13
------------------------------ XFMail ------------------------------
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Peter Ehlers
University of Calgary
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Peter Ehlers
University of Calgary
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.