Peter,
Thanks for the reply.
If that were the case, then should not the following be allowed to work with
ordered factors?
> median(factor(c("1", "2", "3"), ordered = TRUE))
Error in median.default(factor(c("1", "2", "3"), ordered = TRUE)) :
need numeric data
At least on the surface, if you can lexically order a character vector:
> median(c("red", "blue", "green"))
[1] "green"
you can also order a factor, or ordered factor, and if the number of elements
is odd, return a median value.
Regards,
Marc
> On Jan 9, 2020, at 10:46 AM, peter dalgaard <[email protected]> wrote:
>
> I think median() behaves as designed: As long as the argument can be ordered,
> the "middle observation" makes sense, except when the middle falls between
> two categories, and you can't define and average of the two candidates for a
> median.
>
> The "sick man" would seem to be var(). Notice that it is also inconsistent
> with cov():
>
>> cov(c("1","2","3","4"),c("1","2","3","4") )
> Error in cov(c("1", "2", "3", "4"), c("1", "2", "3", "4")) :
> is.numeric(x) || is.logical(x) is not TRUE
>> var(c("1","2","3","4"),c("1","2","3","4") )
> [1] 1.666667
>
> -pd
>
>
>> On 9 Jan 2020, at 14:49 , Marc Schwartz via R-devel <[email protected]>
>> wrote:
>>
>> Jean-Luc,
>>
>> Please keep the communications on the list, for the benefit of others, now
>> and in the future, via the list archive. I am adding r-devel back here.
>>
>> I can't speak to the rationale in some of these cases. As I noted, it may be
>> (is likely) due to differing authors over time, and there may have been
>> relevant use cases at the time that the code was written, resulting in the
>> various checks. Presumably, the additional checks were not incorporated into
>> the other functions to enforce a level of consistency.
>>
>> We will need to wait for someone from R Core to comment.
>>
>> Regards,
>>
>> Marc
>>
>>> On Jan 9, 2020, at 8:34 AM, Lipatz Jean-Luc <[email protected]>
>>> wrote:
>>>
>>> Ok, inconstencies.
>>>
>>> The last test you wrote is a bit strange. I agree that it is useful to warn
>>> about a computation that have no sense in the case of factors. But why
>>> testing data;frames? If you go that way using random structures, you can
>>> also try :
>>>
>>>> median(list(1,2),list(3,4),list(4,5))
>>> Error in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x)))
>>> return(x[FALSE][NA]) :
>>> l'argument n'est pas interprétable comme une valeur logique
>>> De plus : Warning message:
>>> In if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x)))
>>> return(x[FALSE][NA]) :
>>> la condition a une longueur > 1 et seul le premier élément est utilisé
>>>
>>> giving a message which, despite of his length, doesn't really explain the
>>> reason of the error.
>>>
>>> Why not a test on arguments like?
>>> if (!is.numeric(x))
>>> stop("need numeric data")
>>>
>>>
>>> -----Message d'origine-----
>>> De : Marc Schwartz <[email protected]>
>>> Envoyé : jeudi 9 janvier 2020 14:19
>>> À : Lipatz Jean-Luc <[email protected]>
>>> Cc : R-Devel <[email protected]>
>>> Objet : Re: [Rd] mean
>>>
>>>
>>>> On Jan 9, 2020, at 7:40 AM, Lipatz Jean-Luc <[email protected]>
>>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> Is there a reason for the following behaviour?
>>>>> mean(c("1","2","3"))
>>>> [1] NA
>>>> Warning message:
>>>> In mean.default(c("1", "2", "3")) :
>>>> l'argument n'est ni numérique, ni logique : renvoi de NA
>>>>
>>>> But:
>>>>> var(c("1","2","3"))
>>>> [1] 1
>>>>
>>>> And also:
>>>>> median(c("1","2","3"))
>>>> [1] "2"
>>>>
>>>> But:
>>>>> quantile(c("1","2","3"),p=.5)
>>>> Error in (1 - h) * qs[i] :
>>>> argument non numérique pour un opérateur binaire
>>>>
>>>> It sounds like a lack of symetry.
>>>> Best regards.
>>>>
>>>>
>>>> Jean-Luc LIPATZ
>>>> Insee - Direction générale
>>>> Responsable de la coordination sur le développement de R et la mise en
>>>> oeuvre d'alternatives à SAS
>>>
>>>
>>> Hi,
>>>
>>> It would appear, whether by design or just inconsistent implementations,
>>> perhaps by different authors over time, that the checks for whether or not
>>> the input vector is numeric differ across the functions.
>>>
>>> A further inconsistency is for median(), where:
>>>
>>>> median(c("1", "2", "3", "4"))
>>> [1] NA
>>> Warning message:
>>> In mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) :
>>> argument is not numeric or logical: returning NA
>>>
>>> as a result of there being 4 elements, rather than 3, and the internal
>>> checks in the code, where in the case of the input vector having an even
>>> number of elements, mean() is used:
>>>
>>> if (n%%2L == 1L)
>>> sort(x, partial = half)[half]
>>> else mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])
>>>
>>>
>>> Similarly:
>>>
>>>> median(factor(c("1", "2", "3")))
>>> Error in median.default(factor(c("1", "2", "3"))) : need numeric data
>>>
>>> because the input vector is a factor, rather than character, and the
>>> initial check has:
>>>
>>> if (is.factor(x) || is.data.frame(x))
>>> stop("need numeric data")
>>>
>>>
>>> Regards,
>>>
>>> Marc Schwartz
>>>
>>>
>>
>> ______________________________________________
>> [email protected] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: [email protected] Priv: [email protected]
>
>
>
>
>
>
>
>
>
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel