Peter,

Thanks for the reply.

If that were the case, then should not the following be allowed to work with 
ordered factors?

> median(factor(c("1", "2", "3"), ordered = TRUE))
Error in median.default(factor(c("1", "2", "3"), ordered = TRUE)) : 
  need numeric data

At least on the surface, if you can lexically order a character vector:

> median(c("red", "blue", "green"))
[1] "green"

you can also order a factor, or ordered factor, and if the number of elements 
is odd, return a median value.

Regards,

Marc


> On Jan 9, 2020, at 10:46 AM, peter dalgaard <pda...@gmail.com> wrote:
> 
> I think median() behaves as designed: As long as the argument can be ordered, 
> the "middle observation" makes sense, except when the middle falls between 
> two categories, and you can't define and average of the two candidates for a 
> median.
> 
> The "sick man" would seem to be var(). Notice that it is also inconsistent 
> with cov():
> 
>> cov(c("1","2","3","4"),c("1","2","3","4") )
> Error in cov(c("1", "2", "3", "4"), c("1", "2", "3", "4")) : 
>  is.numeric(x) || is.logical(x) is not TRUE
>> var(c("1","2","3","4"),c("1","2","3","4") )
> [1] 1.666667
> 
> -pd
> 
> 
>> On 9 Jan 2020, at 14:49 , Marc Schwartz via R-devel <r-devel@r-project.org> 
>> wrote:
>> 
>> Jean-Luc,
>> 
>> Please keep the communications on the list, for the benefit of others, now 
>> and in the future, via the list archive. I am adding r-devel back here.
>> 
>> I can't speak to the rationale in some of these cases. As I noted, it may be 
>> (is likely) due to differing authors over time, and there may have been 
>> relevant use cases at the time that the code was written, resulting in the 
>> various checks. Presumably, the additional checks were not incorporated into 
>> the other functions to enforce a level of consistency.
>> 
>> We will need to wait for someone from R Core to comment.
>> 
>> Regards,
>> 
>> Marc
>> 
>>> On Jan 9, 2020, at 8:34 AM, Lipatz Jean-Luc <jean-luc.lip...@insee.fr> 
>>> wrote:
>>> 
>>> Ok, inconstencies.
>>> 
>>> The last test you wrote is a bit strange. I agree that it is useful to warn 
>>> about a computation that have no sense in the case of factors. But why 
>>> testing data;frames? If you go that way using random structures, you can 
>>> also try :
>>> 
>>>> median(list(1,2),list(3,4),list(4,5))
>>> Error in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) 
>>> return(x[FALSE][NA]) : 
>>> l'argument n'est pas interprétable comme une valeur logique
>>> De plus : Warning message:
>>> In if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) 
>>> return(x[FALSE][NA]) :
>>> la condition a une longueur > 1 et seul le premier élément est utilisé
>>> 
>>> giving a message which, despite of his length, doesn't really explain the 
>>> reason of the error.
>>> 
>>> Why not a test on arguments like?
>>> if (!is.numeric(x)) 
>>>        stop("need numeric data")
>>> 
>>> 
>>> -----Message d'origine-----
>>> De : Marc Schwartz <marc_schwa...@me.com> 
>>> Envoyé : jeudi 9 janvier 2020 14:19
>>> À : Lipatz Jean-Luc <jean-luc.lip...@insee.fr>
>>> Cc : R-Devel <r-devel@r-project.org>
>>> Objet : Re: [Rd] mean
>>> 
>>> 
>>>> On Jan 9, 2020, at 7:40 AM, Lipatz Jean-Luc <jean-luc.lip...@insee.fr> 
>>>> wrote:
>>>> 
>>>> Hello,
>>>> 
>>>> Is there a reason for the following behaviour?
>>>>> mean(c("1","2","3"))
>>>> [1] NA
>>>> Warning message:
>>>> In mean.default(c("1", "2", "3")) :
>>>> l'argument n'est ni numérique, ni logique : renvoi de NA
>>>> 
>>>> But:
>>>>> var(c("1","2","3"))
>>>> [1] 1
>>>> 
>>>> And also:
>>>>> median(c("1","2","3"))
>>>> [1] "2"
>>>> 
>>>> But:
>>>>> quantile(c("1","2","3"),p=.5)
>>>> Error in (1 - h) * qs[i] : 
>>>> argument non numérique pour un opérateur binaire
>>>> 
>>>> It sounds like a lack of symetry. 
>>>> Best regards.
>>>> 
>>>> 
>>>> Jean-Luc LIPATZ
>>>> Insee - Direction générale
>>>> Responsable de la coordination sur le développement de R et la mise en 
>>>> oeuvre d'alternatives à SAS
>>> 
>>> 
>>> Hi,
>>> 
>>> It would appear, whether by design or just inconsistent implementations, 
>>> perhaps by different authors over time, that the checks for whether or not 
>>> the input vector is numeric differ across the functions.
>>> 
>>> A further inconsistency is for median(), where:
>>> 
>>>> median(c("1", "2", "3", "4"))
>>> [1] NA
>>> Warning message:
>>> In mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) :
>>> argument is not numeric or logical: returning NA
>>> 
>>> as a result of there being 4 elements, rather than 3, and the internal 
>>> checks in the code, where in the case of the input vector having an even 
>>> number of elements, mean() is used:
>>> 
>>>  if (n%%2L == 1L) 
>>>      sort(x, partial = half)[half]
>>>  else mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])
>>> 
>>> 
>>> Similarly:
>>> 
>>>> median(factor(c("1", "2", "3")))
>>> Error in median.default(factor(c("1", "2", "3"))) : need numeric data
>>> 
>>> because the input vector is a factor, rather than character, and the 
>>> initial check has:
>>> 
>>> if (is.factor(x) || is.data.frame(x)) 
>>>        stop("need numeric data")
>>> 
>>> 
>>> Regards,
>>> 
>>> Marc Schwartz
>>> 
>>> 
>> 
>> ______________________________________________
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> -- 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd....@cbs.dk  Priv: pda...@gmail.com
> 
> 
> 
> 
> 
> 
> 
> 
> 

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to