I submit a couple options for addressing bug 16719: kruskal.test
documentation for formula.
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16719

disallow-character.diff changes the documentation and error message
to indicate that factors are accepted.

allow-character.diff changes the kruskal.test functions to convert
character vectors to factors; documentation is updated accordingly.

I tested the updated functions with the examples in example.R. It is
based on the examples in the bug report.

If there is interest in applying either patch, especially the latter,
I want first to test the change on lots of existing programs that call
kruskal.test, to see if it causes any regressions. Is there a good place
to look for programs that use particular R functions?

I am having trouble building R, so I have so far tested these changes
only by patching revision 74631 (SVN head) and sourcing the resulting
kruskal.test.R in R 3.4.1 on OpenBSD 6.2. I thus have not tested the
R documentation files.
Index: src/library/stats/R/kruskal.test.R
===================================================================
--- src/library/stats/R/kruskal.test.R  (revision 74631)
+++ src/library/stats/R/kruskal.test.R  (working copy)
@@ -46,7 +46,10 @@
         x <- x[OK]
         g <- g[OK]
         if (!all(is.finite(g)))
-            stop("all group levels must be finite")
+            if (is.character(g))
+                stop("all group levels must be finite; convert group to a 
factor")
+            else
+                stop("all group levels must be finite")
         g <- factor(g)
         k <- nlevels(g)
         if (k < 2L)
Index: src/library/stats/man/kruskal.test.Rd
===================================================================
--- src/library/stats/man/kruskal.test.Rd       (revision 74631)
+++ src/library/stats/man/kruskal.test.Rd       (working copy)
@@ -22,11 +22,12 @@
   \item{x}{a numeric vector of data values, or a list of numeric data
     vectors.  Non-numeric elements of a list will be coerced, with a
     warning.}
-  \item{g}{a vector or factor object giving the group for the
+  \item{g}{a numeric vector or factor object giving the group for the
     corresponding elements of \code{x}.  Ignored with a warning if
     \code{x} is a list.}
   \item{formula}{a formula of the form \code{response ~ group} where
-    \code{response} gives the data values and \code{group} a vector or
+    \code{response} gives the data values and \code{group}
+    a numeric vector or
     factor of the corresponding groups.} 
   \item{data}{an optional matrix or data frame (or similar: see
     \code{\link{model.frame}}) containing the variables in the
@@ -52,7 +53,8 @@
   list, use \code{kruskal.test(list(x, ...))}.
 
   Otherwise, \code{x} must be a numeric data vector, and \code{g} must
-  be a vector or factor object of the same length as \code{x} giving
+  be a numeric vector or factor object of the same length as \code{x}
+  giving
   the group for the corresponding elements of \code{x}.
 }
 \value{
Index: src/library/stats/R/kruskal.test.R
===================================================================
--- src/library/stats/R/kruskal.test.R  (revision 74631)
+++ src/library/stats/R/kruskal.test.R  (working copy)
@@ -45,7 +45,7 @@
         OK <- complete.cases(x, g)
         x <- x[OK]
         g <- g[OK]
-        if (!all(is.finite(g)))
+        if (!is.character(g) & !all(is.finite(g)))
             stop("all group levels must be finite")
         g <- factor(g)
         k <- nlevels(g)
Index: src/library/stats/man/kruskal.test.Rd
===================================================================
--- src/library/stats/man/kruskal.test.Rd       (revision 74631)
+++ src/library/stats/man/kruskal.test.Rd       (working copy)
@@ -22,11 +22,13 @@
   \item{x}{a numeric vector of data values, or a list of numeric data
     vectors.  Non-numeric elements of a list will be coerced, with a
     warning.}
-  \item{g}{a vector or factor object giving the group for the
+  \item{g}{a character vector, numeric vector, or factor
+    giving the group for the
     corresponding elements of \code{x}.  Ignored with a warning if
     \code{x} is a list.}
   \item{formula}{a formula of the form \code{response ~ group} where
-    \code{response} gives the data values and \code{group} a vector or
+    \code{response} gives the data values and \code{group} a
+    character vector, numeric vector, or
     factor of the corresponding groups.} 
   \item{data}{an optional matrix or data frame (or similar: see
     \code{\link{model.frame}}) containing the variables in the
@@ -52,7 +54,8 @@
   list, use \code{kruskal.test(list(x, ...))}.
 
   Otherwise, \code{x} must be a numeric data vector, and \code{g} must
-  be a vector or factor object of the same length as \code{x} giving
+  be a numeric vector, character vector, or factor of the same length
+  as \code{x} giving
   the group for the corresponding elements of \code{x}.
 }
 \value{
source('kruskal.test.R')

help(kruskal.test)

data(mtcars)
mtcars$type <- rep(letters[1:2], c(16, 16))
is.vector(mtcars$type) ## TRUE

with(mtcars, kruskal.test(mpg, type))
## Error in kruskal.test.default(c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3,  : 
##   all group levels must be finite

kruskal.test(mpg ~ type, mtcars)
## Error in kruskal.test.default(c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3,  : 
##   all group levels must be finite

mtcars$type <- rep(1:2, c(16, 16))
kruskal.test(mpg ~ type, mtcars) # works

mtcars$type <- factor(mtcars$type)
kruskal.test(mpg ~ type, mtcars) # works

mtcars$type <- rep(c(8, Inf), c(16, 16))
kruskal.test(mpg ~ type, mtcars) # should fail
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to