On Wed, 13 Dec 2023 09:04:18 +0100 Hilmar Berger via R-devel <r-devel@r-project.org> wrote:
> Still, I feel that default partial matching cripples the functionality > of data.frame for larger tables. Changing the default now would require a long deprecation cycle to give everyone who uses `[.data.frame` and relies on partial matching (whether they know it or not) enough time to adjust. Still, adding an argument feels like a small change: edit https://svn.r-project.org/R/trunk/src/library/base/R/dataframe.R and add a condition before calling pmatch(). Adjust the warning() for named arguments. Don't forget to document the new argument in the man page at https://svn.r-project.org/R/trunk/src/library/base/man/Extract.data.frame.Rd Index: src/library/base/R/dataframe.R =================================================================== --- src/library/base/R/dataframe.R (revision 85664) +++ src/library/base/R/dataframe.R (working copy) @@ -591,14 +591,14 @@ ### These are a little less general than S `[.data.frame` <- - function(x, i, j, drop = if(missing(i)) TRUE else length(cols) == 1) + function(x, i, j, drop = if(missing(i)) TRUE else length(cols) == 1, pmatch.rows = TRUE) { mdrop <- missing(drop) Narg <- nargs() - !mdrop # number of arg from x,i,j that were specified has.j <- !missing(j) - if(!all(names(sys.call()) %in% c("", "drop")) + if(!all(names(sys.call()) %in% c("", "drop", "pmatch.rows")) && !isS4(x)) # at least don't warn for callNextMethod! - warning("named arguments other than 'drop' are discouraged") + warning("named arguments other than 'drop', 'pmatch.rows' are discouraged") if(Narg < 3L) { # list-like indexing or matrix indexing if(!mdrop) warning("'drop' argument will be ignored") @@ -679,7 +679,11 @@ ## for consistency with [, <length-1>] if(is.character(i)) { rows <- attr(xx, "row.names") - i <- pmatch(i, rows, duplicates.ok = TRUE) + i <- if (pmatch.rows) { + pmatch(i, rows, duplicates.ok = TRUE) + } else { + match(i, rows) + } } ## need to figure which col was selected: ## cannot use .subset2 directly as that may @@ -699,7 +703,11 @@ # as this can be expensive. if(is.character(i)) { rows <- attr(xx, "row.names") - i <- pmatch(i, rows, duplicates.ok = TRUE) + i <- if (pmatch.rows) { + pmatch(i, rows, duplicates.ok = TRUE) + } else { + match(i, rows) + } } for(j in seq_along(x)) { xj <- xx[[ sxx[j] ]] Index: src/library/base/man/Extract.data.frame.Rd =================================================================== --- src/library/base/man/Extract.data.frame.Rd (revision 85664) +++ src/library/base/man/Extract.data.frame.Rd (working copy) @@ -15,7 +15,7 @@ Extract or replace subsets of data frames. } \usage{ -\method{[}{data.frame}(x, i, j, drop = ) +\method{[}{data.frame}(x, i, j, drop =, pmatch.rows = TRUE) \method{[}{data.frame}(x, i, j) <- value \method{[[}{data.frame}(x, ..., exact = TRUE) \method{[[}{data.frame}(x, i, j) <- value @@ -45,6 +45,9 @@ column is selected.} \item{exact}{logical: see \code{\link{[}}, and applies to column names.} + + \item{pmatch.rows}{logical: whether to perform partial matching on + row names in case \code{i} is a character vector.} } \details{ Data frames can be indexed in several modes. When \code{[} and system.time({r <- d1[q2,, drop=FALSE, pmatch.rows = FALSE]}) # user system elapsed # 0.478 0.004 0.482 Unfortunately, that would be only the beginning. The prose in the whole ?`[.data.frame` would have to be updated; the new behaviour would have to be tested in tests/**.R. There may be very good reasons why named arguments to `[` other than drop= are discouraged for data.frames. I'm afraid I lack the whole-project view to consider whether such an addition would be safe. -- Best regards, Ivan ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel