Changing the default behavior of regmatches would break its use with gregexpr, where the number of matches per input element faries, so a zero-length character vector makes more sense than NA_character_.
> x <- c("John Doe", "e e cummings", "Juan de la Madrid") > m <- gregexpr("[A-Z]", x) > regmatches(x,m) [[1]] [1] "J" "D" [[2]] character(0) [[3]] [1] "J" "M" > vapply(.Last.value, function(x)paste(paste0(x, "."),collapse=""), "") [1] "J.D." "." "J.M." (We don't want e e cummings initials mapped to "NA.") Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Aug 15, 2019 at 12:15 AM Cyclic Group Z_1 via R-devel < r-devel@r-project.org> wrote: > A very common use case for regmatches is to extract regex matches into a > new column in a data.frame (or data.table, etc.) or otherwise use the > extracted strings alongside the input. However, the default behavior is to > drop empty matches, which results in mismatches in column length if > reassignment is done without subsetting. > > For consistency with other R functions and compatibility with this use > case, it would be nice if regmatches did not automatically drop empty > matches and would instead insert an NA_character_ value (similar to > stringr::str_extract). This alternative regmatches could be implemented > through an optional drop argument, a new function, or mentioned in the > documentation (a la resample in ?sample). > > Alternatively, at the moment, there is a non-exported function strextract > in utils which is very similar to stringr::str_extract. It would be great > if this function, once exported, were to include a drop argument to prevent > dropping positions with no matches. > > An example solution (last option): > > strextract <- function(pattern, x, perl = FALSE, useBytes = FALSE, drop = > T) { > m <- regexec(pattern, x, perl=perl, useBytes=useBytes) > result <- regmatches(x, m) > > if(isTRUE(drop)){ > unlist(result) > } else if(isFALSE(drop)) { > unlist({result[lengths(result)==0] <- NA_character_; result}) > } else { > stop("Invalid argument for `drop`") > } > } > > Based on Ricardo Saporta's response to How to prevent regmatches drop non > matches? > > --CG > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel