if you want "to extract regex matches into a new column in a data.frame" then there are some package functions which do exactly that. three examples are namedCapture::df_match_variable, rematch2::bind_re_match, and tidyr::extract. For a more detailed discussion see my R journal submission (under review) about regular expression packages, https://raw.githubusercontent.com/tdhock/namedCapture-article/master/RJwrapper.pdf Comments/suggestions welcome.
On Thu, Aug 15, 2019 at 12:15 AM Cyclic Group Z_1 via R-devel < r-devel@r-project.org> wrote: > A very common use case for regmatches is to extract regex matches into a > new column in a data.frame (or data.table, etc.) or otherwise use the > extracted strings alongside the input. However, the default behavior is to > drop empty matches, which results in mismatches in column length if > reassignment is done without subsetting. > > For consistency with other R functions and compatibility with this use > case, it would be nice if regmatches did not automatically drop empty > matches and would instead insert an NA_character_ value (similar to > stringr::str_extract). This alternative regmatches could be implemented > through an optional drop argument, a new function, or mentioned in the > documentation (a la resample in ?sample). > > Alternatively, at the moment, there is a non-exported function strextract > in utils which is very similar to stringr::str_extract. It would be great > if this function, once exported, were to include a drop argument to prevent > dropping positions with no matches. > > An example solution (last option): > > strextract <- function(pattern, x, perl = FALSE, useBytes = FALSE, drop = > T) { > m <- regexec(pattern, x, perl=perl, useBytes=useBytes) > result <- regmatches(x, m) > > if(isTRUE(drop)){ > unlist(result) > } else if(isFALSE(drop)) { > unlist({result[lengths(result)==0] <- NA_character_; result}) > } else { > stop("Invalid argument for `drop`") > } > } > > Based on Ricardo Saporta's response to How to prevent regmatches drop non > matches? > > --CG > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel