That sounds great! Thank you for your consideration.
Best,
CG
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
After some discussion within R core, we decided that a "nomatch"
argument on regmatches() may be a good initial step. We might add a
new function later that combines the regexpr() and regmatches() steps.
The gregexpr() and regexec() inputs are both lists so it's not clear
whether a "nomatch" value
I think that's a good reason for not including this in regmatches; you're
right, its name is somewhat suggestive of yielding matches. Also, that sounds
like a great design for strcapture with an atomic prototype.
Best,
CG
__
R-devel@r-project.org mail
Just started thinking about this. The name of regmatches() suggests
that it will only extract the matches but not return anything for the
non-matches. We might need another function that returns a value for
non-matches. Perhaps the value should be the empty string for
non-matches and NA for matches
Thank you! I greatly appreciate your consideration, though of course it is up
to you. I think many people switch to stringr/stringi simply because functions
in those packages have some consistent design choices, for example, they do not
drop empty/missing matches, which facilitates array-based p
I'd be happy to entertain patches or at least more specific
suggestions to improve strextract() and strcapture(). I hadn't
exported strextract(), because I wasn't quite sure how it should
behave. This feedback should be helpful.
Thanks,
Michael
On Thu, Aug 29, 2019 at 2:20 PM Cyclic Group Z_1 via
Thank you, I am aware that there are packages that can accomplish this. I
mentioned stringr::str_extract as a function that does not drop empty matches.
I think that the behavior of regmatches(..., regexpr(...)) in base R should
permit an option to prevent dropping of empty matches both for sake
if you want "to extract regex matches into a new column in a data.frame"
then there are some package functions which do exactly that. three examples
are namedCapture::df_match_variable, rematch2::bind_re_match, and
tidyr::extract. For a more detailed discussion see my R journal submission
(under re
Using strcapture seems like a great workaround for use cases of this kind, at
least in base R. I agree as well that filling with NA for regmatches(...,
gregexpr(...)) makes less sense, given the outputs are lists and thus are
retained in the list. Also, I suppose in the meantime the stringr pac
Using a non-capturing group, "(?:...)" instead of "(...)", simplifies my
example a bit
> x <- c("Groucho ", "", "Harpo")
> strcapture("([[:alpha:]]+)?(?: *<([[:alpha:]. ]+@[[:alpha:]. ]+)>)?", x,
proto=data.frame(Name=character(), Address=character(),
stringsAsFactors=FALSE))
Name Ad
I don't care much for regmatches and haven't tried strextract, but I think
replacing the character(0) by NA_character_ is almost always inappropriate
if the match information comes from gregexpr.
I think strcapture() does a pretty good job of what I think you are trying
to do. Perhaps adding an a
I do think keeping the default behavior is desirable for backwards
compatibility; my suggestion is not to change default behavior but to add an
optional argument that allows a different behavior. Although this can be
implemented in a user-defined function, retaining empty matches facilitates
pr
Changing the default behavior of regmatches would break its use with
gregexpr, where
the number of matches per input element faries, so a zero-length character
vector
makes more sense than NA_character_.
> x <- c("John Doe", "e e cummings", "Juan de la Madrid")
> m <- gregexpr("[A-Z]", x)
> regmat
A very common use case for regmatches is to extract regex matches into a new
column in a data.frame (or data.table, etc.) or otherwise use the extracted
strings alongside the input. However, the default behavior is to drop empty
matches, which results in mismatches in column length if reassignme
14 matches
Mail list logo