Re: [Rd] Feature request: non-dropping regmatches/strextract

2019-09-02 Thread Cyclic Group Z_1 via R-devel
That sounds great! Thank you for your consideration. Best, CG __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Feature request: non-dropping regmatches/strextract

2019-09-02 Thread Michael Lawrence via R-devel
After some discussion within R core, we decided that a "nomatch" argument on regmatches() may be a good initial step. We might add a new function later that combines the regexpr() and regmatches() steps. The gregexpr() and regexec() inputs are both lists so it's not clear whether a "nomatch" value

Re: [Rd] Feature request: non-dropping regmatches/strextract

2019-09-02 Thread Cyclic Group Z_1 via R-devel
I think that's a good reason for not including this in regmatches; you're right, its name is somewhat suggestive of yielding matches. Also, that sounds like a great design for strcapture with an atomic prototype. Best, CG __ R-devel@r-project.org mail

Re: [Rd] Feature request: non-dropping regmatches/strextract

2019-08-29 Thread Michael Lawrence via R-devel
Just started thinking about this. The name of regmatches() suggests that it will only extract the matches but not return anything for the non-matches. We might need another function that returns a value for non-matches. Perhaps the value should be the empty string for non-matches and NA for matches

Re: [Rd] Feature request: non-dropping regmatches/strextract

2019-08-29 Thread Cyclic Group Z_1 via R-devel
Thank you! I greatly appreciate your consideration, though of course it is up to you. I think many people switch to stringr/stringi simply because functions in those packages have some consistent design choices, for example, they do not drop empty/missing matches, which facilitates array-based p

Re: [Rd] Feature request: non-dropping regmatches/strextract

2019-08-29 Thread Michael Lawrence via R-devel
I'd be happy to entertain patches or at least more specific suggestions to improve strextract() and strcapture(). I hadn't exported strextract(), because I wasn't quite sure how it should behave. This feedback should be helpful. Thanks, Michael On Thu, Aug 29, 2019 at 2:20 PM Cyclic Group Z_1 via

Re: [Rd] Feature request: non-dropping regmatches/strextract

2019-08-29 Thread Cyclic Group Z_1 via R-devel
Thank you, I am aware that there are packages that can accomplish this. I mentioned stringr::str_extract as a function that does not drop empty matches. I think that the behavior of regmatches(..., regexpr(...)) in base R should permit an option to prevent dropping of empty matches both for sake

Re: [Rd] Feature request: non-dropping regmatches/strextract

2019-08-29 Thread Toby Hocking
if you want "to extract regex matches into a new column in a data.frame" then there are some package functions which do exactly that. three examples are namedCapture::df_match_variable, rematch2::bind_re_match, and tidyr::extract. For a more detailed discussion see my R journal submission (under re

Re: [Rd] Feature request: non-dropping regmatches/strextract

2019-08-16 Thread Cyclic Group Z_1 via R-devel
Using strcapture seems like a great workaround for use cases of this kind, at least in base R. I agree as well that filling with NA for regmatches(..., gregexpr(...)) makes less sense, given the outputs are lists and thus are retained in the list.  Also, I suppose in the meantime the stringr pac

Re: [Rd] Feature request: non-dropping regmatches/strextract

2019-08-15 Thread William Dunlap via R-devel
Using a non-capturing group, "(?:...)" instead of "(...)", simplifies my example a bit > x <- c("Groucho ", "", "Harpo") > strcapture("([[:alpha:]]+)?(?: *<([[:alpha:]. ]+@[[:alpha:]. ]+)>)?", x, proto=data.frame(Name=character(), Address=character(), stringsAsFactors=FALSE)) Name Ad

Re: [Rd] Feature request: non-dropping regmatches/strextract

2019-08-15 Thread William Dunlap via R-devel
I don't care much for regmatches and haven't tried strextract, but I think replacing the character(0) by NA_character_ is almost always inappropriate if the match information comes from gregexpr. I think strcapture() does a pretty good job of what I think you are trying to do. Perhaps adding an a

Re: [Rd] Feature request: non-dropping regmatches/strextract

2019-08-15 Thread Cyclic Group Z_1 via R-devel
I do think keeping the default behavior is desirable for backwards compatibility; my suggestion is not to change default behavior but to add an optional argument that allows a different behavior. Although this can be implemented in a user-defined function, retaining empty matches facilitates pr

Re: [Rd] Feature request: non-dropping regmatches/strextract

2019-08-15 Thread William Dunlap via R-devel
Changing the default behavior of regmatches would break its use with gregexpr, where the number of matches per input element faries, so a zero-length character vector makes more sense than NA_character_. > x <- c("John Doe", "e e cummings", "Juan de la Madrid") > m <- gregexpr("[A-Z]", x) > regmat

[Rd] Feature request: non-dropping regmatches/strextract

2019-08-15 Thread Cyclic Group Z_1 via R-devel
A very common use case for regmatches is to extract regex matches into a new column in a data.frame (or data.table, etc.) or otherwise use the extracted strings alongside the input. However, the default behavior is to drop empty matches, which results in mismatches in column length if reassignme