Hi Bill, This is a bug in regexec() and I will commit a fix.
Thanks for the report, Michael On Tue, Oct 4, 2016 at 1:40 PM, William Dunlap <wdun...@tibco.com> wrote: > I noticed a problem in the strcapture from R-devel (2016-09-27 r71386), when > the text contains a missing value and perl=TRUE. > > { > # NA in text input should map to row of NA's in output, without > warning > r9p <- strcapture(perl = TRUE, "(.).* ([[:digit:]]+)", c("One 1", NA, > "Fifty 50"), data.frame(Initial=factor(), Number=numeric())) > e9p <- structure(list(Initial = structure(c(2L, NA, 1L), .Label = > c("F", "O"), class = "factor"), > Number = c(1, NA, 50)), > row.names = c(NA, -3L), > class = "data.frame") > all.equal(e9p, r9p) > } > #Error in if (any(ind)) { : missing value where TRUE/FALSE needed > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Wed, Sep 21, 2016 at 2:32 PM, Michael Lawrence > <lawrence.mich...@gene.com> wrote: >> >> The new behavior is that it yields NAs when the pattern does not match >> (like strptime) and for empty captures in a matching pattern it yields >> the empty string, which is consistent with regmatches(). >> >> Michael >> >> On Wed, Sep 21, 2016 at 2:21 PM, William Dunlap <wdun...@tibco.com> wrote: >> > If there are any matches then strcapture can see if the pattern has the >> > same >> > number of capture expressions as the prototype has columns and give an >> > error if not. That seems appropriate. >> > >> > If there are no matches, then there is no easy way to see if the >> > prototype >> > is compatible with the pattern, so should strcapture just assume the >> > best >> > and fill in the prototype with NA's? >> > >> > Should there be warnings? This is kind of like strptime(), which >> > silently >> > gives NA's when the format does not match the text input. >> > >> > >> > Bill Dunlap >> > TIBCO Software >> > wdunlap tibco.com >> > >> > On Wed, Sep 21, 2016 at 2:10 PM, Michael Lawrence >> > <lawrence.mich...@gene.com> wrote: >> >> >> >> Hi Bill, >> >> >> >> Thanks, another good suggestion. strcapture() now returns NAs for >> >> non-matches. It's nice to have someone kicking the tires on that >> >> function. >> >> >> >> Michael >> >> >> >> On Wed, Sep 21, 2016 at 12:11 PM, William Dunlap via R-devel >> >> <r-devel@r-project.org> wrote: >> >> > Michael, thanks for looking at my first issue with utils::strcapture. >> >> > >> >> > Another issue is how it deals with lines that don't match the >> >> > pattern. >> >> > Currently it gives an error >> >> > >> >> >> strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"), >> >> > proto=list(Name="", Number=0)) >> >> > Error in strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three >> >> > 3"), >> >> > : >> >> > number of matches does not always match ncol(proto) >> >> > >> >> > First, isn't the 'number of matches' the number of parenthesized >> >> > subpatterns in the regular expression? I thought that if the entire >> >> > pattern matches then the subpatterns without matches would be >> >> > shown as matches at position 0 with length 0. Hence either the >> >> > pattern is compatible with the prototype or it isn't, it does not >> >> > depend >> >> > on the text input. E.g., >> >> > >> >> >> regexec("^(([[:alpha:]]+)|([[:digit:]]+))$", c("Twelve", "12", >> >> >> "Z280")) >> >> > [[1]] >> >> > [1] 1 1 1 0 >> >> > attr(,"match.length") >> >> > [1] 6 6 6 0 >> >> > attr(,"useBytes") >> >> > [1] TRUE >> >> > >> >> > [[2]] >> >> > [1] 1 1 0 1 >> >> > attr(,"match.length") >> >> > [1] 2 2 0 2 >> >> > attr(,"useBytes") >> >> > [1] TRUE >> >> > >> >> > [[3]] >> >> > [1] -1 >> >> > attr(,"match.length") >> >> > [1] -1 >> >> > attr(,"useBytes") >> >> > [1] TRUE >> >> > >> >> > Second, an error message like 'some lines were bad' is not very >> >> > helpful. >> >> > Should it put NA's in all the columns of the current output row if >> >> > the >> >> > input line didn't match the pattern and perhaps warn the user that >> >> > there >> >> > were problems? The user could then look for rows of NA's to see >> >> > where >> >> > the >> >> > problems were. >> >> > >> >> > Bill Dunlap >> >> > TIBCO Software >> >> > wdunlap tibco.com >> >> > >> >> > [[alternative HTML version deleted]] >> >> > >> >> > ______________________________________________ >> >> > R-devel@r-project.org mailing list >> >> > https://stat.ethz.ch/mailman/listinfo/r-devel >> > >> > > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel