I noticed a problem in the strcapture from R-devel (2016-09-27 r71386), when the text contains a missing value and perl=TRUE.
{ # NA in text input should map to row of NA's in output, without warning r9p <- strcapture(perl = TRUE, "(.).* ([[:digit:]]+)", c("One 1", NA, "Fifty 50"), data.frame(Initial=factor(), Number=numeric())) e9p <- structure(list(Initial = structure(c(2L, NA, 1L), .Label = c("F", "O"), class = "factor"), Number = c(1, NA, 50)), row.names = c(NA, -3L), class = "data.frame") all.equal(e9p, r9p) } #Error in if (any(ind)) { : missing value where TRUE/FALSE needed Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Sep 21, 2016 at 2:32 PM, Michael Lawrence <lawrence.mich...@gene.com > wrote: > The new behavior is that it yields NAs when the pattern does not match > (like strptime) and for empty captures in a matching pattern it yields > the empty string, which is consistent with regmatches(). > > Michael > > On Wed, Sep 21, 2016 at 2:21 PM, William Dunlap <wdun...@tibco.com> wrote: > > If there are any matches then strcapture can see if the pattern has the > same > > number of capture expressions as the prototype has columns and give an > > error if not. That seems appropriate. > > > > If there are no matches, then there is no easy way to see if the > prototype > > is compatible with the pattern, so should strcapture just assume the best > > and fill in the prototype with NA's? > > > > Should there be warnings? This is kind of like strptime(), which > silently > > gives NA's when the format does not match the text input. > > > > > > Bill Dunlap > > TIBCO Software > > wdunlap tibco.com > > > > On Wed, Sep 21, 2016 at 2:10 PM, Michael Lawrence > > <lawrence.mich...@gene.com> wrote: > >> > >> Hi Bill, > >> > >> Thanks, another good suggestion. strcapture() now returns NAs for > >> non-matches. It's nice to have someone kicking the tires on that > >> function. > >> > >> Michael > >> > >> On Wed, Sep 21, 2016 at 12:11 PM, William Dunlap via R-devel > >> <r-devel@r-project.org> wrote: > >> > Michael, thanks for looking at my first issue with utils::strcapture. > >> > > >> > Another issue is how it deals with lines that don't match the pattern. > >> > Currently it gives an error > >> > > >> >> strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"), > >> > proto=list(Name="", Number=0)) > >> > Error in strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three > 3"), > >> > : > >> > number of matches does not always match ncol(proto) > >> > > >> > First, isn't the 'number of matches' the number of parenthesized > >> > subpatterns in the regular expression? I thought that if the entire > >> > pattern matches then the subpatterns without matches would be > >> > shown as matches at position 0 with length 0. Hence either the > >> > pattern is compatible with the prototype or it isn't, it does not > depend > >> > on the text input. E.g., > >> > > >> >> regexec("^(([[:alpha:]]+)|([[:digit:]]+))$", c("Twelve", "12", > "Z280")) > >> > [[1]] > >> > [1] 1 1 1 0 > >> > attr(,"match.length") > >> > [1] 6 6 6 0 > >> > attr(,"useBytes") > >> > [1] TRUE > >> > > >> > [[2]] > >> > [1] 1 1 0 1 > >> > attr(,"match.length") > >> > [1] 2 2 0 2 > >> > attr(,"useBytes") > >> > [1] TRUE > >> > > >> > [[3]] > >> > [1] -1 > >> > attr(,"match.length") > >> > [1] -1 > >> > attr(,"useBytes") > >> > [1] TRUE > >> > > >> > Second, an error message like 'some lines were bad' is not very > helpful. > >> > Should it put NA's in all the columns of the current output row if the > >> > input line didn't match the pattern and perhaps warn the user that > there > >> > were problems? The user could then look for rows of NA's to see where > >> > the > >> > problems were. > >> > > >> > Bill Dunlap > >> > TIBCO Software > >> > wdunlap tibco.com > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > ______________________________________________ > >> > R-devel@r-project.org mailing list > >> > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel