The new behavior is that it yields NAs when the pattern does not match (like strptime) and for empty captures in a matching pattern it yields the empty string, which is consistent with regmatches().
Michael On Wed, Sep 21, 2016 at 2:21 PM, William Dunlap <wdun...@tibco.com> wrote: > If there are any matches then strcapture can see if the pattern has the same > number of capture expressions as the prototype has columns and give an > error if not. That seems appropriate. > > If there are no matches, then there is no easy way to see if the prototype > is compatible with the pattern, so should strcapture just assume the best > and fill in the prototype with NA's? > > Should there be warnings? This is kind of like strptime(), which silently > gives NA's when the format does not match the text input. > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Wed, Sep 21, 2016 at 2:10 PM, Michael Lawrence > <lawrence.mich...@gene.com> wrote: >> >> Hi Bill, >> >> Thanks, another good suggestion. strcapture() now returns NAs for >> non-matches. It's nice to have someone kicking the tires on that >> function. >> >> Michael >> >> On Wed, Sep 21, 2016 at 12:11 PM, William Dunlap via R-devel >> <r-devel@r-project.org> wrote: >> > Michael, thanks for looking at my first issue with utils::strcapture. >> > >> > Another issue is how it deals with lines that don't match the pattern. >> > Currently it gives an error >> > >> >> strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"), >> > proto=list(Name="", Number=0)) >> > Error in strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"), >> > : >> > number of matches does not always match ncol(proto) >> > >> > First, isn't the 'number of matches' the number of parenthesized >> > subpatterns in the regular expression? I thought that if the entire >> > pattern matches then the subpatterns without matches would be >> > shown as matches at position 0 with length 0. Hence either the >> > pattern is compatible with the prototype or it isn't, it does not depend >> > on the text input. E.g., >> > >> >> regexec("^(([[:alpha:]]+)|([[:digit:]]+))$", c("Twelve", "12", "Z280")) >> > [[1]] >> > [1] 1 1 1 0 >> > attr(,"match.length") >> > [1] 6 6 6 0 >> > attr(,"useBytes") >> > [1] TRUE >> > >> > [[2]] >> > [1] 1 1 0 1 >> > attr(,"match.length") >> > [1] 2 2 0 2 >> > attr(,"useBytes") >> > [1] TRUE >> > >> > [[3]] >> > [1] -1 >> > attr(,"match.length") >> > [1] -1 >> > attr(,"useBytes") >> > [1] TRUE >> > >> > Second, an error message like 'some lines were bad' is not very helpful. >> > Should it put NA's in all the columns of the current output row if the >> > input line didn't match the pattern and perhaps warn the user that there >> > were problems? The user could then look for rows of NA's to see where >> > the >> > problems were. >> > >> > Bill Dunlap >> > TIBCO Software >> > wdunlap tibco.com >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-devel@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel