Hi Bill, Thanks, another good suggestion. strcapture() now returns NAs for non-matches. It's nice to have someone kicking the tires on that function.
Michael On Wed, Sep 21, 2016 at 12:11 PM, William Dunlap via R-devel <r-devel@r-project.org> wrote: > Michael, thanks for looking at my first issue with utils::strcapture. > > Another issue is how it deals with lines that don't match the pattern. > Currently it gives an error > >> strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"), > proto=list(Name="", Number=0)) > Error in strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"), : > number of matches does not always match ncol(proto) > > First, isn't the 'number of matches' the number of parenthesized > subpatterns in the regular expression? I thought that if the entire > pattern matches then the subpatterns without matches would be > shown as matches at position 0 with length 0. Hence either the > pattern is compatible with the prototype or it isn't, it does not depend > on the text input. E.g., > >> regexec("^(([[:alpha:]]+)|([[:digit:]]+))$", c("Twelve", "12", "Z280")) > [[1]] > [1] 1 1 1 0 > attr(,"match.length") > [1] 6 6 6 0 > attr(,"useBytes") > [1] TRUE > > [[2]] > [1] 1 1 0 1 > attr(,"match.length") > [1] 2 2 0 2 > attr(,"useBytes") > [1] TRUE > > [[3]] > [1] -1 > attr(,"match.length") > [1] -1 > attr(,"useBytes") > [1] TRUE > > Second, an error message like 'some lines were bad' is not very helpful. > Should it put NA's in all the columns of the current output row if the > input line didn't match the pattern and perhaps warn the user that there > were problems? The user could then look for rows of NA's to see where the > problems were. > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel