Michael, thanks for looking at my first issue with utils::strcapture. Another issue is how it deals with lines that don't match the pattern. Currently it gives an error
> strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"), proto=list(Name="", Number=0)) Error in strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"), : number of matches does not always match ncol(proto) First, isn't the 'number of matches' the number of parenthesized subpatterns in the regular expression? I thought that if the entire pattern matches then the subpatterns without matches would be shown as matches at position 0 with length 0. Hence either the pattern is compatible with the prototype or it isn't, it does not depend on the text input. E.g., > regexec("^(([[:alpha:]]+)|([[:digit:]]+))$", c("Twelve", "12", "Z280")) [[1]] [1] 1 1 1 0 attr(,"match.length") [1] 6 6 6 0 attr(,"useBytes") [1] TRUE [[2]] [1] 1 1 0 1 attr(,"match.length") [1] 2 2 0 2 attr(,"useBytes") [1] TRUE [[3]] [1] -1 attr(,"match.length") [1] -1 attr(,"useBytes") [1] TRUE Second, an error message like 'some lines were bad' is not very helpful. Should it put NA's in all the columns of the current output row if the input line didn't match the pattern and perhaps warn the user that there were problems? The user could then look for rows of NA's to see where the problems were. Bill Dunlap TIBCO Software wdunlap tibco.com [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel