On 4/4/06, Bill Dunlap <[EMAIL PROTECTED]> wrote: > On Tue, 4 Apr 2006, Gabor Grothendieck wrote: > > > gsubfn in package gsubfn can do this. See the examples > > in ?gsubfn > > Thanks. gsubfn looks useful, but may be overkill > for this, and it isn't vectorized. To do what
gsubfn is vectorized. Its just that you are not using the output of gsubfn in this case. > strsplit(keep=T) would do I think you need to do something like: > > > findMatches<-function(strings, pattern){ > lapply(strings, function(string){ > v <- character() > gsubfn(number.pattern, function(x,...)v<<-c(v,x), string) > v}) > } > > number.pattern <- > "[-+]?(([0-9]+(\\.[0-9]*)?)|(\\.[0-9]+))([eE][+-]?[0-9]+)?" > > findMatches(c("12;34:56,89,,12", "1.2, .4, 1., 1e3"), number.pattern) > [[1]] > [1] "12" "34" "56" "89" "12" > > [[2]] > [1] "1.2" ".4" "1." "1e3" > > Is this worth encapsulating in a standard R function? I will likely add a wrapper to the gsubfn package for this. > If so, is doing via an extra argument to strsplit() > a reasonable way to do it? My current thought was to create a strapply function to do that. > > > strsplit(c("12;34:56,89,,12", "1.2, .4, 1., 1e3"), number.pattern, keep=T) > [[1]]: > [1] "12" "34" "56" "89" "12" > > [[2]]: > [1] "1.2" ".4" "1." "1e3" > > > > On 4/4/06, Bill Dunlap <[EMAIL PROTECTED]> wrote: > > > strsplit() is a convenient way to get a > > > list of items from a string when you > > > have a regular expression for what is not > > > an item. E.g., > > > > > > > strsplit("1.2, 34, 1.7e-2", split="[ ,] *") > > > [[1]]: > > > [1] "1.2" "34" "1.7e-2" > > > > > > However, sometimes is it more convenient to > > > give a pattern for the items you do want. > > > E.g., suppose you want to pull all the numbers > > > out of a string which contains a mix of numbers > > > and words. Making a pattern for what a > > > number is simpler than making a pattern > > > for what may come between the number. > > > > number.pattern <- > > > "[-+]?(([0-9]+(\\.[0-9]*)?)|(\\.[0-9]+))([eE][+-]?[0-9]+)?" > > > > > > I propose adding a keep=FALSE argument to > > > strsplit() to do this. If keep is FALSE, > > > then the split argument matches the stuff to > > > omit from the output; if keep is TRUE then > > > split matches the stuff to put into the > > > output. Then we could do the following to > > > get a list of all the numbers in a string > > > (done in a version of strsplit() I'm working on > > > for S-PLUS): > > > > > > > strsplit("1.2, 34, 1.7e-2", split=number.pattern,keep=TRUE) > > > [[1]]: > > > [1] "1.2" "34" "1.7e-2" > > > > > > > strsplit("Ibuprofin 200mg", split=number.pattern,keep=TRUE) > > > [[1]]: > > > [1] "200" > > > > > > Is this a reasonable thing to want strsplit to do? > > > Is this a reasonable parameterization of it? > > ---------------------------------------------------------------------------- > Bill Dunlap > Insightful Corporation > bill at insightful dot com > 360-428-8146 > > "All statements in this message represent the opinions of the author and do > not necessarily reflect Insightful Corporation policy or position." > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel