> On 27 Aug 2017, at 18:18, Omar André Gonzáles Díaz <oma.gonza...@gmail.com> > wrote: > > 3.- If I make the 2 first letter optional with: > > ecommerce$sku <- > gsub("(.*)([a-zA-Z]?{2}[0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)", "\\2", > ecommerce$producto) > > "49MU6300" is capture, but again only "32S5970" from B (missing "LE").
Regular expressions are matched greedily from left to right, i.e. the first (.*) will consume as many characters as possible (including the first two letters because they're optional in the following subexpression). If you make the first group non-greedy (.*?), this works for me: ecommerce$sku <- gsub("(.*?)([a-zA-Z]{0,2}[0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)", "\\2", ecommerce$producto) But as others have pointed out, you might want to explore more robust approaches (take a look at \\b to match a word boundary, for instance). Best, Stefan ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.