> On 27 Aug 2017, at 18:18, Omar André Gonzáles Díaz <oma.gonza...@gmail.com> 
> wrote:
> 
> 3.- If I make the 2 first letter optional with:
> 
> ecommerce$sku <-
> gsub("(.*)([a-zA-Z]?{2}[0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)", "\\2",
> ecommerce$producto)
> 
> "49MU6300" is capture, but again only "32S5970" from B (missing "LE").

Regular expressions are matched greedily from left to right, i.e. the first 
(.*) will consume as many characters as possible (including the first two 
letters because they're optional in the following subexpression).

If you make the first group non-greedy (.*?), this works for me:

        ecommerce$sku <- 
gsub("(.*?)([a-zA-Z]{0,2}[0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)", "\\2", 
ecommerce$producto)

But as others have pointed out, you might want to explore more robust 
approaches (take a look at \\b to match a word boundary, for instance).

Best,
Stefan

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to