Omar: I don't think this can work. For example number-letter patterns 4), 5), and 6) would all be matched by pattern 6).
As Jeff indicated, you need to provide the delimiters -- what characters come before and after the SKU patterns -- to be able to recognize them. In a quick look at the text file you attached, the delimiters appeared to be either "-" or " " (blank) and perhaps <end of character string>. If that is correct or if you can tell us how to make it correct, then it's straightforward to proceed. Otherwise, I am unable to help. Maybe someone else can. Cheers, Bert On Sun, Aug 27, 2017 at 11:47 AM, Omar André Gonzáles Díaz <oma.gonza...@gmail.com> wrote: > Hi Jeff, Bert, thank you for your input. > > I'm attaching a sample of the data, feel free to explore it. > > As I said, I need to extract the SKUs of the products (a key that > identifies every product). Not every producto (row) has a SKU, in this > case "no SKU" should be the output. > > I've identify these patterns so far: > > 1.- 75Q8C : 2 numbers, 1 letter, 1 number, 1 letter. > 2.-OLED65E7P: 4 letters, 2 numbers, 1 letter, 1 number, 1 letter. > 3.-MT48AF: 2 letters, 2 numbers, 2 letters. > 4.-LH5000: 2 letters, 4 numbers. > 5.-B8500: 1 letters, 4 numbers. > 6.-E310: 1 letter, 3 numbers. > 7.-X541UJ: 1 letter, 3 numbers, 2 letters. > > > I think those cover the mayority of skus. So I would appreciate a a > guidence on how to extract all those different patterns. > > Relate but not the question asked: The idea is that after extracting > the skus, there should be skus repeted accros the different ecommerce. > Those skus would permit us to compare the products and their prices. > > > Thank you in advance. > > > > > > > > > > > > > > > 2017-08-27 12:10 GMT-05:00 Bert Gunter <bgunter.4...@gmail.com>: >> You may have to provide us more detail on **exactly** the sorts of >> patterns you wish to "capture" -- including exactly what you mean by >> "capture" (what vaue do you wish to return?) -- as the "obvious" >> answer is probably not sufficient: >> >> ## using your example -- thankyou >> >>> gsub(".*(49MU6300|LE32S5970).*","\\1",ecommerce[[2]]) >> [1] "49MU6300" "LE32S5970" >> >> >> Cheers, >> Bert >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Sun, Aug 27, 2017 at 9:18 AM, Omar André Gonzáles Díaz >> <oma.gonza...@gmail.com> wrote: >>> Hello, I need some help with regex. >>> >>> I have this to sentences. I need to extract both "49MU6300" and "LE32S5970" >>> and put them in a new colum "SKU". >>> >>> A) SMART TV UHD 49'' CURVO 49MU6300 >>> B) SMART TV HD 32'' LE32S5970 >>> >>> DataFrame for testing: >>> >>> ecommerce <- data.frame(a = c(1,2), producto = c("SMART TV UHD 49'' CURVO >>> 49MU6300", >>> "SMART TV HD 32'' LE32S5970")) >>> >>> >>> I'm using gsub like this: >>> >>> 1.- This would capture A as intended but only "32S5970" from B (missing >>> "LE"). >>> >>> ecommerce$sku <- gsub("(.*)([0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)", "\\2", >>> ecommerce$producto) >>> >>> >>> 2.- This would capture "LE32S5970" but not "49MU6300". >>> >>> ecommerce$sku <- >>> gsub("(.*)([a-zA-Z]{2}[0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)", "\\2", >>> ecommerce$producto) >>> >>> >>> 3.- If I make the 2 first letter optional with: >>> >>> ecommerce$sku <- >>> gsub("(.*)([a-zA-Z]?{2}[0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)", "\\2", >>> ecommerce$producto) >>> >>> >>> "49MU6300" is capture, but again only "32S5970" from B (missing "LE"). >>> >>> >>> What should I do? How would you approche it? >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.