Re: [R] regex - optional part isn't considered in replacement with gsub

2017-08-29 Thread Stefan Evert
> On 27 Aug 2017, at 18:18, Omar André Gonzáles Díaz > wrote: > > 3.- If I make the 2 first letter optional with: > > ecommerce$sku <- > gsub("(.*)([a-zA-Z]?{2}[0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)", "\\2", > ecommerce$producto) > > "49MU6300" is capture, but again only "32S5970" from B (missi

Re: [R] regex - optional part isn't considered in replacement with gsub

2017-08-27 Thread Jeff Newmiller
Omar, please remember that this is R-help, not R-do-my-work-for-me... you have already been given several hints as to how you can refine your patterns yourself. These skills are key to real world data science, so you need to work at being able to take hints and expand on them if you are to be s

Re: [R] regex - optional part isn't considered in replacement with gsub

2017-08-27 Thread Bert Gunter
"Please, consider that some SKUs have "-" in the middle, for example: "PG-9021". Then you need to include these in the list of patterns you gave. Try it again -- this time with a **complete** list. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and

Re: [R] regex - optional part isn't considered in replacement with gsub

2017-08-27 Thread Bert Gunter
Omar: I don't think this can work. For example number-letter patterns 4), 5), and 6) would all be matched by pattern 6). As Jeff indicated, you need to provide the delimiters -- what characters come before and after the SKU patterns -- to be able to recognize them. In a quick look at the text fil

Re: [R] regex - optional part isn't considered in replacement with gsub

2017-08-27 Thread Bert Gunter
You may have to provide us more detail on **exactly** the sorts of patterns you wish to "capture" -- including exactly what you mean by "capture" (what vaue do you wish to return?) -- as the "obvious" answer is probably not sufficient: ## using your example -- thankyou > gsub(".*(49MU6300|LE32S59

Re: [R] regex - optional part isn't considered in replacement with gsub

2017-08-27 Thread Jeff Newmiller
Clearly you are being too specific about the structure of the sku. In the absence of better information about the sku you need to focus on identifying the delimiters and position of the sku... one way might be: ecommerce$sku <- sub( "^(.*)[ \n]+([^ \n]+)$", "\\2", ecommerce$producto ) Please l

[R] regex - optional part isn't considered in replacement with gsub

2017-08-27 Thread Omar André Gonzáles Díaz
Hello, I need some help with regex. I have this to sentences. I need to extract both "49MU6300" and "LE32S5970" and put them in a new colum "SKU". A) SMART TV UHD 49'' CURVO 49MU6300 B) SMART TV HD 32'' LE32S5970 DataFrame for testing: ecommerce <- data.frame(a = c(1,2), producto = c("SMART TV