Re: [R] Regex Split?

2023-05-05 Thread Leonard Mada via R-help
Dear Bert, Thank you for the suggestion. Indeed, there are various solutions and workarounds. However, there is still a bug in strsplit. 2.) gsub I would try to avoid gsub on a Wikipedia-sized corpus: using strsplit directly should be far more efficient. 3.) Punctuation marks Abbreviations and

Re: [R] Regex Split?

2023-05-05 Thread Bert Gunter
Primarily for my own amusement, here is a way to do what I think you wanted without look-aheads/behinds strsplit(gsub("([[:punct:]])"," \\1 ","a bc,def, adef,x; ,,gh"), " +") [[1]] [1] "a""bc" ",""def" ",""adef" ",""x"";" [10] ","",""gh" I certainly would *not* cla

Re: [R] Regex Split?

2023-05-05 Thread Leonard Mada via R-help
Dear Avi, Punctuation marks are used in various NLP language models. Preserving the "," is therefore useful in such scenarios and Regex are useful to accomplish this (especially if you have sufficient experience with such expressions). I observed only an odd behaviour using strsplit: the exa

Re: [R] Regex Split?

2023-05-05 Thread avi.e.gross
Leonard, It can be helpful to spell out your intent in English or some of us have to go back to the documentation to remember what some of the operators do. Your text being searched seems to be an example of items between comas with an optional space after some commas and in one case, nothing b

Re: [R] Regex Split?

2023-05-05 Thread Leonard Mada via R-help
Dear Bill, Indeed, there are other cases as well - as documented. Various Regex sites give the warning to avoid the legacy syntax "[[:<:]]", so this is the alternative syntax: strsplit(split="\\b(?=\\w)", "One, two; three!", perl=TRUE) # "O"  "n"  "e"  ", " "t"  "w"  "o"  "; " "t"  "h"  "r"  "

Re: [R] Regex Split?

2023-05-05 Thread Martin Maechler
> Bill Dunlap on Fri, 5 May 2023 08:19:21 -0700 writes: https://bugs.r-project.org/show_bug.cgi?id=16745 (from 2016, still labelled 'UNCONFIRMED") contains some other examples of strsplit misbehaving when using 0-length perl look-behinds. E.g., Thank you, Bill -- yes, uhmm, ...

Re: [R] Regex Split?

2023-05-05 Thread Bill Dunlap
https://bugs.r-project.org/show_bug.cgi?id=16745 (from 2016, still labelled 'UNCONFIRMED") contains some other examples of strsplit misbehaving when using 0-length perl look-behinds. E.g., > strsplit(split="[[:<:]]", "One, two; three!", perl=TRUE)[[1]] [1] "O" "n" "e" ", " "t" "w" "o" "; "

Re: [R] Regex Split?

2023-05-05 Thread Howard, Tim G (DEC) via R-help
If you only want the character strings, this seems a little simpler: > strsplit("a bc,def, adef ,,gh", "[ ,]+", perl=T) [[1]] [1] "a""bc" "def" "adef" "gh" If you need delimeters (the commas) you could then add them back in again afterwards. Tim -- Messag

Re: [R] Regex Split?

2023-05-05 Thread Ivan Krylov
On Thu, 4 May 2023 23:59:33 +0300 Leonard Mada via R-help wrote: > strsplit("a bc,def, adef ,,gh", " |(?=,)|(?<=,)(?![ ])", perl=T) > # "a"    "bc"   ","    "def"  ","    "" "adef" ","    "," "gh" > > strsplit("a bc,def, adef ,,gh", " |(? # "a"    "bc"   ","    "def"  ","    "" "adef" ",

Re: [R] regex

2019-09-18 Thread Richard O'Keefe
A little note on quoting in regular expressions. I find writing \\. when I want a quoted . somewhat confusing, so I would use the pattern "_w_.*[.]csv$". Better still, if you want to match file names, there is a function glob2rx that converts shell ("glob") patterns into regular expression pattern

Re: [R] regex

2019-09-17 Thread Bert Gunter
?regexp ## Search the text on "backreference" .(or websearch it: "regular expression backreference") -- Bert On Tue, Sep 17, 2019 at 7:52 AM Ivan Calandra wrote: > Thank you Bert. > That's more like what I was looking for. > > Could you please tell me where I can find information on the "\\1

Re: [R] regex

2019-09-17 Thread Ivan Calandra
Thank you Bert. That's more like what I was looking for. Could you please tell me where I can find information on the "\\1"? This is the part I still don't get. Ivan -- Dr. Ivan Calandra TraCEr, laboratory for Traceology and Controlled Experiments MONREPOS Archaeological Research Centre and Mus

Re: [R] regex

2019-09-17 Thread Ivan Calandra
Thanks Jeff! It does indeed make sense that there is no "AND" corresponding to the "|". Ivan -- Dr. Ivan Calandra TraCEr, laboratory for Traceology and Controlled Experiments MONREPOS Archaeological Research Centre and Museum for Human Behavioural Evolution Schloss Monrepos 56567 Neuwied, German

Re: [R] regex

2019-09-17 Thread Bert Gunter
(For the units) Why not simply: sub(".*\\[(.+)\\]","\\1", headers) Cheers, Bert On Tue, Sep 17, 2019 at 6:40 AM Ivan Calandra wrote: > Thank you Ivan for your help! > > Your solution for the first problem is so simple I didn't even think > about it! > What I find weird is that "_w_|\\.csv$"

Re: [R] regex

2019-09-17 Thread Jeff Newmiller
https://stackoverflow.com/questions/3041320/regex-and-operator/37692545 On September 17, 2019 6:39:13 AM PDT, Ivan Calandra wrote: >Thank you Ivan for your help! > >Your solution for the first problem is so simple I didn't even think >about it! >What I find weird is that "_w_|\\.csv$" works as e

Re: [R] regex

2019-09-17 Thread Ivan Calandra
Thank you Ivan for your help! Your solution for the first problem is so simple I didn't even think about it! What I find weird is that "_w_|\\.csv$" works as expected ("OR"), but is there no way to combine two patterns with an "AND"? Your solution to the second problem is actually unfortunate

Re: [R] regex

2019-09-17 Thread Ivan Krylov
On Tue, 17 Sep 2019 10:14:24 +0300 Ivan Krylov wrote: > '\\[.*\\]' Sorry, I forgot to take it into account that you don't want the [] in your units, either. That's still doable, but requires so-called look-around assertions in the regular expression: '(?<=\\[).*(?=\\])' This should match any c

Re: [R] regex

2019-09-17 Thread Ivan Krylov
On Tue, 17 Sep 2019 08:48:43 +0200 Ivan Calandra wrote: > CSVs <- list.files(path=..., pattern="\\.csv$") > w.files <- CSVs[grep(pattern="_w_", CSVs)] > > Of course, what I would like to do is list only the interesting files > from the beginning, rather than subsetting the whole list of files.

Re: [R] regex for "[2440810] / www.tinyurl.com/hgaco4fha3"

2018-02-20 Thread Bert Gunter
These are always kind of fun, not least because of the variety of different replies that "work" at least somewhat. Here's mine: > stringa <- "[2440810] / www.tinyurl.com/hgaco4fha3" > sub("^(.+)www\\.(.+)\\.com.+","\\1\\2",stringa) [1] "[2440810] / tinyurl" Note the use of doubled backslashes to

Re: [R] regex for "[2440810] / www.tinyurl.com/hgaco4fha3"

2018-02-20 Thread Ulrik Stervbo
Hi Omar, you are almost there but! Your first substitution looks 'www' as the start of the line followed by anything (which then do nothing), so your second substitution removes everything from the first '.' to be found (which is the one after www). What you want to do is x <- "[2440810] / ww

Re: [R] regex - optional part isn't considered in replacement with gsub

2017-08-29 Thread Stefan Evert
> On 27 Aug 2017, at 18:18, Omar André Gonzáles Díaz > wrote: > > 3.- If I make the 2 first letter optional with: > > ecommerce$sku <- > gsub("(.*)([a-zA-Z]?{2}[0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)", "\\2", > ecommerce$producto) > > "49MU6300" is capture, but again only "32S5970" from B (missi

Re: [R] regex - optional part isn't considered in replacement with gsub

2017-08-27 Thread Jeff Newmiller
Omar, please remember that this is R-help, not R-do-my-work-for-me... you have already been given several hints as to how you can refine your patterns yourself. These skills are key to real world data science, so you need to work at being able to take hints and expand on them if you are to be s

Re: [R] regex - optional part isn't considered in replacement with gsub

2017-08-27 Thread Bert Gunter
"Please, consider that some SKUs have "-" in the middle, for example: "PG-9021". Then you need to include these in the list of patterns you gave. Try it again -- this time with a **complete** list. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and

Re: [R] regex - optional part isn't considered in replacement with gsub

2017-08-27 Thread Bert Gunter
Omar: I don't think this can work. For example number-letter patterns 4), 5), and 6) would all be matched by pattern 6). As Jeff indicated, you need to provide the delimiters -- what characters come before and after the SKU patterns -- to be able to recognize them. In a quick look at the text fil

Re: [R] regex - optional part isn't considered in replacement with gsub

2017-08-27 Thread Bert Gunter
You may have to provide us more detail on **exactly** the sorts of patterns you wish to "capture" -- including exactly what you mean by "capture" (what vaue do you wish to return?) -- as the "obvious" answer is probably not sufficient: ## using your example -- thankyou > gsub(".*(49MU6300|LE32S59

Re: [R] regex - optional part isn't considered in replacement with gsub

2017-08-27 Thread Jeff Newmiller
Clearly you are being too specific about the structure of the sku. In the absence of better information about the sku you need to focus on identifying the delimiters and position of the sku... one way might be: ecommerce$sku <- sub( "^(.*)[ \n]+([^ \n]+)$", "\\2", ecommerce$producto ) Please l

Re: [R] regex [:digit:] gives diffrent result

2017-02-06 Thread William Dunlap via R-help
Shouldn't your "[:digit:]" be "[[:digit:]]"? Bill Dunlap TIBCO Software wdunlap tibco.com On Mon, Feb 6, 2017 at 10:36 AM, Tilmann Faul wrote: > Using R is a grate advantage, thanks for your work. > > Using regex under R 3.1.1, Debian 8.6 jessy works fine. > > str_detect("16-03-08", "[:digit:]{2

Re: [R] Regex to stop at first capital letter after sequence

2016-12-19 Thread Bert Gunter
You don't need a regex. ?strsplit Something like: > y <-c("PPA 06 - Promo Vasito", "PPA 05 - Cuentos") > sapply(strsplit(y, "-"),"[",2) [1] " Promo Vasito" " Cuentos" You may have to add spaces around your "-" , as you failed to supply data so I cannot be sure what you have. -- Bert Bert G

Re: [R] Regex to stop at first capital letter after sequence

2016-12-19 Thread Sarah Goslee
Hi, If your actual data are of the same form as your sample data, why not just: x <- c("PPA 06 - Promo Vasito", "PPA 05 - Cuentos", "PPA 04 - Promo vasito", "PPA 03 - Promoción escolar", "PPA - Saluda a tu pediatra", "PPL - Dia del Pediatra") sub("^.* - ", "", x) [1] "Promo Vasito" "Cuent

Re: [R] Regex to stop at first capital letter after sequence

2016-12-19 Thread David Winsemius
> On Dec 19, 2016, at 1:25 PM, Omar André Gonzáles Díaz > wrote: > > I have the following strings: > > [1] "PPA 06 - Promo Vasito" [2] "PPA 05 - Cuentos" > [3] "PPA 04 - Promo vasito" [4] "PPA 03 - Promoción escolar" > [5] "PPA - Saluda a tu pediatra" [6] "PPL - Dia del Pediatra" >

Re: [R] regex - extracting src url

2016-03-22 Thread Martin Morgan
On 03/22/2016 12:44 AM, Omar André Gonzáles Díaz wrote: Hi,I have a DF with a column with "html", like this: https://ad.doubleclick.net/ddm/trackimp/N344006.1960500FACEBOOKAD/B9589414.130145906;dc_trk_aid=303019819;dc_trk_cid=69763238;ord=[timestamp];dc_lat=;dc_rdid=;tag_for_child_directed_tre

Re: [R] regex - extracting src url

2016-03-21 Thread Bert Gunter
?strsplit #I think My "solution" assumes a fixed format for the URL's as shown in your example. If that is not the case, it doesn't work. > y <- ' SRC="https://ad.doubleclick.net/ddm/trackimp/N344006.1960500FACEBOOKAD/B9589414.130145906;dc_trk_aid=303019819;dc_trk_cid=69763238;ord=[timestamp];dc_

Re: [R] regex not working for some entries in for loop

2015-11-07 Thread Omar André Gonzáles Díaz
Thanks S. Ellison. Finally, Ihad some time to test it. Thanks for your clarification. Just one more question: You say: Your regexes are on multiple lines and include whitespace and linefeeds. For example you are not testing for " .*forum.*|.*buy.*"; you are testing for " .*forum.*|

Re: [R] regex not working for some entries in for loop

2015-10-26 Thread S Ellison
> From: Omar André Gonzáles Díaz > Subject: [R] regex not working for some entries in for loop > > I'm using some regex in a for loop to check for some values in column > "source", > and put a result in column "fuente". Your regexes are on multiple lines and include whitespace and linefeeds. F

Re: [R] Regex: Combining sub/grepl with ifelse

2015-10-11 Thread David Winsemius
On Oct 10, 2015, at 10:57 PM, Karim Mezhoud wrote: > My code is not correct. > The idea is to use apply instead of a loop. more efficiency. There is no increased efficiency in using apply. Both `apply` and a `for` loop will perform with equal efficiency. The only advantage is the mental clarity

Re: [R] Regex: Combining sub/grepl with ifelse

2015-10-11 Thread Boris Steipe
You are the domain expert, but it would seem to me that "-NEGRO" is a part of the ID because it uniquely specifies the product. >From the perspective of expressing your business logic in code, dropping this >part of the string should have a separate line in the code, and a comment. >Dropping th

Re: [R] Regex: Combining sub/grepl with ifelse

2015-10-10 Thread Karim Mezhoud
My code is not correct. The idea is to use apply instead of a loop. more efficiency. Karim On Sun, Oct 11, 2015 at 6:42 AM, Omar André Gonzáles Díaz < oma.gonza...@gmail.com> wrote: > Thanks Karim. linio.tv is in the email. In the last part. > El oct 11, 2015 12:39 AM, "Karim Mezhoud" escribió:

Re: [R] Regex: Combining sub/grepl with ifelse

2015-10-10 Thread Omar André Gonzáles Díaz
Thanks Karim. linio.tv is in the email. In the last part. El oct 11, 2015 12:39 AM, "Karim Mezhoud" escribió: > Hi, > omit unlist and test. otherwise you can use apply function. > > draft: > > df1 <- apply(linio.tv, 1, function(x) strsplit(x[,idproductio], > "[^A-Z0-9-]+")) > > fct <- function(l

Re: [R] Regex: Combining sub/grepl with ifelse

2015-10-10 Thread Karim Mezhoud
Hi, omit unlist and test. otherwise you can use apply function. draft: df1 <- apply(linio.tv, 1, function(x) strsplit(x[,idproductio], "[^A-Z0-9-]+")) fct <- function(linio.tv){ if(any(grep("[A-Z][0-9]", linio.tv[,idx_productio]))) { linio.tv[,idx(id)] <- linio.tv[,idx

Re: [R] Regex: Combining sub/grepl with ifelse

2015-10-10 Thread Omar André Gonzáles Díaz
Hi Boris, I've modified a little the for loop to catch the IDs (if there is any) otherwise to put NAs. This is for another data set. for (i in 1:nrow(linio.tv)) { v <- unlist(strsplit(linio.tv$producto[i], "[^A-Z0-9-]+")) # isolate tokens if(any(grep("[A-Z][0-9]", v))) {

Re: [R] Regex: Combining sub/grepl with ifelse

2015-10-10 Thread Omar André Gonzáles Díaz
Thank you very much to both of you. This information is very enlightening to me. Cheers. 2015-10-10 1:11 GMT-05:00 Boris Steipe : > David answered most of this. Just a two short notes inline. > > > > > On Oct 10, 2015, at 12:38 AM, Omar André Gonzáles Díaz < > oma.gonza...@gmail.com> wrote: > >

Re: [R] Regex: Combining sub/grepl with ifelse

2015-10-09 Thread Boris Steipe
David answered most of this. Just a two short notes inline. On Oct 10, 2015, at 12:38 AM, Omar André Gonzáles Díaz wrote: > David, Boris, so thankfull for your help. Both approaches are very good. I > got this solve with David's help. > > I find very insteresting Bori's for loop. And I ne

Re: [R] Regex: Combining sub/grepl with ifelse

2015-10-09 Thread David Winsemius
On Oct 9, 2015, at 9:38 PM, Omar André Gonzáles Díaz wrote: > David, Boris, so thankfull for your help. Both approaches are very good. I > got this solve with David's help. > > I find very insteresting Bori's for loop. And I need a little help > understanding the regex part on it. > > - The

Re: [R] Regex: Combining sub/grepl with ifelse

2015-10-09 Thread Omar André Gonzáles Díaz
David, Boris, so thankfull for your help. Both approaches are very good. I got this solve with David's help. I find very insteresting Bori's for loop. And I need a little help understanding the regex part on it. - The strsplit function: strsplit(ripley.tv$producto[i], "[^A-Z0-9-]+") I understand

Re: [R] Regex: Combining sub/grepl with ifelse

2015-10-09 Thread David Winsemius
On Oct 9, 2015, at 4:21 PM, Boris Steipe wrote: > I think you are going into the wrong direction here and this is a classical > example of what we mean by "technical debt" of code. Rather than tell to your > regular expression what you are looking for, you are handling special cases > with red

Re: [R] Regex: Combining sub/grepl with ifelse

2015-10-09 Thread Boris Steipe
I think you are going into the wrong direction here and this is a classical example of what we mean by "technical debt" of code. Rather than tell to your regular expression what you are looking for, you are handling special cases with redundant code. This is ugly, brittle and impossible to maint

Re: [R] Regex: Combining sub/grepl with ifelse

2015-10-09 Thread David Winsemius
On Oct 9, 2015, at 2:48 PM, Omar André Gonzáles Díaz wrote: > Thank you, David. You put me in the right direction. > > At the end, I've used a lot of lines, to my taste, for this task. > > Is there a more elegant way, of doing this? There are conditional capture-classes in rexex in addition t

Re: [R] Regex: Combining sub/grepl with ifelse

2015-10-09 Thread Omar André Gonzáles Díaz
Thank you, David. You put me in the right direction. At the end, I've used a lot of lines, to my taste, for this task. Is there a more elegant way, of doing this? ripley.tv$id <- sub("(.*)( [0-9]{2}[a-z]{1}[0-9]{4})(.*)", "\\2", ripley.tv$producto, ignore

Re: [R] Regex: Combining sub/grepl with ifelse

2015-10-09 Thread David Winsemius
On Oct 9, 2015, at 1:50 PM, Omar André Gonzáles Díaz wrote: > David, > > this is a working case. I know that all cases for ID are not covered with my > current code. > > The question is: > > ID stars as NAs. > > 1.- How to extract 1 type of ID, and keep the rest of entries as they are. >

Re: [R] Regex: Combining sub/grepl with ifelse

2015-10-09 Thread Omar André Gonzáles Díaz
David, this is a working case. I know that all cases for ID are not covered with my current code. The question is: ID stars as NAs. 1.- How to extract 1 type of ID, and keep the rest of entries as they are. 2.- Then keep the first extraction, and search for second type of ID. 3.- An so on wit

Re: [R] Regex: Combining sub/grepl with ifelse

2015-10-09 Thread David Winsemius
On Oct 9, 2015, at 12:59 PM, Omar André Gonzáles Díaz wrote: > I need to extract an ID from the product column of my df. > > I was able to extract the ids for some scenearios, but when applying my > code for the next type of ids (there are some other combinations), the > results of my first line

Re: [R] regex - extracting 2 numbers and " from strings

2015-10-09 Thread Omar André Gonzáles Díaz
Yes, you are right. Thank you. 2015-10-08 20:07 GMT-05:00 David Winsemius : > > On Oct 8, 2015, at 4:50 PM, Omar André Gonzáles Díaz wrote: > > > David, it does work but not in all cases: > > It should work if you change the "+" to "*" in the last capture class. It > makes trailing non-digit cha

Re: [R] regex - extracting 2 numbers and " from strings

2015-10-08 Thread David Winsemius
On Oct 8, 2015, at 4:50 PM, Omar André Gonzáles Díaz wrote: > David, it does work but not in all cases: It should work if you change the "+" to "*" in the last capture class. It makes trailing non-digit characters entirely optional. > sub("(^.+ )(\\d+)([\"]|[']{2})(.*$)", "\\2\\3", b) [1] "4

Re: [R] regex - extracting 2 numbers and " from strings

2015-10-08 Thread David Winsemius
On Oct 8, 2015, at 3:45 PM, Omar André Gonzáles Díaz wrote: > Hi I have a vector of 100 elementos like this ones: > > a <- c("SMART TV LCD FHD 70\" LC70LE660", "LED FULL HD 58'' LE58D3140") > > I want to put just the (70\") and (58'') in a vector b. > sub("(^.+ )(\\d+)([\"]|[']{2})(.+$)", "\\2

Re: [R] regex - extracting 2 numbers and " from strings

2015-10-08 Thread David Wolfskill
On Thu, Oct 08, 2015 at 05:45:13PM -0500, Omar André Gonzáles Díaz wrote: > Hi I have a vector of 100 elementos like this ones: > > a <- c("SMART TV LCD FHD 70\" LC70LE660", "LED FULL HD 58'' LE58D3140") > > I want to put just the (70\") and (58'') in a vector b. > > This is my try, but is not w

Re: [R] regex sub with specified number of characters

2015-10-06 Thread David Winsemius
On Oct 6, 2015, at 7:38 AM, Johannes Radinger wrote: > Hi > > I'd like to remove a leading "3" if my number is 7 digits long, if it is > only 6 I don't want to anything. > I think this should be possible with a 1-liner using sub() but I am not > sure how to define the number of characters follow

Re: [R] regex sub with specified number of characters

2015-10-06 Thread Marc Schwartz
> On Oct 6, 2015, at 9:38 AM, Johannes Radinger > wrote: > > Hi > > I'd like to remove a leading "3" if my number is 7 digits long, if it is > only 6 I don't want to anything. > I think this should be possible with a 1-liner using sub() but I am not > sure how to define the number of character

Re: [R] regex sub with specified number of characters

2015-10-06 Thread Ivan Calandra
Hi Johannes, Not sure if this can be done with sub() only, but combining it with ifelse() apparently does what you want: ifelse(nchar(a)==7, sub("^3","",a), a) HTH, Ivan -- Ivan Calandra, PhD University of Reims Champagne-Ardenne GEGENAA - EA 3795 CREA - 2 esplanade Roland Garros 51100 Reims,

Re: [R] regex find anything which is not a number

2015-03-12 Thread Adrian Dușa
On Thu, Mar 12, 2015 at 9:52 PM, John McKown wrote: > [...] > One problem is that Adrian wanted, for some reason, to exclude numbers > such as "2." but accept "2.0" . That is, no unnecessary trailing > decimal point. as.numeric() will not fail on "2." since that is a > number. The example grep() s

Re: [R] regex find anything which is not a number

2015-03-12 Thread John McKown
On Thu, Mar 12, 2015 at 2:43 PM, Steve Taylor wrote: > How about letting a standard function decide which are numbers: > > which(!is.na(suppressWarnings(as.numeric(myvector > > Also works with numbers in scientific notation and (presumably) different > decimal characters, e.g. comma if that's

Re: [R] regex find anything which is not a number

2015-03-12 Thread Steve Taylor
How about letting a standard function decide which are numbers: which(!is.na(suppressWarnings(as.numeric(myvector Also works with numbers in scientific notation and (presumably) different decimal characters, e.g. comma if that's what the locale uses. -Original Message- From: R-help

Re: [R] regex find anything which is not a number

2015-03-11 Thread Adrian Dușa
Perfect, perfect, perfect. Thanks very much, John. Adrian On Wed, Mar 11, 2015 at 10:00 PM, John McKown wrote: > See if the following will work for you: > > grep('^-?[0-9]+([.]?[0-9]+)?$',myvector,perl=TRUE,invert=TRUE) > >> myvector <- c("a3", "N.A", "1.2", "-3", "3-2", "2.") >> grep('^-?[0-9]+(

Re: [R] regex find anything which is not a number

2015-03-11 Thread John McKown
See if the following will work for you: grep('^-?[0-9]+([.]?[0-9]+)?$',myvector,perl=TRUE,invert=TRUE) > myvector <- c("a3", "N.A", "1.2", "-3", "3-2", "2.") > grep('^-?[0-9]+([.][0-9]+)?$',myvector,perl=TRUE,invert=TRUE) [1] 1 2 5 6 > The key is to match a number, and then invert the TRUE / FAL

Re: [R] regex pattern assistance

2014-08-15 Thread arun
Hi Tom, You could try: library(stringr) str_extract(x, perl("(?<=[A-Za-z]{4}/).*(?=/[0-9])")) #[1] "S01-012" A.K. On Friday, August 15, 2014 12:20 PM, Tom Wright wrote: Hi, Can anyone please assist. given the string > x<-"/mnt/AO/AO Data/S01-012/120824/" I would like to extract "S01-012"

Re: [R] regex pattern assistance

2014-08-15 Thread Marc Schwartz
On Aug 15, 2014, at 11:56 AM, Tom Wright wrote: > WOW!!! > > What can I say 4 answers in less than 4 minutes. Thank you everyone. If > I can't make it work now I don't deserve to. > > btw. the strsplit approach wouldn't work for me as: > a) I wanted to play with regex and > b) the location i

Re: [R] regex pattern assistance

2014-08-15 Thread Jeff Newmiller
Must be another lucky streak. :-) --- Jeff NewmillerThe . . Go Live... DCN:Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#..

Re: [R] regex pattern assistance

2014-08-15 Thread Rui Barradas
Hello, I don't believe you need an extra package for that. Try sub("\\/mnt\\/AO\\/AO Data\\/([-[:alnum:]]*)\\/.+", "\\1", x) or, with package stringr, str_match(x,"\\/mnt\\/AO\\/AO Data\\/(.+)\\/.+") Hope this helps, Rui Barradas Em 15-08-2014 17:18, Tom Wright escreveu: Hi, Can anyone p

Re: [R] regex pattern assistance

2014-08-15 Thread Tom Wright
WOW!!! What can I say 4 answers in less than 4 minutes. Thank you everyone. If I can't make it work now I don't deserve to. btw. the strsplit approach wouldn't work for me as: a) I wanted to play with regex and b) the location isn't consistent. Nice to see email support still works, not everyt

Re: [R] regex pattern assistance

2014-08-15 Thread Marc Schwartz
On Aug 15, 2014, at 11:18 AM, Tom Wright wrote: > Hi, > Can anyone please assist. > > given the string > >> x<-"/mnt/AO/AO Data/S01-012/120824/" > > I would like to extract "S01-012" > > require(stringr) >> str_match(x,"\\/mnt\\/AO\\/AO Data\\/(.+)\\/+") >> str_match(x,"\\/mnt\\/AO\\/AO Data

Re: [R] regex pattern assistance

2014-08-15 Thread S Ellison
> -Original Message- > > x<-"/mnt/AO/AO Data/S01-012/120824/" > > I would like to extract "S01-012" > gsub("/mnt/AO/AO Data/(.+)/.+", "\\1", x) #does it, as does > gsub("/mnt/AO/AO Data/([\\w-]+)/.+", "\\1", x, perl=TRUE)# \w is perl RE; > the default is POSIX, which would be. >

Re: [R] Regex - subsetting parts of a file name.

2014-07-31 Thread arnaud gaboury
> > R> as.vector(sapply(my.cache.list, function(x)strsplit(x, "\\.")[[1]][2])) > [1] "subject_test" "subject_train" "y_test""y_train" > > > R> gsub("df\\.(.*)\\.RData", "\\1", my.cache.list) > [1] "subject_test" "subject_train" "y_test""y_train" > > > Note that "." will match any

Re: [R] Regex - subsetting parts of a file name.

2014-07-31 Thread Sarah Goslee
Hi, Here are two possibilities: R> as.vector(sapply(my.cache.list, function(x)strsplit(x, "\\.")[[1]][2])) [1] "subject_test" "subject_train" "y_test""y_train" R> gsub("df\\.(.*)\\.RData", "\\1", my.cache.list) [1] "subject_test" "subject_train" "y_test""y_train" Note that "

Re: [R] Regex - subsetting parts of a file name.

2014-07-31 Thread arun
Try: gsub(".*\\.(.*)\\..*","\\1", my.cache.list) [1] "subject_test"  "subject_train" "y_test"    "y_train" #or library(stringr) str_extract(my.cache.list, perl('(?<=\\.).*(?=\\.)')) [1] "subject_test"  "subject_train" "y_test"    "y_train"  A.K. On Thursday, July 31, 2014 11:05 AM,

Re: [R] Regex - subsetting parts of a file name.

2014-07-31 Thread S Ellison
> I want to keep only the part inside the two points. After lots of headache > using grep() when trying something like this: > > grep('.(.*?).','df.subject_test.RData',value=T) > > > Does anyone have any suggestion ? gsub("df\\.(.+)\\.RData", "\\1", 'df.subject_test.RData') Steve E ***

Re: [R] Regex with criteria from multiple lines

2014-02-14 Thread Jeff Newmiller
You need to use the JSON library or equivalent to solve this problem. I don't understand why you think that having the data in the clipboard prevents you from doing this since that is just another file (but I usually avoid using the clipboard for reproducible analysis anyway). --

Re: [R] regex challenge

2013-08-17 Thread Frank Harrell
Bill I found a workaround: f <- ff(formula, lab) f <- as.formula(gsub("`", "", as.character(deparse(f Thanks for your elegant solution. Frank -- Thanks Bill. The problem is one of the results of convertName might be 'Heading("Age in Years")*age' (this is fo

Re: [R] regex challenge

2013-08-16 Thread Frank Harrell
re, TIBCO Software wdunlap tibco.com > -Original Message----- > From: [hidden email] [mailto:[hidden email]] On Behalf > Of Frank Harrell > Sent: Thursday, August 15, 2013 7:47 PM > To: RHELP > Subject: Re: [R] regex challenge > > Bill that is very impresive. The only

Re: [R] regex challenge

2013-08-16 Thread William Dunlap
r[[i]], convertName = convertName) } } else if (is.name(expr)) { expr <- as.name(convertName(expr)) } expr } Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project

Re: [R] regex challenge

2013-08-15 Thread Frank Harrell
quot;Female") * SBPz) * Heading() * Gz + (AGEz + SBPz) * Heading() * TRIOz ~ Heading() * COUNTRYz * Heading() * SEXz Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -Original Message----- > From: [hidden email] [mailto:[hidden email]] On Behalf > Of William Dunlap &

Re: [R] regex challenge

2013-08-15 Thread William Dunlap
"Female") * SBPz) * Heading() * Gz + (AGEz + SBPz) * Heading() * TRIOz ~ Heading() * COUNTRYz * Heading() * SEXz Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] O

Re: [R] regex challenge

2013-08-15 Thread William Dunlap
-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > Of Frank Harrell > Sent: Thursday, August 15, 2013 4:45 PM > To: RHELP > Subject: Re: [R] regex challenge > > I really appreciate the excellent ideas from Bill Dunlap and Greg Snow. > Both sugg

Re: [R] regex challenge

2013-08-15 Thread Frank Harrell
I really appreciate the excellent ideas from Bill Dunlap and Greg Snow. Both suggestions almost work perfectly. Greg's recognizes expressions such as sex=='female' but not ones such as age > 21, age < 21, a - b > 0, and possibly other legal R expressions. Bill's idea is similar to what Dunca

Re: [R] regex challenge

2013-08-15 Thread arun
- From: Greg Snow <538...@gmail.com> To: Frank Harrell Cc: RHELP Sent: Thursday, August 15, 2013 5:07 PM Subject: Re: [R] regex challenge Here is a first stab: library(gsubfn) test <- "y1 + y2 ~ a*(b + c) + d + f * (h == 3) + (sex == 'male')*i" gsubfn( &q

Re: [R] regex challenge

2013-08-15 Thread Greg Snow
Here is a first stab: library(gsubfn) test <- "y1 + y2 ~ a*(b + c) + d + f * (h == 3) + (sex == 'male')*i" gsubfn( "([a-zA-Z][a-zA-Z0-9]*)((?=\\s*[-+~)*])|\\s*$)", function(x,...) paste0(toupper(x),'z'), test, perl=TRUE ) On Wed, Aug 14, 2013 at 9:13 PM, Frank Harrell wrote: > I would like t

Re: [R] regex challenge

2013-08-15 Thread William Dunlap
I think substitute() or bquote() will do a better job here than gsub() be they work on the parsed formula rather than on the raw string. The terms() function will interpret the formula-specific operators like "+" and ":" to come up with a list of the 'variables' (or 'terms') in the formula E.g.,

Re: [R] regex challenge

2013-08-14 Thread Guanrao Chen
This might be hard. How to tell f is to be changed while h is NOT ...   Thanks, Guanrao http://www.myfav5.com where fun and easy friend-making happens From: Frank Harrell To: RHELP Sent: Wednesday, August 14, 2013 11:13 PM Subject: [R] regex challenge I

Re: [R] Regex for ^ (the caret symbol)?

2013-01-22 Thread S Ellison
> -Original Message- > > So what is the special behavior of the ^ symbol when not at >> the beginning of the string that occurs when it is not escaped? > > I think it retains its meaning as an assertion that it occurs > at the beginning of the line, and so a pattern like "a^b" > coul

Re: [R] Regex for ^ (the caret symbol)?

2013-01-21 Thread Rui Barradas
Hello, Em 21-01-2013 20:52, Duncan Murdoch escreveu: On 13-01-21 3:20 PM, Jeff Newmiller wrote: Apparently Extended RegExp syntax eliminated the "^-is-an-ordinary-character-except-for-two-uses" meaning that I am familiar with from the Basic RegExp usage, since GNU grep with the -e option also r

Re: [R] Regex for ^ (the caret symbol)?

2013-01-21 Thread Duncan Murdoch
On 13-01-21 3:20 PM, Jeff Newmiller wrote: Apparently Extended RegExp syntax eliminated the "^-is-an-ordinary-character-except-for-two-uses" meaning that I am familiar with from the Basic RegExp usage, since GNU grep with the -e option also refuses to match the carat unless it is escaped. The

Re: [R] Regex for ^ (the caret symbol)?

2013-01-21 Thread Jeff Newmiller
Apparently Extended RegExp syntax eliminated the "^-is-an-ordinary-character-except-for-two-uses" meaning that I am familiar with from the Basic RegExp usage, since GNU grep with the -e option also refuses to match the carat unless it is escaped. The TRE library treats BRE as obsolete, so we on

Re: [R] Regex for ^ (the caret symbol)?

2013-01-21 Thread Duncan Murdoch
On 13-01-21 1:05 PM, Jeff Newmiller wrote: So what is the special behavior of the ^ symbol when not at the beginning of the string that occurs when it is not escaped? I think it retains its meaning as an assertion that it occurs at the beginning of the line, and so a pattern like "a^b" could

Re: [R] Regex for ^ (the caret symbol)?

2013-01-21 Thread David Winsemius
On Jan 21, 2013, at 10:05 AM, Jeff Newmiller wrote: So what is the special behavior of the ^ symbol when not at the beginning of the string that occurs when it is not escaped? Isn't there a distinction between what _is_ "special" and what should be "special". You are saying that "^" after

Re: [R] Regex for ^ (the caret symbol)?

2013-01-21 Thread Jeff Newmiller
So what is the special behavior of the ^ symbol when not at the beginning of the string that occurs when it is not escaped? --- Jeff NewmillerThe . . Go Live... DCN:Basics: ##

Re: [R] Regex for ^ (the caret symbol)?

2013-01-21 Thread Duncan Murdoch
On 13-01-21 11:48 AM, Jeff Newmiller wrote: I am not sure I understand what worked perfectly, since it is my understanding that ^ is only special at the beginning of the regex (to anchor the pattern at the beginning of the target string) or as the first character of a character set (to indicat

Re: [R] Regex for ^ (the caret symbol)?

2013-01-21 Thread Rich Shepard
On Mon, 21 Jan 2013, mtb...@gmail.com wrote: I am trying to search for string that includes the caret symbol, using the following code: grepl("latitude^2",temp) Many regex implementations require us to escape a metacharacter such as '^' by preceeding it with a backslash. This indicates the n

Re: [R] Regex for ^ (the caret symbol)?

2013-01-21 Thread Jeff Newmiller
I am not sure I understand what worked perfectly, since it is my understanding that ^ is only special at the beginning of the regex (to anchor the pattern at the beginning of the target string) or as the first character of a character set (to indicate exclusion of the listed characters). In any

Re: [R] Regex for ^ (the caret symbol)?

2013-01-21 Thread mtb954
Hi Tsjerk, many thanks...that worked perfectly! Mark Na On Mon, Jan 21, 2013 at 9:36 AM, Tsjerk Wassenaar wrote: > Oh, I'm jetlagged. ^ is a control character for 'start of string'. In the > context of a character set it means negation: [^a-z]. > > Ciao, > > Tsjerk > > > On Mon, Jan 21, 2013

Re: [R] Regex for ^ (the caret symbol)?

2013-01-21 Thread Tsjerk Wassenaar
Oh, I'm jetlagged. ^ is a control character for 'start of string'. In the context of a character set it means negation: [^a-z]. Ciao, Tsjerk On Mon, Jan 21, 2013 at 4:33 PM, Tsjerk Wassenaar wrote: > Hi Mark Na, > > Try: > > grepl("latitude\\^2",temp) > > ^ is a control character for negation,

Re: [R] Regex for ^ (the caret symbol)?

2013-01-21 Thread Tsjerk Wassenaar
Hi Mark Na, Try: grepl("latitude\\^2",temp) ^ is a control character for negation, so you have to escape it. Cheers, Tsjerk On Mon, Jan 21, 2013 at 4:26 PM, wrote: > Hello R-helpers, > > I am trying to search for string that includes the caret symbol, using the > following code: > > grepl(

Re: [R] Regex Question: return digits after particular letters

2011-06-02 Thread David Winsemius
On Jun 2, 2011, at 4:21 PM, Ben Ganzfried wrote: > Thank you very much for your help. It saved me a lot of time and it > worked perfectly. I have a quick follow-up as I'm not sure I > understand yet why the code works and where it comes from. > > For example, in: Tstg <- sub(".*T(\\d)N.", "

  1   2   >