Re: [R] Regular expressions and 2 dots

2019-06-28 Thread Rui Barradas
Hello, Please always cc the list. To know more about the regular expressions used by r read help("regex") The one I used is not very complicated. \\. match a dot; it is a meta-character so it needs to be escaped. {2,} repeated at least 2 times, at most an undetermined number of times. .* a

Re: [R] Regular expressions and 2 dots

2019-06-28 Thread Rui Barradas
Hello, Try s <- c( "colone..xx.","coltwo.ft..rr.","colthree.gh..az.","colfour.DG..lm.") sub("\\.{2,}.*$", "", s) #[1] "colone" "coltwo.ft" "colthree.gh" "colfour.DG" Às 09:00 de 28/06/19, lionel sicot via R-help escreveu: c( "colone..xx.","coltwo.ft..rr.","colthree.gh..az.","colfour.DG

[R] Regular expressions and 2 dots

2019-06-28 Thread lionel sicot via R-help
Hello, I have files from an equipment with column names including dots.I would like to simplify these names but all my attempts with sub and regular expressions are unsuccessful. I havec( "colone..xx.","coltwo.ft..rr.","colthree.gh..az.","colfour.DG..lm.")and I would like to have c( "colone","c

Re: [R] Regular expressions, genbank

2014-02-06 Thread arun
HI, May be this helps: lines1 <- readLines(textConnection('text to be ignored... CDS 687..3158 /gene="AXL2" /note="plasma membrane glycoprotein" other text to be ignored... CDS complement(3300..4037)

Re: [R] Regular expressions, genbank

2014-02-06 Thread arun
You could also try: library(gsubfn) strapply(gsub("\\d+<|>\\d+","",vec1),"([0-9]+)",as.numeric,simplify=c) A.K. On Thursday, February 6, 2014 1:55 PM, arun wrote: Hi, One way would be: vec1 <- c("CDS 3300..4037",  "CDS complement(3300..4037)", "CDS 3300

Re: [R] Regular expressions, genbank

2014-02-06 Thread arun
Hi, One way would be: vec1 <- c("CDS 3300..4037",  "CDS complement(3300..4037)", "CDS 3300<..4037", "CDS join(21467..26641,27577..28890)",  "CDS complement(join(30708..31700,31931..31984))",  "CDS 3300<..>4037") library(s

Re: [R] Regular expressions on filenames

2014-01-15 Thread Wojtek Poppe
Try sub("\\.[^.]+$", "", basename(FILELIST)) Thanks, Wojtek On Wed, Jan 15, 2014 at 4:37 PM, Fisher Dennis wrote: > R 3.0.2 > OS X > > Colleagues > > I am writing code to read a large number of files in a particular folder. > In some situations, there may be two versions of the file with di

Re: [R] Regular expressions on filenames

2014-01-15 Thread David Winsemius
On Jan 15, 2014, at 4:37 PM, Fisher Dennis wrote: > R 3.0.2 > OS X > > Colleagues > > I am writing code to read a large number of files in a particular folder. In > some situations, there may be two versions of the file with different > extensions, e.g.: > FILE.csv > FILE.xls > I

Re: [R] Regular expressions on filenames

2014-01-15 Thread Jeff Newmiller
You want to match a period and anything that follows to the end of the string, as long as what follows has no period in it. "\\.[^.]*$" --- Jeff NewmillerThe . . Go Live... DCN:

Re: [R] Regular expressions on filenames

2014-01-15 Thread arun
Hi, Try:  FILELIST <- list.files() FILELIST #[1] "FILE.csv" "FILE.XXX.csv" "FILE.YYY.xls"   sub("(.*)\\..*$", "\\1", basename(FILELIST)) #[1] "FILE" "FILE.XXX" "FILE.YYY" A.K. On Wednesday, January 15, 2014 7:35 PM, Fisher Dennis wrote: R 3.0.2 OS X Colleagues I am writing code to

Re: [R] Regular expressions on filenames

2014-01-15 Thread jim holtman
try this: > x <- c( "FILE.XXX.csv" + , "FILE.YYY.xls") > sub("\\.[^.]*$", "", x) [1] "FILE.XXX" "FILE.YYY" > the '[^.]*' says to match anything BUT a period. Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to

[R] Regular expressions on filenames

2014-01-15 Thread Fisher Dennis
R 3.0.2 OS X Colleagues I am writing code to read a large number of files in a particular folder. In some situations, there may be two versions of the file with different extensions, e.g.: FILE.csv FILE.xls I extracted the portion before the extension with: sub("\\..*$"

Re: [R] R Regular Expressions - Metacharacters

2013-02-05 Thread David Winsemius
On Feb 5, 2013, at 9:49 AM, Seth Dickey wrote: > I thought that I can use metacharacters such as \w to match word characters > with one backslash. But for some reason, I need to include two backslashes. > >> grepl(pattern='\w', x="what") > Error: '\w' is an unrecognized escape in character stri

Re: [R] R Regular Expressions - Metacharacters

2013-02-05 Thread Duncan Murdoch
On 05/02/2013 12:49 PM, Seth Dickey wrote: I thought that I can use metacharacters such as \w to match word characters with one backslash. But for some reason, I need to include two backslashes. > grepl(pattern='\w', x="what") Error: '\w' is an unrecognized escape in character string starting "

[R] R Regular Expressions - Metacharacters

2013-02-05 Thread Seth Dickey
I thought that I can use metacharacters such as \w to match word characters with one backslash. But for some reason, I need to include two backslashes. > grepl(pattern='\w', x="what") Error: '\w' is an unrecognized escape in character string starting "\w" > grepl(pattern='\\w', x="what") [1] TRU

Re: [R] Regular expressions: stuck again...

2012-08-24 Thread Noia Raindrops
Hello, try this: x <- c("SELECT [public_tblFiche].[Fichenr], [public_tblArtnr].[Artnr]", "SELECT public_tblFiche.Fichenr, public_tblArtnr.Artnr") # > The square backets [ and ] should removed x <- gsub("[][]", "", x) # > and xxx_xxx.xxx should become \"xxx\".\"xxx\"\".\"xxx\" x <- gsub("([[:al

[R] Regular expressions: stuck again...

2012-08-23 Thread Bart Joosen
Hi, I'm currently reworking a report, originating from a MS Access database, but should be implemented in R. Now I'm facing the task to convert a lot of queries to postgreSQL. What I want to do is make a function which takes the MS Access query as an argument and returns the pgSQL version. So: SE

Re: [R] Regular Expressions in grep - Solution and function to determine significant figures of a number

2012-08-23 Thread Dr. Holger van Lishaut
Am 22.08.2012, 21:46 Uhr, schrieb Dr. Holger van Lishaut : SignifStellen<-function(x){ strx=as.character(x) nchar(regmatches(strx, regexpr("[1-9][0-9]*\\.[0-9]*[1-9]",strx)))-1 } returns the significant figures of a number. Perhaps this can help someone. Sorry, to work, it must

Re: [R] Regular Expressions in grep - Solution and function to determine significant figures of a number

2012-08-22 Thread Bert Gunter
... On Wed, Aug 22, 2012 at 12:46 PM, Dr. Holger van Lishaut wrote: > Dear all, > > regmatches works. > > And, since this has been asked here before: > > SignifStellen<-function(x){ > strx=as.character(x) > nchar(regmatches(strx, regexpr("[1-9][0-9]*\\.[0-9]*[1-9]",strx)))-1 > } > > retur

Re: [R] Regular Expressions in grep - Solution and function to determine significant figures of a number

2012-08-22 Thread Dr. Holger van Lishaut
Dear all, regmatches works. And, since this has been asked here before: SignifStellen<-function(x){ strx=as.character(x) nchar(regmatches(strx, regexpr("[1-9][0-9]*\\.[0-9]*[1-9]",strx)))-1 } returns the significant figures of a number. Perhaps this can help someone. Thanks & best reg

Re: [R] Regular Expressions in grep

2012-08-21 Thread arun
HI, Try this: gsub("^-\\d(\\d{4}.).*","\\1",a) #[1] "1020." gsub("^.*(.\\d{5}).","\\1",a) #[1] ".90920" A.K. - Original Message - From: Dr. Holger van Lishaut To: "r-help@r-project.org" Cc: Sent: Tuesday, August 2

Re: [R] Regular Expressions in grep

2012-08-21 Thread R. Michael Weylandt
You're misreading the docs: from grep, value: if ‘FALSE’, a vector containing the (‘integer’) indices of the matches determined by ‘grep’ is returned, and if ‘TRUE’, a vector containing the matching elements themselves is returned. Since there's a match somewhere

Re: [R] Regular Expressions in grep

2012-08-21 Thread Noia Raindrops
'grep' does not change strings. Use 'gsub' or 'regmatches': # gsub Front <- gsub("^.*?([1-9][0-9]*\\.).*?$", "\\1", a) End <- gsub("^.*?(\\.[0-9]*[1-9]).*?$", "\\1", a) # regexpr and regmatches (R >= 2.14.0) Front <- regmatches(a, regexpr("[1-9][0-9]*\\.", a)) End <- regmatches(a, regexpr("\\.[0-9

Re: [R] Regular Expressions in grep

2012-08-21 Thread Bert Gunter
grep() returns the matches. You want regexpr() and regmatches() -- Bert On Tue, Aug 21, 2012 at 12:24 PM, Dr. Holger van Lishaut wrote: > Dear r-help members, > > I have a number in the form of a string, say: > > a<-"-01020.909200" > > I'd like to extract "1020." as well as ".9092" > > Front<-gr

[R] Regular Expressions in grep

2012-08-21 Thread Dr. Holger van Lishaut
Dear r-help members, I have a number in the form of a string, say: a<-"-01020.909200" I'd like to extract "1020." as well as ".9092" Front<-grep(pattern="[1-9]+[0-9]*\\.", value=TRUE, x=a, fixed=FALSE) End<-grep(pattern="\\.[0-9]*[1-9]+", value=TRUE, x=a, fixed=FALSE) However, both strings gi

Re: [R] Regular Expressions + Matrices

2012-08-10 Thread Fred G
6 Chicago Blacksox 1701 made up > 7 7 Chicago Cubs 1702 made up > 8 8 Chicago Whitesox 1703 made up > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > > > -Original Message----- > > From: r-help-boun...@r-project.

Re: [R] Regular Expressions + Matrices

2012-08-10 Thread William Dunlap
p-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > Of Rui Barradas > Sent: Friday, August 10, 2012 11:18 AM > To: Fred G > Cc: r-help > Subject: Re: [R] Regular Expressions + Matrices > > Hello, > > Try the following. > > > d

Re: [R] Regular Expressions + Matrices

2012-08-10 Thread Fred G
x,1918,ESPN >>> 4,Washington Nationals,2010,ESPN >>> 5,Detroit Tigers,1990,ESPN >>> ",sep=",",header=TRUE,**stringsAsFactors=FALSE) >>> >>> index<-grep("New York.*",dat1$NAME) >>> dat1[i

Re: [R] Regular Expressions + Matrices

2012-08-10 Thread Rui Barradas
ESPN 4,Washington Nationals,2010,ESPN 5,Detroit Tigers,1990,ESPN ",sep=",",header=TRUE,stringsAsFactors=FALSE) index<-grep("New York.*",dat1$NAME) dat1[index,] # ID NAME YEAR SOURCE #1 1New York Mets 1900ESPN #2

Re: [R] Regular Expressions + Matrices

2012-08-10 Thread arun
help@r-project.org Cc: Sent: Friday, August 10, 2012 1:41 PM Subject: [R] Regular Expressions + Matrices Hi all, My code looks like the following: inname = read.csv("ID_error_checker.csv", as.is=TRUE) outname = read.csv("output.csv", as.is=TRUE) #My algorithm is the following: #for

Re: [R] Regular Expressions + Matrices

2012-08-10 Thread Rui Barradas
Hello, Try the following. d <- read.table(textConnection(" ID NAME YEAR SOURCE 1 'New York Mets' 1900 ESPN 2 'New York Yankees' 1920 Cooperstown 3 'Boston Redsox' 1918 ESPN 4 'Washington Nationals' 2010

Re: [R] Regular Expressions + Matrices

2012-08-10 Thread Fred G
LSE) > > index<-grep("New York.*",dat1$NAME) > dat1[index,] > # ID NAME YEAR SOURCE > #1 1New York Mets 1900ESPN > #2 2 New York Yankees 1920 Cooperstown > > A.K. > > > > - Original Message - > From: Fred G

[R] Regular Expressions + Matrices

2012-08-10 Thread Fred G
Hi all, My code looks like the following: inname = read.csv("ID_error_checker.csv", as.is=TRUE) outname = read.csv("output.csv", as.is=TRUE) #My algorithm is the following: #for line in inname #if first string up to whitespace in row in inname$name = first string up to whitespace in row + 1 in in

Re: [R] regular expressions in R

2011-12-21 Thread jim holtman
To be correct for the regular expression, it should be: dir(pattern = "\\.(txt|doc)$") The form dir(pattern="*.txt") will match 'txt' appearing anywhere in the name; this looks like the argument you would have used to "Sys.glob" which is a UNIX style file name match and not a regular expression

Re: [R] regular expressions in R

2011-12-21 Thread R. Michael Weylandt
Do you wish to include .docx files as well or just .doc? Michael On Wed, Dec 21, 2011 at 10:04 AM, Alaios wrote: > Dear all > I would like to ask from dir function in R (?dir) > to give me only the files that end with .txt or .doc. > > The dir functions supports the use of patterns (is not that

Re: [R] regular expressions in R

2011-12-21 Thread Sarah Goslee
>From the help for dir: File naming conventions are platform dependent. The pattern matching works with the case of file names as returned by the OS On my linux system, this works: > dir(pattern="*.txt") [1] "a.txt" "b.txt" > > dir(pattern="*.doc") [1] "c.doc" > > dir(pattern="*.doc|*

[R] regular expressions in R

2011-12-21 Thread Alaios
Dear all I would like to ask from dir function in R (?dir) to give me only the files that end with .txt or .doc. The dir functions supports the use of patterns (is not that regular expressions) for doing that.   print(dir(i,full.names=TRUE,pattern=.)) Could you please help me compose such a

Re: [R] Regular expressions in R

2011-11-16 Thread Michael Griffiths
Thanks to everyone who contributed to my questions. As ever, I am extremely grateful to all those on the R-list who make it what it is. Regards Mike Griffiths On Tue, Nov 15, 2011 at 5:47 PM, Joshua Wiley wrote: > Hi Michael, > > Your strings were long so I made a bit smaller example. Sarah ma

Re: [R] Regular expressions in R

2011-11-15 Thread Joshua Wiley
Hi Michael, Your strings were long so I made a bit smaller example. Sarah made one good point, you want to be using gsub() not sub(), but when I use your code, I do not think it even works precisely for one instance. Try this on for size, you were 99% there: ## simplified cases form1 <- c('produ

Re: [R] Regular expressions in R

2011-11-15 Thread Sarah Goslee
Hi Michael, You need to take another look at the examples you were given, and at the help for ?sub(): The two ‘*sub’ functions differ only in that ‘sub’ replaces only the first occurrence of a ‘pattern’ whereas ‘gsub’ replaces all occurrences. If ‘replacement’ contains backreferen

[R] Regular expressions in R

2011-11-15 Thread Michael Griffiths
Good afternoon list, I have the following character strings; one with spaces between the maths operators and variable names, and one without said spaces. form<-c('~ Sentence + LEGAL + Intro + Intro / Intro1 + Intro * LEGAL + benefit + benefit / benefit1 + product + action * mean + CTA + help + me

Re: [R] Regular Expressions for "Large" Data Set

2011-06-07 Thread Marc Schwartz
On Jun 7, 2011, at 3:55 PM, Abraham Mathew wrote: > I'm running R 2.13 on Ubuntu 10.10 > > I have a data set which is comprised of character strings. > > site = readLines('http://www.census.gov/tiger/tms/gazetteer/zips.txt') > > dat <- c("01, 35004, AL, ACMAR, 86.51557, 33.584132, 6055, 0.00149

[R] Regular Expressions for "Large" Data Set

2011-06-07 Thread Abraham Mathew
I'm running R 2.13 on Ubuntu 10.10 I have a data set which is comprised of character strings. site = readLines('http://www.census.gov/tiger/tms/gazetteer/zips.txt') dat <- c("01, 35004, AL, ACMAR, 86.51557, 33.584132, 6055, 0.001499") dat I want to loop through the data and construct a data fra

Re: [R] Regular Expressions in Column Headings

2011-03-09 Thread Gabor Grothendieck
On Wed, Mar 9, 2011 at 8:52 AM, Matthew DeAngelis wrote: > Hi all, > > I am hoping that someone can help me with a problem I am having with column > headings.  I have read a table into R using read.table: the rows are > documents, and the columns are counts of regular expression matches (so that >

[R] Regular Expressions in Column Headings

2011-03-09 Thread Matthew DeAngelis
Hi all, I am hoping that someone can help me with a problem I am having with column headings. I have read a table into R using read.table: the rows are documents, and the columns are counts of regular expression matches (so that the column heading is the given regular expression). My problem is

Re: [R] Regular Expressions

2010-11-05 Thread Gabor Grothendieck
2010/11/5 Brian Diggs : > Is there a standard, built in way to get both (all) backreferences at the > same time with just one call to sub (or the appropriate function)? I can > cobble something together specifically for 2 backreferences (not extensively > tested): > > both_backrefs <- function(patt

Re: [R] Regular Expressions

2010-11-05 Thread Brian Diggs
On 11/5/2010 12:09 AM, Prof Brian Ripley wrote: On Thu, 4 Nov 2010, Noah Silverman wrote: Hi, I'm trying to figure out how to use capturing parenthesis in regular expressions in R. (Doing this in Perl, Java, etc. is fairly trivial, but I can't seem to find the functionality in R.) For example

Re: [R] Regular Expressions

2010-11-05 Thread Noah Silverman
That's perfect! Don't know how I missed that. I want to start playing with some modeling of financial data and the only format I can download is rather ugly. So my plan is to use a series of Regex to extract what I want. Noticed that you are a Prof. in applied stats. I'm at UCLA working on an

Re: [R] Regular Expressions

2010-11-05 Thread Prof Brian Ripley
On Thu, 4 Nov 2010, Noah Silverman wrote: Hi, I'm trying to figure out how to use capturing parenthesis in regular expressions in R. (Doing this in Perl, Java, etc. is fairly trivial, but I can't seem to find the functionality in R.) For example, given the string:"10 Nov 13.00 (PFE1020

[R] Regular Expressions

2010-11-04 Thread Noah Silverman
Hi, I'm trying to figure out how to use capturing parenthesis in regular expressions in R. (Doing this in Perl, Java, etc. is fairly trivial, but I can't seem to find the functionality in R.) For example, given the string:"10 Nov 13.00 (PFE1020K13)" I want to capture the first to digits

Re: [R] Regular expressions: offsets of groups

2010-09-30 Thread Titus von der Malsburg
Ok, we decided to have a shot at modifying gregexpr. Let's see how it works out. If anybody is interested in discussing this please contact me. R-help doesn't seem like the right place for further discussion. Is there a default place for discussing things like that? Thanks everybody for your re

Re: [R] Regular expressions: offsets of groups

2010-09-29 Thread Titus von der Malsburg
On Wed, Sep 29, 2010 at 1:58 PM, Michael Bedward wrote: > How is your C coding ? Bill ? Anyone else ?  I could have a got at > writing some prototype code to test in the next few days, though if > someone else with decent C skills is itching to do it please speak up. We have a skilled C- and R-pr

Re: [R] Regular expressions: offsets of groups

2010-09-29 Thread Michael Bedward
I'd definitely be a customer for it Titus. And it does seem like an obvious hole in regex processing in R that cries out to be filled. Um, ggregexpr isn't the sexiest of function names :) Perhaps we can think of something a little easier ? How is your C coding ? Bill ? Anyone else ? I could hav

Re: [R] Regular expressions: offsets of groups

2010-09-29 Thread Titus von der Malsburg
Bill, Michael, good to see I'm not the only one who sees potential for improvements in the regexpr domain. Adding a subpattern argument is certainly a step in the right direction and would make my life much easier. However, in my application I need to know not only the position of one group but a

Re: [R] Regular expressions: offsets of groups

2010-09-28 Thread Michael Bedward
Ah, that's interesting - thanks Bill. That's certainly on the right track for me (Titus, you too ?) especially if the subpattern argument accepted a vector of multiple group indices. As you say, this is straightforward in C. I'd be happy to (try to) make a patch for the R sources if there was some

Re: [R] Regular expressions: offsets of groups

2010-09-28 Thread William Dunlap
> -Original Message- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Michael Bedward > Sent: Tuesday, September 28, 2010 12:46 AM > To: Titus von der Malsburg > Cc: r-help@r-project.org > Subject: Re: [R] Regular expressio

Re: [R] Regular expressions: offsets of groups

2010-09-28 Thread Gabor Grothendieck
On Tue, Sep 28, 2010 at 6:52 AM, Titus von der Malsburg wrote: > On Tue, Sep 28, 2010 at 9:46 AM, Michael Bedward > wrote: >> What Titus wants to do is akin to retrieving capturing groups from a >> Matcher object in Java. > > Precisely.  Here's the description: > >  http://download.oracle.com/jav

Re: [R] Regular expressions: offsets of groups

2010-09-28 Thread Titus von der Malsburg
On Tue, Sep 28, 2010 at 9:46 AM, Michael Bedward wrote: > What Titus wants to do is akin to retrieving capturing groups from a > Matcher object in Java. Precisely. Here's the description: http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Matcher.html#start(int) Gabor's lookbe

Re: [R] Regular expressions: offsets of groups

2010-09-28 Thread Michael Bedward
What Titus wants to do is akin to retrieving capturing groups from a Matcher object in Java. I also thought there must be an existing, elegant solution to this some time ago and searched for it, including looking at the sources (albeit with not much expertise) but came up blank. I also looked at t

Re: [R] Regular expressions: offsets of groups

2010-09-27 Thread Gabor Grothendieck
On Mon, Sep 27, 2010 at 1:34 PM, Titus von der Malsburg wrote: > On Mon, Sep 27, 2010 at 7:29 PM, Gabor Grothendieck > wrote: >> Try this zero width negative look behind expression: >> >>> gregexpr("(?!a+)(b+)", "abcdaabbc", perl = TRUE) >> [[1]] >> [1] 2 7 >> attr(,"match.length") >> [1] 1 2 > >

Re: [R] Regular expressions: offsets of groups

2010-09-27 Thread Henrique Dallazuanna
You've tried: gregexpr("b+", "abcdaabbc") On Mon, Sep 27, 2010 at 12:48 PM, Titus von der Malsburg wrote: > Dear list! > > > gregexpr("a+(b+)", "abcdaabbc") > [[1]] > [1] 1 5 > attr(,"match.length") > [1] 2 4 > > What I want is the offsets of the matches for the group (b+), i.e. 2 > and 7, not

Re: [R] Regular expressions: offsets of groups

2010-09-27 Thread Henrique Dallazuanna
You could do this: gregexpr("ab+", "abcdaabbcbb")[[1]] + 1 On Mon, Sep 27, 2010 at 2:25 PM, Titus von der Malsburg wrote: > On Mon, Sep 27, 2010 at 7:16 PM, Henrique Dallazuanna > wrote: > > You've tried: > > > > gregexpr("b+", "abcdaabbc") > > But this would match the third occurrence of b+ in

Re: [R] Regular expressions: offsets of groups

2010-09-27 Thread Titus von der Malsburg
On Mon, Sep 27, 2010 at 7:29 PM, Gabor Grothendieck wrote: > Try this zero width negative look behind expression: > >> gregexpr("(?!a+)(b+)", "abcdaabbc", perl = TRUE) > [[1]] > [1] 2 7 > attr(,"match.length") > [1] 1 2 Thanks Gabor, but this gives me the same result as gregexpr("b+", "abcdaab

Re: [R] Regular expressions: offsets of groups

2010-09-27 Thread Gabor Grothendieck
On Mon, Sep 27, 2010 at 11:48 AM, Titus von der Malsburg wrote: > Dear list! > >> gregexpr("a+(b+)", "abcdaabbc") > [[1]] > [1] 1 5 > attr(,"match.length") > [1] 2 4 > > What I want is the offsets of the matches for the group (b+), i.e. 2 > and 7, not the offsets of the complete matches.  Is there

Re: [R] Regular expressions: offsets of groups

2010-09-27 Thread Titus von der Malsburg
On Mon, Sep 27, 2010 at 7:16 PM, Henrique Dallazuanna wrote: > You've tried: > > gregexpr("b+", "abcdaabbc") But this would match the third occurrence of b+ in "abcdaabbcbb". But in this example I'm only interested in b+ if it's preceded by a+. Titus _

Re: [R] Regular expressions: offsets of groups

2010-09-27 Thread Titus von der Malsburg
Thank you Jim, but just as the solution that I discussed, your proposal involves deconstructing the pattern and searching several times. I'm looking for a general and efficient solution. Internally, the regexpr engine has all necessary information after one pass through the string. What I need i

Re: [R] Regular expressions: offsets of groups

2010-09-27 Thread jim holtman
try this: > x <- gregexpr("a+(b+)", "abcdaabbcaaacaaab") > justA <- gregexpr("a+", "abcdaabbcaaacaaab") > # find matches in 'x' for 'justA' > indx <- which(justA[[1]] %in% x[[1]]) > # now determine where 'b' starts > justA[[1]][indx] + attr(justA[[1]], 'match.length')[indx] [1] 2 7 17 > On M

[R] Regular expressions: offsets of groups

2010-09-27 Thread Titus von der Malsburg
Dear list! > gregexpr("a+(b+)", "abcdaabbc") [[1]] [1] 1 5 attr(,"match.length") [1] 2 4 What I want is the offsets of the matches for the group (b+), i.e. 2 and 7, not the offsets of the complete matches. Is there a way in R to get that? I know about gsubgn and strapply, but they only give me

Re: [R] regular expressions

2009-10-26 Thread baptiste auguie
Perfect, thanks! baptiste 2009/10/26 Gabor Grothendieck : > Assuming only START fields match pat: > >> ## this one has more fields: how do I generalize the regular expression? >> st2 = c("START text1 1 text2 2.3 text3 5", "whatever intermediate text", > + "START text1 23.4 text2 3.1415 text3 6")

Re: [R] regular expressions

2009-10-26 Thread Gabor Grothendieck
Assuming only START fields match pat: > ## this one has more fields: how do I generalize the regular expression? > st2 = c("START text1 1 text2 2.3 text3 5", "whatever intermediate text", + "START text1 23.4 text2 3.1415 text3 6") > > pat <- "[[:alnum:]]+ +([0-9.]+)" > s <- strapply(st2, pat, c, s

[R] regular expressions

2009-10-26 Thread baptiste auguie
Dear list, I have the following text to parse (originating from readLines as some lines have unequal size), st = c("START text1 1 text2 2.3", "whatever intermediate text", "START text1 23.4 text2 3.1415") from which I'd like to extract the lines starting with "START", and group the subsequent fi

Re: [R] Regular expressions: bug or misunderstanding?

2008-07-06 Thread Duncan Murdoch
On 06/07/2008 7:37 PM, Gabor Grothendieck wrote: Look at the discussion of zero width lookahead assertions in ?regex . Use perl = TRUE as previously indicated. Thanks, this seems to work: gsub( "(? On Sun, Jul 6, 2008 at 7:29 PM, Duncan Murdoch <[EMAIL PROTECTED]> wrote: On 06/07/2008 5:37 P

Re: [R] Regular expressions: bug or misunderstanding?

2008-07-06 Thread Gabor Grothendieck
Look at the discussion of zero width lookahead assertions in ?regex . Use perl = TRUE as previously indicated. On Sun, Jul 6, 2008 at 7:29 PM, Duncan Murdoch <[EMAIL PROTECTED]> wrote: > On 06/07/2008 5:37 PM, (Ted Harding) wrote: >> >> On 06-Jul-08 21:17:04, Duncan Murdoch wrote: >>> >>> I'm tryi

Re: [R] Regular expressions: bug or misunderstanding?

2008-07-06 Thread Duncan Murdoch
On 06/07/2008 5:37 PM, (Ted Harding) wrote: On 06-Jul-08 21:17:04, Duncan Murdoch wrote: I'm trying to write a gsub() call that takes a string and escapes all the unescaped quote marks in it. So the string \" would be left unchanged, but \\" would be changed to \\\" because the double ba

Re: [R] Regular expressions: bug or misunderstanding?

2008-07-06 Thread Ted Harding
On 06-Jul-08 21:17:04, Duncan Murdoch wrote: > I'm trying to write a gsub() call that takes a string and escapes all > the unescaped quote marks in it. So the string > > \" > > would be left unchanged, but > > \\" > > would be changed to > > \\\" > > because the double backslash doesn't act

Re: [R] Regular expressions: bug or misunderstanding?

2008-07-06 Thread Gabor Grothendieck
Try adding perl = TRUE On Sun, Jul 6, 2008 at 5:17 PM, Duncan Murdoch <[EMAIL PROTECTED]> wrote: > I'm trying to write a gsub() call that takes a string and escapes all the > unescaped quote marks in it. So the string > > \" > > would be left unchanged, but > > \\" > > would be changed to > > \\\

[R] Regular expressions: bug or misunderstanding?

2008-07-06 Thread Duncan Murdoch
I'm trying to write a gsub() call that takes a string and escapes all the unescaped quote marks in it. So the string \" would be left unchanged, but \\" would be changed to \\\" because the double backslash doesn't act as an escape for the quote, the first just escapes the second. I have

Re: [R] Regular Expressions

2008-05-13 Thread Gabor Grothendieck
On Tue, May 13, 2008 at 5:02 AM, Shubha Vishwanath Karanth <[EMAIL PROTECTED]> wrote: > Suppose, > > S=c("World_is_beautiful", "one_two_three_four","My_book") > > I need to extract the last but one element of the strings. So, my output > should look like: > > Ans=c("is","three","My") > > gsub() ca

Re: [R] Regular Expressions

2008-05-13 Thread Richard . Cotton
> S=c("World_is_beautiful", "one_two_three_four","My_book") > I need to extract the last but one element of the strings. So, my > output should look like: > Ans=c("is","three","My") > gsub() can do this...but wondering how do I give the regular expression sapply(strsplit(S, "_"), functio

Re: [R] Regular Expressions

2008-05-13 Thread Dimitris Rizopoulos
hubha Vishwanath Karanth" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, May 13, 2008 11:02 AM Subject: [R] Regular Expressions Hi R, Again struck with regular expressions... Suppose, S=c("World_is_beautiful", "one_two_three_four","My_

[R] Regular Expressions

2008-05-13 Thread Shubha Vishwanath Karanth
Hi R, Again struck with regular expressions... Suppose, S=c("World_is_beautiful", "one_two_three_four","My_book") I need to extract the last but one element of the strings. So, my output should look like: Ans=c("is","three","My") gsub() can do this...but wondering how do I giv

Re: [R] Regular Expressions Help

2008-04-19 Thread Hans-Jörg Bibiko
On 19.04.2008, at 06:46, maud wrote: > I am having some trouble learning regular expressions. Let me describe > the general problem I am dealing with. Consider the following setup: > > Joe<- c(1,2,3) > Bob<- c(2,4,6) > Alice <- c(9,8,7) > > Matrix <- cbind(Joe, Bob, Alice) > St <- c("Bob", "Alice"

[R] Regular Expressions Help

2008-04-19 Thread maud
I am having some trouble learning regular expressions. Let me describe the general problem I am dealing with. Consider the following setup: Joe<- c(1,2,3) Bob<- c(2,4,6) Alice <- c(9,8,7) Matrix <- cbind(Joe, Bob, Alice) St <- c("Bob", "Alice", "Alice:Bob") Now I want to make a new matrix having

Re: [R] regular expressions

2008-03-12 Thread Christos Hatzis
David > Sent: Wednesday, March 12, 2008 12:15 PM > To: [EMAIL PROTECTED] > Subject: [R] regular expressions > > Hello all, > > Still fighting with regular expressions and such, I am again stuck: > > Suppose I have a vector of character chains. In this vector, >

[R] regular expressions

2008-03-12 Thread GOUACHE David
Hello all, Still fighting with regular expressions and such, I am again stuck: Suppose I have a vector of character chains. In this vector, I wish to identify which character chains start with a given pattern, and then replace everything that comes after said pattern. Here is a quick exampl