Dear Bert,
Thank you for the suggestion. Indeed, there are various solutions and
workarounds. However, there is still a bug in strsplit.
2.) gsub
I would try to avoid gsub on a Wikipedia-sized corpus: using strsplit
directly should be far more efficient.
3.) Punctuation marks
Abbreviations and
Primarily for my own amusement, here is a way to do what I think you wanted
without look-aheads/behinds
strsplit(gsub("([[:punct:]])"," \\1 ","a bc,def, adef,x; ,,gh"), " +")
[[1]]
[1] "a""bc" ",""def" ",""adef" ",""x"";"
[10] ","",""gh"
I certainly would *not* cla
Dear Avi,
Punctuation marks are used in various NLP language models. Preserving
the "," is therefore useful in such scenarios and Regex are useful to
accomplish this (especially if you have sufficient experience with such
expressions).
I observed only an odd behaviour using strsplit: the exa
Leonard,
It can be helpful to spell out your intent in English or some of us have to go
back to the documentation to remember what some of the operators do.
Your text being searched seems to be an example of items between comas with an
optional space after some commas and in one case, nothing b
Dear Bill,
Indeed, there are other cases as well - as documented.
Various Regex sites give the warning to avoid the legacy syntax
"[[:<:]]", so this is the alternative syntax:
strsplit(split="\\b(?=\\w)", "One, two; three!", perl=TRUE)
# "O" "n" "e" ", " "t" "w" "o" "; " "t" "h" "r" "
> Bill Dunlap on Fri, 5 May 2023 08:19:21 -0700 writes:
https://bugs.r-project.org/show_bug.cgi?id=16745 (from 2016, still labelled
'UNCONFIRMED") contains some other examples of strsplit misbehaving when
using 0-length perl look-behinds. E.g.,
Thank you, Bill -- yes, uhmm, ...
https://bugs.r-project.org/show_bug.cgi?id=16745 (from 2016, still labelled
'UNCONFIRMED") contains some other examples of strsplit misbehaving when
using 0-length perl look-behinds. E.g.,
> strsplit(split="[[:<:]]", "One, two; three!", perl=TRUE)[[1]]
[1] "O" "n" "e" ", " "t" "w" "o" "; "
If you only want the character strings, this seems a little simpler:
> strsplit("a bc,def, adef ,,gh", "[ ,]+", perl=T)
[[1]]
[1] "a""bc" "def" "adef" "gh"
If you need delimeters (the commas) you could then add them back in again
afterwards.
Tim
--
Messag
On Thu, 4 May 2023 23:59:33 +0300
Leonard Mada via R-help wrote:
> strsplit("a bc,def, adef ,,gh", " |(?=,)|(?<=,)(?![ ])", perl=T)
> # "a" "bc" "," "def" "," "" "adef" "," "," "gh"
>
> strsplit("a bc,def, adef ,,gh", " |(? # "a" "bc" "," "def" "," "" "adef" ",
A little note on quoting in regular expressions.
I find writing \\. when I want a quoted . somewhat confusing,
so I would use the pattern "_w_.*[.]csv$".
Better still, if you want to match file names,
there is a function glob2rx that converts shell ("glob")
patterns into regular expression pattern
?regexp ## Search the text on "backreference" .(or websearch it: "regular
expression backreference")
-- Bert
On Tue, Sep 17, 2019 at 7:52 AM Ivan Calandra wrote:
> Thank you Bert.
> That's more like what I was looking for.
>
> Could you please tell me where I can find information on the "\\1
Thank you Bert.
That's more like what I was looking for.
Could you please tell me where I can find information on the "\\1"? This
is the part I still don't get.
Ivan
--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Mus
Thanks Jeff!
It does indeed make sense that there is no "AND" corresponding to the "|".
Ivan
--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, German
(For the units)
Why not simply:
sub(".*\\[(.+)\\]","\\1", headers)
Cheers,
Bert
On Tue, Sep 17, 2019 at 6:40 AM Ivan Calandra wrote:
> Thank you Ivan for your help!
>
> Your solution for the first problem is so simple I didn't even think
> about it!
> What I find weird is that "_w_|\\.csv$"
https://stackoverflow.com/questions/3041320/regex-and-operator/37692545
On September 17, 2019 6:39:13 AM PDT, Ivan Calandra wrote:
>Thank you Ivan for your help!
>
>Your solution for the first problem is so simple I didn't even think
>about it!
>What I find weird is that "_w_|\\.csv$" works as e
Thank you Ivan for your help!
Your solution for the first problem is so simple I didn't even think
about it!
What I find weird is that "_w_|\\.csv$" works as expected ("OR"), but is
there no way to combine two patterns with an "AND"?
Your solution to the second problem is actually unfortunate
On Tue, 17 Sep 2019 10:14:24 +0300
Ivan Krylov wrote:
> '\\[.*\\]'
Sorry, I forgot to take it into account that you don't want the [] in
your units, either. That's still doable, but requires so-called
look-around assertions in the regular expression:
'(?<=\\[).*(?=\\])'
This should match any c
On Tue, 17 Sep 2019 08:48:43 +0200
Ivan Calandra wrote:
> CSVs <- list.files(path=..., pattern="\\.csv$")
> w.files <- CSVs[grep(pattern="_w_", CSVs)]
>
> Of course, what I would like to do is list only the interesting files
> from the beginning, rather than subsetting the whole list of files.
These are always kind of fun, not least because of the variety of different
replies that "work" at least somewhat. Here's mine:
> stringa <- "[2440810] / www.tinyurl.com/hgaco4fha3"
> sub("^(.+)www\\.(.+)\\.com.+","\\1\\2",stringa)
[1] "[2440810] / tinyurl"
Note the use of doubled backslashes to
Hi Omar,
you are almost there but! Your first substitution looks 'www' as the
start of the line followed by anything (which then do nothing), so your
second substitution removes everything from the first '.' to be found
(which is the one after www).
What you want to do is
x <- "[2440810] / ww
> On 27 Aug 2017, at 18:18, Omar André Gonzáles Díaz
> wrote:
>
> 3.- If I make the 2 first letter optional with:
>
> ecommerce$sku <-
> gsub("(.*)([a-zA-Z]?{2}[0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)", "\\2",
> ecommerce$producto)
>
> "49MU6300" is capture, but again only "32S5970" from B (missi
Omar, please remember that this is R-help, not R-do-my-work-for-me... you have
already been given several hints as to how you can refine your patterns
yourself. These skills are key to real world data science, so you need to work
at being able to take hints and expand on them if you are to be s
"Please, consider that some SKUs have "-"
in the middle, for example: "PG-9021".
Then you need to include these in the list of patterns you gave. Try it
again -- this time with a **complete** list.
-- Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming along and
Omar:
I don't think this can work. For example number-letter patterns 4),
5), and 6) would all be matched by pattern 6).
As Jeff indicated, you need to provide the delimiters -- what
characters come before and after the SKU patterns -- to be able to
recognize them. In a quick look at the text fil
You may have to provide us more detail on **exactly** the sorts of
patterns you wish to "capture" -- including exactly what you mean by
"capture" (what vaue do you wish to return?) -- as the "obvious"
answer is probably not sufficient:
## using your example -- thankyou
> gsub(".*(49MU6300|LE32S59
Clearly you are being too specific about the structure of the sku. In the
absence of better information about the sku you need to focus on identifying
the delimiters and position of the sku... one way might be:
ecommerce$sku <- sub( "^(.*)[ \n]+([^ \n]+)$", "\\2", ecommerce$producto )
Please l
Shouldn't your "[:digit:]" be "[[:digit:]]"?
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Mon, Feb 6, 2017 at 10:36 AM, Tilmann Faul wrote:
> Using R is a grate advantage, thanks for your work.
>
> Using regex under R 3.1.1, Debian 8.6 jessy works fine.
>
> str_detect("16-03-08", "[:digit:]{2
You don't need a regex.
?strsplit
Something like:
> y <-c("PPA 06 - Promo Vasito", "PPA 05 - Cuentos")
> sapply(strsplit(y, "-"),"[",2)
[1] " Promo Vasito" " Cuentos"
You may have to add spaces around your "-" , as you failed to supply
data so I cannot be sure what you have.
-- Bert
Bert G
Hi,
If your actual data are of the same form as your sample data, why not just:
x <- c("PPA 06 - Promo Vasito", "PPA 05 - Cuentos",
"PPA 04 - Promo vasito", "PPA 03 - Promoción escolar",
"PPA - Saluda a tu pediatra", "PPL - Dia del Pediatra")
sub("^.* - ", "", x)
[1] "Promo Vasito" "Cuent
> On Dec 19, 2016, at 1:25 PM, Omar André Gonzáles Díaz
> wrote:
>
> I have the following strings:
>
> [1] "PPA 06 - Promo Vasito" [2] "PPA 05 - Cuentos"
> [3] "PPA 04 - Promo vasito" [4] "PPA 03 - Promoción escolar"
> [5] "PPA - Saluda a tu pediatra" [6] "PPL - Dia del Pediatra"
>
On 03/22/2016 12:44 AM, Omar André Gonzáles Díaz wrote:
Hi,I have a DF with a column with "html", like this:
https://ad.doubleclick.net/ddm/trackimp/N344006.1960500FACEBOOKAD/B9589414.130145906;dc_trk_aid=303019819;dc_trk_cid=69763238;ord=[timestamp];dc_lat=;dc_rdid=;tag_for_child_directed_tre
?strsplit #I think
My "solution" assumes a fixed format for the URL's as shown in your
example. If that is not the case, it doesn't work.
> y <- ' SRC="https://ad.doubleclick.net/ddm/trackimp/N344006.1960500FACEBOOKAD/B9589414.130145906;dc_trk_aid=303019819;dc_trk_cid=69763238;ord=[timestamp];dc_
Thanks S. Ellison.
Finally, Ihad some time to test it. Thanks for your clarification.
Just one more question:
You say:
Your regexes are on multiple lines and include whitespace and linefeeds.
For example you are not testing for
" .*forum.*|.*buy.*"; you are testing for
" .*forum.*|
> From: Omar André Gonzáles Díaz
> Subject: [R] regex not working for some entries in for loop
>
> I'm using some regex in a for loop to check for some values in column
> "source",
> and put a result in column "fuente".
Your regexes are on multiple lines and include whitespace and linefeeds. F
On Oct 10, 2015, at 10:57 PM, Karim Mezhoud wrote:
> My code is not correct.
> The idea is to use apply instead of a loop. more efficiency.
There is no increased efficiency in using apply. Both `apply` and a `for` loop
will perform with equal efficiency. The only advantage is the mental clarity
You are the domain expert, but it would seem to me that "-NEGRO" is a part of
the ID because it uniquely specifies the product.
>From the perspective of expressing your business logic in code, dropping this
>part of the string should have a separate line in the code, and a comment.
>Dropping th
My code is not correct.
The idea is to use apply instead of a loop. more efficiency.
Karim
On Sun, Oct 11, 2015 at 6:42 AM, Omar André Gonzáles Díaz <
oma.gonza...@gmail.com> wrote:
> Thanks Karim. linio.tv is in the email. In the last part.
> El oct 11, 2015 12:39 AM, "Karim Mezhoud" escribió:
Thanks Karim. linio.tv is in the email. In the last part.
El oct 11, 2015 12:39 AM, "Karim Mezhoud" escribió:
> Hi,
> omit unlist and test. otherwise you can use apply function.
>
> draft:
>
> df1 <- apply(linio.tv, 1, function(x) strsplit(x[,idproductio],
> "[^A-Z0-9-]+"))
>
> fct <- function(l
Hi,
omit unlist and test. otherwise you can use apply function.
draft:
df1 <- apply(linio.tv, 1, function(x) strsplit(x[,idproductio],
"[^A-Z0-9-]+"))
fct <- function(linio.tv){
if(any(grep("[A-Z][0-9]", linio.tv[,idx_productio]))) {
linio.tv[,idx(id)] <- linio.tv[,idx
Hi Boris,
I've modified a little the for loop to catch the IDs (if there is any)
otherwise to put NAs. This is for another data set.
for (i in 1:nrow(linio.tv)) {
v <- unlist(strsplit(linio.tv$producto[i], "[^A-Z0-9-]+")) #
isolate tokens
if(any(grep("[A-Z][0-9]", v))) {
Thank you very much to both of you. This information is very enlightening
to me.
Cheers.
2015-10-10 1:11 GMT-05:00 Boris Steipe :
> David answered most of this. Just a two short notes inline.
>
>
>
>
> On Oct 10, 2015, at 12:38 AM, Omar André Gonzáles Díaz <
> oma.gonza...@gmail.com> wrote:
>
>
David answered most of this. Just a two short notes inline.
On Oct 10, 2015, at 12:38 AM, Omar André Gonzáles Díaz
wrote:
> David, Boris, so thankfull for your help. Both approaches are very good. I
> got this solve with David's help.
>
> I find very insteresting Bori's for loop. And I ne
On Oct 9, 2015, at 9:38 PM, Omar André Gonzáles Díaz wrote:
> David, Boris, so thankfull for your help. Both approaches are very good. I
> got this solve with David's help.
>
> I find very insteresting Bori's for loop. And I need a little help
> understanding the regex part on it.
>
> - The
David, Boris, so thankfull for your help. Both approaches are very good. I
got this solve with David's help.
I find very insteresting Bori's for loop. And I need a little help
understanding the regex part on it.
- The strsplit function: strsplit(ripley.tv$producto[i], "[^A-Z0-9-]+")
I understand
On Oct 9, 2015, at 4:21 PM, Boris Steipe wrote:
> I think you are going into the wrong direction here and this is a classical
> example of what we mean by "technical debt" of code. Rather than tell to your
> regular expression what you are looking for, you are handling special cases
> with red
I think you are going into the wrong direction here and this is a classical
example of what we mean by "technical debt" of code. Rather than tell to your
regular expression what you are looking for, you are handling special cases
with redundant code. This is ugly, brittle and impossible to maint
On Oct 9, 2015, at 2:48 PM, Omar André Gonzáles Díaz wrote:
> Thank you, David. You put me in the right direction.
>
> At the end, I've used a lot of lines, to my taste, for this task.
>
> Is there a more elegant way, of doing this?
There are conditional capture-classes in rexex in addition t
Thank you, David. You put me in the right direction.
At the end, I've used a lot of lines, to my taste, for this task.
Is there a more elegant way, of doing this?
ripley.tv$id <- sub("(.*)( [0-9]{2}[a-z]{1}[0-9]{4})(.*)", "\\2",
ripley.tv$producto,
ignore
On Oct 9, 2015, at 1:50 PM, Omar André Gonzáles Díaz wrote:
> David,
>
> this is a working case. I know that all cases for ID are not covered with my
> current code.
>
> The question is:
>
> ID stars as NAs.
>
> 1.- How to extract 1 type of ID, and keep the rest of entries as they are.
>
David,
this is a working case. I know that all cases for ID are not covered with
my current code.
The question is:
ID stars as NAs.
1.- How to extract 1 type of ID, and keep the rest of entries as they are.
2.- Then keep the first extraction, and search for second type of ID.
3.- An so on wit
On Oct 9, 2015, at 12:59 PM, Omar André Gonzáles Díaz wrote:
> I need to extract an ID from the product column of my df.
>
> I was able to extract the ids for some scenearios, but when applying my
> code for the next type of ids (there are some other combinations), the
> results of my first line
Yes, you are right. Thank you.
2015-10-08 20:07 GMT-05:00 David Winsemius :
>
> On Oct 8, 2015, at 4:50 PM, Omar André Gonzáles Díaz wrote:
>
> > David, it does work but not in all cases:
>
> It should work if you change the "+" to "*" in the last capture class. It
> makes trailing non-digit cha
On Oct 8, 2015, at 4:50 PM, Omar André Gonzáles Díaz wrote:
> David, it does work but not in all cases:
It should work if you change the "+" to "*" in the last capture class. It
makes trailing non-digit characters entirely optional.
> sub("(^.+ )(\\d+)([\"]|[']{2})(.*$)", "\\2\\3", b)
[1] "4
On Oct 8, 2015, at 3:45 PM, Omar André Gonzáles Díaz wrote:
> Hi I have a vector of 100 elementos like this ones:
>
> a <- c("SMART TV LCD FHD 70\" LC70LE660", "LED FULL HD 58'' LE58D3140")
>
> I want to put just the (70\") and (58'') in a vector b.
> sub("(^.+ )(\\d+)([\"]|[']{2})(.+$)", "\\2
On Thu, Oct 08, 2015 at 05:45:13PM -0500, Omar André Gonzáles Díaz wrote:
> Hi I have a vector of 100 elementos like this ones:
>
> a <- c("SMART TV LCD FHD 70\" LC70LE660", "LED FULL HD 58'' LE58D3140")
>
> I want to put just the (70\") and (58'') in a vector b.
>
> This is my try, but is not w
On Oct 6, 2015, at 7:38 AM, Johannes Radinger wrote:
> Hi
>
> I'd like to remove a leading "3" if my number is 7 digits long, if it is
> only 6 I don't want to anything.
> I think this should be possible with a 1-liner using sub() but I am not
> sure how to define the number of characters follow
> On Oct 6, 2015, at 9:38 AM, Johannes Radinger
> wrote:
>
> Hi
>
> I'd like to remove a leading "3" if my number is 7 digits long, if it is
> only 6 I don't want to anything.
> I think this should be possible with a 1-liner using sub() but I am not
> sure how to define the number of character
Hi Johannes,
Not sure if this can be done with sub() only, but combining it with
ifelse() apparently does what you want:
ifelse(nchar(a)==7, sub("^3","",a), a)
HTH,
Ivan
--
Ivan Calandra, PhD
University of Reims Champagne-Ardenne
GEGENAA - EA 3795
CREA - 2 esplanade Roland Garros
51100 Reims,
On Thu, Mar 12, 2015 at 9:52 PM, John McKown
wrote:
> [...]
> One problem is that Adrian wanted, for some reason, to exclude numbers
> such as "2." but accept "2.0" . That is, no unnecessary trailing
> decimal point. as.numeric() will not fail on "2." since that is a
> number. The example grep() s
On Thu, Mar 12, 2015 at 2:43 PM, Steve Taylor wrote:
> How about letting a standard function decide which are numbers:
>
> which(!is.na(suppressWarnings(as.numeric(myvector
>
> Also works with numbers in scientific notation and (presumably) different
> decimal characters, e.g. comma if that's
How about letting a standard function decide which are numbers:
which(!is.na(suppressWarnings(as.numeric(myvector
Also works with numbers in scientific notation and (presumably) different
decimal characters, e.g. comma if that's what the locale uses.
-Original Message-
From: R-help
Perfect, perfect, perfect.
Thanks very much, John.
Adrian
On Wed, Mar 11, 2015 at 10:00 PM, John McKown
wrote:
> See if the following will work for you:
>
> grep('^-?[0-9]+([.]?[0-9]+)?$',myvector,perl=TRUE,invert=TRUE)
>
>> myvector <- c("a3", "N.A", "1.2", "-3", "3-2", "2.")
>> grep('^-?[0-9]+(
See if the following will work for you:
grep('^-?[0-9]+([.]?[0-9]+)?$',myvector,perl=TRUE,invert=TRUE)
> myvector <- c("a3", "N.A", "1.2", "-3", "3-2", "2.")
> grep('^-?[0-9]+([.][0-9]+)?$',myvector,perl=TRUE,invert=TRUE)
[1] 1 2 5 6
>
The key is to match a number, and then invert the TRUE / FAL
Hi Tom,
You could try:
library(stringr)
str_extract(x, perl("(?<=[A-Za-z]{4}/).*(?=/[0-9])"))
#[1] "S01-012"
A.K.
On Friday, August 15, 2014 12:20 PM, Tom Wright wrote:
Hi,
Can anyone please assist.
given the string
> x<-"/mnt/AO/AO Data/S01-012/120824/"
I would like to extract "S01-012"
On Aug 15, 2014, at 11:56 AM, Tom Wright wrote:
> WOW!!!
>
> What can I say 4 answers in less than 4 minutes. Thank you everyone. If
> I can't make it work now I don't deserve to.
>
> btw. the strsplit approach wouldn't work for me as:
> a) I wanted to play with regex and
> b) the location i
Must be another lucky streak. :-)
---
Jeff NewmillerThe . . Go Live...
DCN:Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#..
Hello,
I don't believe you need an extra package for that. Try
sub("\\/mnt\\/AO\\/AO Data\\/([-[:alnum:]]*)\\/.+", "\\1", x)
or, with package stringr,
str_match(x,"\\/mnt\\/AO\\/AO Data\\/(.+)\\/.+")
Hope this helps,
Rui Barradas
Em 15-08-2014 17:18, Tom Wright escreveu:
Hi,
Can anyone p
WOW!!!
What can I say 4 answers in less than 4 minutes. Thank you everyone. If
I can't make it work now I don't deserve to.
btw. the strsplit approach wouldn't work for me as:
a) I wanted to play with regex and
b) the location isn't consistent.
Nice to see email support still works, not everyt
On Aug 15, 2014, at 11:18 AM, Tom Wright wrote:
> Hi,
> Can anyone please assist.
>
> given the string
>
>> x<-"/mnt/AO/AO Data/S01-012/120824/"
>
> I would like to extract "S01-012"
>
> require(stringr)
>> str_match(x,"\\/mnt\\/AO\\/AO Data\\/(.+)\\/+")
>> str_match(x,"\\/mnt\\/AO\\/AO Data
> -Original Message-
> > x<-"/mnt/AO/AO Data/S01-012/120824/"
>
> I would like to extract "S01-012"
> gsub("/mnt/AO/AO Data/(.+)/.+", "\\1", x)
#does it, as does
> gsub("/mnt/AO/AO Data/([\\w-]+)/.+", "\\1", x, perl=TRUE)# \w is perl RE;
> the default is POSIX, which would be.
>
>
> R> as.vector(sapply(my.cache.list, function(x)strsplit(x, "\\.")[[1]][2]))
> [1] "subject_test" "subject_train" "y_test""y_train"
>
>
> R> gsub("df\\.(.*)\\.RData", "\\1", my.cache.list)
> [1] "subject_test" "subject_train" "y_test""y_train"
>
>
> Note that "." will match any
Hi,
Here are two possibilities:
R> as.vector(sapply(my.cache.list, function(x)strsplit(x, "\\.")[[1]][2]))
[1] "subject_test" "subject_train" "y_test""y_train"
R> gsub("df\\.(.*)\\.RData", "\\1", my.cache.list)
[1] "subject_test" "subject_train" "y_test""y_train"
Note that "
Try:
gsub(".*\\.(.*)\\..*","\\1", my.cache.list)
[1] "subject_test" "subject_train" "y_test" "y_train"
#or
library(stringr)
str_extract(my.cache.list, perl('(?<=\\.).*(?=\\.)'))
[1] "subject_test" "subject_train" "y_test" "y_train"
A.K.
On Thursday, July 31, 2014 11:05 AM,
> I want to keep only the part inside the two points. After lots of headache
> using grep() when trying something like this:
>
> grep('.(.*?).','df.subject_test.RData',value=T)
>
>
> Does anyone have any suggestion ?
gsub("df\\.(.+)\\.RData", "\\1", 'df.subject_test.RData')
Steve E
***
You need to use the JSON library or equivalent to solve this problem. I don't
understand why you think that having the data in the clipboard prevents you
from doing this since that is just another file (but I usually avoid using the
clipboard for reproducible analysis anyway).
--
Bill I found a workaround:
f <- ff(formula, lab)
f <- as.formula(gsub("`", "", as.character(deparse(f
Thanks for your elegant solution.
Frank
--
Thanks Bill. The problem is one of the results of convertName might be
'Heading("Age in Years")*age' (this is fo
re, TIBCO Software
wdunlap tibco.com
> -Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf
> Of Frank Harrell
> Sent: Thursday, August 15, 2013 7:47 PM
> To: RHELP
> Subject: Re: [R] regex challenge
>
> Bill that is very impresive. The only
r[[i]], convertName = convertName)
}
} else if (is.name(expr)) {
expr <- as.name(convertName(expr))
}
expr
}
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project
quot;Female") * SBPz) *
Heading() * Gz + (AGEz + SBPz) * Heading() * TRIOz ~ Heading() *
COUNTRYz * Heading() * SEXz
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf
> Of William Dunlap
&
"Female") * SBPz) *
Heading() * Gz + (AGEz + SBPz) * Heading() * TRIOz ~ Heading() *
COUNTRYz * Heading() * SEXz
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] O
-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
> Behalf
> Of Frank Harrell
> Sent: Thursday, August 15, 2013 4:45 PM
> To: RHELP
> Subject: Re: [R] regex challenge
>
> I really appreciate the excellent ideas from Bill Dunlap and Greg Snow.
> Both sugg
I really appreciate the excellent ideas from Bill Dunlap and Greg Snow.
Both suggestions almost work perfectly. Greg's recognizes expressions
such as sex=='female' but not ones such as age > 21, age < 21, a - b >
0, and possibly other legal R expressions. Bill's idea is similar to
what Dunca
-
From: Greg Snow <538...@gmail.com>
To: Frank Harrell
Cc: RHELP
Sent: Thursday, August 15, 2013 5:07 PM
Subject: Re: [R] regex challenge
Here is a first stab:
library(gsubfn)
test <- "y1 + y2 ~ a*(b + c) + d + f * (h == 3) + (sex == 'male')*i"
gsubfn( &q
Here is a first stab:
library(gsubfn)
test <- "y1 + y2 ~ a*(b + c) + d + f * (h == 3) + (sex == 'male')*i"
gsubfn( "([a-zA-Z][a-zA-Z0-9]*)((?=\\s*[-+~)*])|\\s*$)",
function(x,...) paste0(toupper(x),'z'), test, perl=TRUE )
On Wed, Aug 14, 2013 at 9:13 PM, Frank Harrell wrote:
> I would like t
I think substitute() or bquote() will do a better job here than gsub() be
they work on the parsed formula rather than on the raw string. The
terms() function will interpret the formula-specific operators like "+"
and ":" to come up with a list of the 'variables' (or 'terms') in the formula
E.g.,
This might be hard.
How to tell f is to be changed while h is NOT ...
Thanks,
Guanrao
http://www.myfav5.com
where fun and easy friend-making happens
From: Frank Harrell
To: RHELP
Sent: Wednesday, August 14, 2013 11:13 PM
Subject: [R] regex challenge
I
> -Original Message-
> > So what is the special behavior of the ^ symbol when not at
>> the beginning of the string that occurs when it is not escaped?
>
> I think it retains its meaning as an assertion that it occurs
> at the beginning of the line, and so a pattern like "a^b"
> coul
Hello,
Em 21-01-2013 20:52, Duncan Murdoch escreveu:
On 13-01-21 3:20 PM, Jeff Newmiller wrote:
Apparently Extended RegExp syntax eliminated the
"^-is-an-ordinary-character-except-for-two-uses" meaning that I am
familiar with from the Basic RegExp usage, since GNU grep with the -e
option also r
On 13-01-21 3:20 PM, Jeff Newmiller wrote:
Apparently Extended RegExp syntax eliminated the
"^-is-an-ordinary-character-except-for-two-uses" meaning that I am familiar with from the
Basic RegExp usage, since GNU grep with the -e option also refuses to match the carat unless it is
escaped. The
Apparently Extended RegExp syntax eliminated the
"^-is-an-ordinary-character-except-for-two-uses" meaning that I am familiar
with from the Basic RegExp usage, since GNU grep with the -e option also
refuses to match the carat unless it is escaped. The TRE library treats BRE as
obsolete, so we on
On 13-01-21 1:05 PM, Jeff Newmiller wrote:
So what is the special behavior of the ^ symbol when not at the beginning of
the string that occurs when it is not escaped?
I think it retains its meaning as an assertion that it occurs at the
beginning of the line, and so a pattern like "a^b" could
On Jan 21, 2013, at 10:05 AM, Jeff Newmiller wrote:
So what is the special behavior of the ^ symbol when not at the
beginning of the string that occurs when it is not escaped?
Isn't there a distinction between what _is_ "special" and what should
be "special". You are saying that "^" after
So what is the special behavior of the ^ symbol when not at the beginning of
the string that occurs when it is not escaped?
---
Jeff NewmillerThe . . Go Live...
DCN:Basics: ##
On 13-01-21 11:48 AM, Jeff Newmiller wrote:
I am not sure I understand what worked perfectly, since it is my understanding
that ^ is only special at the beginning of the regex (to anchor the pattern at
the beginning of the target string) or as the first character of a character
set (to indicat
On Mon, 21 Jan 2013, mtb...@gmail.com wrote:
I am trying to search for string that includes the caret symbol, using the
following code:
grepl("latitude^2",temp)
Many regex implementations require us to escape a metacharacter such as
'^' by preceeding it with a backslash. This indicates the n
I am not sure I understand what worked perfectly, since it is my understanding
that ^ is only special at the beginning of the regex (to anchor the pattern at
the beginning of the target string) or as the first character of a character
set (to indicate exclusion of the listed characters). In any
Hi Tsjerk, many thanks...that worked perfectly!
Mark Na
On Mon, Jan 21, 2013 at 9:36 AM, Tsjerk Wassenaar wrote:
> Oh, I'm jetlagged. ^ is a control character for 'start of string'. In the
> context of a character set it means negation: [^a-z].
>
> Ciao,
>
> Tsjerk
>
>
> On Mon, Jan 21, 2013
Oh, I'm jetlagged. ^ is a control character for 'start of string'. In the
context of a character set it means negation: [^a-z].
Ciao,
Tsjerk
On Mon, Jan 21, 2013 at 4:33 PM, Tsjerk Wassenaar wrote:
> Hi Mark Na,
>
> Try:
>
> grepl("latitude\\^2",temp)
>
> ^ is a control character for negation,
Hi Mark Na,
Try:
grepl("latitude\\^2",temp)
^ is a control character for negation, so you have to escape it.
Cheers,
Tsjerk
On Mon, Jan 21, 2013 at 4:26 PM, wrote:
> Hello R-helpers,
>
> I am trying to search for string that includes the caret symbol, using the
> following code:
>
> grepl(
On Jun 2, 2011, at 4:21 PM, Ben Ganzfried wrote:
> Thank you very much for your help. It saved me a lot of time and it
> worked perfectly. I have a quick follow-up as I'm not sure I
> understand yet why the code works and where it comes from.
>
> For example, in: Tstg <- sub(".*T(\\d)N.", "
1 - 100 of 197 matches
Mail list logo