Re: [R] data frame manipulation and regex

David Winsemius Wed, 28 Apr 2010 05:42:07 -0700


On Apr 28, 2010, at 8:30 AM, arnaud Gaboury wrote:

TY so much david. We are getting close. But I need to keep "USD" in my
object name (i.e "STANDARD LEAD USD")

> sub("USD+.*.(.../\\d{2})", "USD", avprix$DESCRIPTION)

[1] "CORN Jul/10" "CORN May/10" "ROBUSTACOFFEE (10) Jul/10"[4] "SOYBEANS Jul/10" "SPCL HIGH GRADE ZINC USD""STANDARD LEAD USD"

I had been attempting (unsuccessfully to get the portion within hteparens to be the replaced string; This also works and has hte sideeffect of keeping hte \n that I had not intended to remove from the5th item:


> sub("(USD+.*).../\\d{2}", "\\1", avprix$DESCRIPTION)

[1] "CORN Jul/10" "CORN May/10" "ROBUSTACOFFEE (10) Jul/10"[4] "SOYBEANS Jul/10" "SPCL HIGH GRADE ZINC USD\n""STANDARD LEAD USD "


--
David



***************************
Arnaud Gaboury
Mobile: +41 79 392 79 56
BBM: 255B488F
***************************

-----Original Message-----
From: David Winsemius [mailto:dwinsem...@comcast.net]
Sent: Wednesday, April 28, 2010 2:25 PM
To: arnaud Gaboury
Cc: r-help@r-project.org
Subject: Re: [R] data frame manipulation and regex


On Apr 28, 2010, at 5:14 AM, arnaud Gaboury wrote:

Dear group,

Here is my data.frame :

avprix <-
structure(list(DESCRIPTION = c("CORN Jul/10", "CORN May/10",
"ROBUSTA COFFEE (10) Jul/10", "SOYBEANS Jul/10", "SPCL HIGH GRADE
ZINC USD
Jul/10",
"STANDARD LEAD USD Jul/10"), prix = c(-1.5, -1082, 11084, 1983.5,
-2464, -118), quantity = c(0, -3, 8, 2, -1, 0)), .Names =
c("DESCRIPTION",
"prix", "quantity"), row.names = c(NA, -6L), class = "data.frame")

avprix

                    DESCRIPTION    prix quantity
1                     CORN Jul/10    -1.5        0
2                     CORN May/10 -1082.0       -3
3      ROBUSTA COFFEE (10) Jul/10 11084.0        8
4                 SOYBEANS Jul/10  1983.5        2
5 SPCL HIGH GRADE ZINC USD Jul/10 -2464.0       -1
6        STANDARD LEAD USD Jul/10  -118.0        0

I need to remove the date (i.e. Jul/10 in this example) for each
element of
the DESCRIPTION column that contains the USD symbol. I am trying to
do this
using regular expressions, but must admit I am going nowhere.
My elements in the DESCRIPTION column and the dates can change every
day.


This searches for the pattern USD and then replaces any three
characters , forward-slash, any two characters:

sub("USD+.*(.../..)", "", avprix$DESCRIPTION)

[1] "CORN Jul/10" "CORN May/10""ROBUSTA

COFFEE (10) Jul/10"
[4] "SOYBEANS Jul/10"            "SPCL HIGH GRADE ZINC "
"STANDARD LEAD "

This tightens up the matching by requiring that that the characters
after the slash be digits:

sub("USD+.*(.../\\d{2})", "", avprix$DESCRIPTION)

[1] "CORN Jul/10" "CORN May/10""ROBUSTA

COFFEE (10) Jul/10"
[4] "SOYBEANS Jul/10"            "SPCL HIGH GRADE ZINC "
"STANDARD LEAD "

-- David.



TY for any help.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-

guide.html

and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation and regex

Reply via email to