On May 6, 2010, at 2:14 PM, Greg Snow wrote:

This can be further simplified by combining the 2 subs into a single gsub('[$,]','',as.character(y)).

This will then convert "$123$35,24,,$1$$2,,3.4" into a number when you may have wanted something like that to give a warning and/or NA value.

The g in gsub stands for global (meaning replace every '$' and ',' not just the first one) rather than greedy (which has a different meaning in regular expressions).

This discussion brings up a related issue that I have thought about for a while. In the help for read.table in the section on colClasses it says that you can specify other conversions from character as long as there is a method for as corresponding to what you put in.

This suggests to me the approach of writing a conversion function called something like "as.dollar" then setting colClasses=c('numeric','dollar','dollar','factor') or something like that and having the middle 2 columns run through the function. However my first quick attempt failed (the doc says the method needs to be in the methods package and my quick attempt with setMethod created a local copy). There is also the possible problem that this would create a column with class dollar when I want a simple numeric.

So this brings up 2 questions:

1. has anyone found a way to create a method for as in the methods package such that my idea above would work? (preferable without much more work than the post-processing already suggested).


I do get a warning but it does seem to "work" as intended. Basically following as best I could suggestion a couple of months ago by Gabor Grothendieck. A link to an early post and then a colClass method to strip "$" and ","'s:

http://finzi.psych.upenn.edu/Rhelp10/2010-February/229550.html

> Input <- "$245,000,000\n 3,000.000\n $$$34"

> setAs("character", "num.with.commas.dolsign",
+     function(from) as.numeric(gsub(",|\\$", "", from)))
Warning message:
In matchSignature(signature, fdef, where) :
in the method signature for function "coerce" no definition for class: “num.with.commas.dolsign”
> DF <- read.table(textConnection(Input), header = FALSE,
+     colClasses = c("num.with.commas.dolsign"))
> DF
        V1
1 2.45e+08
2 3.00e+03
3 3.40e+01

> sprintf("%12.2f", DF$V1)
[1] "245000000.00" "     3000.00" "       34.00"

Any help with cleaning up the S4 incantations would be welcome.

--
David.


2. If the answer to 1 above is no, are others interested in this type of functionality and we should move the discussion to r-devel as a feature request?

Even nicer would be a simple way to go from a single character vector to multiple columns in the data frame, I remember working with a file once where the 1st 3 columns were comma separated (no spaces), but everything after that was white space separated. I read it in as whitespace separated, then had to post process the 1st column into 3. But getting all the semantics of 1 to multiple could be tricky. That particular case could also have been easier if the sep argument to read.table could be a regular expression, but that would probably slow things down for the simple cases.



--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
project.org] On Behalf Of David Winsemius
Sent: Thursday, May 06, 2010 4:47 AM
To: Wang, Kevin (SYD)
Cc: r-help@r-project.org; Phil Spector
Subject: Re: [R] Converting dollar value (factors) to numeric


On May 5, 2010, at 11:31 PM, Wang, Kevin (SYD) wrote:

Hi Phil and all those who replied,

Thanks heap!  Yes it worked to a certain extent.  However, if I have
the
following case:
x <- c("$135,359.00", "$135359.00", "$1,135,359.00")
y <- sub('\\$','',as.character(x))
cost <- as.numeric(sub('\\,','',as.character(y)))

Try gsub, it seems to be more "greedy" :

cost <- as.numeric(gsub('\\,','',as.character(y)))

--
David
Warning message:
NAs introduced by coercion
cost
[1] 135359 135359     NA

Then the third value bcomes NA -- though I suspect it's probably has
something to do with regular expression (which I'm not sure how to
fix)
than R?

Thanks again for the help!

Cheers
Kev

-----Original Message-----
From: Phil Spector [mailto:spec...@stat.berkeley.edu]
Sent: Wednesday, 5 May 2010 6:14 PM
To: Wang, Kevin (SYD)
Cc: r-help@r-project.org
Subject: Re: [R] Converting dollar value (factors) to numeric

Kev-
 The most reliable way to do the conversion is as follows:

x = factor(c('$112.11','$119.15','$121.32'))
as.numeric(sub('\\$','',as.character(x)))
[1] 112.11 119.15 121.32

This way negative quantities and numbers without dollar signs are
handled correctly.  There's certainly no need to create a new input
file.

It may be easier to understand as

as.numeric(sub('$','',as.character(x),fixed=TRUE))

which gives the same result.
                                        - Phil Spector
                                         Statistical Computing Facility
                                         Department of Statistics
                                         UC Berkeley
                                         spec...@stat.berkeley.edu


On Wed, 5 May 2010, Wang, Kevin (SYD) wrote:

Hi,

I'm trying to read in a bunch of CSV files into R where many columns
are coded like $111.11.  When reading them in they are treated as
factors.

I'm wondering if there is an easy way to convert them into numeric
in
R (as I don't want to modify the source data)?  I've done some
searches and can't seem to find an easy way to do this.

I apologise if this is a trivial question, I haven't been using R
for
a while.

Many thanks in advance!

Cheers

Kev

Kevin Wang
Senior Advisor, Health and Human Services Practice Government
Advisory Services

KPMG
10 Shelley Street
Sydney  NSW  2000  Australia

Tel     +61 2 9335 8282
Fax     +61 2 9335 7001

kevinw...@kpmg.com.au

Protect the environment: think before you print




        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to