On May 6, 2010, at 2:14 PM, Greg Snow wrote:
This can be further simplified by combining the 2 subs into a single
gsub('[$,]','',as.character(y)).
This will then convert "$123$35,24,,$1$$2,,3.4" into a number when
you may have wanted something like that to give a warning and/or NA
value.
The g in gsub stands for global (meaning replace every '$' and ','
not just the first one) rather than greedy (which has a different
meaning in regular expressions).
This discussion brings up a related issue that I have thought about
for a while. In the help for read.table in the section on
colClasses it says that you can specify other conversions from
character as long as there is a method for as corresponding to what
you put in.
This suggests to me the approach of writing a conversion function
called something like "as.dollar" then setting
colClasses=c('numeric','dollar','dollar','factor') or something like
that and having the middle 2 columns run through the function.
However my first quick attempt failed (the doc says the method needs
to be in the methods package and my quick attempt with setMethod
created a local copy). There is also the possible problem that this
would create a column with class dollar when I want a simple numeric.
So this brings up 2 questions:
1. has anyone found a way to create a method for as in the methods
package such that my idea above would work? (preferable without much
more work than the post-processing already suggested).
I do get a warning but it does seem to "work" as intended. Basically
following as best I could suggestion a couple of months ago by Gabor
Grothendieck. A link to an early post and then a colClass method to
strip "$" and ","'s:
http://finzi.psych.upenn.edu/Rhelp10/2010-February/229550.html
> Input <- "$245,000,000\n 3,000.000\n $$$34"
> setAs("character", "num.with.commas.dolsign",
+ function(from) as.numeric(gsub(",|\\$", "", from)))
Warning message:
In matchSignature(signature, fdef, where) :
in the method signature for function "coerce" no definition for
class: “num.with.commas.dolsign”
> DF <- read.table(textConnection(Input), header = FALSE,
+ colClasses = c("num.with.commas.dolsign"))
> DF
V1
1 2.45e+08
2 3.00e+03
3 3.40e+01
> sprintf("%12.2f", DF$V1)
[1] "245000000.00" " 3000.00" " 34.00"
Any help with cleaning up the S4 incantations would be welcome.
--
David.
2. If the answer to 1 above is no, are others interested in this
type of functionality and we should move the discussion to r-devel
as a feature request?
Even nicer would be a simple way to go from a single character
vector to multiple columns in the data frame, I remember working
with a file once where the 1st 3 columns were comma separated (no
spaces), but everything after that was white space separated. I
read it in as whitespace separated, then had to post process the 1st
column into 3. But getting all the semantics of 1 to multiple could
be tricky. That particular case could also have been easier if the
sep argument to read.table could be a regular expression, but that
would probably slow things down for the simple cases.
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111
-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
project.org] On Behalf Of David Winsemius
Sent: Thursday, May 06, 2010 4:47 AM
To: Wang, Kevin (SYD)
Cc: r-help@r-project.org; Phil Spector
Subject: Re: [R] Converting dollar value (factors) to numeric
On May 5, 2010, at 11:31 PM, Wang, Kevin (SYD) wrote:
Hi Phil and all those who replied,
Thanks heap! Yes it worked to a certain extent. However, if I have
the
following case:
x <- c("$135,359.00", "$135359.00", "$1,135,359.00")
y <- sub('\\$','',as.character(x))
cost <- as.numeric(sub('\\,','',as.character(y)))
Try gsub, it seems to be more "greedy" :
cost <- as.numeric(gsub('\\,','',as.character(y)))
--
David
Warning message:
NAs introduced by coercion
cost
[1] 135359 135359 NA
Then the third value bcomes NA -- though I suspect it's probably has
something to do with regular expression (which I'm not sure how to
fix)
than R?
Thanks again for the help!
Cheers
Kev
-----Original Message-----
From: Phil Spector [mailto:spec...@stat.berkeley.edu]
Sent: Wednesday, 5 May 2010 6:14 PM
To: Wang, Kevin (SYD)
Cc: r-help@r-project.org
Subject: Re: [R] Converting dollar value (factors) to numeric
Kev-
The most reliable way to do the conversion is as follows:
x = factor(c('$112.11','$119.15','$121.32'))
as.numeric(sub('\\$','',as.character(x)))
[1] 112.11 119.15 121.32
This way negative quantities and numbers without dollar signs are
handled correctly. There's certainly no need to create a new input
file.
It may be easier to understand as
as.numeric(sub('$','',as.character(x),fixed=TRUE))
which gives the same result.
- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spec...@stat.berkeley.edu
On Wed, 5 May 2010, Wang, Kevin (SYD) wrote:
Hi,
I'm trying to read in a bunch of CSV files into R where many
columns
are coded like $111.11. When reading them in they are treated as
factors.
I'm wondering if there is an easy way to convert them into numeric
in
R (as I don't want to modify the source data)? I've done some
searches and can't seem to find an easy way to do this.
I apologise if this is a trivial question, I haven't been using R
for
a while.
Many thanks in advance!
Cheers
Kev
Kevin Wang
Senior Advisor, Health and Human Services Practice Government
Advisory Services
KPMG
10 Shelley Street
Sydney NSW 2000 Australia
Tel +61 2 9335 8282
Fax +61 2 9335 7001
kevinw...@kpmg.com.au
Protect the environment: think before you print
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.