On Oct 20, 2014, at 2:34 PM, Dan Murphy wrote:

> Good ideas, David.
> 
> 1) By "confirm that MS Excel honors that OutDec" I mean that, in a
> location (France? others?) where options("OutDec") is a comma, does MS
> Excel format numbers that way when displaying currencies with decimal
> places? I have no way of knowing if that is true in all OutDec = ","
> locales.
> 
> 2) I wish it were as simple as just removing unwanted "adornments."
> The issue is that such "adornments" must be in their proper places for
> the character string to represent a currency value, or a numeric value
> for that matter. If I add one more comma to your first element in the
> wrong place, it should not translate to a valid numeric, but it does
> with your gsub, which would be a bug if that were in pasteFromExcel:
>> gsub(rmchar, "", c("$1,0,00", "1,200", "800"))
> [1] "1000" "1200" "800"

If you wanted to restrict the substitutions to only the commas that were 
succeeded by three digits then this succeeds:

gsub("(\\,)(\\d{3,3})", "\\2", c("1,000,000,000.00") )
[1] "1000000000.00"

You should also take a look at formatC which has provisions for output using 
commas.

- 
david.


> 
> When I originally looked into this I believed I couldn't be the first
> one asking that question .. and I wasn't. There are many hits for
> regular expressions that purport to successfully identify well-formed
> *US dollar* currency strings. The expression in pasteFromExcel is
> based on 
> http://stackoverflow.com/questions/354044/what-is-the-best-u-s-currency-regex.
> 
> I'm curious if anyone has come across -- and tested -- a similar
> regular expression in other places that might have use for
> pasteFromExcel.
> 
> This is how pasteFromExcel uses its currency regular expression (the
> first ugly assignment is what I'm looking for in other locales around
> the world -- maybe there's a Regular Expression mailing list out
> there):
> 
> currencypattern <-
> "^\\$?\\-?([1-9]{1}[0-9]{0,2}(\\,\\d{3})*(\\.\\d{0,2})?|[1-9]{1}\\d{0,}(\\.\\d{0,2})?|0(\\.\\d{0,2})?|(\\.\\d{1,2}))$|^\\-?\\$?([1-9]{1}\\d{0,2}(\\,\\d{3})*(\\.\\d{0,2})?|[1-9]{1}\\d{0,}(\\.\\d{0,2})?|0(\\.\\d{0,2})?|(\\.\\d{1,2}))$|^\\$?\\(([1-9]{1}\\d{0,2}(\\,\\d{3})*(\\.\\d{0,2})?|[1-9]{1}\\d{0,}(\\.\\d{0,2})?|0(\\.\\d{0,2})?|(\\.\\d{1,2}))\\)$"
> 
> # Here's a test vector
> x <- c("1,234.00", "12,34.00", "$1,000", "(124)", "$(123)", "($123)",
> "  1,000   ", "NA")
> 
> # grep will tell you whether elements of x, trimmed of
> beginning/ending whitespace, match the currencypattern
> grep(currencypattern, trim(x))
> [1] 1 3 4 5 7  # correct answer
> 
> *Now* one may remove unwanted characters from the well-formed strings.
> And deal with the "negatives" of course .. and NAs. See how that's
> done in excelRio.r in the excelRio package on github:
> https://github.com/trinostics/excelRio
> 
> Thanks for your interest.
> 
> 
> On Mon, Oct 20, 2014 at 10:56 AM, David Winsemius
> <dwinsem...@comcast.net> wrote:
>> 
>> On Oct 20, 2014, at 10:29 AM, Dan Murphy wrote:
>> 
>>> Nice.
>>> So if someone were to offer a currency regular expression that works
>>> in their locale, I should also ask them to give me the results of
>>> Sys.getlocale("LC_MONETARY")
>>> and
>>> options("OutDec")
>>> and confirm that MS Excel honors that OutDec.
>> 
>> I'm not sure we can know what you mean by "confirm that MS Excel honors that 
>> OutDec." The result of options("OutDec") was intended for you to determine 
>> what character not to remove from a monetary value in an R workspace. If the 
>> assumption is that all values will be in the same unit and that the user is 
>> not doing any currency conversions then:
>> 
>>> decsep <- options("OutDec")
>>> rmchar <- paste0( "[$£€", c(".", ",")[!c(".", ",") %in% decsep], "]" )
>>> gsub(rmchar, "", c("$1,000", "1,200", "800"))
>> [1] "1000" "1200" "800"
>> 
>> 
>>> Thank you, David.
>>> -Dan
>>> 
>>> On Mon, Oct 20, 2014 at 10:04 AM, David Winsemius
>>> <dwinsem...@comcast.net> wrote:
>>>> 
>>>> On Oct 19, 2014, at 11:18 PM, Dan Murphy wrote:
>>>> 
>>>>> To Users of Excel:
>>>>> 
>>>>> Following advice from Brian and Markus, I created an RMarkdown "vignette"
>>>>> that shows an example of how the pasteFromExcel function in the excelRio
>>>>> package on github could be used by an actuary to transfer a triangle from
>>>>> Excel to R. See today's post at http://trinostics.blogspot.com/
>>>>> 
>>>>> Unfortunately, if you are located outside the US, the demonstrated
>>>>> functionality will not work for you because the currency regex implemented
>>>>> assumes the dollar sign ($) and comma/decimal punctuation of the form
>>>>> 999,999.00.
>>>>> 
>>>>> If anyone is interested in contributing currency regex expressions that
>>>>> work in your locale, I would be happy to try to incorporate them in the
>>>>> package. If anyone knows how best to determine the user's locale (might
>>>>> "timezone" suffice?), I'd appreciate that help too.
>>>>> 
>>>> 
>>>> ?Sys.getlocale   # perhaps "LC_MONETARY"
>>>> 
>>>> ?options   # look for OutDec
>>>> 
>>>> 
>>>>>     [[alternative HTML version deleted]]
>>>> 
>>>> 
>>>> 
>>>> David Winsemius
>>>> Alameda, CA, USA
>>>> 
>> 
>> David Winsemius
>> Alameda, CA, USA
>> 

David Winsemius
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to