On Oct 20, 2014, at 2:34 PM, Dan Murphy wrote: > Good ideas, David. > > 1) By "confirm that MS Excel honors that OutDec" I mean that, in a > location (France? others?) where options("OutDec") is a comma, does MS > Excel format numbers that way when displaying currencies with decimal > places? I have no way of knowing if that is true in all OutDec = "," > locales. > > 2) I wish it were as simple as just removing unwanted "adornments." > The issue is that such "adornments" must be in their proper places for > the character string to represent a currency value, or a numeric value > for that matter. If I add one more comma to your first element in the > wrong place, it should not translate to a valid numeric, but it does > with your gsub, which would be a bug if that were in pasteFromExcel: >> gsub(rmchar, "", c("$1,0,00", "1,200", "800")) > [1] "1000" "1200" "800"
If you wanted to restrict the substitutions to only the commas that were succeeded by three digits then this succeeds: gsub("(\\,)(\\d{3,3})", "\\2", c("1,000,000,000.00") ) [1] "1000000000.00" You should also take a look at formatC which has provisions for output using commas. - david. > > When I originally looked into this I believed I couldn't be the first > one asking that question .. and I wasn't. There are many hits for > regular expressions that purport to successfully identify well-formed > *US dollar* currency strings. The expression in pasteFromExcel is > based on > http://stackoverflow.com/questions/354044/what-is-the-best-u-s-currency-regex. > > I'm curious if anyone has come across -- and tested -- a similar > regular expression in other places that might have use for > pasteFromExcel. > > This is how pasteFromExcel uses its currency regular expression (the > first ugly assignment is what I'm looking for in other locales around > the world -- maybe there's a Regular Expression mailing list out > there): > > currencypattern <- > "^\\$?\\-?([1-9]{1}[0-9]{0,2}(\\,\\d{3})*(\\.\\d{0,2})?|[1-9]{1}\\d{0,}(\\.\\d{0,2})?|0(\\.\\d{0,2})?|(\\.\\d{1,2}))$|^\\-?\\$?([1-9]{1}\\d{0,2}(\\,\\d{3})*(\\.\\d{0,2})?|[1-9]{1}\\d{0,}(\\.\\d{0,2})?|0(\\.\\d{0,2})?|(\\.\\d{1,2}))$|^\\$?\\(([1-9]{1}\\d{0,2}(\\,\\d{3})*(\\.\\d{0,2})?|[1-9]{1}\\d{0,}(\\.\\d{0,2})?|0(\\.\\d{0,2})?|(\\.\\d{1,2}))\\)$" > > # Here's a test vector > x <- c("1,234.00", "12,34.00", "$1,000", "(124)", "$(123)", "($123)", > " 1,000 ", "NA") > > # grep will tell you whether elements of x, trimmed of > beginning/ending whitespace, match the currencypattern > grep(currencypattern, trim(x)) > [1] 1 3 4 5 7 # correct answer > > *Now* one may remove unwanted characters from the well-formed strings. > And deal with the "negatives" of course .. and NAs. See how that's > done in excelRio.r in the excelRio package on github: > https://github.com/trinostics/excelRio > > Thanks for your interest. > > > On Mon, Oct 20, 2014 at 10:56 AM, David Winsemius > <dwinsem...@comcast.net> wrote: >> >> On Oct 20, 2014, at 10:29 AM, Dan Murphy wrote: >> >>> Nice. >>> So if someone were to offer a currency regular expression that works >>> in their locale, I should also ask them to give me the results of >>> Sys.getlocale("LC_MONETARY") >>> and >>> options("OutDec") >>> and confirm that MS Excel honors that OutDec. >> >> I'm not sure we can know what you mean by "confirm that MS Excel honors that >> OutDec." The result of options("OutDec") was intended for you to determine >> what character not to remove from a monetary value in an R workspace. If the >> assumption is that all values will be in the same unit and that the user is >> not doing any currency conversions then: >> >>> decsep <- options("OutDec") >>> rmchar <- paste0( "[$£€", c(".", ",")[!c(".", ",") %in% decsep], "]" ) >>> gsub(rmchar, "", c("$1,000", "1,200", "800")) >> [1] "1000" "1200" "800" >> >> >>> Thank you, David. >>> -Dan >>> >>> On Mon, Oct 20, 2014 at 10:04 AM, David Winsemius >>> <dwinsem...@comcast.net> wrote: >>>> >>>> On Oct 19, 2014, at 11:18 PM, Dan Murphy wrote: >>>> >>>>> To Users of Excel: >>>>> >>>>> Following advice from Brian and Markus, I created an RMarkdown "vignette" >>>>> that shows an example of how the pasteFromExcel function in the excelRio >>>>> package on github could be used by an actuary to transfer a triangle from >>>>> Excel to R. See today's post at http://trinostics.blogspot.com/ >>>>> >>>>> Unfortunately, if you are located outside the US, the demonstrated >>>>> functionality will not work for you because the currency regex implemented >>>>> assumes the dollar sign ($) and comma/decimal punctuation of the form >>>>> 999,999.00. >>>>> >>>>> If anyone is interested in contributing currency regex expressions that >>>>> work in your locale, I would be happy to try to incorporate them in the >>>>> package. If anyone knows how best to determine the user's locale (might >>>>> "timezone" suffice?), I'd appreciate that help too. >>>>> >>>> >>>> ?Sys.getlocale # perhaps "LC_MONETARY" >>>> >>>> ?options # look for OutDec >>>> >>>> >>>>> [[alternative HTML version deleted]] >>>> >>>> >>>> >>>> David Winsemius >>>> Alameda, CA, USA >>>> >> >> David Winsemius >> Alameda, CA, USA >> David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.