Hi Sarah, this is a neat solution. Thanks very much for your help, and your patience with my poorly posed questions. I've learned a lot from your approach.
best regards, Aidan On Wed, Dec 7, 2011 at 1:40 PM, Sarah Goslee <sarah.gos...@gmail.com> wrote: > Hi, > > If you really wanted precision (significant figures) rather than decimal > places, > it would be easy: format() handles that, I believe. > > Your original email said you'd been reading about regular expressions; > continuing > that reading will lead you to the meaning of the cryptic ^ and all the \. > > As for the final ., you're right: I didn't think about having nothing > following the > decimal place. It's much easier to do in two steps: > >> testdata <- data.frame(values=c("10,000.0", "5.321", "1.1"), digits=c(0, 1, >> 2)) >> intermediate <- apply(testdata, 1, function(x)sub(paste("(^.*\\.\\d{", x[2], >> "})(\\d*)", sep=""), "\\1", x[1])) >> intermediate > [1] "10,000." "5.3" "1.1" >> sub("\\.$", "", intermediate) > [1] "10,000" "5.3" "1.1" > > Sarah > On Wed, Dec 7, 2011 at 8:20 AM, Aidan Corcoran > <aidan.corcora...@gmail.com> wrote: >> Hi Sarah, >> >> apologies for the excess. A smaller example: >> >> f<-structure(list(c("GDP per capita (LCU)", "Ratio to EZ GDP Per Cap" >> ), `2005` = c(32128, 0.1), `2009` = c(52163, 0.1), `2010` = c(63100, >> 0.1), `2011` = c(72461, 0.1), `2012` = c(81313, 0.1)), .Names = c("", >> "2005", "2009", "2010", "2011", "2012"), row.names = 3:4, class = >> c("cast_df", >> "data.frame")) >> >> nam2<- >> structure(list(var1 = c("GDP per capita (LCU)", "Ratio to EZ GDP Per Cap" >> ), digi = c(0, 1)), .Names = c("var1", "digi"), row.names = c("98", >> "110"), class = "data.frame") >> >> I'm trying to place a thousand separator in the numbers in the table f: >> >>> f >> 2005 2009 2010 2011 2012 >> 3 GDP per capita (LCU) 32128.0 52163.0 63100.0 72461.0 81313.0 >> 4 Ratio to EZ GDP Per Cap 0.1 0.1 0.1 0.1 0.1 >> >> and also have precision given by variable digi: >> >>> nam2 >> var1 digi >> 98 GDP per capita (LCU) 0 >> 110 Ratio to EZ GDP Per Cap 1 >> >> format >> hi<-format(f,big.mark=",",scientific=F) >> gives me the comma, but now I'm not sure how to get the precision. >> >> Your answer seems to be doing what I want, although when I changed the >> testdata slightly >>>testdata[1,1]<-10000 >>> hi<-format(testdata,big.mark=",",scientific=F) >>> hi >> values digits >> 1 10,000.0 0 >> 2 5.3 1 >> 3 1.1 2 >>> apply(hi, 1, function(x)sub(paste("(^.*\\.\\d{", x[2], "})(\\d*)", sep=""), >>> "\\1", x[1])) >> 1 2 3 >> "10,000." " 5.3" " 1.1" >> The decimal appears to be left behind in 10,000. >> >> Unfortunately your approach is a bit too advanced for me, so I can't >> adapt it. Perhaps you could recommend somewhere where I could read up >> on what the caret and other symbols mean in your paste call? >> >> thanks for your help! >> >> Aidan >> >> On Wed, Dec 7, 2011 at 12:05 PM, Sarah Goslee <sarah.gos...@gmail.com> wrote: >>> Hi, >>> >>> Example data is crucial, but small simple example data is even better. >>> I'm too lazy to figure out which bits I need from your data, so here's >>> a simple example of one way to approach your question. You could >>> use gsub() in very much the same manner if you need more complex >>> output. >>> >>>> testdata <- data.frame(values=c(2.0, 5.3, 1.1), digits=c(0, 1, 2)) >>>> testdata >>> values digits >>> 1 2.0 0 >>> 2 5.3 1 >>> 3 1.1 2 >>> # a nice way that works on numbers >>>> apply(testdata, 1, function(x)sprintf(paste("%0.", x[2], "f", sep=""), >>>> x[1])) >>> [1] "2" "5.3" "1.10" >>> >>> # a messy way that works on strings >>>> apply(testdata, 1, function(x)sub(paste("(^.*\\.\\d{", x[2], "})(\\d*)", >>>> sep=""), "\\1", x[1])) >>> [1] "2" "5.3" "1.1" >>> >>> Also note that the second method will not add zeros to pad out the >>> end. If you need that, I'd consider rearranging the order of your >>> steps so that you can use sprintf(). >>> >>> Someone else might have a more flexible way too; I'd be interested to see >>> it. >>> Unfortunately I don't think sprintf() has a way to insert a thousands >>> separator, >>> or that would be a one-step solution. >>> >>> Sarah >>> >>> On Wed, Dec 7, 2011 at 6:05 AM, Aidan Corcoran >>> <aidan.corcora...@gmail.com> wrote: >>>> Dear all, >>>> >>>> I'm trying to remove some text after the period (a decimal point) in >>>> the data frame 'hi', below. This is one step in formatting a table. So >>>> I would like e.g. >>>> "2.0" to become "2" >>>> and "5.3" to be "5.3", >>>> where the variable digordered contains the number of digits after the >>>> decimal that I would like to display, in the same order in which the >>>> variables appear in hi. If it makes it easier to use, this info is >>>> also contained in the dataframe nam2. The reason the numbers are >>>> recorded as characters is because I used format to get a thousand >>>> separator, which I also need. >>>> >>>> The string manipulation functions in R generally don't seem to work >>>> with matrices or data frames, so e.g. regexpr("\\.", hi[1,2]) works >>>> but not regexpr("\\.", hi). Finding the location of the period and >>>> then using substring was the approach I was thinking of taking, but >>>> this would seem to need for loops here. I was wondering if anyone >>>> knows any easier ways. >>>> >>>> Thanks very much for any help! >>>> >>>> Aidan >>>> >>>> >>>> digordered<- c(0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1) >>>> f<-structure(list(c("GDP (LCU,bn)", "GDP ($, bn)", "GDP per capita (LCU)", >>>> "Ratio to EZ GDP Per Cap", "Share of World GDP (Intl $, %)", >>>> "Real GDP Growth (%)", "Population (mn)", "Unemployment Rate (%)", >>>> "Ratio of Employed/Unemployed", "PPP Exchange Rate", "Nominal Exchange >>>> Rate (LCU per $)", >>>> "Inflation (%)", "Main Lending Rate to Private Sector (%)", "Claims on >>>> Central Gov", >>>> "Claims on Private Sector", "Bank Assets", "Regulator Capital to RWA", >>>> "Tier 1 Capital to RWA", "Return on Equity", "Liquid Assets to ST >>>> Liabilities" >>>> ), `2005` = c(35662, 809, 32128, 0.1, 4.3, 9, 1110, 3.5, NA, >>>> 14.7, 44.1, 4, 10.8, 7, 15, 22835, NA, NA, NA, NA), `2009` = c(61240, >>>> 1265, 52163, 0.1, 5.2, 6.8, 1174, NA, NA, 16.8, 48.4, 10.9, 12.2, >>>> 14, 31, 47180, 13.6, 9, 10.8, 42.8), `2010` = c(75122, 1632, >>>> 63100, 0.1, 5.5, 10.1, 1191, NA, NA, 18.5, 45.7, 12, NA, 15, >>>> 39, 56787, 14.7, 9.9, 10.5, 41.1), `2011` = c(87455, 1843, 72461, >>>> 0.1, 5.7, 7.8, 1207, NA, NA, 19.6, NA, 10.6, NA, NA, NA, NA, >>>> 13.5, 9.3, 14.3, 35.8), `2012` = c(99459, 2013, 81313, 0.1, 5.9, >>>> 7.5, 1223, NA, NA, 20.5, NA, 8.6, NA, NA, NA, NA, NA, NA, NA, >>>> NA)), .Names = c("", "2005", "2009", "2010", "2011", "2012"), row.names = >>>> c(NA, >>>> 20L), class = c("cast_df", "data.frame")) >>>> >>>> hi<-format(f,big.mark=",",scientific=F) >>>> regexpr("\\.", hi) #don't know to get location of "." in a dataframe of >>>> chars >>>> >>>> >>>> nam2<- structure(list(var1 = c("GDP (LCU,bn)", "GDP ($, bn)", "GDP >>>> per capita (LCU)", >>>> "Ratio to EZ GDP Per Cap", "GDP per capita (Intl $)", "EU GDP per >>>> capita (Intl $)", >>>> "Share of World GDP (Intl $, %)", "Real GDP Growth (%)", "Population (mn)", >>>> "Unemployment Rate (%)", "Ratio of Employed/Unemployed", "Employment >>>> (1000s)", >>>> "Unemployment (1000s)", "PPP Exchange Rate", "Nominal Exchange Rate >>>> (LCU per $)", >>>> "Inflation (%)", "Main Lending Rate to Private Sector (%)", "Claims on >>>> Central Gov", >>>> "Claims on Private Sector", "Bank Assets", "Regulator Capital to RWA", >>>> "Tier 1 Capital to RWA", "Return on Equity", "Liquid Assets to ST >>>> Liabilities", >>>> "Reserves"), digi = c(0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, >>>> 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0)), .Names = c("var1", "digi" >>>> ), row.names = c("96", "97", "98", "110", "99", "100", "101", >>>> "102", "103", "111", "112", "104", "105", "106", "107", "108", >>>> "109", "114", "115", "113", "119", "120", "121", "122", "116" >>>> ), class = "data.frame") >>>> >>>> ________________________ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.