Hi, Try using trim=TRUE, in ?format() options(digits=4) df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000) df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], trim=TRUE,scientific = FALSE)) df2$id2[99990:100010] # [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" "99997" # [9] "99998" "99999" "100000" "100001" "100002" "100003" "100004" "100005" #[17] "100006" "100007" "100008" "100009" "100010"
id2 <- format(1:110000, scientific = FALSE,trim=TRUE) id2[99990:100010] # [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" "99997" #[9] "99998" "99999" "100000" "100001" "100002" "100003" "100004" "100005" #[17] "100006" "100007" "100008" "100009" "100010" A.K. ----- Original Message ----- From: Mathieu Basille <basille....@ase-research.org> To: David Winsemius <dwinsem...@comcast.net> Cc: r-help@r-project.org Sent: Tuesday, July 30, 2013 2:07 PM Subject: Re: [R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)' Thanks David for your interest. I have to admit that your answer puzzles me even more than before. It seems that the underlying problem is way beyond my R skills... The generation of id2 is indeed quite demanding, especially compared to a simple 'as.character' call. Anyway, since it seems to be system specific, here is the sessionInfo() that I forgot to attach to my first message: R version 3.0.1 (2013-05-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8 [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base In brief: last stable R available under Debian Testing... Hopefully this can help tracking down the problem. Mathieu. Le 07/30/2013 01:58 PM, David Winsemius a écrit : > > On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote: > >> Dear list, >> >> Here is a simple example in which the behaviour of 'format' does not make >> sense to me. I have read the documentation and searched the archives, but >> nothing pointed me in the right direction to understand this behaviour. >> Let's start with a simple data frame: >> >> df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000) >> >> Let's now create a new variable 'id2' which is the character representation >> of 'id'. Note that I use 'scientific = FALSE' to ensure that long numbers >> such as 100,000 are not formatted using their scientific representation (in >> this case 1e+05): >> >> df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE)) >> >> Let's have a look at part of the result: >> >> df1$id2[99990:100010] >> [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" >> [8] "99997" "99998" "99999" "100000" "100001" "100002" "100003" >> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010" > > Some formating processes are carried out by system functions. In this case I > am unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 Patched > >> df1$id2[99990:100010] > [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" "99997" > [9] "99998" "99999" "100000" "100001" "100002" "100003" "100004" "100005" > [17] "100006" "100007" "100008" "100009" "100010" > > (I did notice that generation of the id2 variable seemed to take an > inordinately long time.) > > -- David. >> >> So far, so good. Let's now play with the 'digits' option: >> >> options(digits = 4) >> df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000) >> df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], scientific = FALSE)) >> df2$id2[99990:100010] >> [1] "99990" "99991" "99992" "99993" "99994" " 99995" " 99996" >> [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003" >> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010" >> >> Notice the extra leading space from 99995 to 99999? To make sure it only >> happened there: >> >> df2$id2[which(df1$id2 != df2$id2)] >> [1] " 99995" " 99996" " 99997" " 99998" " 99999" >> >> And just to make sure it only occurs in a 'apply' call, here is the same >> directly on a numeric vector: >> >> id2 <- format(1:110000, scientific = FALSE) >> id2[99990:100010] >> [1] " 99990" " 99991" " 99992" " 99993" " 99994" " 99995" " 99996" >> [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003" >> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010" >> >> Here the leading spaces are for every number, which makes sense to me. Is >> there anything I'm misinterpreting in the behaviour of 'format'? >> Thanks in advance for any hint, >> Mathieu. >> >> >> PS: Some background for this question. It all comes from a Rmd document, >> that knitr consistently failed to process, while the R code was fine using >> batch or interactive R. knitr uses 'options(digits = 4)' as opposed to >> 'options(digits = 7)' by default in R, which made one of my function throw >> an error with knitr, but not with batch or interactive R. I managed to solve >> the problem using 'trim = TRUE' in 'format', but I still do not understand >> what's going on... >> If you're interested, see here for more details on the original problem: >> http://stackoverflow.com/questions/17866230/knitr-vs-interactive-r-behaviour/17872176 >> >> >> -- >> >> ~$ whoami >> Mathieu Basille, PhD >> >> ~$ locate --details >> University of Florida \\ >> Fort Lauderdale Research and Education Center >> (+1) 954-577-6314 >> http://ase-research.org/basille >> >> ~$ fortune >> « Le tout est de tout dire, et je manque de mots >> Et je manque de temps, et je manque d'audace. » >> -- Paul Éluard >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > > > On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote: > >> Dear list, >> >> Here is a simple example in which the behaviour of 'format' does not make >> sense to me. I have read the documentation and searched the archives, but >> nothing pointed me in the right direction to understand this behaviour. >> Let's start with a simple data frame: >> >> df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000) >> >> Let's now create a new variable 'id2' which is the character representation >> of 'id'. Note that I use 'scientific = FALSE' to ensure that long numbers >> such as 100,000 are not formatted using their scientific representation (in >> this case 1e+05): >> >> df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE)) >> >> Let's have a look at part of the result: >> >> df1$id2[99990:100010] >> [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" >> [8] "99997" "99998" "99999" "100000" "100001" "100002" "100003" >> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010" > > Some formating processes are carried out by system functions. In this case I > am unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 Patched > >> df1$id2[99990:100010] > [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" "99997" > [9] "99998" "99999" "100000" "100001" "100002" "100003" "100004" "100005" > [17] "100006" "100007" "100008" "100009" "100010" > > (I did notice that generation of the id2 variable seemed to take an > inordinately long time.) > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.