Dear list,

Here is a simple example in which the behaviour of 'format' does not make sense to me. I have read the documentation and searched the archives, but nothing pointed me in the right direction to understand this behaviour. Let's start with a simple data frame:

df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)

Let's now create a new variable 'id2' which is the character representation of 'id'. Note that I use 'scientific = FALSE' to ensure that long numbers such as 100,000 are not formatted using their scientific representation (in this case 1e+05):

df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE))

Let's have a look at part of the result:

df1$id2[99990:100010]
 [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"
 [8] "99997"  "99998"  "99999"  "100000" "100001" "100002" "100003"
[15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"

So far, so good. Let's now play with the 'digits' option:

options(digits = 4)
df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], scientific = FALSE))
df2$id2[99990:100010]
 [1] "99990"  "99991"  "99992"  "99993"  "99994"  " 99995" " 99996"
 [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
[15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"

Notice the extra leading space from 99995 to 99999? To make sure it only happened there:

df2$id2[which(df1$id2 != df2$id2)]
[1] " 99995" " 99996" " 99997" " 99998" " 99999"

And just to make sure it only occurs in a 'apply' call, here is the same directly on a numeric vector:

id2 <- format(1:110000, scientific = FALSE)
id2[99990:100010]
 [1] " 99990" " 99991" " 99992" " 99993" " 99994" " 99995" " 99996"
 [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
[15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"

Here the leading spaces are for every number, which makes sense to me. Is there anything I'm misinterpreting in the behaviour of 'format'?
Thanks in advance for any hint,
Mathieu.


PS: Some background for this question. It all comes from a Rmd document, that knitr consistently failed to process, while the R code was fine using batch or interactive R. knitr uses 'options(digits = 4)' as opposed to 'options(digits = 7)' by default in R, which made one of my function throw an error with knitr, but not with batch or interactive R. I managed to solve the problem using 'trim = TRUE' in 'format', but I still do not understand what's going on... If you're interested, see here for more details on the original problem: http://stackoverflow.com/questions/17866230/knitr-vs-interactive-r-behaviour/17872176


--

~$ whoami
Mathieu Basille, PhD

~$ locate --details
University of Florida \\
Fort Lauderdale Research and Education Center
(+1) 954-577-6314
http://ase-research.org/basille

~$ fortune
« Le tout est de tout dire, et je manque de mots
Et je manque de temps, et je manque d'audace. »
 -- Paul Éluard

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to