Re: [R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'

Mathieu Basille Thu, 01 Aug 2013 10:43:05 -0700

Nicely spotted, Bill! You went much farther than I could have. We canbasically summarize the problem with the following simple example:


> format(9994, digits = 3)
[1] "9994"
> format(9995, digits = 3)
[1] " 9995"

I'm still not sure why this is happening, though: The 'digits' parameter isused to guess the number of characters of the output, but not to format theactual number (i.e. all digits are still there anyway)? Is this case a bug,or a feature? And if the latter, is it documented anywhere? I couldn't seeany hint of it in ?format, or ?options... The use of 'trim = TRUE' to fixthe problem seems to me like a workaround, not a real solution...


Lastly, should I report this somewhere else?

Thanks for your comment,
Mathieu.


Le 08/01/2013 12:36 PM, William Dunlap a écrit :

I see the problem on both Linux and Windows, R-3.0.1.
   >  vapply(as.numeric(9994:9995), function(x)format(x, scientific=FALSE, digits=3), 
"")
   [1] "9994"  " 9995"
   > vapply(as.numeric(99994:99995), function(x)format(x, scientific=FALSE, digits=4), 
"")
   [1] "99994"  " 99995"
   > vapply(as.numeric(999994:999995), function(x)format(x, scientific=FALSE, digits=5), 
"")
   [1] "999994"  " 999995"

The ones with the initial space are the ones that would round up to the next 
power of 10 when
rounded to the requested number of significant digits:
   > x <- as.numeric(1:5e5)
   > z <- vapply(x, function(x)format(x, scientific=FALSE, digits=3), "")
   > i <- grep(" ", z)
   > z[i]
    [1] " 9995"  " 9996"  " 9997"  " 9998"  " 9999"  " 99950" " 99951" " 99952"
    [9] " 99953" " 99954" " 99955" " 99956" " 99957" " 99958" " 99959" " 99960"
   [17] " 99961" " 99962" " 99963" " 99964" " 99965" " 99966" " 99967" " 99968"
   [25] " 99969" " 99970" " 99971" " 99972" " 99973" " 99974" " 99975" " 99976"
   [33] " 99977" " 99978" " 99979" " 99980" " 99981" " 99982" " 99983" " 99984"
   [41] " 99985" " 99986" " 99987" " 99988" " 99989" " 99990" " 99991" " 99992"
   [49] " 99993" " 99994" " 99995" " 99996" " 99997" " 99998" " 99999"
   > print(x[i], digits=3)
    [1] 1e+04 1e+04 1e+04 1e+04 1e+04 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05
   [13] 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05
   [25] 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05
   [37] 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05
   [49] 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

-----Original Message-----
From: [email protected] [mailto:[email protected]] On 
Behalf
Of Mathieu Basille
Sent: Thursday, August 01, 2013 8:31 AM
To: R help
Subject: Re: [R] 'format' behaviour in a 'apply' call depending on 
'options(digits = K)'

This problem does not seem to be widely popular, but at least affects two
users (both on Linux, maybe a hint here?). To me, it looks like a bug (is
it a R bug, or a OS-related bug, I don't know). Should I forward it to
R-devel, or some other place where R gurus may have a chance to look at it?

Mathieu.

Le 07/30/2013 02:34 PM, arun a écrit :

Hi Mathieu
yes, the original problem occurs in my system too. I am using R 3.0.1 on linux 
mint 15.  I

guess the default case would be trim=FALSE, but still it looks very strange 
especially in
?apply(), as it starts from " 99995" onwards.


sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
   [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
   [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
   [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8
   [7] LC_PAPER=C                 LC_NAME=C
   [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] stringr_0.6.2  reshape2_1.2.2

loaded via a namespace (and not attached):
[1] plyr_1.8    tools_3.0.1








----- Original Message -----
From: Mathieu Basille <[email protected]>
To: arun <[email protected]>
Cc: R help <[email protected]>
Sent: Tuesday, July 30, 2013 2:29 PM
Subject: Re: [R] 'format' behaviour in a 'apply' call depending on 
'options(digits = K)'

Thanks Arun for your answer. 'trim = TRUE' does indeed solve the symptoms
of the problem, and this is the solution I'm currently using. However, it
does not help to understand what the problem is, and what is the cause of it.

Can you confirm that the original problem also occurs on your computer (and
what is your OS)? It would be interesting since David is not able to
reproduce the problem with Mac OS X.
Mathieu.


Le 07/30/2013 02:15 PM, arun a écrit :

Hi,
Try using trim=TRUE, in ?format()
options(digits=4)

df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
     df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], 
trim=TRUE,scientific = FALSE))
      df2$id2[99990:100010]
# [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"  "99997"
# [9] "99998"  "99999"  "100000" "100001" "100002" "100003" "100004" "100005"
#[17] "100006" "100007" "100008" "100009" "100010"


id2 <- format(1:110000, scientific = FALSE,trim=TRUE)
id2[99990:100010]
# [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"  "99997"
     #[9] "99998"  "99999"  "100000" "100001" "100002" "100003" "100004" 
"100005"
#[17] "100006" "100007" "100008" "100009" "100010"
A.K.


----- Original Message -----
From: Mathieu Basille <[email protected]>
To: David Winsemius <[email protected]>
Cc: [email protected]
Sent: Tuesday, July 30, 2013 2:07 PM
Subject: Re: [R] 'format' behaviour in a 'apply' call depending on 
'options(digits = K)'

Thanks David for your interest. I have to admit that your answer puzzles me
even more than before. It seems that the underlying problem is way beyond
my R skills...

The generation of id2 is indeed quite demanding, especially compared to a
simple 'as.character' call. Anyway, since it seems to be system specific,
here is the sessionInfo() that I forgot to attach to my first message:

R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
      [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C
      [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8
      [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8
      [7] LC_PAPER=C                 LC_NAME=C
      [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

In brief: last stable R available under Debian Testing... Hopefully this
can help tracking down the problem.
Mathieu.


Le 07/30/2013 01:58 PM, David Winsemius a écrit :


On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote:

Dear list,

Here is a simple example in which the behaviour of 'format' does not make sense 
to

me. I have read the documentation and searched the archives, but nothing 
pointed me in
the right direction to understand this behaviour. Let's start with a simple 
data frame:


df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)

Let's now create a new variable 'id2' which is the character representation of 
'id'.

Note that I use 'scientific = FALSE' to ensure that long numbers such as 
100,000 are not
formatted using their scientific representation (in this case 1e+05):


df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE))

Let's have a look at part of the result:

df1$id2[99990:100010]
[1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"
[8] "99997"  "99998"  "99999"  "100000" "100001" "100002" "100003"
[15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"


Some formating processes are carried out by system functions. In this case I am

unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 Patched

df1$id2[99990:100010]

      [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"  "99997"
      [9] "99998"  "99999"  "100000" "100001" "100002" "100003" "100004" 
"100005"
[17] "100006" "100007" "100008" "100009" "100010"

(I did notice that generation of the id2 variable seemed to take an 
inordinately long

time.)


-- David.


So far, so good. Let's now play with the 'digits' option:

options(digits = 4)
df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], scientific = FALSE))
df2$id2[99990:100010]
[1] "99990"  "99991"  "99992"  "99993"  "99994"  " 99995" " 99996"
[8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
[15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"

Notice the extra leading space from 99995 to 99999? To make sure it only

happened there:


df2$id2[which(df1$id2 != df2$id2)]
[1] " 99995" " 99996" " 99997" " 99998" " 99999"

And just to make sure it only occurs in a 'apply' call, here is the same 
directly on a

numeric vector:


id2 <- format(1:110000, scientific = FALSE)
id2[99990:100010]
[1] " 99990" " 99991" " 99992" " 99993" " 99994" " 99995" " 99996"
[8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
[15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"

Here the leading spaces are for every number, which makes sense to me. Is there

anything I'm misinterpreting in the behaviour of 'format'?

Thanks in advance for any hint,
Mathieu.


PS: Some background for this question. It all comes from a Rmd document, that

knitr consistently failed to process, while the R code was fine using batch or 
interactive
R. knitr uses 'options(digits = 4)' as opposed to 'options(digits = 7)' by 
default in R, which
made one of my function throw an error with knitr, but not with batch or 
interactive R. I
managed to solve the problem using 'trim = TRUE' in 'format', but I still do not
understand what's going on...

If you're interested, see here for more details on the original problem:

http://stackoverflow.com/questions/17866230/knitr-vs-interactive-r-
behaviour/17872176



--

~$ whoami
Mathieu Basille, PhD

~$ locate --details
University of Florida \\
Fort Lauderdale Research and Education Center
(+1) 954-577-6314
http://ase-research.org/basille

~$ fortune
« Le tout est de tout dire, et je manque de mots
Et je manque de temps, et je manque d'audace. »
-- Paul Éluard

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius
Alameda, CA, USA


On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote:

Dear list,

Here is a simple example in which the behaviour of 'format' does not make sense 
to

me. I have read the documentation and searched the archives, but nothing 
pointed me in
the right direction to understand this behaviour. Let's start with a simple 
data frame:


df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)

Let's now create a new variable 'id2' which is the character representation of 
'id'.

Note that I use 'scientific = FALSE' to ensure that long numbers such as 
100,000 are not
formatted using their scientific representation (in this case 1e+05):


df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE))

Let's have a look at part of the result:

df1$id2[99990:100010]
[1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"
[8] "99997"  "99998"  "99999"  "100000" "100001" "100002" "100003"
[15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"


Some formating processes are carried out by system functions. In this case I am

unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 Patched

df1$id2[99990:100010]

       [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"  
"99997"
       [9] "99998"  "99999"  "100000" "100001" "100002" "100003" "100004" 
"100005"
[17] "100006" "100007" "100008" "100009" "100010"

(I did notice that generation of the id2 variable seemed to take an 
inordinately long

time.)


______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'

Reply via email to