format(c(1:2, NA)) gives the last value as "NA" rather than preserving it as NA, even if na.encode = FALSE (which does the 'expected' thing for character vectors, but not numeric vectors).

This was already brought up in 2008 in https://bugs.r-project.org/show_bug.cgi?id=12318 where Gregor Gorjanc pointed out the issue. Documentation was added and the bug closed as invalid. GG ended with:

> IMHO it would be better that na.encode argument would also have an
effect for numeric like vectors. Nearly any function in R returns NA values and I expected the same for format, at least when na.encode=FALSE.

  I agree!

I encountered this in the context of printing a data frame with na.print = "", which works as expected when printing the individual columns but not when printing the whole data frame (because print.data.frame calls format.data.frame, which calls format.default ...). Example below.

It's also different from what you would get if you converted to character before formatting and printing:

print(format(as.character(c(1:2, NA)), na.encode=FALSE), na.print ="")

Everything about this is documented (if you look carefully enough), but IMO it violates the principle of least surprise https://en.wikipedia.org/wiki/Principle_of_least_astonishment , so I would call it at least an 'infelicity' (sensu Bill Venables)

  Is there any chance that this design decision could be revisited?

  cheers
   Ben Bolker


---

  Consider

dd <- data.frame(f = factor(1:2), c = as.character(1:2), n = as.numeric(1:2), i = 1:2)
dd[3,] <- rep(NA, 4)
print(dd, na.print = "")


print(dd, na.print = "")
  f c  n  i
1 1 1  1  1
2 2 2  2  2
3     NA NA

This is in fact as documented (see below), but seems suboptimal given that printing the columns separately with na.print = "" would successfully print the NA entries as blank even in the numeric columns:

invisible(lapply(dd, print, na.print = ""))
[1] 1 2
Levels: 1 2
[1] "1" "2"
[1] 1 2
[1] 1 2

* ?print.data.frame documents that it calls format() for each column before printing * the code of print.data.frame() shows that it calls format.data.frame() with na.encode = FALSE * ?format.data.frame specifically notes that na.encode "only applies to elements of character vectors, not to numerical, complex nor logical β€˜NA’s, which are always encoded as β€˜"NA"’.

So the NA values in the numeric columns become "NA" rather than remaining as NA values, and are thus printed rather than being affected by the na.print argument.

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to