>>>>> Ben Bolker >>>>> on Sat, 3 Jun 2023 13:06:41 -0400 writes:
> format(c(1:2, NA)) gives the last value as "NA" rather than > preserving it as NA, even if na.encode = FALSE (which does the > 'expected' thing for character vectors, but not numeric vectors). > This was already brought up in 2008 in > https://bugs.r-project.org/show_bug.cgi?id=12318 where Gregor Gorjanc > pointed out the issue. Documentation was added and the bug closed as > invalid. GG ended with: >> IMHO it would be better that na.encode argument would also have an > effect for numeric like vectors. Nearly any function in R returns NA > values and I expected the same for format, at least when na.encode=FALSE. > I agree! I do too, at least "in principle", keeping in mind that backward compatibility is also an important principle ... Not sure if the 'na.encode' argument should matter or possibly a new optional argument, but "in principle" I think that format(c(1:2, NA, 4)) should preserve is.na(.) even by default. > I encountered this in the context of printing a data frame with > na.print = "", which works as expected when printing the individual > columns but not when printing the whole data frame (because > print.data.frame calls format.data.frame, which calls format.default > ...). Example below. > It's also different from what you would get if you converted to > character before formatting and printing: > print(format(as.character(c(1:2, NA)), na.encode=FALSE), na.print ="") > Everything about this is documented (if you look carefully enough), > but IMO it violates the principle of least surprise > https://en.wikipedia.org/wiki/Principle_of_least_astonishment , so I > would call it at least an 'infelicity' (sensu Bill Venables) > Is there any chance that this design decision could be revisited? We'd have to hear other opinions / gut feelings. Also, someone (not me) would ideally volunteer to run 'R CMD check <pkg>' for a few 1000 (not necessarily all) CRAN & BioC packages with an accordingly patched version of R-devel (I might volunteer to create such a branch, e.g., a bit before the R Sprint 2023 end of August). > cheers > Ben Bolker > --- The following issue you are raising may really be a *different* one, as it involves format() and print() methods for "data.frame", i.e., format.data.frame() vs print.data.frame() which is quite a bit related, of course, to how 'numeric' columns are formatted -- as you note yourself below; I vaguely recall that the data.frame method could be an even "harder problem" .. but I don't remember the details. It may also be that there are no changes necessary to the *.data.frame() methods, and only the documentation (you mention) should be updated ... Martin > Consider > dd <- data.frame(f = factor(1:2), c = as.character(1:2), n = > as.numeric(1:2), i = 1:2) > dd[3,] <- rep(NA, 4) > print(dd, na.print = "") > print(dd, na.print = "") > f c n i > 1 1 1 1 1 > 2 2 2 2 2 > 3 NA NA > This is in fact as documented (see below), but seems suboptimal given > that printing the columns separately with na.print = "" would > successfully print the NA entries as blank even in the numeric columns: > invisible(lapply(dd, print, na.print = "")) > [1] 1 2 > Levels: 1 2 > [1] "1" "2" > [1] 1 2 > [1] 1 2 > * ?print.data.frame documents that it calls format() for each column > before printing > * the code of print.data.frame() shows that it calls format.data.frame() > with na.encode = FALSE > * ?format.data.frame specifically notes that na.encode "only applies to > elements of character vectors, not to numerical, complex nor logical > βNAβs, which are always encoded as β"NA"β. > So the NA values in the numeric columns become "NA" rather than > remaining as NA values, and are thus printed rather than being affected > by the na.print argument. > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel