>>>>> Martyn Plummer <plumm...@iarc.fr> >>>>> on Thu, 1 Mar 2018 17:23:04 +0000 writes:
> On Thu, 2018-03-01 at 09:36 -0500, Ron wrote: >> Hello, >> >> I'd like to report what I think is a bug: using as.data.frame() we can >> create duplicate row names in a data frame. R version 3.4.3 (current stable >> release). >> >> Rather than paste code in an email, please see the example formatted code >> here: >> https://stackoverflow.com/questions/49031523/duplicate-row-names-in-r-using-as-data-frame >> >> I posted to StackOverflow, and consensus was that we should proceed with >> this as a bug report. > Yes that is definitely a bug. > The end of the as.data.frame.matrix method has: > attr(value, "row.names") <- row.names > class(value) <- "data.frame" > value > Changing this to: > class(value) <- "data.frame" > row.names(value) <- row.names > value > ensures that the row.names<-.data.frame method is called with its built > -in check for duplicate names. > There are quite a few as.data.frame methods so this could be a > recurring problem. I will check. and Martyn found other cases and proposed a more principled approach to conceptually all such situations. >From that, I have addressed at least the current bug (and its immediate surroundings). I now have committed the following to 'R-devel' (= the R sources development "trunk") : ------------------------------------------------------------------------ r74373 | maechler | 2018-03-08 17:49:32 +0100 (Thu, 08. Mar 2018) M doc/NEWS.Rd M src/library/base/R/dataframe.R M src/library/base/man/as.data.frame.Rd M src/library/base/man/row.names.Rd M tests/eval-etc.Rout.save M tests/reg-tests-1c.R M tests/reg-tests-1d.R M tests/reg-tests-2.Rout.save duplicated rownames in as.data.frame.matrix() are handled now (gracefully by default) ------------------------------------------------------------------------ The NEWS entry is • Some as.data.frame() methods, notably the matrix one, are now more careful in not accepting duplicated or NA row names, and by default produce unique non-NA row names. This is based on row.names(x, make.names = *) <- rNms where make.names is a new logical, with back compatible default. and the not-quite-back-compatible API change is that the `row.names<-` S3 generic function now does have a new optional 'make.names' argument -- with back compatible default FALSE (meaning that invalid rownames by default continue to lead to an error). It may happen that this or the other changes have some negative impact on the CRAN package check results, (I do expect *some* check problems), e.g. producing new warnings if packages use the current R <= 3.4.x signature of `row.names<-` But I think the new feature of allowing indicating on how to treat invalid row names --- notably, allowing to use make.names(*, unique=TRUE) getting valid row names --- is attractive and leads to Martyn's proposed behavior which entails that as.data.frame.*(x) (and similar coercions to data frames) should typically _handle_ invalid row names rather than signal errors. Feedback is welcome ! ((though I will be slow in replying, going basicaly off work for my early-starting weekend in the Alps)) Martin Maechler, ETH Zurich ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel