Early on I had been wondering if deprecating I() and the AsIs class would be a way to get the problem to go away. I imagine (based on no data at all!) that they are rarely used. If I were writing the same code today, I would use options(stringsAsFactors=FALSE) instead of sprinkling I() here and there throughout my scripts.
But I see from the discussions that there’s something deeper going on. Thanks for continuing to cc me; I find it interesting. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 9/9/14, 9:35 AM, "Martin Maechler" <maech...@stat.math.ethz.ch> wrote: >>>>>> peter dalgaard <pda...@gmail.com> >>>>>> on Tue, 9 Sep 2014 16:36:19 +0200 writes: > > > It's actually a little more complicated. I wrote a note, but it >seems to be stuck in the outbox on my home machine (I probably forgot to >click Send...). > > One important aspect is that > > >> "x" < "\265g" > > [1] NA > > > which makes me wonder if the bug really is in the case that >"works". It seems that it is possible to rank() character vectors that >contain incomparable elements. > > > -pd > >yes you are right that it is even more complicated. >In both cases, our Scollate() is involved, >(Scollate: the one where we had a discussion about making it part of the C > level R API, which would help package authors ..) > >After > > ch <- c('x','\265g') > foo <- I(ch) > >Of the four expressions, > > order(ch) > order(foo) > ch [1] < ch [2] > foo[1] < foo[2] > >only the first one "works", the others give NA or an error because of NA >and the first one is the only of the 4 that does not use >do_relop_dflt() > >It's not even clear what we'd want (as I think pd also alluded to): >Ideally all of these should work consistently, which because of > "<(.,.)" returning NA in both cases, >would mean that order(ch) also should give an error as order(foo) > {{ an error we should improve the message in any case!!}. >Big Q: Can we afford order(ch) giving an error in such cases. >Pretty high chance that this will "break" much user (and probably >even package) code out there. > >Still, the other solution, namely order(foo) behaving as >order(ch) now does would correspond to the ">" giving FALSE >instead of NA, so this solution is not ok in my view. > >Martin > > > > On 09 Sep 2014, at 16:19 , Martin Maechler ><maech...@stat.math.ethz.ch> wrote: > > >>>>>>> MacQueen, Don <macque...@llnl.gov> > >>>>>>> on Mon, 8 Sep 2014 16:06:21 +0000 writes: > >> > >>> I have found that order() fails in a rather arcane circumstance, >as in > >>> this example: > >> > >>>> foo <- I( c('x','\265g') ) > >>>> order(foo) > >>> Error in if (xi > xj) 1L else -1L : missing value where >TRUE/FALSE needed > >> > >>>> foo <-c('x','\265g') > >>>> order(foo) > >>> [1] 1 2 > >> > >> yes, this is not desirable. > >> order() in such cases calls xtfrm() {as documented} > >> and that ends up calling rank() and then the internal .gt() > >> where the bug happens because > >> > >>> I("x") > I("\xb5g") > >> [1] NA > >> > >> but really I think the change should happen in xtfrm.Asis(.) > >> which I think should drop the class also in this case. > >> > >> More on this, once we have fixed it. > >> > >> Thank you, Don, very much! > >> > >> Martin Maechler, > >> ETH Zurich > >> > >>>> sessionInfo() > >>> R version 3.1.1 (2014-07-10) > >>> Platform: x86_64-apple-darwin13.1.0 (64-bit) > >> > >>> locale: > >>> [1] C > >> > >>> attached base packages: > >>> [1] stats graphics grDevices utils datasets methods >base > >> > >>> Thanks > >>> -Don > >> > >>> p.s. > >>> Just a little background, irrelevant unless one wonders why I¹m >using I() > >>> and \265: > >> > >>> If I were writing new code I wouldn¹t be using I(), since there >are better > >>> ways now to achieve the same end (preventing the creation of >factors in > >>> data frames), but the scripts that use it are quite old, >originally > >>> developed in 2001. > >> > >>> In at least some but perhaps limited contexts, Œ\265¹ produces >the greek > >>> letter mu, and that¹s why I¹m using it. And if I remember >correctly, 2001 > >>> is prior to the current R support for locales and extended >character sets. > >>> Using \265 is what I could find at that time to get a mu into my >output. > >> > >>> I came across this while checking some things; it¹s not actually >breaking > >>> my scripts, so I doubt it¹s due to any recent change. > >> > >> > >>> -- > >>> Don MacQueen > >> > >>> Lawrence Livermore National Laboratory > >>> 7000 East Ave., L-627 > >>> Livermore, CA 94550 > >>> 925-423-1062 > >> > >>> ______________________________________________ > >>> R-devel@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-devel > >> > >> ______________________________________________ > >> R-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > -- > > Peter Dalgaard, Professor, > > Center for Statistics, Copenhagen Business School > > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > > Phone: (+45)38153501 > > Email: pd....@cbs.dk Priv: pda...@gmail.com > > > > > > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel