[ Hmm, is everyone of those interested in changes inside R "sleeping" , uninterested, ... ]
>>>>> "MM" == Martin Maechler <[EMAIL PROTECTED]> >>>>> on Fri, 1 Jul 2005 18:36:54 +0200 writes: >>>>> "PD" == Peter Dalgaard <[EMAIL PROTECTED]> >>>>> on 28 Jun 2005 14:57:42 +0200 writes: PD> "Liaw, Andy" <[EMAIL PROTECTED]> writes: >>> The issue is not with boxplot, but with split. boxplot.formula() >>> calls boxplot(split(split(mf[[response]], mf[-response]), ...), >>> but look at what split() returns when there are empty levels in >>> the factor: >>> >>> > f <- factor(gl(3, 6), levels=1:5) >>> > y <- rnorm(f) >>> > split(y, f) >>> $"1" >>> [1] 0.4832124 1.1924811 0.3657797 1.7400198 0.5577356 0.9889520 >>> >>> $"2" >>> [1] -1.1296642 -0.4808355 -0.2789933 0.1220718 0.1287742 -0.7573801 >>> >>> $"3" >>> [1] 1.2320902 0.5090700 -1.5508074 2.1373780 1.1681297 -0.7151561 >>> >>> The "culprit" is the following in split.default(): >>> >>> f <- factor(f) >>> >>> which drops empty levels in f, if there are any. BTW, ?split doesn't >>> mention what it does in such situation. Perhaps it should? >>> >>> If this is to be "fixed", I suppose an additional argument, e.g., >>> drop=TRUE, can be added, and the corresponding line mentioned >>> above changed to something like: >>> >>> if (drop || !is.factor(f)) f <- factor(f) >>> >>> Then this additional argument can be pass on from boxplot.formula() to >>> split(). PD> Alternatively, I suspect that the intention was as.factor() rather PD> than factor(). MM> at first I thought Peter was right; but the real source of MM> split.default contains a comment (!) and that line is MM> f <- factor(f) # drop extraneous levels MM> so it seems, this was done there very much on purpose. MM> OTOH, S(-plus) has implemented it quite a bit differently, and actually MM> does keep the empty levels in the example MM> f <- factor(rep(1:3, each=6), levels=1:5); y <- rnorm(f); split(y, f) PD> It does require a bit of care to fix it that way, PD> though. There could be problems with empty levels popping up in PD> unexpected places. MM> Indeed! MM> Given the new facts, I think we want to go in Andy's direction MM> with a new argument, 'drop' MM> A Peter mentioned, the real question is about its default. MM> "drop = TRUE" would be fully compatible with previous versions of R. MM> "drop = FALSE" would be compatible with S and S-plus. MM> I'm going to implement it, and try to see if 'drop = FALSE' MM> gives changes for R and its standard packages; if 'yes', that MM> would be an indication that such a R-back-compatibility breaking MM> change was not a good idea. If 'no', I could commit it and see MM> if it has an effect on the CRAN packages.... MM> Of course, since split() and split()<- are S3 generics, and MM> since there's also unsplit(), this entails a whole slew of MM> changes {adding a "drop = FALSE" argument everywhere!} MM> and I presume will break everyone's code who has written own MM> split.foobar methods.... MM> great... MM> Martin The change doesn't seem to affect the "standard" packages at all which is good. On CRAN, it seems there are two packages only that have split() or split()<- methods, namely 'spatstat' and 'compositions'. If we introduced the extra argument 'drop', these and every other user code defining split methods would have to be updated to be compatible with the changed (S3) generic having an extra argument 'drop'. With this in mind, after more thought, I think that Peter's initial proposal ---just replacing 'factor()' by 'as.factor()' inside split--- seems to be nicer than introducing 'drop' and *change* the default behavior to 'drop = FALSE' for the following reasons : 1) people who rely on the current behavior would have to change their calls to split() anyway; 2) instead of calling split(x, f, drop=TRUE) they can as well go for split(x, factor(f)) which has identical effect but does not introduce an extra argument 'drop'. 3) advantage of slightly higher compatibility with S --- I intend to change this in R-devel {with appropriate notes in NEWS !} during this week, unless someone finds good reasons for a different (or no) change. Martin ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel