Duncan Murdoch wrote: > On 15/12/2007 5:17 PM, Martin Maechler wrote: >>>>>>> "TP" == Tony Plate <[EMAIL PROTECTED]> >>>>>>> on Fri, 14 Dec 2007 13:58:30 -0700 writes: >> TP> Duncan Murdoch wrote: >> >> On 12/13/2007 1:59 PM, Tony Plate wrote: >> >>> Duncan Murdoch wrote: >> >>>> On 12/11/2007 6:20 AM, [EMAIL PROTECTED] wrote: >> >>>>> Full_Name: Petr Simecek >> >>>>> Version: 2.5.1, 2.6.1 >> >>>>> OS: Windows XP >> >>>>> Submission from: (NULL) (195.113.231.2) >> >>>>> >> >>>>> >> >>>>> Several times I have experienced that a length of a POSIXt vector >> >>>>> has not been >> >>>>> computed right. >> >>>>> >> >>>>> Example: >> >>>>> >> >>>>> tv<-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31 >> >>>>> ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L >> >>>>> ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 12L), >> >>>>> mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 13L), >> mon >> >>>>> = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, >> >>>>> 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = >> >>>>> c(1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, >> 163L, >> >>>>> 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, >> >>>>> 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c("sec", >> "min", >> >>>>> "hour", "mday", "mon", "year", "wday", "yday", "isdst" >> >>>>> ), class = c("POSIXt", "POSIXlt")) >> >>>>> >> >>>>> print(tv) >> >>>>> # print 11 time points (right) >> >>>>> >> >>>>> length(tv) >> >>>>> # returns 9 (wrong) >> >>>> >> >>>> tv is a list of length 9. The answer is right, your expectation is >> >>>> wrong. >> >>>>> I have tried that on several computers with/without switching to >> >>>>> English >> >>>>> locales, i.e. Sys.setlocale("LC_TIME", "en"). I have searched a >> >>>>> help pages but I >> >>>>> cannot imagine how that could be OK. >> >>>> >> >>>> See this in ?POSIXt: >> >>>> >> >>>> Class '"POSIXlt"' is a named list of vectors... >> >>>> >> >>>> You could define your own length measurement as >> >>>> >> >>>> length.POSIXlt <- function(x) length(x$sec) >> >>>> >> >>>> and you'll get the answer you expect, but be aware that length.XXX >> >>>> methods are quite rare, and you may surprise some of your users. >> >>>> >> >>> >> >>> On the other hand, isn't the fact that length() currently always >> >>> returns 9 for POSIXlt objects likely to be a surprise to many users >> >>> of POSIXlt? >> >>> >> >>> The back of "The New S Language" says "Easy-to-use facilities allow >> >>> you to organize, store and retrieve all sorts of data. ... S >> >>> functions and data organization make applications easy to write." >> >>> >> >>> Now, POSIXlt has methods for c() and vector subsetting "[" (and many >> >>> other vector-manipulation methods - see methods(class="POSIXlt")). >> >>> Hence, from the point of view of intending to supply "easy-to-use >> >>> facilities ... [for] all sorts of data", isn't it a little >> >>> incongruous that length() is not also provided -- as 3 functions >> (any >> >>> others?) comprise a core set of vector-manipulation functions? >> >>> >> >>> Would it make sense to have an informal prescription (e.g., in >> >>> R-exts) that a class that implements a vector-like object and >> >>> provides at least of one of functions 'c', '[' and 'length' should >> >>> provide all three? It would also be easy to describe a test-suite >> >>> that should be included in the 'test' directory of a package >> >>> implementing such a class, that had some tests of the basic >> >>> vector-manipulation functionality, such as: >> >>> >> >>> > # at this point, x0, x1, x3, & x10 should exist, as vectors of the >> >>> > # class being tested, of length 0, 1, 3, and 10, and they should >> >>> > # contain no duplicate elements >> >>> > length(x0) >> >>> [1] 1 >> >>> > length(c(x0, x1)) >> >>> [1] 2 >> >>> > length(c(x1,x10)) >> >>> [1] 11 >> >>> > all(x3 == x3[seq(len=length(x3))]) >> >>> [1] TRUE >> >>> > all(x3 == c(x3[1], x3[2], x3[3])) >> >>> [1] TRUE >> >>> > length(c(x3[2], x10[5:7])) >> >>> [1] 4 >> >>> > >> >>> >> >>> It would also be possible to describe a larger set of vector >> >>> manipulation functions that should be implemented together, >> including >> >>> e.g., 'rep', 'unique', 'duplicated', '==', 'sort', '[<-', 'is.na', >> >>> head, tail ... (many of which are provided for POSIXlt). >> >>> >> >>> Or is there some good reason that length() cannot be provided (while >> >>> 'c' and '[' can) for some vector-like classes such as "POSIXlt"? >> >> >> >> What you say sounds good in general, but the devil is in the details. >> >> Changing the meaning of length(x) for some objects has fairly >> >> widespread effects. Are they all positive? I don't know. >> >> >> >> Adding a prescription like the one you suggest would be good if it's >> >> easy to implement, but bad if it's already widely violated. How many >> >> base or CRAN or Bioconductor packages violate it currently? Do the >> >> ones that provide all 3 methods do so in a consistent way, i.e. does >> >> "length(x)" mean the same thing in all of them? >> TP> I'm not sure doing something like this would be so bad even if it is >> TP> already widely violated. R has evolved significantly over time, and >> TP> many rough edges have been cleaned up, sometimes in ways that were >> not >> TP> backward compatible. This is a great thing & my thanks go to the >> people >> TP> working on R. >> >> TP> If some base or CRAN or Bioconductor packages currently don't >> implement >> TP> vector operations consistently, wouldn't it be good to know that? >> TP> Wouldn't it be useful to have an automatic way of determining >> whether a >> TP> particular vector-like class is consistent with generally agreed set >> of >> TP> principles for how basic vector operations should work -- things >> like >> TP> length(x)+length(y)==length(c(x,y))? This could help developers >> check, >> TP> document & improve their code, and it could help users understand >> how to >> TP> use a class, and to evaluate the software quality of a class >> TP> implementation and whether or not it provides the functionality they >> need. >> >> I agree that the current state is less than perfect, but making it >> >> better would really be a lot of work. I suspect there are better >> ways >> >> to spend my time, so I'm not going to volunteer to do it. I'm not >> >> even going to invite someone else to do it, or offer to review your >> >> work if you volunteer. I think this falls into the class of "next >> >> time we write a language, let's handle this better" problems. >> >> TP> Thanks very much for the thoughtful (and honest) feedback! I >> suspect >> TP> that the current state could be improved with just a little work, >> and >> TP> without forcing anyone to do any work they don't want to do. I'll >> think >> TP> about this more and try to come back with a better & more concrete >> TP> suggestion. >> >> Good. From "the outside" (i.e. superficial gut feeling :-) >> I've sympathized with your suggestion, Tony, quite a bit. >> Further, my own taste would probably also have lead me to define >> length.POSIXlt differently .. >> OTOH, I agree with Duncan that it may be too late to change it >> and even more to enforce the consistency rules you propose. >> If with a small bit of code (and some patience) we could check >> all of CRAN and hopefully bioconductor packages and find only a >> very few where it was violated, the whole endeavor may be worth it >> ... for the sake of making R more consistent, easier to teach, etc.. >> >> Unfortunately I don't remember now what happened many months ago >> when I indeed did experiment with having something like >> >> length.POSIXlt <- function(x) length(x$sec) >> >> Martin Maechler > > One reason I don't want to work on this is because the appropriate > action depends on what "length(x)" is intended to mean. Currently for > POSIXlt objects, it gives the physical length of the underlying basic > type (the list). This is the same behaviour as we have for matrices, > data frames and every other object without a specific length method, so > it's not outrageous. > > The proposed change is to have it return the logical length of the > object, which also seems quite reasonable. I don't think matrices and > data frames have a "logical length", so there would be no contradiction > in those examples. The thing that worries me is that there are probably > objects in packages where both logical length and physical length make > sense but are different. I don't have any expectation that length(x) on > those currently is consistent in which type of value it returns. > > If we were to decide that "length(x)" *always* meant logical length, > then we would have a problem: matrices and data frames don't have a > logical length, so we shouldn't be getting an answer there. Changing > length(x) for those is not acceptable. > > On the other hand, if we decide that "length(x)" *always* means physical > length, we don't need to do anything to the POSIXlt or matrices or data > frames, but there may well be other kinds of objects out there that > violate this rule. > > We could leave the meaning of length(x) ambiguous. If you want to know > what it does for a POSIXlt object, you need to read the documentation or > look at the source code. As a policy, this isn't particularly > appealing, but I could probably live with it if someone else did the > research and showed that current usage is ambiguous.
Leaving the meaning of length(x) ambiguous seems reasonable to me (as are the meanings of 'c' and '['). I was thinking more in terms of consistency of either supplying all or none of the tightly related group of functions 'c', '[', and 'length'. It seems diabolically confusing that 'c' and '[' exist for POSIXlt and do the expected things in terms of the vector-of-dates interpretation, but length does something completely different. (And this is not mentioned in ?POSIXlt). Coding & documentation guidelines & tools could help R to move towards more consistency with regard to this kind of behavior. -- Tony Plate > > Duncan Murdoch > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel