Re: [Rd] [R] "[.data.frame" and lapply
Wacek Kusnierczyk wrote: redirected to r-devel, because there are implementational details of [.data.frame discussed here. spoiler: at the bottom there is a fairly interesting performance result. Romain Francois wrote: Hi, This is a bug I think. [.data.frame treats its arguments differently depending on the number of arguments. you might want to hesitate a bit before you say that something in r is a bug, if only because it drives certain people mad. r is a carefully tested software, and [.data.frame is such a basic function that if what you talk about were a bug, it wouldn't have persisted until now. I did hesitate, and would be prepared to look the other way of someone shows me proper evidence that this makes sense. > d <- data.frame( x = 1:10, y = 1:10, z = 1:10 ) > d[ j=1 ] x y z 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 "If a single index is supplied, it is interpreted as indexing the list of columns". Clearly this does not happen here, and this is because NextMethod gets confused. I have not looked your implementation in details, but it misses array indexing, as in: > d <- data.frame( x = 1:10, y = 1:10, z = 1:10 ) > m <- cbind( 5:7, 1:3 ) > m [,1] [,2] [1,]51 [2,]62 [3,]73 > d[m] [1] 5 6 7 > subdf( d, m ) Error in subdf(d, m) : undefined columns selected "Matrix indexing using '[' is not recommended, and barely supported. For extraction, 'x' is first coerced to a matrix. For replacement a logical matrix (only) can be used to select the elements to be replaced in the same way as for a matrix." You might also want to look at `[<-.data.frame`. > d[j=2] <- 1:10 Error in `[<-.data.frame`(`*tmp*`, j = 2, value = 1:10) : element 1 is empty; the part of the args list of 'is.logical' being evaluated was: (i) > d[2] <- 10:1 > d x y z 1 1 10 1 2 2 9 2 3 3 8 3 4 4 7 4 5 5 6 5 6 6 5 6 7 7 4 7 8 8 3 8 9 9 2 9 10 10 1 10 This is probably less of an issue, because there is very little chance for people to use this construct, but for the first one, if not used directly, it still has good chances to be used within some fooapply call, as in the original post. Although it might have been preferable to use subset as the applied function. Romain treating the arguments differently depending on their number is actually (if clearly...) documented: if there is one index (the 'i'), it selects columns. if there are two, 'i' selects rows. however, not all seems fine, there might be a design flaw: # dummy data frame d = structure(names=paste('col', 1:3, sep='.'), data.frame(row.names=paste('row', 1:3, sep='.'), matrix(1:9, 3, 3))) d[1:2] # correctly selects two first columns # 1:2 passed to [.data.frame as i, no j given d[,1:2] # correctly selects two first columns # 1:2 passed to [.data.frame as j, i given the missing argument value (note the comma) d[,i=1:2] # correctly selects two first rows # 1:2 passed to [.data.frame as i, j given the missing argument value (note the comma) d[j=1:2,] # correctly selects two first columns # 1:2 passed to [.data.frame as j, i given the missing argument value (note the comma) d[i=1:2] # correctly (arguably) selects the first two columns # 1:2 passed to [.data.frame as i, no j given d[j=1:2] # wrong: returns the whole data frame # does not recognize the index as i because it is explicitly named 'j' # does not recognize the index as j because there is only one index i say this *might* be a design flaw because it's hard to judge what the design really is. the r language definition (!) [1, sec. 3.4.3 p. 18] says: " The most important example of a class method for [ is that used for data frames. It is not be described in detail here (see the help page for [.data.frame, but in broad terms, if two indices are supplied (even if one is empty) it creates matrix-like indexing for a structure that is basically a list of vectors of the same length. If a single index is supplied, it is interpreted as indexing the list of columns—in that case the drop argument is ignored, with a warning." it does not say what happens when only one *named* index argument is given. from the above, it would indeed seem that there is a *bug* here: in the last example above only one index is given, and yet columns are not selected, even though the *language definition* says they should. (so it's not a documented feature, it's a contra-definitional misfeature -- a bug?) somewhat on the side, the 'matrix-like indexing' above is fairly misleading; just try the same patterns of indexing -- one index, two indices, named indices -- on a data frame and a matrix of the same shape: m = matrix(1:9, 3, 3) md = data.frame(m) md[1] # the first column m[1] # the first element (i.e., m[
Re: [Rd] [R] "[.data.frame" and lapply
Romain Francois wrote: > Wacek Kusnierczyk wrote: >> redirected to r-devel, because there are implementational details of >> [.data.frame discussed here. spoiler: at the bottom there is a fairly >> interesting performance result. >> >> Romain Francois wrote: >> >>> Hi, >>> >>> This is a bug I think. [.data.frame treats its arguments differently >>> depending on the number of arguments. >>> >> >> you might want to hesitate a bit before you say that something in r is a >> bug, if only because it drives certain people mad. r is a carefully >> tested software, and [.data.frame is such a basic function that if what >> you talk about were a bug, it wouldn't have persisted until now. >> > I did hesitate, and would be prepared to look the other way of someone > shows me proper evidence that this makes sense. > > > d <- data.frame( x = 1:10, y = 1:10, z = 1:10 ) > > d[ j=1 ] >x y z > 1 1 1 1 > 2 2 2 2 > 3 3 3 3 > 4 4 4 4 > 5 5 5 5 > 6 6 6 6 > 7 7 7 7 > 8 8 8 8 > 9 9 9 9 > 10 10 10 10 > > "If a single index is supplied, it is interpreted as indexing the list > of columns". Clearly this does not happen here, and this is because > NextMethod gets confused. obviously. it seems that there is a bug here, and that it results from the lack of clear design specification. > > I have not looked your implementation in details, but it misses array > indexing, as in: yes; i didn't take it into consideration, but (still without detailed analysis) i guess it should not be difficult to extend the code to handle this. > > > d <- data.frame( x = 1:10, y = 1:10, z = 1:10 ) > > m <- cbind( 5:7, 1:3 ) > > m > [,1] [,2] > [1,]51 > [2,]62 > [3,]73 > > d[m] > [1] 5 6 7 > > subdf( d, m ) > Error in subdf(d, m) : undefined columns selected this should be easy to handle by checking if i is a matrix and then indexing by its first column as i and the second as j. > > "Matrix indexing using '[' is not recommended, and barely > supported. For extraction, 'x' is first coerced to a matrix. For > replacement a logical matrix (only) can be used to select the > elements to be replaced in the same way as for a matrix." yes, here's how it's done (original comment): if(is.matrix(i)) return(as.matrix(x)[i]) # desperate measures and i can easily add this to my code, at virtually no additional expense. it's probably not a good idea to convert x to a matrix, x would often be much more data than the index matrix m, so it's presumably much more efficient, on average, to fiddle with i instead. there are some potentially confusing issues here: m = cbind(8:10, 1:3) d[m] # 3-element vector, as you could expect d[t(m)] # 6-element vector t(m) has dimensionality inappropriate for matrix indexing (it has 3 columns), so it gets flattened into a vector; however, it does not work like in the case of a single vector index where columns would be selected: d[as.vector(t(m))] # error: undefined columns selected i think it would be more appropriate to raise an error in a case like d[t(m)]. furthermore, if a matrix is used in a two-index form, the matrix is flattened again and is used to select rows (not elements, as in d[t(m)]). note also that the help page says that "for extraction, 'x' is first coerced to a matrix". it fails to explain that if *two* indices are used of which at least one is a matrix, no coercion is done. that is, the matrix is again flattened into a vector, but here [.data.frame forgets that it was a matrix (unlike in d[t(m)]): is(d[m]) # a character vector, matrix indexing is(d[t(m)]) # a character vector, vector indexing of elements, not columns is(d[m,]) # a data frame, row indexing and finally, the fact that d[m] in fact converts x (i.e., d) to a matrix before the indexing means that the types of values in a some columns in d may get coerced to another type: d[,2] = as.character(d[,2]) is(d[,1]) # integer vector is(d[,2]) # character vector is(d[1:2, 1]) # integer vector is(d[cbind(1:2, 1)]) # character vector for all it's worth, i think matrix indexing of data frames should be dropped: d[m] # error: ... and if one needs it, it's as simple as as.matrix(d)[m] where the conversion of d to a matrix is explicit. on the side, [.data.frame is able to index matrices: '[.data.frame'(as.matrix(d), m) # same as as.matrix(d)[m] which is, so to speak, nonsense, since '[.data.frame' is designed specifically to handle data frames; i'd expect an error to be raised here (or a warning, at the very least). to summarize, the fact that subdf does not handle matrix indices is not an issue. anyway, thanks for the comment! best, vQ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Recent setClass fails where previous succeeded
These lines of code setClass("A", representation(x="numeric")) setMethod(initialize, "A", function(.Object, ...) stop("oops")) setClass("B", representation("A")) result in > setClass("B", representation("A")) Error in initialize(value, ...) : oops in R version 2.9.0 alpha (2009-03-28 r48239) R version 2.10.0 Under development (unstable) (2009-03-28 r48239) but not in r48182. In addition, in package code, the error above does NOT lead to removal of the partially installed package, or of the lock on the package directory, corrupting the user installation. For more context, the actual code adds arguments to initialize and expects them to be provided by calls to 'new'; 'new' is not exposed directly to the user but via a constructor that always provides appropriate arguments. A specific example occurs when trying to install the package Biostrings v 2.11.44 from the Bioconductor devel repository. Martin -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel