On 30 Jun 2010, at 22:55, Allan Engelhardt wrote: >> > a$z=z > You are (kind of) assigning *two* columns from the data frame "z" to > the name 'z' in "a" which is probably not going to work as you > expect. R tries to be clever which may or may not be a Good Thing. > Try > > a$z1 <- z[,1] > a$z2 <- z[,2]
Yes, the problem is that I wanted my code to work on data where the number of columns is variable. Of course even that can be handled.... just much uglier than just assigning the result of the computation to a part of the data.frame. I was mainly asking how I could have avoided having the bug in the first place... once I found it, it was easy to solve. I tried to track the problem further... As I said before, the problem is there if one does a=data.frame(1:10,1:10) a$z=a rbind(a,a) in this case str(a) gives: --- > str(a) 'data.frame': 10 obs. of 3 variables: $ X1.10 : int 1 2 3 4 5 6 7 8 9 10 $ X1.10.1: int 1 2 3 4 5 6 7 8 9 10 $ z :'data.frame': 10 obs. of 2 variables: ..$ X1.10 : int 1 2 3 4 5 6 7 8 9 10 ..$ X1.10.1: int 1 2 3 4 5 6 7 8 9 10 --- The problem does not occur if one does a=data.frame(1:10,1:10) a=data.frame(1:10,1:10,z=a) rbind(a,a) Here all works fine. In this case str(a) gives: --- > str(a) 'data.frame': 10 obs. of 4 variables: $ X1.10 : int 1 2 3 4 5 6 7 8 9 10 $ X1.10.1 : int 1 2 3 4 5 6 7 8 9 10 $ z.X1.10 : int 1 2 3 4 5 6 7 8 9 10 $ z.X1.10.1: int 1 2 3 4 5 6 7 8 9 10 -- the problem also does not occur when I do: a=data.frame(1:10,1:10) a$z=as.matrix(a) rbind(a,a) In this case, str(a) gives: -- > str(a) 'data.frame': 10 obs. of 3 variables: $ X1.10 : int 1 2 3 4 5 6 7 8 9 10 $ X1.10.1: int 1 2 3 4 5 6 7 8 9 10 $ z : int [1:10, 1:2] 1 2 3 4 5 6 7 8 9 10 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr "X1.10" "X1.10.1" -- Now, looking at the code of rbind.data.frame, the error comes from the lines: -- xij <- xi[[j]] if (has.dim[jj]) { value[[jj]][ri, ] <- xij rownames(value[[jj]])[ri] <- rownames(xij) # <-- problem is here } -- if the rownames() line is dropped, all works well. What this line tries to do is to join the rownames of internal elements of the data.frames I try to join. So the result, in my case should have a column z, whose rownames are the rownames of the original column z. It isn't totally clear to me why this is needed. When would a data.frame have different rownames on the inside vs. the outside? Notice also that rbind takes into account whether the rownames of the data.frames to be joined are simply 1:n, or they are something else. If they are 1:n, then the result will have rownames 1:(n+m). If not, then the rownames might be kept. I think, more consistent would be to replace the lines above with something like: if (has.dim[jj]) { value[[jj]][ri, ] <- xij rnj = rownames(value[[jj]]) rnj[ri] = rownames(xij) rnj = make.unique(as.character(unlist(rnj)), sep = "") rownames(value[[jj]]) <- rnj } In this case, the rownames of inside elements will also be joined, but in case they overlap, they will be made unique - just as they are for the overall result of rbind. A side effect here would be that the rownames of matrices will also be made unique, which till now didn't happen, and which also doesn't happen when one rbinds matrices that have rownames. So it would be better to test above if we are dealing with a matrix or a data.frame. But most people don't have different rownames inside and outside. Maybe it would be best to add a flag as to whether you care or don't care about the rownames of internal data.frames... Testing a bit further, I created a data.frame that has in it a data.frame that contains another data.frame: -- > a X1.10 X1.10 zz.X1.10 zz.X1.10 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9 9 10 10 10 10 10 > str(a) 'data.frame': 10 obs. of 3 variables: $ X1.10: int 1 2 3 4 5 6 7 8 9 10 $ z :'data.frame': 10 obs. of 1 variable: ..$ X1.10: int 1 2 3 4 5 6 7 8 9 10 $ zz :'data.frame': 10 obs. of 2 variables: ..$ X1.10: int 1 2 3 4 5 6 7 8 9 10 ..$ z :'data.frame': 10 obs. of 1 variable: .. ..$ X1.10: int 1 2 3 4 5 6 7 8 9 10 -- (and b is similar). One can carefully change rownames(a$z) and rownames(a$zz), and rownames(b$z) and rownames(b$zz) so that rbind(a,b) works. The result seems quite nonsensical, though. Another possible solution would be if a=data.frame(...,z=X) and a=data.frame(...) a$z=X behaved in the same way... Michael > On 30/06/10 20:46, Michael Lachmann wrote: >> It took me some time to find this bug in my code. Is this a feature >> of R? Am I doing something wrong? >> >> > a=data.frame(x=1:10,y=1:10) >> > b=data.frame(x=11:20,y=11:20) >> > z=data.frame(1:10,11:20) >> >> > > or equivalent to keep the names straight. > > As you have it, a$z is a data.frame, not a column, so you'd need a > $z[,1] to get the 1:10 back from the original assignment of z. > > The default printing of a does not help: always check using str: > > > str(a) > 'data.frame': 10 obs. of 3 variables: > $ x: int 1 2 3 4 5 6 7 8 9 10 > $ y: int 1 2 3 4 5 6 7 8 9 10 > $ z:'data.frame': 10 obs. of 2 variables: > ..$ X1.10 : int 1 2 3 4 5 6 7 8 9 10 > ..$ X11.20: int 11 12 13 14 15 16 17 18 19 20 > > > Hope this helps a little. > > Allan > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.