Re: [R] problem with rbind on data.frames that contain data.frames

Michael Lachmann Wed, 30 Jun 2010 16:23:54 -0700

On 30 Jun 2010, at 22:55, Allan Engelhardt wrote:
>> > a$z=z
> You are (kind of) assigning *two* columns from the data frame "z" to  
> the name 'z' in "a" which is probably not going to work as you  
> expect.  R tries to be clever which may or may not be a Good Thing.   
> Try
>
> a$z1 <- z[,1]
> a$z2 <- z[,2]


Yes, the problem is that I wanted my code to work on data where the  
number of columns is variable. Of course even that can be handled....  
just much uglier than just assigning the result of the computation to  
a part of the data.frame. I was mainly asking how I could have avoided  
having the bug in the first place... once I found it, it was easy to  
solve.

I tried to track the problem further...

As I said before, the problem is there if one does
a=data.frame(1:10,1:10)
a$z=a
rbind(a,a)

in this case str(a) gives:
---
 > str(a)
'data.frame':   10 obs. of  3 variables:
  $ X1.10  : int  1 2 3 4 5 6 7 8 9 10
  $ X1.10.1: int  1 2 3 4 5 6 7 8 9 10
  $ z      :'data.frame':        10 obs. of  2 variables:
   ..$ X1.10  : int  1 2 3 4 5 6 7 8 9 10
   ..$ X1.10.1: int  1 2 3 4 5 6 7 8 9 10
---
The problem does not occur if one does
a=data.frame(1:10,1:10)
a=data.frame(1:10,1:10,z=a)
rbind(a,a)

Here all works fine. In this case str(a) gives:
---
 > str(a)
'data.frame':   10 obs. of  4 variables:
  $ X1.10    : int  1 2 3 4 5 6 7 8 9 10
  $ X1.10.1  : int  1 2 3 4 5 6 7 8 9 10
  $ z.X1.10  : int  1 2 3 4 5 6 7 8 9 10
  $ z.X1.10.1: int  1 2 3 4 5 6 7 8 9 10
--
the problem also does not occur when I do:
a=data.frame(1:10,1:10)
a$z=as.matrix(a)
rbind(a,a)

In this case, str(a) gives:
--
 > str(a)
'data.frame':   10 obs. of  3 variables:
  $ X1.10  : int  1 2 3 4 5 6 7 8 9 10
  $ X1.10.1: int  1 2 3 4 5 6 7 8 9 10
  $ z      : int [1:10, 1:2] 1 2 3 4 5 6 7 8 9 10 ...
   ..- attr(*, "dimnames")=List of 2
   .. ..$ : NULL
   .. ..$ : chr  "X1.10" "X1.10.1"
--

Now, looking at the code of rbind.data.frame, the error comes from the  
lines:
--
             xij <- xi[[j]]
             if (has.dim[jj]) {
                 value[[jj]][ri, ] <- xij
                 rownames(value[[jj]])[ri] <- rownames(xij)   # <--  
problem is here
             }
--
if the rownames() line is dropped, all works well. What this line  
tries to do is to join the rownames of internal elements of the  
data.frames I try to join. So the result, in my case should have a  
column z, whose rownames are the rownames of the original column z. It  
isn't totally clear to me why this is needed. When would a data.frame  
have different rownames on the inside vs. the outside?

Notice also that rbind takes into account whether the rownames of the  
data.frames to be joined are simply 1:n, or they are something else.  
If they are 1:n, then the result will have rownames 1:(n+m). If not,  
then the rownames might be kept.

I think, more consistent would be to replace the lines above with  
something like:
             if (has.dim[jj]) {
                 value[[jj]][ri, ] <- xij
                 rnj = rownames(value[[jj]])
                 rnj[ri] = rownames(xij)
                 rnj = make.unique(as.character(unlist(rnj)), sep = "")
                 rownames(value[[jj]]) <- rnj
             }

In this case, the rownames of inside elements will also be joined, but  
in case they overlap, they will be made unique - just as they are for  
the overall result of rbind. A side effect here would be that the  
rownames of matrices will also be made unique, which till now didn't  
happen, and which also doesn't happen when one rbinds matrices that  
have rownames. So it would be better to test above if we are dealing  
with a matrix or a data.frame.

But most people don't have different rownames inside and outside.  
Maybe it would be best to add a flag as to whether you care or don't  
care about the rownames of internal data.frames...

Testing a bit further, I created a data.frame that has in it a  
data.frame that contains another data.frame:
--
 > a
    X1.10 X1.10 zz.X1.10 zz.X1.10
1      1     1        1        1
2      2     2        2        2
3      3     3        3        3
4      4     4        4        4
5      5     5        5        5
6      6     6        6        6
7      7     7        7        7
8      8     8        8        8
9      9     9        9        9
10    10    10       10       10
 > str(a)
'data.frame':   10 obs. of  3 variables:
  $ X1.10: int  1 2 3 4 5 6 7 8 9 10
  $ z    :'data.frame':  10 obs. of  1 variable:
   ..$ X1.10: int  1 2 3 4 5 6 7 8 9 10
  $ zz   :'data.frame':  10 obs. of  2 variables:
   ..$ X1.10: int  1 2 3 4 5 6 7 8 9 10
   ..$ z    :'data.frame':       10 obs. of  1 variable:
   .. ..$ X1.10: int  1 2 3 4 5 6 7 8 9 10
--
(and b is similar). One can carefully change rownames(a$z) and  
rownames(a$zz), and rownames(b$z) and rownames(b$zz) so that  
rbind(a,b) works. The result seems quite nonsensical, though.

Another possible solution would be if
a=data.frame(...,z=X)
and
a=data.frame(...)
a$z=X
behaved in the same way...

Michael





> On 30/06/10 20:46, Michael Lachmann wrote:
>> It took me some time to find this bug in my code. Is this a feature  
>> of R? Am I doing something wrong?
>>
>> > a=data.frame(x=1:10,y=1:10)
>> > b=data.frame(x=11:20,y=11:20)
>> > z=data.frame(1:10,11:20)
>>
>>
>
> or equivalent to keep the names straight.
>
> As you have it, a$z is a data.frame, not a column, so you'd need a 
> $z[,1] to get the 1:10 back from the original assignment of z.
>
> The default printing of a does not help: always check using str:
>
> > str(a)
> 'data.frame':    10 obs. of  3 variables:
> $ x: int  1 2 3 4 5 6 7 8 9 10
> $ y: int  1 2 3 4 5 6 7 8 9 10
> $ z:'data.frame':    10 obs. of  2 variables:
>  ..$ X1.10 : int  1 2 3 4 5 6 7 8 9 10
>  ..$ X11.20: int  11 12 13 14 15 16 17 18 19 20
>
>
> Hope this helps a little.
>
> Allan
>


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with rbind on data.frames that contain data.frames

Reply via email to