On Tue, Oct 26, 2010 at 8:37 PM, Matt Curcio <matt.curcio...@gmail.com> wrote: > Hi All, > I am learning R and having a little trouble with the usage and proper > definitions of data.frames vs. matrix vs vectors. I have read many R > tutorials, and looked over ump-teen 'cheat' sheets and have found that > no one has articulated a really good definition of the differences > between 'data.frames', 'matrix', and 'arrays' and even 'factors'. I > realize that I might have missed someones R tutorial, and actually > would like to receive 'your' most concise or most useful tutorial. > Any help would be appreciated. > > My particular favorite explanation and helpful hint is from the > 'R-Inferno'. Don't get me wrong... I think this pdf is great and > some tables are excellent. Overall it is a very good primer but this > one section leaves me puzzled. This quote belies the lack of hard and > fast rules for what and when to use 'data.frames', 'matrix', and > 'arrays'. It discusses ways in which to simplify your work. > > Here are a few possibilities for simplifying: > • Don’t use a list when an atomic vector will do. > • Don’t use a data frame when a matrix will do. > • Don’t try to use an atomic vector when a list is needed. > • Don’t try to use a matrix when a data frame is needed. > > Cheers, > Matt C
Look at their internal representations and it will become clearer. v, a vector, has length 6. m, a matrix, is actually the same as the vector v except is has dimensions too. Since m is just a vector with dimensions, m has length 6 as well. L is a list and has length 2 because its a vector each of whose components is itself a vector. DF is a data frame and is the same as L except its 2 components must each have the same length and it must have row and column names. If you don't assign the row and column names they are automatically generated as we can see. Note that row.names = c(NA, -3L) is a short form for row names of 1:3 and .Names internally refers to column names. > v <- 1:6 # vector > dput(v) 1:6 > > m <- v; dim(m) <- 2:3 # m is a matrix since we added dimensions > dput(m) structure(1:6, .Dim = 2:3) > > L <- list(1:3, 4:6) > dput(L) list(1:3, 4:6) > > DF <- data.frame(1:3, 4:6) > dput(DF) structure(list(X1.3 = 1:3, X4.6 = 4:6), .Names = c("X1.3", "X4.6" ), row.names = c(NA, -3L), class = "data.frame") > -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.