My comments are in the text.

Le 10/27/2010 12:11, Gabor Grothendieck a écrit :
On Wed, Oct 27, 2010 at 4:03 AM, Ivan Calandra
<ivan.calan...@uni-hamburg.de>  wrote:
Hi,

Gabor gave you a great answer already. But I would add a few precisions.
Someone please correct me if I'm wrong.

Arrays are matrices with more than 2 dimensions. Put the other way: matrices
are arrays with only 2 dimensions.
Arrays can have any number of dimensions including 1, 2, 3, etc.

        >  # a 2d array is a matrix. Its composed from a vector plus two 
dimensions.
        >  m<- array(1:4, c(2, 2))
        >  dput(m)
        structure(1:4, .Dim = c(2L, 2L))
        >  class(m)
        [1] "matrix"
        >  is.array(m)
        [1] TRUE

        >  # a 1d array is a vector plus a single dimension
        >  a1<- array(1:4, 4)
        >  dput(a1)
        structure(1:4, .Dim = 4L)
        >  dim(a1)
        [1] 4
        >  class(a1)
        [1] "array"
        >  is.array(a1)
        [1] TRUE

        >  # if we remove dimension part its no longer an array but just a 
vector
        >  nota<- a1
        >  dim(nota)<- NULL
        >  dput(nota)
        1:4
        >  is.array(nota)
        [1] FALSE
        >  is.vector(nota)
        [1] TRUE
What I don't understand is why vectors (with more than one value) don't have dimensions. They look like they do have 1 dimension. For me no dimension would be a scalar. Like in geometry: a point has no dimension, a line has 1, a square has 2, a cube 3 and so on. Is it because of some internal process? The intuitive geometry way of thinking is not programmatically relevant?


I would also add these:
- the components of a vector have to be of the same mode (character,
numeric, integer...)
however, a list with no attributes is a vector too so this is a vector:

    >   vl<- list(sin, 3, "a")
    >   is.vector(vl)
    [1] TRUE

A vector may not have attributes so arrays and factors are not vectors
although they are composed from vectors.
That's also completely unexpected for me! What is then a vector?! And then the difference between a vector and a list?! I mean, in practice, it's not so important, my understanding is probably enough for what I'm doing in R, but I'd like to understand how it works.

Also you wrote that a vector may not have attributes. I might be wrong (and certainly am), but aren't names attributes? So with is a named list still a vector:
my.list <- list(num=1:3, let=LETTERS[1:2])
names(my.list)
[1] "num" "let"
is.vector(my.list)
[1] TRUE
- which implies that the components of matrices and arrays have to be also
of the same mode (which might lead to some coercion of your data if you
don't pay attention to it).

Factor are character data, but coded as numeric mode. Each number is
associated with a given string, the so-called levels. Here is an example:
my.fac<- factor(c("something", "other", "more", "something", "other",
"more"))
A factor is composed of an integer vector plus a levels attribute
(called .Label internally) as in this code:

    >  fac<- factor(c("b", "a", "b"))
    >  dput(fac)
    structure(c(2L, 1L, 2L), .Label = c("a", "b"), class = "factor")
    >  levels(fac)
    [1] "a" "b"
I like this explanation for a factor, I couldn't find these exact words!

Thanks for the clarifications anyway!
Ivan

my.fac
  [1] something other     more      something other     more
  Levels: more other something
mode(my.fac)
  [1] "numeric"    ## coded as numeric even though you gave character
strings!
class(my.fac)
  [1] "factor"
levels(my.fac)
  [1] "more"      "other"     "something"
as.numeric(my.fac)
  [1] 3 2 1 3 2 1                  ## internal representation
as.character(my.fac)
[1] "something" "other"     "more"      "something" "other"     "more"    ##
what you think it is!

I found that the book "Data Manipulation with R" from Phil Spector (2008)
  was quite well done to explain all these object modes and classes, even
though I wouldn't have understood completely by reading only this book (not
that I have yet completely mastered this topic...)

HTH,
Ivan



Le 10/27/2010 02:49, Gabor Grothendieck a écrit :
On Tue, Oct 26, 2010 at 8:37 PM, Matt Curcio<matt.curcio...@gmail.com>
  wrote:
Hi All,
I am learning R and having a little trouble with the usage and proper
definitions of data.frames vs. matrix vs vectors. I have read many R
tutorials, and looked over ump-teen 'cheat' sheets and have found that
no one has articulated a really good definition of the differences
between 'data.frames', 'matrix', and 'arrays' and even 'factors'.  I
realize that I might have missed someones R tutorial, and actually
would like to receive 'your' most concise or most useful tutorial.
Any help would be appreciated.

My particular favorite explanation and helpful hint is from the
'R-Inferno'.  Don't get me wrong...  I think this pdf is great and
some tables are excellent. Overall it is a very good primer but this
one section leaves me puzzled.  This quote belies the lack of hard and
fast rules for what and when to use 'data.frames', 'matrix', and
'arrays'.  It discusses ways in which to simplify your work.

Here are a few possibilities for simplifying:
• Don’t use a list when an atomic vector will do.
• Don’t use a data frame when a matrix will do.
• Don’t try to use an atomic vector when a list is needed.
• Don’t try to use a matrix when a data frame is needed.

Cheers,
Matt C
Look at their internal representations and it will become clearer.  v,
a vector, has length 6.  m, a matrix, is actually the same as the
vector v except is has dimensions too. Since m is just a vector with
dimensions, m has length 6 as well.  L is a list and has length 2
because its a vector each of whose components is itself a vector.  DF
is a data frame and is the same as L except its 2 components must each
have the same length and it must have row and column names.  If you
don't assign the row and column names they are automatically generated
as we can see.  Note that row.names = c(NA, -3L) is a short form for
row names of 1:3 and .Names internally refers to column names.

v<- 1:6 # vector
dput(v)
1:6
m<- v; dim(m)<- 2:3 # m is a matrix since we added dimensions
dput(m)
structure(1:6, .Dim = 2:3)
L<- list(1:3, 4:6)
dput(L)
list(1:3, 4:6)
DF<- data.frame(1:3, 4:6)
dput(DF)
structure(list(X1.3 = 1:3, X4.6 = 4:6), .Names = c("X1.3", "X4.6"
), row.names = c(NA, -3L), class = "data.frame")

--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**********
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**********
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to