Thank you for your response. Note that with R 3.4.3, I get the same result with simplify=TRUE or simplify=FALSE.

My problem was the behaviour was different if I define my columns as character or as numeric but for now some minutes I discovered there also is a stringsAsFactors option in the function data.frame. So yes, it was a stupid question and I apologize for it.


On 06/02/2018 18:07, William Dunlap wrote:
Don't use aggregate's simplify=TRUE when FUN() produces return
values of various dimensions.  In your case, the shape of table(subset)'s
return value depends on the number of levels in the factor 'subset'.
If you make B a factor before splitting it by C, each split will have the
same number of levels (2).  If you split it and then let table convert
each split to a factor, one split will have 1 level and the other 2.  To see
the details of the output , use str() instead of print().


Bill Dunlap
TIBCO Software
wdunlap tibco.com <http://tibco.com>

On Tue, Feb 6, 2018 at 12:20 AM, Alain Guillet <alain.guil...@uclouvain.be <mailto:alain.guil...@uclouvain.be>> wrote:

    Dear R users,

    When I use aggregate with table as FUN, I get what I would call a
    strange behaviour if it involves numerical vectors and one "level"
    of it is not present for every "levels" of the "by" variable:

    ---------------------------

    > df <-
    data.frame(A=c(1,1,1,1,0,0,0,0),B=c(1,0,1,0,0,0,1,0),C=c(1,0,1,0,0,1,1,1))
    > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
      Group.1 A.0 A.1    B
    1       0   1   2    3
    2       1   3   2 2, 3

    > table(df$C,df$B)

        0 1
      0 3 0
      1 2 3

    ---------------

    As you can see, a comma appears in the column with the variable B
    in the aggregate whereas when I call table I obtain the same
    result as if B was defined as a factor (I suppose it comes from
    the fact "non-factor arguments a are coerced via factor" according
    to the details of the table help). I find it completely normal if
    I remember that aggregate first splits the data into subsets and
    then compute the table. But then I don't understand why it works
    differently with character vectors. Indeed if I use character
    vectors, I get the same result as with factors:

    ------------------------

    > df <-
    
data.frame(A=factor(c("1","1","1","1","0","0","0","0")),B=factor(c("1","0","1","0","0","0","1","0")),C=factor(c("1","0","1","0","0","1","1","1")))
    > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
      Group.1 A.0 A.1 B.0 B.1
    1       0   1   2   3   0
    2       1   3   2   2   3

    > df <-
    
data.frame(A=factor(c(1,1,1,1,0,0,0,0)),B=factor(c(1,0,1,0,0,0,1,0)),C=factor(c(1,0,1,0,0,1,1,1)))
    > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
      Group.1 A.0 A.1 B.0 B.1
    1       0   1   2   3   0
    2       1   3   2   2   3

    ---------------------

    Is it possible to precise anything about this behaviour in the
    aggregate help since the result is not completely compatible with
    the expectation of result we can have according to the table help?
    Or would it be possible to have the same results independently of
    the vector type? This post was rejected on the R-devel mailing
    list so I ask my question here as suggested.


    Best regards,
    Alain Guillet

--

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to