Peter,
Thanks for the response. I have no wish to prolong this and have no axe to 
grind. I’m sure you were delighted to see another stringsAsFactors issue.

Perhaps we talking about the conflation of two steps: the first is the language 
‘pure' conversion of the table to a data.frame with the cross-tab factor, 
followed by an optional  subsequent step with programmatic utility for a 
specific application, of conversion of that factor to a character column.

As my toy example shows, the as.data.frame.table() function permits passing the 
inline stringsAsFactors argument and returns a data.frame with a factor 
cross-tab column coerced as a character column, permitting these two steps to 
be accomplished in a single function.

If you intend the function to only meet the first step, then I would suggest 
you remove stringsAsFactors as an argument to this function and amend the 
documentation.  
Following this, if an application needed a coercion to a character, then it 
should be accomplished in a second step. 

If you are implying that the core team intended options(stringsAsFactors) to be 
a ‘selective’ global option then I am guess I am confused and have not seen 
documentation about a limited scope of the session-wide options(). 

?options
  ‘stringsAsFactors’: The default setting for arguments of
          ‘data.frame’ and ‘read.table’.

As a practical programming matter this inconsistency created a bug in our code 
that was very insidious and cost hours of debugging and a lot of head 
scratching. Chars and factors are always prime candidates, but we never even 
considered that the session option would not have been respected by a low level 
core function in which the function call in the documentation explicitly 
included the inline argument.

?as.data.frame.table() 

From the Usage section of as.data.frame.table()

     ## S3 method for class 'table'
     as.data.frame(x, row.names = NULL, ...,
                   responseName = "Freq", stringsAsFactors = TRUE,
                   sep = "", base = list(LETTERS))


Thanks,  Joe.


> On Mar 14, 2019, at 11:18 AM, peter dalgaard <pda...@gmail.com> wrote:
> 
> I have no recollection of the original rationale for as.data.frame.table, but 
> I actually think it is fine as it is: 
> 
> The classifying _factors_ of a crosstable should be factors unless very 
> specifically directed otherwise and that should not depend on the setting of 
> an option that controls the conversion of character data. 
> 
> For as.data.frame.matrix, in contrast, it is the _content_ of the matrix that 
> is being converted, and it seems much more reasonable to follow the same path 
> as for other character data.
> 
> -pd
> 
>> On 12 Mar 2019, at 21:39 , Mychaleckyj, Josyf C (jcm6t) <jc...@virginia.edu> 
>> wrote:
>> 
>> Reporting a possible inconsistency or bug in handling stringsAsFactors in 
>> as.data.frame.table()
>> 
>> Here is a simple test
>> 
>>> options()$stringsAsFactors
>> [1] TRUE
>>> x<-c("a","b","c","a","b")
>>> d<-as.data.frame(table(x))
>>> d
>> x Freq
>> 1 a    2
>> 2 b    2
>> 3 c    1
>>> class(d$x)
>> [1] "factor"
>>> d2<-as.data.frame(table(x),stringsAsFactors=F)
>>> class(d2$x)
>> [1] “character"
>>> options(stringsAsFactors=F)
>>> options()$stringsAsFactors
>> [1] FALSE
>>> d3<-as.data.frame(table(x))
>>> d3
>> x Freq
>> 1 a    2
>> 2 b    2
>> 3 c    1
>>> class(d3$x)
>> [1] “factor"
>>> d4<-as.data.frame(table(x),stringsAsFactors=F)
>>> class(d4$x)
>> [1] “character"
>> 
>> 
>> # Display the code showing the different  stringsAsFactors handling in table 
>> and matrix:
>> 
>>> as.data.frame.table
>> function (x, row.names = NULL, ..., responseName = "Freq", stringsAsFactors 
>> = TRUE,
>>   sep = "", base = list(LETTERS))
>> {
>>   ex <- quote(data.frame(do.call("expand.grid", c(dimnames(provideDimnames(x,
>>       sep = sep, base = base)), KEEP.OUT.ATTRS = FALSE, stringsAsFactors = 
>> stringsAsFactors)),
>>       Freq = c(x), row.names = row.names))
>>   names(ex)[3L] <- responseName
>>   eval(ex)
>> }
>> <bytecode: 0x28769f8>
>> <environment: namespace:base>
>> 
>>> as.data.frame.matrix
>> function (x, row.names = NULL, optional = FALSE, make.names = TRUE,
>>   ..., stringsAsFactors = default.stringsAsFactors())
>> {
>>   d <- dim(x)
>>   nrows <- d[[1L]]
>>   ncols <- d[[2L]]
>>   ic <- seq_len(ncols)
>>   dn <- dimnames(x)
>>   if (is.null(row.names))
>>       row.names <- dn[[1L]]
>>   collabs <- dn[[2L]]
>>   if (any(empty <- !nzchar(collabs)))
>>       collabs[empty] <- paste0("V", ic)[empty]
>>   value <- vector("list", ncols)
>>   if (mode(x) == "character" && stringsAsFactors) {
>>       for (i in ic) value[[i]] <- as.factor(x[, i])
>>   }
>>   else {
>>       for (i in ic) value[[i]] <- as.vector(x[, i])
>>   }
>>   autoRN <- (is.null(row.names) || length(row.names) != nrows)
>>   if (length(collabs) == ncols)
>>       names(value) <- collabs
>>   else if (!optional)
>>       names(value) <- paste0("V", ic)
>>   class(value) <- "data.frame"
>>   if (autoRN)
>>       attr(value, "row.names") <- .set_row_names(nrows)
>>   else .rowNamesDF(value, make.names = make.names) <- row.names
>>   value
>> }
>> <bytecode: 0x29995c0>
>> <environment: namespace:base>
>> 
>> 
>>> sessionInfo()
>> R version 3.5.2 (2018-12-20)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: CentOS Linux 7 (Core)
>> 
>> Matrix products: default
>> BLAS: /usr/lib64/libblas.so.3.4.2
>> LAPACK: /usr/lib64/liblapack.so.3.4.2
>> 
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>> [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>> 
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>> 
>> loaded via a namespace (and not attached):
>> [1] compiler_3.5.2 tools_3.5.2
>> 
>> Thanks,
>> Joe
>> 
>> 
>> 
>>      [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> -- 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd....@cbs.dk  Priv: pda...@gmail.com
> 
> 
> 
> 
> 
> 
> 
> 
> 

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to