On 08/05/2018 4:50 PM, Steven Nydick wrote:
It also does the same thing if the factor is not on the first level of
the list, which seems to be due to the fact that the islistfactor is
recursive, but if a list is a list-factor, the first level lists are
coerced into character strings.
> x <- list(list(factor(LETTERS[1])))
> unlist(x)
Error in as.character.factor(x) : malformed factor
However, if one of the factors is at the top level, and one is nested,
then the result is:
> x <- list(list(factor(LETTERS[1])), factor(LETTERS[2]))
> unlist(x)
[1] <NA> B
Levels: B
... which does not seem to me to be desired behavior.
The patch I suggested doesn't help with either of these. I'd suggest
collecting examples, and posting a bug report to bugs.r-project.org.
Duncan Murdoch
On Tue, May 8, 2018 at 2:22 PM Duncan Murdoch <murdoch.dun...@gmail.com
<mailto:murdoch.dun...@gmail.com>> wrote:
On 08/05/2018 2:58 PM, Duncan Murdoch wrote:
> On 08/05/2018 1:48 PM, Steven Nydick wrote:
>> Reproducible example:
>>
>> x <- list(list(list(), list()))
>> unlist(x)
>>
>> *> Error in as.character.factor(x) : malformed factor*
>
> The error comes from the line
>
> structure(res, levels = lv, names = nm, class = "factor")
>
> which is called because unlist() thinks that some entry is a factor,
> with NULL levels and NULL names. It's not legal for a factor to have
> NULL levels. Probably it should never get here; the earlier test
>
> if (.Internal(islistfactor(x, recursive))) {
>
> should have been false, and then the result would have been
>
> .Internal(unlist(x, recursive, use.names))
>
> (with both recursive and use.names being TRUE), which returns NULL.
And the problem is in the islistfactor function in src/main/apply.c,
which looks like this:
static Rboolean islistfactor(SEXP X)
{
int i, n = length(X);
switch(TYPEOF(X)) {
case VECSXP:
case EXPRSXP:
if(n == 0) return NA_LOGICAL;
for(i = 0; i < LENGTH(X); i++)
if(!islistfactor(VECTOR_ELT(X, i))) return FALSE;
return TRUE;
break;
}
return isFactor(X);
}
One of those deeply nested lists is length 0, so at the lowest level it
returns NA_LOGICAL. But then it does C-style logical testing on the
results. I think to C NA_LOGICAL counts as true, so at the next level
up we get the wrong answer.
A fix would be to rewrite it like this:
static Rboolean islistfactor(SEXP X)
{
int i, n = length(X);
Rboolean result = NA_LOGICAL, childresult;
switch(TYPEOF(X)) {
case VECSXP:
case EXPRSXP:
for(i = 0; i < LENGTH(X); i++) {
childresult = islistfactor(VECTOR_ELT(X, i));
if(childresult == FALSE) return FALSE;
else if(childresult == TRUE) result = TRUE;
}
return result;
break;
}
return isFactor(X);
}
--
Steven Nydick
PhD, Quantitative Psychology
M.A., Psychology
M.S., Statistics
--
"Beware of the man who works hard to learn something, learns it, and
finds himself no wiser than before, Bokonon tells us. He is full of
murderous resentment of people who are ignorant without having come by
their ignorance the hard way."
-Kurt Vonnegut
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel