El mié., 8 ago. 2018 a las 19:23, Gabe Becker (<becker.g...@gene.com>) escribió: > > Actually, I sent that too quickly, I should have let it stew a bit more. > I've changed my mind about the resolution argument I Was trying to make. > There is more information, technically speaking, in the factor with empty > levels. I'm still not convinced that its the right behavior, personally. It > may just be me though, since Martin seems on board. Mostly I'm just very > wary of taking away the thing about factors that makes them fundamentally > not characters, and removing the effectiveness of the level restriction, in > practice, does that.
For what it's worth, I always thought about factors as fundamentally characters, but with restrictions: a subspace of all possible strings. And I'd say that a non-negligible number of R users may think about them in a similar way. In fact, if you search "concatenation factors", you'll see that back in 2008 somebody asked on R-help [1] because he wanted to do exactly what Hadley is describing (i.e., concatenation as character with levels as a union of the levels), and he was surprised because... well, the behaviour of c.factor is quite surprising if you don't read the manual. BTW, the solution proposed was unlist(list(fct1, fct2)). [1] https://www.mail-archive.com/r-help@r-project.org/msg38360.html Iñaki > > Best, > ~G > > On Wed, Aug 8, 2018 at 8:54 AM, Martin Maechler <maech...@stat.math.ethz.ch> > wrote: > > > >>>>> Hadley Wickham > > >>>>> on Wed, 8 Aug 2018 09:34:42 -0500 writes: > > > > >>>> Method dispatch for `vec_c()` is quite simple because > > >>>> associativity and commutativity mean that we can > > >>>> determine the output type only by considering a pair of > > >>>> inputs at a time. To this end, vctrs provides > > >>>> `vec_type2()` which takes two inputs and returns their > > >>>> common type (represented as zero length vector): > > >>>> > > >>>> str(vec_type2(integer(), double())) #> num(0) > > >>>> > > >>>> str(vec_type2(factor("a"), factor("b"))) #> Factor w/ 2 > > >>>> levels "a","b": > > >>> > > >>> > > >>> What is the reasoning behind taking the union of the > > >>> levels here? I'm not sure that is actually the behavior > > >>> I would want if I have a vector of factors and I try to > > >>> append some new data to it. I might want/ expect to > > >>> retain the existing levels and get either NAs or an > > >>> error if the new data has (present) levels not in the > > >>> first data. The behavior as above doesn't seem in-line > > >>> with what I understand the purpose of factors to be > > >>> (explicit restriction of possible values). > > >> > > >> Originally (like a week ago ), we threw an error if the > > >> factors didn't have the same level, and provided an > > >> optional coercion to character. I decided that while > > >> correct (the factor levels are a parameter of the type, > > >> and hence factors with different levels aren't > > >> comparable), that this fights too much against how people > > >> actually use factors in practice. It also seems like base > > >> R is moving more in this direction, i.e. in 3.4 > > >> factor("a") == factor("b") is an error, whereas in R 3.5 > > >> it returns FALSE. > > > > > I now have a better argument, I think: > > > > > If you squint your brain a little, I think you can see > > > that each set of automatic coercions is about increasing > > > resolution. Integers are low resolution versions of > > > doubles, and dates are low resolution versions of > > > date-times. Logicals are low resolution version of > > > integers because there's a strong convention that `TRUE` > > > and `FALSE` can be used interchangeably with `1` and `0`. > > > > > But what is the resolution of a factor? We must take a > > > somewhat pragmatic approach because base R often converts > > > character vectors to factors, and we don't want to be > > > burdensome to users. So we say that a factor `x` has finer > > > resolution than factor `y` if the levels of `y` are > > > contained in `x`. So to find the common type of two > > > factors, we take the union of the levels of each factor, > > > given a factor that has finer resolution than > > > both. Finally, you can think of a character vector as a > > > factor with every possible level, so factors and character > > > vectors are coercible. > > > > > (extracted from the in-progress vignette explaining how to > > > extend vctrs to work with your own vctrs, now that vctrs > > > has been rewritten to use double dispatch) > > > > I like this argumentation, and find it very nice indeed! > > It confirms my own gut feeling which had lead me to agreeing > > with you, Hadley, that taking the union of all factor levels > > should be done here. > > > > As Gabe mentioned (and you've explained about) the term "type" > > is really confusing here. As you know, the R internals are all > > about SEXPs, TYPEOF(), etc, and that's what the R level > > typeof(.) also returns. As you want to use something slightly > > different, it should be different naming, ideally something not > > existing yet in the R / S world, maybe 'kind' ? > > > > Martin > > > > > > > Hadley > > > > > -- > > > http://hadley.nz > > > > > ______________________________________________ > > > R-devel@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > > > -- > Gabriel Becker, Ph.D > Scientist > Bioinformatics and Computational Biology > Genentech Research > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel