Hi Gabe,
> [... ] > Well, sure but that is because it happens to be a list with each element > having length one. In which case, it really should not have been a list at > all, and the fact that it was seems a deeper problem that should likely be > resolved instead of treating the symptom, in my opinion. > I wouldn't mind it explicitly failing on the ground that you don't join a list column on a character column, and I wouldn't mind it succeeding either, because it's consistent with `c("a", "b") == list("a", "b")` and `c("a", "b") %in% list("a", "b")` returning `c(TRUE, TRUE)`. But I feel strongly that it shouldn't behave differently depending on which data frame is provided first to the function, and I do think that if we do make it an error, it is worth making it understandable. > > > df1 <- data.frame(a=1) > > > df2 <- data.frame(b=2) > > > df1$id <- "ID" > > > df2$id <- list(c("ID", "ID2")) > > > merge(df1, df2) > > [1] id a b > > <0 rows> (or 0-length row.names) > > > Thats probably not what you wanted it to do, right? Or maybe it is, it > depends, right?. And therein lies the rub. > > > I have to be honest, as a developer, I really wish this, even in your > example case, threw an error. Anything else just looks to me like a > debugging nightmare looming in the wings waiting to strike. > > > What I did wrong in my real case, to provide context, is compute `df2$id <- lapply(x, fun)`, which was a mistake, but looked ok when printing, `vapply` solved the issue, `sapply` would still have been problematic because `df2$id` would be an emply list for a `x` of length 0. After correcting my mistake I tried to isolate the error and had trouble reproducing it with my simple case because I was inverting both data frames argument. This is how the inconsistency + cryptic message caused me more trouble than I think it should have. Imagine that I can have production code work for years with `merge(df1, df2)`, maybe not written by me, I change it to `merge(df2, df1)` for some reason and all breaks loose with `Error in sort.list(bx[m$xi]): 'x' must be atomic for 'sort.list', method "shell" and "quick"`. If I'm not familiar with list columns and that they can print just like character columns I might have a rough day. Here's another oddity that I think is worth fixing : df1 <- data.frame(a=1, id = "ID") df3 <- data.frame(c=character(), id = list()) merge(df3, df1) #> [1] x[FALSE, ] a id #> <0 lignes> (ou 'row.names' de longueur nulle) merge(df1, df3) #> [1] a id y[FALSE, ] #> <0 lignes> (ou 'row.names' de longueur nulle) [...] > > There's no reason (in principle) you wouldn't be able to join by a list > column, they should just both have to be list columns, in my ideal (but > admittedly unlikely) world. Id rather the atomic-vector/list mismatch case > throw an error, myself. > The doc does say that "This is intended to work with data frames with vector-like columns" in a note at the bottom, so anything we do is consistent with the doc, and fine by me if it fails (that's how {dplyr} joins work), but let the order of the data frames not matter. A warning is another option. > Now I kind of doubt we can change the behavior that works now, but as Avi > points out, I think this is something that is complicated and case specific > enough that it really ought to be your job as the coder to take care of > what should happen when you try to merge on columns that are fundamentally > different types. > Well yes, one can always say it's the developer's fault, but we all appreciate a software that guides us toward the light. List columns are not a rare thing at all anymore and using an `lapply` call instad of `sapply` or `vapply` is probably not a rare mistake. And again, the inconsistency is wrong in any case. I'll read other answers when I get the digest. Thanks, Antoine [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel