>>>>> Hervé Pagès >>>>> on Sun, 24 May 2020 14:22:37 -0700 writes:
> On 5/24/20 00:26, Gabriel Becker wrote: >> >> >> On Sat, May 23, 2020 at 9:59 PM Hervé Pagès <hpa...@fredhutch.org >> <mailto:hpa...@fredhutch.org>> wrote: >> >> On 5/23/20 17:45, Gabriel Becker wrote: >> > Maybe my intuition is just >> > different but when I collapse multiple character vectors together, I >> > expect all the characters from each of those vectors to be in the >> > resulting collapsed one. >> >> Yes I'd expect that too. But the **collapse** operation in paste() has >> never been about collapsing **multiple** character vectors together. >> What it does is collapse the **single** character vector that comes out >> of the 'sep' operation. >> >> >> I understand what it does, I broke ti down the same way in my post >> earlier in the thread. the fact remains is that it is a single function >> which significantly muddies the waters. so you can say >> >> paste0(x,y, collapse=",", recycle0=TRUE) >> >> is not a collapse operation on multiple vectors, and of course there's a >> sense in which you're not wrong (again I understand what these functions >> do), but it sure looks like one in the invocation, doesn't it? >> >> Honestly the thing that this whole discussion has shown me most clearly >> is that, imho, collapse (accepting ONLY one data vector) and >> paste(accepting multiple) should never have been a single function to >> begin with. But that ship sailed long long ago. > Yes :-( >> >> So >> >> paste(x, y, z, sep="", collapse=",") >> >> is analogous to >> >> sum(x + y + z) >> >> >> Honestly, I'd be significantly more comfortable if >> >> 1:10 + integer(0) + 5 >> >> were an error too. > This is actually the recycling scheme used by mapply(): >> mapply(function(x, y, z) c(x, y, z), 1:10, integer(0), 5) > Error in mapply(FUN = FUN, ...) : > zero-length inputs cannot be mixed with those of non-zero length > AFAIK base R uses 3 different recycling schemes for n-ary operations: > (1) The recycling scheme used by arithmetic and comparison operations > (Arith, Compare, Logic group generics). > (2) The recycling scheme used by classic paste(). > (3) The recycling scheme used by mapply(). > Having such a core mechanism like recycling being inconsistent across > base R is sad. It makes it really hard to predict how a given n-ary > function will recycle its arguments unless you spend some time trying it > yourself with several combinations of vector lengths. It is of course > the source of numerous latent bugs. I wish there was only one but that's > just a dream. > None of these 3 recycling schemes is perfect. IMO (2) is by far the > worst. (3) is too restrictive and would need to be refined if we wanted > to make it a good universal recycling scheme. > Anyway I don't think it makes sense to introduce a 4th recycling scheme > at this point even though it would be a nice item to put on the wish > list for R 7.0.0 with the ultimate goal that it will universally adopted > in R 11.0.0 ;-) > So if we have to do with what we have IMO (1) is the scheme that makes > most sense although I agree that it can do some surprising things for > some unusual combinations of vector lengths. It's the scheme I adhere to > in my own binary operations e.g. in S4Vector::pcompare(). > The modest proposal of the 'recycle0' argument is only to let the user > switch from recycling scheme (2) to (1) if they're not happy with scheme > (2) (I'm one of them). Yes, indeed. This was the purpose of introducing 'recycle0'. Now, with collapse = <string>, {in R "string" := character vector of length 1}. we clearly see different interpretations on what is desirable for recycle0 = TRUE, all of you (Suharto, Bill, Hervé, Gabe) assert that the behavior should be different than now, and should either error (possibly, by Gabe), or return a single string (possibly with a warning), i.e., collapse = <string> behavior should not be influenced (or possibly be conflicting with) by recycle0=TRUE. Within R core, some believe the current recyle0=TRUE behavior to be the correct one. Personally, I see reasons for both.. What about remaining back-compatible, not only to R 3.y.z with default recycle0=FALSE, but also to R 4.0.0 with recycle0=TRUE *and* add a new option for the Suharto-Bill-Hervé-Gabe behavior, e.g., recycle0="sep.only" or just recycle0="sep" ? As (for back-compatibility reasons) you have to specify 'recycle0 = ..' anyway, you would get what makes most sense to you by using such a third option. ? (WDYT ?) Martin > Switching to scheme (3) or to a new custom scheme > would be a completely different proposal. >> >> At least I'm consistent right? > Yes :-) > Anyway discussing recycling schemes is interesting but not directly > related with what the OP brought up (behavior of the 'collapse' operation). > Cheers, > H. >> >> ~G ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel