I don't think a custom type alone would work, because users would expect to use such string anywhere a regular string can be used, and that's where the problems start - the evaluation would have to happen at a point where it is not expected since we can assume today that CHAR() doesn't evaluate. If it's just construct that needs some function call to turn it into a real string, then that's (from user's perspective) no different than glue() so I don't think the users would see the benefit (admittedly, you could do a lot more with such internal type, but not sure if the complexity is worth it).
Cheers, Simon > On Dec 8, 2021, at 12:56 AM, Taras Zakharko <taras.zakha...@uzh.ch> wrote: > > I fully agree! General string interpolation opens a gaping security hole and > is accompanied by all kinds of problems and decisions. What I envision > instead is something like this: > > f”hello {name}” > > Which gets parsed by R to this: > > (STRINTERPSXP (CHARSXP (PROMISE nil))) > > Basically, a new type of R language construct that still can be processed by > packages (for customized interpolation like in cli etc.), with a default eval > which is basically paste0(). The benefit here would be that this is eagerly > parsed and syntactically checked, and that the promise code could carry a > srcref. And of course, that you could pass an interpolated string expression > lazily between frames without losing the environment etc… For more advanced > applications, a low level string interpolation expression constructor could > be provided (that could either parse a general string — at the user’s risk, > or build it directly from expressions). > > — Taras > > >> On 7 Dec 2021, at 12:06, Simon Urbanek <simon.urba...@r-project.org> wrote: >> >> >> >>> On Dec 7, 2021, at 22:09, Taras Zakharko <taras.zakha...@uzh.ch >>> <mailto:taras.zakha...@uzh.ch>> wrote: >>> >>> Great summary, Avi. >>> >>> String concatenation cold be trivially added to R, but it probably should >>> not be. You will notice that modern languages tend not to use “+” to do >>> string concatenation (they either have >>> a custom operator or a special kind of pattern to do it) due to practical >>> issues such an approach brings (implicit type casting, lack of >>> commutativity, performance etc.). These issues will be felt even more so in >>> R with it’s weak typing, idiosyncratic casting behavior and NAs. >>> >>> As other’s have pointed out, any kind of behavior one wants from string >>> concatenation can be implemented by custom operators as needed. This is not >>> something that needs to be in the base R. I would rather like the efforts >>> to be directed on improving string formatting (such as glue-style built-in >>> string interpolation). >>> >> >> This is getting OT, but there is a very good reason why string interpolation >> is not in core R. As I recall it has been considered some time ago, but it >> is very dangerous as it implies evaluation on constants which opens a huge >> security hole and has questionable semantics (where you evaluate etc). Hence >> it's much easier to ban a package than to hack it out of R ;). >> >> Cheers, >> Simon >> >> >>> — Taras >>> >>> >>>> On 7 Dec 2021, at 02:27, Avi Gross via R-devel <r-devel@r-project.org> >>>> wrote: >>>> >>>> After seeing what others are saying, it is clear that you need to carefully >>>> think things out before designing any implementation of a more native >>>> concatenation operator whether it is called "+' or anything else. There may >>>> not be any ONE right solution but unlike a function version like paste() >>>> there is nowhere to place any options that specify what you mean. >>>> >>>> You can obviously expand paste() to accept arguments like replace.NA="" or >>>> replace.NA="<NA>" and similar arguments on what to do if you see a NaN, and >>>> Inf or -Inf, a NULL or even an NA.character_ and so on. Heck, you might >>>> tell >>>> to make other substitutions as in substitute=list(100=99, D=F) or any other >>>> nonsense you can come up with. >>>> >>>> But you have nowhere to put options when saying: >>>> >>>> c <- a + b >>>> >>>> Sure, you could set various global options before the addition and maybe >>>> rest them after, but that is not a way I like to go for something this >>>> basic. >>>> >>>> And enough such tinkering makes me wonder if it is easier to ask a user to >>>> use a slightly different function like this: >>>> >>>> paste.no.na <- function(...) do.call(paste, Filter(Negate(is.na), >>>> list(...))) >>>> >>>> The above one-line function removes any NA from the argument list to make a >>>> potentially shorter list before calling the real paste() using it. >>>> >>>> Variations can, of course, be made that allow functionality as above. >>>> >>>> If R was a true object-oriented language in the same sense as others like >>>> Python, operator overloading of "+" might be doable in more complex ways >>>> but >>>> we can only work with what we have. I tend to agree with others that in >>>> some >>>> places R is so lenient that all kinds of errors can happen because it makes >>>> a guess on how to correct it. Generally, if you really want to mix numeric >>>> and character, many languages require you to transform any arguments to >>>> make >>>> all of compatible types. The paste() function is clearly stated to coerce >>>> all arguments to be of type character for you. Whereas a+b makes no such >>>> promises and also is not properly defined even if a and b are both of type >>>> character. Sure, we can expand the language but it may still do things some >>>> find not to be quite what they wanted as in "2"+"3" becoming "23" rather >>>> than 5. Right now, I can use as.numeric("2")+as.numeric("3") and get the >>>> intended result after making very clear to anyone reading the code that I >>>> wanted strings converted to floating point before the addition. >>>> >>>> As has been pointed out, the plus operator if used to concatenate does not >>>> have a cognate for other operations like -*/ and R has used most other >>>> special symbols for other purposes. So, sure, we can use something like >>>> .... >>>> (4 periods) if it is not already being used for something but using + here >>>> is a tad confusing. Having said that, the makers of Python did make that >>>> choice. >>>> >>>> -----Original Message----- >>>> From: R-devel <r-devel-boun...@r-project.org> On Behalf Of Gabriel Becker >>>> Sent: Monday, December 6, 2021 7:21 PM >>>> To: Bill Dunlap <williamwdun...@gmail.com> >>>> Cc: Radford Neal <radf...@cs.toronto.edu>; r-devel <r-devel@r-project.org> >>>> Subject: Re: [Rd] string concatenation operator (revisited) >>>> >>>> As I recall, there was a large discussion related to that which resulted in >>>> the recycle0 argument being added (but defaulting to FALSE) for >>>> paste/paste0. >>>> >>>> I think a lot of these things ultimately mean that if there were to be a >>>> string concatenation operator, it probably shouldn't have behavior >>>> identical >>>> to paste0. Was that what you were getting at as well, Bill? >>>> >>>> ~G >>>> >>>> On Mon, Dec 6, 2021 at 4:11 PM Bill Dunlap <williamwdun...@gmail.com> >>>> wrote: >>>> >>>>> Should paste0(character(0), c("a","b")) give character(0)? >>>>> There is a fair bit of code that assumes that paste("X",NULL) gives "X" >>>>> but c(1,2)+NULL gives numeric(0). >>>>> >>>>> -Bill >>>>> >>>>> On Mon, Dec 6, 2021 at 1:32 PM Duncan Murdoch >>>>> <murdoch.dun...@gmail.com> >>>>> wrote: >>>>> >>>>>> On 06/12/2021 4:21 p.m., Avraham Adler wrote: >>>>>>> Gabe, I agree that missingness is important to factor in. To >>>>>>> somewhat >>>>>> abuse >>>>>>> the terminology, NA is often used to represent missingness. Perhaps >>>>>>> concatenating character something with character something missing >>>>>> should >>>>>>> result in the original character? >>>>>> >>>>>> I think that's a bad idea. If you wanted to represent an empty >>>>>> string, you should use "" or NULL, not NA. >>>>>> >>>>>> I'd agree with Gabe, paste0("abc", NA) shouldn't give "abcNA", it >>>>>> should give NA. >>>>>> >>>>>> Duncan Murdoch >>>>>> >>>>>>> >>>>>>> Avi >>>>>>> >>>>>>> On Mon, Dec 6, 2021 at 3:35 PM Gabriel Becker >>>>>>> <gabembec...@gmail.com> >>>>>> wrote: >>>>>>> >>>>>>>> Hi All, >>>>>>>> >>>>>>>> Seeing this and the other thread (and admittedly not having >>>>>>>> clicked >>>>>> through >>>>>>>> to the linked r-help thread), I wonder about NAs. >>>>>>>> >>>>>>>> Should NA <concat> "hi there" not result in NA_character_? This >>>>>>>> is not what any of the paste functions do, but in my opinoin, NA + >>>>>> <non_na_value> >>>>>>>> seems like it should be NA (not "NA"), particularly if we are >>>>>>>> talking about `+` overloading, but potentially even in the case of >>>>>>>> a distinct concatenation operator? >>>>>>>> >>>>>>>> I guess what I'm saying is that in my head missingness propagation >>>>>> rules >>>>>>>> should take priority in such an operator (ie NA + <anything> >>>>>>>> should *always * be NA). >>>>>>>> >>>>>>>> Is that something others disagree with, or has it just not come up >>>>>>>> yet >>>>>> in >>>>>>>> (the parts I have read) of this discussion? >>>>>>>> >>>>>>>> Best, >>>>>>>> ~G >>>>>>>> >>>>>>>> On Mon, Dec 6, 2021 at 10:03 AM Radford Neal >>>>>>>> <radf...@cs.toronto.edu> >>>>>>>> wrote: >>>>>>>> >>>>>>>>>>> In pqR (see pqR-project.org), I have implemented ! and !! as >>>>>>>>>>> binary string concatenation operators, equivalent to paste0 and >>>>>>>>>>> paste, respectively. >>>>>>>>>>> >>>>>>>>>>> For instance, >>>>>>>>>>> >>>>>>>>>>>> "hello" ! "world" >>>>>>>>>>> [1] "helloworld" >>>>>>>>>>>> "hello" !! "world" >>>>>>>>>>> [1] "hello world" >>>>>>>>>>>> "hello" !! 1:4 >>>>>>>>>>> [1] "hello 1" "hello 2" "hello 3" "hello 4" >>>>>>>>>> >>>>>>>>>> I'm curious about the details: >>>>>>>>>> >>>>>>>>>> Would `1 ! 2` convert both to strings? >>>>>>>>> >>>>>>>>> They're equivalent to paste0 and paste, so 1 ! 2 produces "12", >>>>>>>>> just like paste0(1,2) does. Of course, they wouldn't have to be >>>>>>>>> exactly equivalent to paste0 and paste - one could impose >>>>>>>>> stricter requirements if that seemed better for error detection. >>>>>>>>> Off hand, though, I think automatically converting is more in >>>>>>>>> keeping with the rest of R. Explicitly converting with as.character >>>> could be tedious. >>>>>>>>> >>>>>>>>> I suppose disallowing logical arguments might make sense to guard >>>>>>>>> against typos where ! was meant to be the unary-not operator, but >>>>>>>>> ended up being a binary operator, after some sort of typo. I >>>>>>>>> doubt that this would be a common error, though. >>>>>>>>> >>>>>>>>> (Note that there's no ambiguity when there are no typos, except >>>>>>>>> that when negation is involved a space may be needed - so, for >>>>>>>>> example, "x" ! !TRUE is "xFALSE", but "x"!!TRUE is "x TRUE". >>>>>>>>> Existing uses of double negation are still fine - eg, a <- !!TRUE >>>> still sets a to TRUE. >>>>>>>>> Parsing of operators is greedy, so "x"!!!TRUE is "x FALSE", not >>>>>> "xTRUE".) >>>>>>>>> >>>>>>>>>> Where does the binary ! fit in the operator priority? E.g. how >>>>>>>>>> is >>>>>>>>>> >>>>>>>>>> a ! b > c >>>>>>>>>> >>>>>>>>>> parsed? >>>>>>>>> >>>>>>>>> As (a ! b) > c. >>>>>>>>> >>>>>>>>> Their precedence is between that of + and - and that of < and >. >>>>>>>>> So "x" ! 1+2 evalates to "x3" and "x" ! 1+2 < "x4" is TRUE. >>>>>>>>> >>>>>>>>> (Actually, pqR also has a .. operator that fixes the problems >>>>>>>>> with generating sequences with the : operator, and it has >>>>>>>>> precedence lower than + and - and higher than ! and !!, but >>>>>>>>> that's not relevant if you don't have the .. operator.) >>>>>>>>> >>>>>>>>> Radford Neal >>>>>>>>> >>>>>>>>> ______________________________________________ >>>>>>>>> R-devel@r-project.org mailing list >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>>>>>>> >>>>>>>> >>>>>>>> [[alternative HTML version deleted]] >>>>>>>> >>>>>>>> ______________________________________________ >>>>>>>> R-devel@r-project.org mailing list >>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>>>>>> >>>>>> >>>>>> ______________________________________________ >>>>>> R-devel@r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>>>> >>>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>> >>>> ______________________________________________ >>>> R-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >>> ______________________________________________ >>> R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> <https://stat.ethz.ch/mailman/listinfo/r-devel> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel