I don't think a custom type alone would work, because users would expect to use 
such string anywhere a regular string can be used, and that's where the 
problems start - the evaluation would have to happen at a point where it is not 
expected since we can assume today that CHAR() doesn't evaluate. If it's just 
construct that needs some function call to turn it into a real string, then 
that's (from user's perspective) no different than glue() so I don't think the 
users would see the benefit (admittedly, you could do a lot more with such 
internal type, but not sure if the complexity is worth it).

Cheers,
Simon



> On Dec 8, 2021, at 12:56 AM, Taras Zakharko <taras.zakha...@uzh.ch> wrote:
> 
> I fully agree! General string interpolation opens a gaping security hole and 
> is accompanied by all kinds of problems and decisions. What I envision 
> instead is something like this:
> 
>   f”hello {name}” 
> 
> Which gets parsed by R to this:
> 
>   (STRINTERPSXP (CHARSXP (PROMISE nil)))
> 
> Basically, a new type of R language construct that still can be processed by 
> packages (for customized interpolation like in cli etc.), with a default eval 
> which is basically paste0(). The benefit here would be that this is eagerly 
> parsed and syntactically checked, and that the promise code could carry a 
> srcref. And of course, that you could pass an interpolated string expression 
> lazily between frames without losing the environment etc… For more advanced 
> applications, a low level string interpolation expression constructor could 
> be provided (that could either parse a general string — at the user’s risk, 
> or build it directly from expressions). 
> 
> — Taras
> 
> 
>> On 7 Dec 2021, at 12:06, Simon Urbanek <simon.urba...@r-project.org> wrote:
>> 
>> 
>> 
>>> On Dec 7, 2021, at 22:09, Taras Zakharko <taras.zakha...@uzh.ch 
>>> <mailto:taras.zakha...@uzh.ch>> wrote:
>>> 
>>> Great summary, Avi. 
>>> 
>>> String concatenation cold be trivially added to R, but it probably should 
>>> not be. You will notice that modern languages tend not to use “+” to do 
>>> string concatenation (they either have 
>>> a custom operator or a special kind of pattern to do it) due to practical 
>>> issues such an approach brings (implicit type casting, lack of 
>>> commutativity, performance etc.). These issues will be felt even more so in 
>>> R with it’s weak typing, idiosyncratic casting behavior and NAs. 
>>> 
>>> As other’s have pointed out, any kind of behavior one wants from string 
>>> concatenation can be implemented by custom operators as needed. This is not 
>>> something that needs to be in the base R. I would rather like the efforts 
>>> to be directed on improving string formatting (such as glue-style built-in 
>>> string interpolation).
>>> 
>> 
>> This is getting OT, but there is a very good reason why string interpolation 
>> is not in core R. As I recall it has been considered some time ago, but it 
>> is very dangerous as it implies evaluation on constants which opens a huge 
>> security hole and has questionable semantics (where you evaluate etc). Hence 
>> it's much easier to ban a package than to hack it out of R ;).
>> 
>> Cheers,
>> Simon
>> 
>> 
>>> — Taras
>>> 
>>> 
>>>> On 7 Dec 2021, at 02:27, Avi Gross via R-devel <r-devel@r-project.org> 
>>>> wrote:
>>>> 
>>>> After seeing what others are saying, it is clear that you need to carefully
>>>> think things out before designing any implementation of a more native
>>>> concatenation operator whether it is called "+' or anything else. There may
>>>> not be any ONE right solution but unlike a function version like paste()
>>>> there is nowhere to place any options that specify what you mean.
>>>> 
>>>> You can obviously expand paste() to accept arguments like replace.NA="" or
>>>> replace.NA="<NA>" and similar arguments on what to do if you see a NaN, and
>>>> Inf or -Inf, a NULL or even an NA.character_ and so on. Heck, you might 
>>>> tell
>>>> to make other substitutions as in substitute=list(100=99, D=F) or any other
>>>> nonsense you can come up with.
>>>> 
>>>> But you have nowhere to put options when saying:
>>>> 
>>>> c <- a + b
>>>> 
>>>> Sure, you could set various global options before the addition and maybe
>>>> rest them after, but that is not a way I like to go for something this
>>>> basic.
>>>> 
>>>> And enough such tinkering makes me wonder if it is easier to ask a user to
>>>> use a slightly different function like this:
>>>> 
>>>> paste.no.na <- function(...) do.call(paste, Filter(Negate(is.na),
>>>> list(...)))
>>>> 
>>>> The above one-line function removes any NA from the argument list to make a
>>>> potentially shorter list before calling the real paste() using it.
>>>> 
>>>> Variations can, of course, be made that allow functionality as above. 
>>>> 
>>>> If R was a true object-oriented language in the same sense as others like
>>>> Python, operator overloading of "+" might be doable in more complex ways 
>>>> but
>>>> we can only work with what we have. I tend to agree with others that in 
>>>> some
>>>> places R is so lenient that all kinds of errors can happen because it makes
>>>> a guess on how to correct it. Generally, if you really want to mix numeric
>>>> and character, many languages require you to transform any arguments to 
>>>> make
>>>> all of compatible types. The paste() function is clearly stated to coerce
>>>> all arguments to be of type character for you. Whereas a+b makes no such
>>>> promises and also is not properly defined even if a and b are both of type
>>>> character. Sure, we can expand the language but it may still do things some
>>>> find not to be quite what they wanted as in "2"+"3" becoming "23" rather
>>>> than 5. Right now, I can use as.numeric("2")+as.numeric("3") and get the
>>>> intended result after making very clear to anyone reading the code that I
>>>> wanted strings converted to floating point before the addition.
>>>> 
>>>> As has been pointed out, the plus operator if used to concatenate does not
>>>> have a cognate for other operations like -*/ and R has used most other
>>>> special symbols for other purposes. So, sure, we can use something like 
>>>> ....
>>>> (4 periods) if it is not already being used for something but using + here
>>>> is a tad confusing. Having said that, the makers of Python did make that
>>>> choice.
>>>> 
>>>> -----Original Message-----
>>>> From: R-devel <r-devel-boun...@r-project.org> On Behalf Of Gabriel Becker
>>>> Sent: Monday, December 6, 2021 7:21 PM
>>>> To: Bill Dunlap <williamwdun...@gmail.com>
>>>> Cc: Radford Neal <radf...@cs.toronto.edu>; r-devel <r-devel@r-project.org>
>>>> Subject: Re: [Rd] string concatenation operator (revisited)
>>>> 
>>>> As I recall, there was a large discussion related to that which resulted in
>>>> the recycle0 argument being added (but defaulting to FALSE) for
>>>> paste/paste0.
>>>> 
>>>> I think a lot of these things ultimately mean that if there were to be a
>>>> string concatenation operator, it probably shouldn't have behavior 
>>>> identical
>>>> to paste0. Was that what you were getting at as well, Bill?
>>>> 
>>>> ~G
>>>> 
>>>> On Mon, Dec 6, 2021 at 4:11 PM Bill Dunlap <williamwdun...@gmail.com> 
>>>> wrote:
>>>> 
>>>>> Should paste0(character(0), c("a","b")) give character(0)?
>>>>> There is a fair bit of code that assumes that paste("X",NULL) gives "X"
>>>>> but c(1,2)+NULL gives numeric(0).
>>>>> 
>>>>> -Bill
>>>>> 
>>>>> On Mon, Dec 6, 2021 at 1:32 PM Duncan Murdoch 
>>>>> <murdoch.dun...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> On 06/12/2021 4:21 p.m., Avraham Adler wrote:
>>>>>>> Gabe, I agree that missingness is important to factor in. To 
>>>>>>> somewhat
>>>>>> abuse
>>>>>>> the terminology, NA is often used to represent missingness. Perhaps 
>>>>>>> concatenating character something with character something missing
>>>>>> should
>>>>>>> result in the original character?
>>>>>> 
>>>>>> I think that's a bad idea.  If you wanted to represent an empty 
>>>>>> string, you should use "" or NULL, not NA.
>>>>>> 
>>>>>> I'd agree with Gabe, paste0("abc", NA) shouldn't give "abcNA", it 
>>>>>> should give NA.
>>>>>> 
>>>>>> Duncan Murdoch
>>>>>> 
>>>>>>> 
>>>>>>> Avi
>>>>>>> 
>>>>>>> On Mon, Dec 6, 2021 at 3:35 PM Gabriel Becker 
>>>>>>> <gabembec...@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi All,
>>>>>>>> 
>>>>>>>> Seeing this and the other thread (and admittedly not having 
>>>>>>>> clicked
>>>>>> through
>>>>>>>> to the linked r-help thread), I wonder about NAs.
>>>>>>>> 
>>>>>>>> Should NA <concat> "hi there"  not result in NA_character_? This 
>>>>>>>> is not what any of the paste functions do, but in my opinoin, NA +
>>>>>> <non_na_value>
>>>>>>>> seems like it should be NA  (not "NA"), particularly if we are 
>>>>>>>> talking about `+` overloading, but potentially even in the case of 
>>>>>>>> a distinct concatenation operator?
>>>>>>>> 
>>>>>>>> I guess what I'm saying is that in my head missingness propagation
>>>>>> rules
>>>>>>>> should take priority in such an operator (ie NA + <anything> 
>>>>>>>> should *always * be NA).
>>>>>>>> 
>>>>>>>> Is that something others disagree with, or has it just not come up 
>>>>>>>> yet
>>>>>> in
>>>>>>>> (the parts I have read) of this discussion?
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> ~G
>>>>>>>> 
>>>>>>>> On Mon, Dec 6, 2021 at 10:03 AM Radford Neal 
>>>>>>>> <radf...@cs.toronto.edu>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>>>> In pqR (see pqR-project.org), I have implemented ! and !! as 
>>>>>>>>>>> binary string concatenation operators, equivalent to paste0 and 
>>>>>>>>>>> paste, respectively.
>>>>>>>>>>> 
>>>>>>>>>>> For instance,
>>>>>>>>>>> 
>>>>>>>>>>>> "hello" ! "world"
>>>>>>>>>>>   [1] "helloworld"
>>>>>>>>>>>> "hello" !! "world"
>>>>>>>>>>>   [1] "hello world"
>>>>>>>>>>>> "hello" !! 1:4
>>>>>>>>>>>   [1] "hello 1" "hello 2" "hello 3" "hello 4"
>>>>>>>>>> 
>>>>>>>>>> I'm curious about the details:
>>>>>>>>>> 
>>>>>>>>>> Would `1 ! 2` convert both to strings?
>>>>>>>>> 
>>>>>>>>> They're equivalent to paste0 and paste, so 1 ! 2 produces "12", 
>>>>>>>>> just like paste0(1,2) does.  Of course, they wouldn't have to be 
>>>>>>>>> exactly equivalent to paste0 and paste - one could impose 
>>>>>>>>> stricter requirements if that seemed better for error detection.  
>>>>>>>>> Off hand, though, I think automatically converting is more in 
>>>>>>>>> keeping with the rest of R.  Explicitly converting with as.character
>>>> could be tedious.
>>>>>>>>> 
>>>>>>>>> I suppose disallowing logical arguments might make sense to guard 
>>>>>>>>> against typos where ! was meant to be the unary-not operator, but 
>>>>>>>>> ended up being a binary operator, after some sort of typo.  I 
>>>>>>>>> doubt that this would be a common error, though.
>>>>>>>>> 
>>>>>>>>> (Note that there's no ambiguity when there are no typos, except 
>>>>>>>>> that when negation is involved a space may be needed - so, for 
>>>>>>>>> example, "x" !  !TRUE is "xFALSE", but "x"!!TRUE is "x TRUE".  
>>>>>>>>> Existing uses of double negation are still fine - eg, a <- !!TRUE
>>>> still sets a to TRUE.
>>>>>>>>> Parsing of operators is greedy, so "x"!!!TRUE is "x FALSE", not
>>>>>> "xTRUE".)
>>>>>>>>> 
>>>>>>>>>> Where does the binary ! fit in the operator priority?  E.g. how 
>>>>>>>>>> is
>>>>>>>>>> 
>>>>>>>>>> a ! b > c
>>>>>>>>>> 
>>>>>>>>>> parsed?
>>>>>>>>> 
>>>>>>>>> As (a ! b) > c.
>>>>>>>>> 
>>>>>>>>> Their precedence is between that of + and - and that of < and >.
>>>>>>>>> So "x" ! 1+2 evalates to "x3" and "x" ! 1+2 < "x4" is TRUE.
>>>>>>>>> 
>>>>>>>>> (Actually, pqR also has a .. operator that fixes the problems 
>>>>>>>>> with generating sequences with the : operator, and it has 
>>>>>>>>> precedence lower than + and - and higher than ! and !!, but 
>>>>>>>>> that's not relevant if you don't have the .. operator.)
>>>>>>>>> 
>>>>>>>>> Radford Neal
>>>>>>>>> 
>>>>>>>>> ______________________________________________
>>>>>>>>> R-devel@r-project.org mailing list 
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>>>> 
>>>>>>>> 
>>>>>>>>      [[alternative HTML version deleted]]
>>>>>>>> 
>>>>>>>> ______________________________________________
>>>>>>>> R-devel@r-project.org mailing list 
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>>> 
>>>>>> 
>>>>>> ______________________________________________
>>>>>> R-devel@r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>> 
>>>>> 
>>>> 
>>>>    [[alternative HTML version deleted]]
>>>> 
>>>> ______________________________________________
>>>> R-devel@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>> 
>>>> ______________________________________________
>>>> R-devel@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>>> ______________________________________________
>>> R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel 
>>> <https://stat.ethz.ch/mailman/listinfo/r-devel>
> 
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to