OK, let's take a step back.  We have identified at least three degrees of freedom that have been sources of friction with existing string literals:

 - Sometimes we don't want traditional escaping (\n, etc);
 - Sometimes we don't want unicode escaping (\unnnn);
 - Sometimes we want to represent multiple lines of text as a single String.

Traditional strings could be described as (false, false, false) on these axes; the propose raw strings are (true, true, true).  As a first evaluation (if these really are the axes), this is encouraging; if you're going to pick 2 of 2^N prepackaged options, its often best to pick the ones with the biggest hamming distance.

I have a hard time imagining that people really need, for example, traditional escaping but not unicode escaping, with any frequency.  So offering all 2^n combinations is not likely to carry its weight.

I think what you are suggesting is that its fine to lump the first two, but it might have been a premature move to lump them with the third.  (A second question is: are these the only axes we should be concerned with right now.)  So, let's examine that.

We explored allowing double-quoted strings to span lines too; this gives you a different stacking: { escaping multi-line, raw multi-line }.  But I think the part that's still  unexplored is: do we need to explicitly surface how source lines are combined into strings?

The assumption we've been working off of is: \n has won (this wasn't true when Java got started.)  Is this wishful thinking? And if not, can the library approach serve this purpose here too:

    `a long
     string`.toPlatformLineEnding()

(which, as has been observed, can be optimized either by compile-time evaluation or by link-time evaluation using LDC and ConstantDynamic, so I think we can ignore the "but then I'm doing work at runtime" aspect of this.)



On 2/5/2018 1:39 PM, Guy Steele wrote:
On Feb 5, 2018, at 1:39 PM, Brian Goetz <[email protected]> wrote:


However, I also note that the broad problem may two or three distinct symptoms, 
and:
(1) A solution that addresses one symptom may not address the others, and
(2) On the other hand, it may (or may not) be perfectly reason to address the 
most painful symptoms in different ways, rather than insisting that a single 
solution cover them all.
Indeed so.  This is one reason why we resisted the call to do string interpolation (which 
many developers conflate with multi-line strings, as many languages with one also have 
the other) at the same time.  Another way to ask this question is: are we yet 
sufficiently minimal?  We boiled it down quite a lot already, but are we at 
"minimal" yet?  Or, did we take a wrong turn in boiling it down, and find 
ourselves only a local minimum?

In particular, I happen to think that the problem of distinguishing snippet 
indentation from encoding-program indentation may require a rather different 
kind of solution from the problem of escape characters in embedded snippets.  
The reason is that in both these cases the painful symptom is visual in nature 
rather than logical.  That’s why I can understand what drove Tagir to pursue 
the pipe-character approach (even though I think it may not be the best 
solution to the problem).  We may want to use ```…``` to enclose regexes but 
also want to use some other approach to solve the multi-line / indentation 
problems.
OK, so what you're saying here is that it might be a clever self-deception to count 
newline handling as "just another aspect of raw-ness"?
Bingo.

Back in the day (I’m talking 1960s) it was ugly and wasteful but predictable: 
if there were line breaks at all (as opposed to record-oriented I/O), they were 
represented by two characters, CR and then LF, held over from the mechanical 
abilities/requirements of Teletype machines.

Then in mid-1960s an ISO standard allowed plain LF (eventually semi-renamed 
Newline) as an alternative, and Multics and then Unix spread this idea (and 
eventually to Apple).

But another branch of the world, notably the CP/M to MS-DOS to Windows line, 
continued to use CR/LF.  Worse yet, some software came to use CR along (perhaps 
a natural enough theory when you consider that the “Return” key on keyboards 
usually generates the CR character rather than the LF character).

It is simply impossible to be compatible with everyone on this issue, and we 
are fooling ourselves if we think that raw string representations can solve 
this problem in all contexts.  Much better, I think, in the absence of 
consensus to have explicit software gatekeepers at the points where data 
transitions among these disparate worlds.


Reply via email to