Wrapping up the first two courses

Brian Goetz Mon, 22 Apr 2019 06:16:10 -0700

> The main thing Brian is waiting for, though, is not lots of new ideas,
> but rather a consensus that (a) we can treat leading whitespace outside
> of a given rectangle as syntax-not-payload (thus stripped), and (b) that
> we should provide a way for programmers to opt out of the stripping
> (making all space into syntax-and-payload).  It feels to me like we
> have arrived there and are driving around the parking lot, checking
> out all the parking spots, worrying that we will miss the best one.


Glad to hear it :)

So, I posit, we have consensus over the following things: 

 - Multi-line strings are a useful feature on their own
 - Using “fat” delimiters for multi-line strings is practical and intuitive
 - Multi-line string literals share the same escape language as single-line 
string literals
 - Newlines in MLSLs should be normalized to \n
 - There exists a reasonable alignment algorithm, which users can learn easily 
enough, and can be captured as a library method on String (some finer points to 
be hammered out)
 - To the extent the language performs alignment, it should be consistent with 
what the library-based version does, so that users can opt out and opt back in 
again
 - In the common case, a MLSL will be a combination of some intended and some 
incidental indentation, and it is reasonable for the default to be that the 
language attempts to normalize away the incidental indendentation
 - There needs to be an opt-out, for the cases where alignment is not the 
default the user wants

(A useful way to frame the discussion we had regarding linguistic alignment is: 
whether a string literal is “one dimensional” or “two dimensional.”  The 1D 
interpretation says a string literal is just a sequence of characters between 
two delimiters; the 2D interpretation says that it has an inherent line 
structure that could be manipulated directly.)

What I like about this proposal — much more than with the previous round — is 
that the two flavors of string literal (thin and fat) are clearly projections 
of the same feature, and their differences pertain solely to their essential 
difference — multi-line-ness.

I will leave it to Jim to summarize the current state of the alignment 
algorithm, and any open questions (e.g., closing delimiter influence, treatment 
of single-line strings, etc) that may still be lingering, but these are not 
blockers to placing our order for the first two courses.  

I am still having a hard time getting comfortable with Guy’s proposal to use 
more “envelope” here — I think others have expressed similar discomfort.  If I 
had to put my finger on it, it is that being able to cut and paste in and out 
is such a big part of what is currently missing, and there is insufficient 
trust that there would be ubiquitous IDE support in all the various ways that 
people edit Java code.  But given that this is framed as “let’s carve out some 
extra envelope space”, we can keep discussing this even as we move forward.  

We still need to make some decisions on syntax; the main one that is currently 
relevant being opt-out. (For any syntax issues, please create another thread.)  
Jim hinted at this earlier: use an escape sequence that is stripped out of the 
string but means “no alignment.”  Something like:

     String s = “"“\- 
         Leave me just the way 
         you found me”””

Obviously there is room to argue over the specific escape sequence, so let’s 
put this in the “open questions” bucket.
       
There was another proposal, which was to use a prefix character:

    String s = a”…” // opt into alignment
    String s = r”…” // raw string

I’d like to put this one to bed quickly, because I see it as having a number of 
issues.  

Having a set of prefix characters is one of those features that starts off weak 
and scales badly from there :). With only two prefixes, as suggested above, it 
has a feel of overgeneralization, but with a large number of candidate 
prefixes, it gets worse, because invariably as such a feature gets more 
complicated, there are interactions.  One need look only at a Perl regex that 
uses multiple modifiers:

    /foo*/egimosx

to realize that what started as a simple feature (I think initially just `g`) 
had grown out of control.  

More importantly, of the two prefixes suggested, one doesn’t really make sense. 
 And that is: while the notion of “raw” string is attractive, one of the things 
that tripped us up the first time around is the believe that “raw” is a binary 
thing.  In reality, raw-ness comes in degrees — how hard you have to work to 
break out of the “string of uninterpreted characters” mode.  (Note: please 
let’s not start a discussion on raw strings; we’re wrapping up our orders for 
the first courses now.  I raise this only to put to bed a syntax choice 
predicated on the assumption that raw-ness is a binary characteristic.). 

If we’re pursuing align-by-default, we should consider a different name for the 
align() method; the name was originally chosen as a compromise when there was 
no align-by-default, and most of the other names were too long to ask people to 
type routinely.  If alignment is the default, the explicit name can be more 
descriptive.  


So, next steps:

 - Jim to write up current details of alignment algorithm, with current open 
issues;
 - Remaining bike sheds on opt-out and naming of align()

Once 1/1a are in the pipe, we can consider whether we want to move ahead to raw 
strings.

Wrapping up the first two courses

Reply via email to