> The main thing Brian is waiting for, though, is not lots of new ideas,
> but rather a consensus that (a) we can treat leading whitespace outside
> of a given rectangle as syntax-not-payload (thus stripped), and (b) that
> we should provide a way for programmers to opt out of the stripping
> (making all space into syntax-and-payload). It feels to me like we
> have arrived there and are driving around the parking lot, checking
> out all the parking spots, worrying that we will miss the best one.
Glad to hear it :)
So, I posit, we have consensus over the following things:
- Multi-line strings are a useful feature on their own
- Using “fat” delimiters for multi-line strings is practical and intuitive
- Multi-line string literals share the same escape language as single-line
string literals
- Newlines in MLSLs should be normalized to \n
- There exists a reasonable alignment algorithm, which users can learn easily
enough, and can be captured as a library method on String (some finer points to
be hammered out)
- To the extent the language performs alignment, it should be consistent with
what the library-based version does, so that users can opt out and opt back in
again
- In the common case, a MLSL will be a combination of some intended and some
incidental indentation, and it is reasonable for the default to be that the
language attempts to normalize away the incidental indendentation
- There needs to be an opt-out, for the cases where alignment is not the
default the user wants
(A useful way to frame the discussion we had regarding linguistic alignment is:
whether a string literal is “one dimensional” or “two dimensional.” The 1D
interpretation says a string literal is just a sequence of characters between
two delimiters; the 2D interpretation says that it has an inherent line
structure that could be manipulated directly.)
What I like about this proposal — much more than with the previous round — is
that the two flavors of string literal (thin and fat) are clearly projections
of the same feature, and their differences pertain solely to their essential
difference — multi-line-ness.
I will leave it to Jim to summarize the current state of the alignment
algorithm, and any open questions (e.g., closing delimiter influence, treatment
of single-line strings, etc) that may still be lingering, but these are not
blockers to placing our order for the first two courses.
I am still having a hard time getting comfortable with Guy’s proposal to use
more “envelope” here — I think others have expressed similar discomfort. If I
had to put my finger on it, it is that being able to cut and paste in and out
is such a big part of what is currently missing, and there is insufficient
trust that there would be ubiquitous IDE support in all the various ways that
people edit Java code. But given that this is framed as “let’s carve out some
extra envelope space”, we can keep discussing this even as we move forward.
We still need to make some decisions on syntax; the main one that is currently
relevant being opt-out. (For any syntax issues, please create another thread.)
Jim hinted at this earlier: use an escape sequence that is stripped out of the
string but means “no alignment.” Something like:
String s = “"“\-
Leave me just the way
you found me”””
Obviously there is room to argue over the specific escape sequence, so let’s
put this in the “open questions” bucket.
There was another proposal, which was to use a prefix character:
String s = a”…” // opt into alignment
String s = r”…” // raw string
I’d like to put this one to bed quickly, because I see it as having a number of
issues.
Having a set of prefix characters is one of those features that starts off weak
and scales badly from there :). With only two prefixes, as suggested above, it
has a feel of overgeneralization, but with a large number of candidate
prefixes, it gets worse, because invariably as such a feature gets more
complicated, there are interactions. One need look only at a Perl regex that
uses multiple modifiers:
/foo*/egimosx
to realize that what started as a simple feature (I think initially just `g`)
had grown out of control.
More importantly, of the two prefixes suggested, one doesn’t really make sense.
And that is: while the notion of “raw” string is attractive, one of the things
that tripped us up the first time around is the believe that “raw” is a binary
thing. In reality, raw-ness comes in degrees — how hard you have to work to
break out of the “string of uninterpreted characters” mode. (Note: please
let’s not start a discussion on raw strings; we’re wrapping up our orders for
the first courses now. I raise this only to put to bed a syntax choice
predicated on the assumption that raw-ness is a binary characteristic.).
If we’re pursuing align-by-default, we should consider a different name for the
align() method; the name was originally chosen as a compromise when there was
no align-by-default, and most of the other names were too long to ask people to
type routinely. If alignment is the default, the explicit name can be more
descriptive.
So, next steps:
- Jim to write up current details of alignment algorithm, with current open
issues;
- Remaining bike sheds on opt-out and naming of align()
Once 1/1a are in the pipe, we can consider whether we want to move ahead to raw
strings.