Re: RFR: Multi-line String Literal (Preview) JEP [EG Draft]

Dan Smith Tue, 14 May 2019 16:19:08 -0700

> On May 13, 2019, at 8:05 AM, Jim Laskey <[email protected]> wrote:
> 
> After some significant tweaks, reopening the JEP for review. 
> https://bugs.openjdk.java.net/browse/JDK-8222530 
> <https://bugs.openjdk.java.net/browse/JDK-8222530>
Something really clicks for me in calling these "text blocks". The delimiter 
syntax and conventions for line breaks/whitespace, which seemed somewhat 
arbitrary before, feel right. Nice psychological trick.


Let me weigh in with some design feedback, in a refined form of some comments I 
made in a previous thread:

Finding the right indentation trimming algorithm has been a struggle. We've 
come up with something, but it sure seems complex, and I'll bet most 
programmers will never fully internalize it. The struggle arises primarily 
because the feature has an ambitious goal of getting it "right" for a wide 
variety of indentation conventions, and also because the feature is constrained 
to be a post-processing step, independent of program context. I suggest 
rethinking both of those requirements.

Instead, the language should be strongly opinionated about how text blocks 
should be indented, and should take the enclosing context into account. 
Specifically, the opening """ delimiter should mark the left margin of the text 
block, and it should be a compiler error to put content to the left of that 
margin. This results in a really simple, readable approach to indenting: the 
delimiter marks the rectangle.

Detailed rules:
- The *prefix* of a text block is the program text after the immediately 
preceding \n or \r, up to the opening """, with every non-whitespace character 
replaced with a space (\u0020).
- The form of a text block is """ <whitespace>* ( <newline> <prefix> <content>* 
)+ """ (that is, opening delimiter, ignored whitespace, then one or more lines 
of content, each prefixed by a newline and the *prefix*; all prefixes must be 
identical).
- The string denoted by a text block is its <content>* strings after escape 
processing, concatenated together with '\n'.

Most of the examples in the JEP follow these rules as a convention already. The 
concatenation examples would benefit from following it.

Discussion:

What if I want to shift my content left? Just put a line break before the 
opening delimiter, and align it wherever you want to set your left margin. (If 
you don't want to strip anything, put the opening delimiter in column 0.) 
You're n-line text block now takes n+1 lines—nbd.

What if I want to shift my content right, beyond the delimiters? Don't do that. 
That's not how text blocks work. (I mean, you can do it, but your extra 
whitespace will be included in the denoted string.)

What about tabs? Tabs that come before the opening delimiter are recognized, 
and all prefixes must use the same pattern of tabs/spaces/[other exotic 
whitespace]. What if you want to have program text on the same line as the 
opening delimiter, but then want to use tabs underneath?:

\t  \t  System.out.println("""
\t  \t  \t  \t  \t  \t  \t  Hello world!
\t  \t  \t  \t  \t  \t  \t  """);

Well, then you're doing tabs wrong—different tab widths will make "Hello 
world!" appear to the left or right of the delimiters. So this is an error. 
Either use spaces after the first two tabs, or put the opening delimiter on a 
new line.

What about variable-width fonts? If you expect your code to be read in a 
variable-width font, by convention you should start all text blocks on a 
(possibly-indented) blank line.

What about Unicode escapes? It's an orthogonal question, but I think it's fine 
to continue pre-processing all Unicode escapes. If you want obfuscate prefixes 
and line breaks using \u0020 and \u000a, go for it.

Re: RFR: Multi-line String Literal (Preview) JEP [EG Draft]

Reply via email to