Hi, Alex! At 2021-10-17T21:33:24+0200, Alejandro Colomar wrote: > Break URIs before a single slash, not after. > > I found no GNU-specific (or any other at all) source that recommends > breaking long URIs after a slash. So follow Chicago Style and > break them before single slashes.
As far as I'm aware there is no such source. Thus does it fall to me to blaze a trail. I admit that it had not occurred to me until recently why breaking after slashes is better than breaking before them. 1. A slash is not confusable as end-of-sentence punctuation as a dot is. In fact, it signals sentence continuation even if the URI context is missed or forgotten. 2. URIs can validly, and in fact commonly do, end with single slashes. 2a. Corollary: Inserting a break before slashes therefore invites the formatter to break a URI such that a single slash is set on the next line, or, if you don't have window/orphan control, on a subsequent column or page. 2b. Corollary: Multiple trailing slashes at the end of a URI, when valid (this is rare) are vanishingly uncommon. Therefore, breaking before slashes buys you at most one character cell of room on a line that must be broken (modulus any trailing punctuation, but that is under user control in the source document). Moreover, in that very case, the lone trailing slash on the next output line is at risk of creating confusion or being mistaken as an error. But in fact, trailing slashes on URIs are semantically significant[1], and a reader who is confident that didn't overlook the trailing slash on the next (line, column, page) when they copy-and-paste such a URI is at risk of retrieving the wrong resource. 3. One might concede the above and still say that it's worth meeting Chicago (more than halfway) by applying their breaking rule to every slash in URI _except_ the last. But having a different breaking rule for a trailing slash (or group of slashes) in a URI is more tedious to remember and possibly implement. The sed expressions you crafted are pretty simple, and are made no more complex by shifting the location of the break point; that's an advantage worth preserving. I've written the following new material for the groff_man_style(7) page. [[ URIs can be lengthy; rendering them can result in jarring adjust‐ ment or variations in line length, or troff warnings when a hy‐ perlink is longer than an output line. The application of non- printing break point escape sequences \: after each slash (or se‐ ries thereof), and before each dot (or series thereof) is recom‐ mended. The former practice avoids forcing a trailing slash in a URI onto a separate output line, and the latter helps the reader to avoid mistakenly interpreting dot(s) at the end of a line as periods or ellipses. Thus, .UR http://\:example\:.com/\:fb8afcfbaebc74e\:.cc has several potential break points in the URI shown. The \: es‐ cape sequences are ignored when supplied to device control com‐ mands for embedding in hyperlink-aware output drivers. ]] Before I land it, I need to do some homework regarding the portability of the \: escape, so that I can make honest disclosures in the requisite addition to the "Portability" subsection of this page. I guess I have another pin for my Russell Harper voodoo doll now.[2][3] Please let me know if you find any inconsistencies in our URI breaking practices in the groff man pages. I inferred that you said the existing style was consistent, but I'm not sure and it could have been wishful reading on my part. :) Regards, Branden [1] https://stackoverflow.com/questions/5948659/when-should-i-use-a-trailing-slash-in-my-url/ [2] https://www.linkedin.com/in/russell-harper-70394718 [3] https://web.archive.org/web/20171107164742/http://www.heracliteanriver.com/?p=324
signature.asc
Description: PGP signature