Re: [PATCH] *.man: Break URIs at points specified by the Chicago Style

G. Branden Robinson Sun, 17 Oct 2021 22:35:04 -0700

Hi, Alex!

At 2021-10-17T21:33:24+0200, Alejandro Colomar wrote:
> Break URIs before a single slash, not after.
> 
> I found no GNU-specific (or any other at all) source that recommends
> breaking long URIs after a slash.  So follow Chicago Style and
> break them before single slashes.


As far as I'm aware there is no such source.

Thus does it fall to me to blaze a trail.

I admit that it had not occurred to me until recently why breaking after
slashes is better than breaking before them.

1. A slash is not confusable as end-of-sentence punctuation as a dot is.
   In fact, it signals sentence continuation even if the URI context is
   missed or forgotten.

2. URIs can validly, and in fact commonly do, end with single slashes.

2a. Corollary: Inserting a break before slashes therefore invites the
    formatter to break a URI such that a single slash is set on the next
    line, or, if you don't have window/orphan control, on a subsequent
    column or page.

2b. Corollary: Multiple trailing slashes at the end of a URI, when valid
    (this is rare) are vanishingly uncommon.  Therefore, breaking before
    slashes buys you at most one character cell of room on a line that
    must be broken (modulus any trailing punctuation, but that is under
    user control in the source document).  Moreover, in that very case,
    the lone trailing slash on the next output line is at risk of
    creating confusion or being mistaken as an error.  But in fact,
    trailing slashes on URIs are semantically significant[1], and a
    reader who is confident that didn't overlook the trailing slash on
    the next (line, column, page) when they copy-and-paste such a URI is
    at risk of retrieving the wrong resource.

3. One might concede the above and still say that it's worth meeting
   Chicago (more than halfway) by applying their breaking rule to every
   slash in URI _except_ the last.  But having a different breaking rule
   for a trailing slash (or group of slashes) in a URI is more tedious
   to remember and possibly implement.  The sed expressions you crafted
   are pretty simple, and are made no more complex by shifting the
   location of the break point; that's an advantage worth preserving.

I've written the following new material for the groff_man_style(7) page.

[[
       URIs can be lengthy; rendering them can result in jarring adjust‐
       ment  or  variations in line length, or troff warnings when a hy‐
       perlink is longer than an output line.  The application  of  non-
       printing break point escape sequences \: after each slash (or se‐
       ries  thereof), and before each dot (or series thereof) is recom‐
       mended.  The former practice avoids forcing a trailing slash in a
       URI onto a separate output line, and the latter helps the  reader
       to  avoid  mistakenly interpreting dot(s) at the end of a line as
       periods or ellipses.  Thus,
              .UR http://\:example\:.com/\:fb8afcfbaebc74e\:.cc
       has several potential break points in the URI shown.  The \:  es‐
       cape  sequences  are ignored when supplied to device control com‐
       mands for embedding in hyperlink-aware output drivers.
]]

Before I land it, I need to do some homework regarding the portability
of the \: escape, so that I can make honest disclosures in the requisite
addition to the "Portability" subsection of this page.

I guess I have another pin for my Russell Harper voodoo doll now.[2][3]

Please let me know if you find any inconsistencies in our URI breaking
practices in the groff man pages.  I inferred that you said the existing
style was consistent, but I'm not sure and it could have been wishful
reading on my part.  :)

Regards,
Branden

[1] 
https://stackoverflow.com/questions/5948659/when-should-i-use-a-trailing-slash-in-my-url/
[2] https://www.linkedin.com/in/russell-harper-70394718
[3] 
https://web.archive.org/web/20171107164742/http://www.heracliteanriver.com/?p=324

signature.asc
Description: PGP signature

Re: [PATCH] *.man: Break URIs at points specified by the Chicago Style

Reply via email to