[PATCH] repair recent, ill-conceived man page changes

2023-10-11 Thread G. Branden Robinson
Hi Chet,

Please consider reverting the following recent changes to the bash man
page.  Bjarni should have run them by the groff list first, because some
of them are ill-considered.

Bjarni,

I regret that I must repeat myself.[1]

Please do not offer yourself as an authority on correct man page
composition when you don't know what you're talking about and have filed
bug reports complaining of that fact.

https://savannah.gnu.org/bugs/?64238

You got some things right here, and some wrong; please stick to those
you got right.

diff --git a/doc/bash.1 b/doc/bash.1
index 0f23d480..24170efe 100644
--- a/doc/bash.1
+++ b/doc/bash.1
@@ -5,14 +5,19 @@
 .\"Case Western Reserve University
 .\"chet.ra...@case.edu
 .\"
-.\"Last Change: Wed Sep 13 15:39:24 EDT 2023
+.\"Last Change: Fri Oct  6 16:41:20 EDT 2023
 .\"
+.\" suggested by Bjarni Ingi Gislason 
+.if n \{\
+.kern 0
+.ss 12 0
+.\}

The above change is half pointless and half intrusive.

A) No formatter for terminal output devices ("nroff mode", which is
   tested by "if n" performs kerning.  So that's a no-op.

B) The amount of intersentence spacing, for man pages, is matter of the
   _reader's_ taste and should be left to them.  mandoc(1) ignores this
   request and I'm glad it does.  So that, too, is a no-op with that
   formatter.

@@ -511,8 +516,13 @@ .SH "RESERVED WORDS"
 .if t .RS
 .PP
 .B
-.if n ! case  coproc  do done elif else esac fi for function if in select then 
until while { } time [[ ]]
-.if t !casecoprocdodoneelifelseesacfifor   
 functionifinselectthenuntilwhile{}time
[[]]
+.if n ! case  coproc  do done elif else esac fi for function if in select \
+then until while { } time [[ ]]
+.if t \{\
+.lg 0
+ !casecoprocdodoneelifelseesacfifor
functionifinselectthenuntilwhile{}time
[[]]
+.lg 1
+.\}

This change is pointless because no ligatures are defined for any of the
letter pairs in the text in any known formatter (the ligature for "ct",
like that for "st" [not seen here] is archaic in English typography and
seldom seen in digital fonts).  Moreover, this request sequence clobbers
the user's selected ligature mode, which might have been 2 prior to the
".lg 0" control line.  This hunk should be reverted.

@@ -11629,7 +11638,7 @@ .SH "SHELL COMPATIBILITY MODE"
 .BR compat41 ,
 and so on).
 There is only one current
-compatibility level -- each option is mutually exclusive.
+compatibility level \(en each option is mutually exclusive.
 The compatibility level is intended to allow users to select behavior
 from previous versions that is incompatible with newer versions
 while they migrate scripts to use current features and

An en-dash is not the correct glyph: an em-dash is.  As it happens, the
"en" special character identifier is less portable than "em" to boot.
See section "History" of groff_char(7).

Authorities differ on whether space should surround em dashes; from what
I have seen, a majority favor omitting them, and that is what I do in
the groff man pages, but I cannot say it is more than a matter of taste.

(A man page's "Name" section is a special case, and the "-", "--",
"\(en", or "\(em" that separates the page's topic(s) from the summary
description _must_ be bracketed by spaces for makewhatis(8) and mandb(8)
to reliably interpret them.)

@@ -11940,7 +11983,7 @@ .SH "SEE ALSO"
 \fIPortable Operating System Interface (POSIX) Part 2: Shell and Utilities\fP, 
IEEE --
 http://pubs.opengroup.org/onlinepubs/9699919799/
 .TP
-http://tiswww.case.edu/\(tichet/bash/POSIX -- a description of posix mode
+http://tiswww.case.edu/\(tichet/bash/POSIX \(en a description of posix mode
 .TP
 \fIsh\fP(1), \fIksh\fP(1), \fIcsh\fP(1)
 .TP

As with the previous \(en -> \(em remark.  Also we can see that another
double-hyphen in the context was carelessly left inconsistent.

I find nothing amiss with the other changes.

Chet, I'm happy to prepare a patch reflecting the above recommendations
(against the "devel" branch at git.sv.gnu.org:/srv/git/bash.git).  Let
me know.

Regards,
Branden

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1036826#33


signature.asc
Description: PGP signature


Re: "here document" causing failures. Implementation is inconsistent and not documented.

2023-10-11 Thread Chet Ramey

On 10/10/23 8:56 PM, Jim McD wrote:

Bug:
Trailing white space after the delimiting tag cause the here document to 
fail with an error like
/./: line : warning: here-document at line 
 delimited by end-of-file (wanted `msg_end')Trailing white space/


This is not a bug.


Trailing white space after the start of the here statement is ignored.


That's how shell tokenization works; see below.

This doesn't appear to be documented anywhere. All the material I have seen 
on here documents I've seen so far never mention the requirement that the 
tag a the end must be free of any trailing white space.


If this text

"This  type  of  redirection  instructs the shell to read input from the
 current source until a line containing only delimiter (with no trailing
 blanks)  is  seen."

in the manual page doesn't convince you then maybe POSIX

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_07_04

will.

_Inconsistency _: If white space after the start of the here statement is 
permitted and ignored then it should be the same with a tag terminating the 
here statement.


Not at all. The delimiter (the word after the `<<' or `<<-') is a shell
word; words are delimited by metacharacters; white space is a
metacharacter. The white space is not part of the word.

The end of the here-document is likewise well-defined.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/




Re: [PATCH] repair recent, ill-conceived man page changes

2023-10-11 Thread Chet Ramey

On 10/11/23 5:08 AM, G. Branden Robinson wrote:

Hi Chet,

Please consider reverting the following recent changes to the bash man
page.  Bjarni should have run them by the groff list first, because some
of them are ill-considered.


OK. I'm trying to understand them myself; please take my comments in that
spirit.



+.\" suggested by Bjarni Ingi Gislason 
+.if n \{\
+.kern 0
+.ss 12 0
+.\}

The above change is half pointless and half intrusive.

A) No formatter for terminal output devices ("nroff mode", which is
tested by "if n" performs kerning.  So that's a no-op.

B) The amount of intersentence spacing, for man pages, is matter of the
_reader's_ taste and should be left to them.  mandoc(1) ignores this
request and I'm glad it does.  So that, too, is a no-op with that
formatter.


Is his intent here to force French spacing instead of English spacing?
How does groff deal with input where the number of spaces after
a period varies? My personal writing style has changed from two spaces to
one over a number of years, and the man page reflects that.


+.if t \{\
+.lg 0
+ !casecoprocdodoneelifelseesacfifor
functionifinselectthenuntilwhile{}time
[[]]
+.lg 1
+.\}

This change is pointless because no ligatures are defined for any of the
letter pairs in the text in any known formatter (the ligature for "ct",
like that for "st" [not seen here] is archaic in English typography and
seldom seen in digital fonts). 


I assume he was interested in what formatters do with the `fi'. I couldn't
see any discernable difference myself.



@@ -11629,7 +11638,7 @@ .SH "SHELL COMPATIBILITY MODE"
  .BR compat41 ,
  and so on).
  There is only one current
-compatibility level -- each option is mutually exclusive.
+compatibility level \(en each option is mutually exclusive.
  The compatibility level is intended to allow users to select behavior
  from previous versions that is incompatible with newer versions
  while they migrate scripts to use current features and

An en-dash is not the correct glyph: an em-dash is.  As it happens, the
"en" special character identifier is less portable than "em" to boot.
See section "History" of groff_char(7).


Thanks for the clarification.



Authorities differ on whether space should surround em dashes; from what
I have seen, a majority favor omitting them, and that is what I do in
the groff man pages, but I cannot say it is more than a matter of taste.


I think it's cleaner with spaces, but it's clearly personal taste.



Chet, I'm happy to prepare a patch reflecting the above recommendations
(against the "devel" branch at git.sv.gnu.org:/srv/git/bash.git).  Let
me know.


Thanks, I'll take care of it.

Chet

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/




Re: [PATCH] repair recent, ill-conceived man page changes

2023-10-11 Thread G. Branden Robinson
Hi Chet,

At 2023-10-11T10:22:44-0400, Chet Ramey wrote:
> On 10/11/23 5:08 AM, G. Branden Robinson wrote:
> > Please consider reverting the following recent changes to the bash
> > man page.  Bjarni should have run them by the groff list first,
> > because some of them are ill-considered.
> 
> OK. I'm trying to understand them myself; please take my comments in
> that spirit.

No worries.  My concern with some of the changes is that they risk
mystifying people who encounter them ("`kern`?  `ss`?  `lg`?  What are
those?") without delivering concrete benefit, typographical or
otherwise.  groff_man_style(7), as of groff 1.23.0, attempts to document
all of the *roff syntax a man page author is ever likely to need, and
strives _not_ to introduce any other *roff features or typesetting
concepts.[0]

> > +.\" suggested by Bjarni Ingi Gislason 
> > +.if n \{\
> > +.kern 0
> > +.ss 12 0
> > +.\}
> > 
> > The above change is half pointless and half intrusive.
> > 
> > A) No formatter for terminal output devices ("nroff mode", which is
> > tested by "if n" performs kerning.  So that's a no-op.
> > 
> > B) The amount of intersentence spacing, for man pages, is matter of
> >the _reader's_ taste and should be left to them.  mandoc(1)
> >ignores this request and I'm glad it does.  So that, too, is a
> >no-op with that formatter.
> 
> Is his intent here to force French spacing instead of English spacing?

Yes, if you understand "French spacing" to mean "the space between
sentences is the same as the space between words".  Frustratingly,
"French spacing" has multiple incompatible meanings.[1]

> How does groff deal with input where the number of spaces after a
> period varies?

roff(7) and the groff Texinfo manual cover this--clearly, I hope.  If
not, blame me because the language is mine, and I'll try to improve it.

(groff 1.23.0; UTF-8 follows)

   A roff formatter attempts to detect boundaries between sentences,
   and supplies additional inter‐sentence space between them.  It
   flags certain characters (normally “!”, “?”, and “.”) as
   potentially ending a sentence.  When the formatter encounters one
   of these end‐of‐sentence characters at the end of an input line,
   or one of them is followed by two (unescaped) spaces on the same
   input line, it appends an inter‐word space followed by an inter‐
   sentence space in the output.  The dummy character escape
   sequence \& can be used after an end‐of‐sentence character to
   defeat end‐of‐sentence detection on a per‐instance basis.
   Normally, the occurrence of a visible non‐end‐of‐sentence
   character (as opposed to a space or tab) immediately after an
   end‐of‐sentence character cancels detection of the end of a
   sentence.  However, several characters are treated transparently
   after the occurrence of an end‐of‐sentence character.  That is, a
   roff does not cancel end‐of‐sentence detection when it processes
   them.  This is because such characters are often used as footnote
   markers or to close quotations and parentheticals.  The default
   set is ", ', ), ], *, \[dg], \[dd], \[rq], and \[cq].  The last
   four are examples of special characters, escape sequences whose
   purpose is to obtain glyphs that are not easily typed at the
   keyboard, or which have special meaning to the formatter (like
   \).

That reads a bit better with font style changes, so "man 7 roff" might
be preferable.

> My personal writing style has changed from two spaces to one over a
> number of years, and the man page reflects that.

For _input_, it's a good idea to either break lines at the ends of
sentences, or put two spaces after them.  This is so that the formatter
knows where the ends of the sentences are.  Like TeX, *roff is not smart
to know where the sentence boundary/ies are in "C. A. R. Hoare next came
to the U.S. Linux kernel developers have yet to absorb his lessons."

For output, the amount of inter-sentence space is configurable; that
is what the `ss` request does.[2]  For man pages, I strongly urge all
authors to leave the issue alone so as to respect readers' preferences.
Since authors' will differ, this is the only way to achieve
consistency.[3]

People can get pretty passionate about this, and complain of their
eyeballs being violated when the "wrong" amount of inter-sentence space
is employed in a document they're reading.  Some people bring this
passion even to man page _source_ documents, and the only recourse in
that event is to break input lines at the ends of sentences.  This has
also been Brian Kernighan's advice to troff users since the 1970s.[4]
Linux man-pages maintainer Alejandro Colomar calls this practice
"semantic newlines".  My opinion is that it is a Solomonic solution,
satisfying neither partisan camp, but also has a benefit of reducing the
amount of churn in diffs.  Incremental changes to documentation often
find boundaries at sent

Re: [PATCH] WIP: quote_string_for_globbing: unquoted backslash

2023-10-11 Thread Grisha Levit
On Sat, Oct 7, 2023 at 10:42 AM Chet Ramey  wrote:
>
> On 9/26/23 2:50 AM, Grisha Levit wrote:
> > I'm not confident in what the right behavior is here, and maybe there is
> > no obvious one, but I _think_ this is not desirable:
> >
> > If an unquoted backslash is followed by a quoted globbing character,
> > quote_string_for_globbing will store the unquoted backslash and then also
> > another one to quote the character -- resulting in the originally quoted
> > character becoming unquoted:
> >
> > $ bash -cx '[[ \\FOO == $1"*" ]]' _ '\'; echo $?
> > + [[ \FOO == \\* ]]
> > 0
>
> It's hard to say. There's only one way to quote a character for globbing:
> using a backslash. The expansion of $1 isn't quoted, so there's no reason
> to quote that backslash, but the `*' is, so you have to quote it somehow,
> and the only way to do that is with a backslash.
>
> This behavior is compatible with ksh93, at least, so you can conclude
> that both implementations see it the same way.

I originally thought bash was _not_ ksh93 compatible because:

$ mkdir '\X'
$ B=\\ bash  -c 'echo $B\*'
\X
$ B=\\ ksh93 -c 'echo $B\*'
\*

But I suspect that's a ksh93 bug, because its pattern matching in `case'
or `[[' commands works the same as in bash.

In any case, after some more testing I see that there's quite a lot of
variability between shells here so no need to change anything in bash.

   glob case
   $B* $B\* $B* $B\*
bash   N   YN   Y
dash   N   YN   Y
yash   N   YN   Y
ksh93  N   NN   Y
mksh   Y   NY   N
oksh   Y   NY   N
posh   Y   NY   N
zshY   NY   N



Re: [PATCH] repair recent, ill-conceived man page changes

2023-10-11 Thread Chet Ramey

On 10/11/23 12:54 PM, G. Branden Robinson wrote:


[5] https://www.dourish.com/goodies/see-figure-1.html


A true classic.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/




Re: bash: bash.1: Some remarks and editorial fixes for the man page

2023-10-11 Thread Bjarni Ingi Gislason
On Fri, Oct 06, 2023 at 04:44:08PM -0400, Chet Ramey wrote:
> On 9/24/23 8:32 AM, Bjarni Ingi Gislason wrote:
> > Package: bash
> > Version: 5.2.15-2+b5
> > Severity: minor
> > Tags: patch
> > 
> > Dear Maintainer,
> > 
> > here are some notes and an editorial changes for the manual.
> 
> Thanks for taking the time to do this.
> 
> I am not sure where you got this version of the manual page (Debian?); I
> don't use nroff to produce the ASCII output, and you've sent patches for
> several blocks of text that don't appear in bash.1 as distributed.
> 
  The man page is from the Debian testing version as shown in the
header and trailing part of the report (thus using "reportbug" script
from Debian).  This is the same as from
"git.savannah.gnu.org/git/bash.git/doc/bash.1" except Debian has added
some changes in the text.

>From my file "git/bash/.git/config"

[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
[remote "origin"]
url = https://git.savannah.gnu.org/git/bash.git
fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
remote = origin
merge = refs/heads/master