Re: Tilde (~) in bash(1) is typeset incorrectly as Unicode character

2023-07-29 Thread Steffen Nurpmeso
Steffen Nurpmeso wrote in
 <20230729002703.lasps%stef...@sdaoden.eu>:
 |Chet Ramey wrote in
 | <2fd2ed52-3272-3433-6179-164bc5122...@case.edu>:
 |  ...
 ||> At 2023-07-26T10:47:05+0200, Thomas ten Cate wrote:
 ||>> In the bash manual page (`man bash`), the ASCII tilde character '~'
 ||>> (0x7e) is replaced by the Unicode character '˜' (U+02DC SMALL TILDE):
 ||>>
 ||>>  $ man bash | grep 'additional binary operator'
 ||>>An additional binary operator, =˜, is available,
 ||>>
 ||>> The same happens for the use of ~ as a shorthand for the home
 ||>> directory. This makes the manual page incorrect, and difficult to
 ||>> search.
 | ...
 ||>> I don't know the first thing about groff, but `man groff_char`
 ||>> suggests that ~ is indeed rendered as "modifier tilde", and that one
 ||>> should write \(ti to obtain an actual tilde character.
 ...
 |If i grep the manuals in the BSD git repo then they would benefit
 |from that decision; whereas ~ in paths is not often used, (ti is
 |never, unless i have overseen it.  \(ti / \[ti] for ASCII tilde in
 |UNIX manuals, code blocks, formulas etc is just sick.  And then

Having said that it seems that Linux man-pages and man-pages-posix
actively switched towards \[ti] in 2020 according to its
Changes.old, saying

  Various pages
  Michael Kerrisk  [Geoff Clare]
  Use "\(ti" instead of "~"
  A naked tilde ("~") renders poorly in PDF. Instead use "\(ti",
  which renders better in a PDF, and produces the same glyph
  when rendering on a terminal.

resulting in 53, and the latter uses \(ti, but lots of that is awk
operator stuff etc.  One thread member is credited for that
release btw.

Well i personally continue to find this sad given that UNIX
manuals started being written in the 70s differently, making it
over 45 years, something around that.  And i find the reasoning
mysterious even given that you can always create a different
mapping in case you really want it, no?, so why not simply do that
for PDF instead (.ie 'pdf'\*[.T]')?  No.

Thank you all for your patience, and a nice weekend i wish.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: Tilde (~) in bash(1) is typeset incorrectly as Unicode character

2023-07-29 Thread Steffen Nurpmeso
Chet Ramey wrote in
 <2fd2ed52-3272-3433-6179-164bc5122...@case.edu>:
  ...
 |> At 2023-07-26T10:47:05+0200, Thomas ten Cate wrote:
 |>> In the bash manual page (`man bash`), the ASCII tilde character '~'
 |>> (0x7e) is replaced by the Unicode character '˜' (U+02DC SMALL TILDE):
 |>>
 |>>  $ man bash | grep 'additional binary operator'
 |>>An additional binary operator, =˜, is available,
 |>>
 |>> The same happens for the use of ~ as a shorthand for the home
 |>> directory. This makes the manual page incorrect, and difficult to
 |>> search.
 ...
 |>> I don't know the first thing about groff, but `man groff_char`
 |>> suggests that ~ is indeed rendered as "modifier tilde", and that one
 |>> should write \(ti to obtain an actual tilde character.

Because i always have to give some remarks, this design decision
of James Clark of groff (~ is for accent) i personally always
found terrible.  In the past i suggested to at least change the
mdoc(7) manual macros so that during arguments for .Pa (path) (and
similar, like code blocks etc) a tilde is indeed ASCII tilde, and
nothing else.  Unfortunately that was not followed.

If i grep the manuals in the BSD git repo then they would benefit
from that decision; whereas ~ in paths is not often used, (ti is
never, unless i have overseen it.  \(ti / \[ti] for ASCII tilde in
UNIX manuals, code blocks, formulas etc is just sick.  And then
the world moved to UTF-8 long ago; i personally have never made
use of such crux in neither TeX nor roff, if all else fails you
can map something for a specific document.

Quite honestly, in NetBSD, only mdocml and groff use \(ti/\[ti],
In FreeBSD, only (external, new thing) bc(1) / dc(1), as well as
nvi and mandoc (mdocml), and less for its command line option.
On OpenBSD, mandoc plus

  origin/master:lib/libc/gen/ispunct.3:.Dl 
!\(dq#$%&\(aq()*+,\-./:;<=>?@[\e]\(ha_\(ga{|}\(ti
  origin/master:lib/libcrypto/man/ASN1_BIT_STRING_set.3:.D1 Po Fa bitstr No & 
Pf \(ti Fa goodbits Pc == 0

Plain tilde is the dead king, long live the king.
Thank you,

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: Tilde (~) in bash(1) is typeset incorrectly as Unicode character

2023-07-29 Thread Chet Ramey

On 7/28/23 3:28 PM, G. Branden Robinson wrote:

Hi Chet,

At 2023-07-28T15:15:48-0400, Chet Ramey wrote:

Applying the patch without any other changes to bash.1 results in

$ groff -Tascii -P -c -I/usr/local/src/bash/bash-20230728/doc -man
/usr/local/src/bash/bash-20230728/doc/bash.1 > bash.0
troff: /usr/local/src/bash/bash-20230728/doc/bash.1:26: warning: numeric
expression expected (got a special character)

Where line 26 is the

.if \n(.g:(\(.f=0) \{\

test. This is macOS running groff-1.22.4 from MacPorts.


Sorry about that.  I fat-fingered it.

An 'n' is needed after the second backslash, because we're interpolating
a register value.


Thanks. I probably could have figured it out, but I figured I'd go to the
expert.

That eliminates the warning, but unfortunately produces output that looks
like this

   ~.nr need_eo_h 1 NAME
   bash - GNU Bourne-Again SHell

   ~.nr need_eo_h 1 SYNOPSIS
   bash [options] [command_string | file]

So something weird is happening with .SH. It doesn't matter whether I set
it to 0 or 1 in bash.1. I'll just hold off on these definitions for now.

Chet

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/




Re: Tilde (~) in bash(1) is typeset incorrectly as Unicode character

2023-07-29 Thread G. Branden Robinson
Hi Chet,

At 2023-07-29T13:16:55-0400, Chet Ramey wrote:
> On 7/28/23 3:28 PM, G. Branden Robinson wrote:
> > Sorry about that.  I fat-fingered it.
> > 
> > An 'n' is needed after the second backslash, because we're interpolating
> > a register value.
> 
> Thanks. I probably could have figured it out, but I figured I'd go to
> the expert.

The expert has let you down twice now.  :(

> That eliminates the warning, but unfortunately produces output that
> looks like this
> 
>~.nr need_eo_h 1 NAME
>bash - GNU Bourne-Again SHell
> 
>~.nr need_eo_h 1 SYNOPSIS
>bash [options] [command_string | file]
> 
> So something weird is happening with .SH. It doesn't matter whether I
> set it to 0 or 1 in bash.1. I'll just hold off on these definitions
> for now.

That _is_ weird.  I'll set aside some time to do this properly with a
dump of the page before and after to ensure no undesired changes.

This _looked_ like it should be trivial.  I should have known better.

Regards,
Branden


signature.asc
Description: PGP signature


Re: comments inside command subst are handled inconsistently

2023-07-29 Thread Chet Ramey

On 7/27/23 4:31 AM, Denys Vlasenko wrote:

Try these two commands:

$ echo "Date: `date #comment`"
Date: Thu Jul 27 10:28:13 CEST 2023

$ echo "Date: $(date #comment)"

)"

Date: Thu Jul 27 10:27:58 CEST 2023


As you see, #comment is handled differently in `` and $().


Yes. There's a hint in the POSIX spec as to why.

POSIX says you can parse `` command substitution lexically, by scanning for
the ending ` by skipping over other constructs and appropriately backslash-
quoting nested command substitutions. It's possible to do this because the
backquote doesn't have any other semantic meaning to the parser. Of course,
everything else the lexer may encounter while finding the closing backquote
is unspecified and implementation defined.

The $(...) form, on the other hand, is required to accept "any valid shell
script," and since the parens have other semantic meaning (and may not need
to be balanced), as a practical matter this means you need to recursively
invoke the parser in order to locate the closing right paren. So what you
do is read the "$(", run the parser recursively to parse a command, and
make sure the next token you read is a ")".

You can see where this goes. Once you commit to running the parser to find
the closing right paren, all normal lexing rules apply. Comments discard
characters until a newline.

Everyone does it this way, including pre-bash-5.2, which used ad-hoc
parsing to locate the closing right paren.

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/




Re: comments inside command subst are handled inconsistently

2023-07-29 Thread Chet Ramey

On 7/28/23 1:51 PM, Martin D Kealey wrote:


On the other hand, since everyone has now had 36+ years to update their
scripts to get rid of backticks, maybe it's time to start issuing a warning
when they're used at all? 🤪


There's no reason to use `` over $(...), but that form is still a required
POSIX expansion.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/