Re: bash --pretty-print and pattern

2025-02-21 Thread Timotei Campian
great hint! many thanks

On Thu, 20 Feb 2025 at 14:26, Koichi Murase  wrote:

> 2025年2月20日(木) 20:51 Timotei Campian :
> > echo !(file.f*)
> >
> >  *bash --pretty-print test.sh*
>
> If this script file "test.sh" will be used as an independent
> executable file, to make it work, you need to put "shopt -s extglob"
> at the beginning of the file as Greg explained in the other reply. If
> the script file is supposed to be sourced by another file where
> "extglob" is enabled, you don't have to set "shopt -s extglob" again
> at the beginning of the file.
>
> However, even if you make sure that "extglob" is enabled when the
> script file is parsed in real situations as described above,
> "--pretty-print" still doesn't work because it doesn't execute the
> file at all. To parse and print a file with the actual set of shell
> options that the file is supposed to be parsed, you need to set them
> in the command-line options of Bash. In the present case, you can run
> it in the following way:
>
> $ bash --pretty-print -O extglob test.sh
>
> --
> Koichi
>


Re: [sr #111166] ngettext syntax

2025-02-21 Thread Chet Ramey

On 2/21/25 1:37 PM, Phi Debian wrote:





Given the following, which POSIX says is unspecified:

printf '%s %3$s %s\n' A B C D

ksh93-u+m prints "A D", which is just wrong. No matter how you mix numbered
and unnumbered specifications, or whether you implement numbered
specifications at all, you can't just drop it.


Well mine gives


$ echo $KSH_VERSION
Version AJM 93u+m/1.1.0-alpha+b5087279 2025-01-15

$ printf '%s %3$s %s\n' A B C D
A C B
D


I got that behavior with Version AJM 93u+m/1.1.0-alpha 2022-07-31.

I just updated my version using macports and I get the same behavior as
you with Version AJM 93u+m/1.0.10 2024-08-01. So it looks like Martijn
was working on it.



Which is just right I guess unless I am misled somewhere.

Yet mixing numbered and unumbered specification is questionable as you 
said, but so is the re-use of format string when reaching excess args... we 
are in the invent area here :-)


No, reusing the format string is required by POSIX. The only question is
the conditions under which you do that.


The more C like could be (fully numbered) latest ksh93 gives
$ printf '%3$s %2$s %1$s\n' A B C D
C B A
   D
Still the D is the args fmt re-use, but output is similar to C (modulo D)


You're not going to be able to give up format re-use. Everyone seems to
agree on the behavior when all of the format conversions are numbered.



and bash 5.3.0(1)-beta gives
$ printf '%3$s %2$s %1$s\n' A B C D
bash: printf: `$': invalid format character

Which indeed is not an invalid format or should I say is a plausible format 
specification.


Of course it does. I haven't implemented any support yet.


 > I like also to have %d (and all integer format) accept a valid integer
 > expression as argument (printf '%d' 1+1 ; i=1 ; printf '%d' i+1 etc...)
 > again a very low prio, but it is clean, I know that a simple
$((...)) can
 > do this, but it is 5 char of line cluttering for each % integer
format, for
 > not that much clarity.

It decreases clarity. If you have an argument '1+1', your proposal
makes it depend on the conversion specifier. Right now, there's no
ambiguity: 1+1 is a string, and $(( 1+1 )) is 2, regardless of whether
or not the conversion specifier accepts an integer argument.


For me clarity means that %d expect an integer expression value (ala C even 
though, there will be voice saying BASH is not C :-) so %d arg fetch must 
fetch not a string but a string that is an integer expression really 
removing $((...))


It's not an expression context. That's why $((...)) is available.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [sr #111166] ngettext syntax

2025-02-21 Thread Phi Debian
On Fri, Feb 21, 2025 at 3:08 PM Chet Ramey  wrote:

> On 2/20/25 11:44 PM, Phi Debian wrote:
> >
> >
> > On Thu, Feb 20, 2025 at 11:41 PM Chet Ramey  > > wrote:
> >
> > A response to the question about printf supporting %n$ conversion
> > specifications I posted to savannah.
> >
> >
> > Thanx @chet, I didn't knew this thread. The thread say ksh93 is buggy,
> > that's not what I observe with latest patches, but at time of the thread
> it
> > was probably buggy.
>
> Yesterday?
>


>
> Given the following, which POSIX says is unspecified:
>
> printf '%s %3$s %s\n' A B C D
>
> ksh93-u+m prints "A D", which is just wrong. No matter how you mix numbered
> and unnumbered specifications, or whether you implement numbered
> specifications at all, you can't just drop it.
>

Well mine gives


$ echo $KSH_VERSION
Version AJM 93u+m/1.1.0-alpha+b5087279 2025-01-15

$ printf '%s %3$s %s\n' A B C D
A C B
D

Which is just right I guess unless I am misled somewhere.

Yet mixing numbered and unumbered specification is questionable as you
said, but so is the re-use of format string when reaching excess args... we
are in the invent area here :-)

The more C like could be (fully numbered) latest ksh93 gives
$ printf '%3$s %2$s %1$s\n' A B C D
C B A
  D
Still the D is the args fmt re-use, but output is similar to C (modulo D)

and bash 5.3.0(1)-beta gives
$ printf '%3$s %2$s %1$s\n' A B C D
bash: printf: `$': invalid format character

Which indeed is not an invalid format or should I say is a plausible format
specification.




> > Yet it doesn't prevent an attempt to make it better, low priori I rekon,
> > but if someone want to cut a patch there is no reason to veto it. Old
> > script will still work, and new script could benefit it.
>
> It will happen, but not in bash-5.3.
>

I like that :-)


> >
> > I like also to have %d (and all integer format) accept a valid integer
> > expression as argument (printf '%d' 1+1 ; i=1 ; printf '%d' i+1 etc...)
> > again a very low prio, but it is clean, I know that a simple $((...))
> can
> > do this, but it is 5 char of line cluttering for each % integer format,
> for
> > not that much clarity.
>
> It decreases clarity. If you have an argument '1+1', your proposal
> makes it depend on the conversion specifier. Right now, there's no
> ambiguity: 1+1 is a string, and $(( 1+1 )) is 2, regardless of whether
> or not the conversion specifier accepts an integer argument.
>

For me clarity means that %d expect an integer expression value (ala C even
though, there will be voice saying BASH is not C :-) so %d arg fetch must
fetch not a string but a string that is an integer expression really
removing $((...))

At the moment %d fetch a string, applying an internal sh_eval(string)
should be doable... but if that doesn't happen, no big deal... :-)

So the ambiguity you are talking about is the problamatic of scanning the
format string, and recognise the d, i, o, u, x, or X conversion char and
accept the input args is a integer expresion 'string' :-) I would say it is
an internal ambiguity, but at the user level it is not insane to think the
string could be an integer expresion, at the moment is must be string that
is a valid integer... why limiting the check to integer vs integer
expression ?

Cheers,


Re: [sr #111166] ngettext syntax

2025-02-21 Thread Chet Ramey

On 2/20/25 11:44 PM, Phi Debian wrote:



On Thu, Feb 20, 2025 at 11:41 PM Chet Ramey > wrote:


A response to the question about printf supporting %n$ conversion
specifications I posted to savannah.


Thanx @chet, I didn't knew this thread. The thread say ksh93 is buggy, 
that's not what I observe with latest patches, but at time of the thread it 
was probably buggy.


Yesterday?

Given the following, which POSIX says is unspecified:

printf '%s %3$s %s\n' A B C D

ksh93-u+m prints "A D", which is just wrong. No matter how you mix numbered
and unnumbered specifications, or whether you implement numbered
specifications at all, you can't just drop it.

Yet it doesn't prevent an attempt to make it better, low priori I rekon, 
but if someone want to cut a patch there is no reason to veto it. Old 
script will still work, and new script could benefit it.


It will happen, but not in bash-5.3.



I like also to have %d (and all integer format) accept a valid integer 
expression as argument (printf '%d' 1+1 ; i=1 ; printf '%d' i+1 etc...) 
again a very low prio, but it is clean, I know that a simple $((...)) can 
do this, but it is 5 char of line cluttering for each % integer format, for 
not that much clarity.


It decreases clarity. If you have an argument '1+1', your proposal
makes it depend on the conversion specifier. Right now, there's no
ambiguity: 1+1 is a string, and $(( 1+1 )) is 2, regardless of whether
or not the conversion specifier accepts an integer argument.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: bash --pretty-print and pattern

2025-02-21 Thread Chet Ramey

On 2/20/25 11:04 AM, Koichi Murase wrote:


Thank you. I didn't know this behavior. Is that documented? I tried to
find it in the description of `--pretty-print', but I realized that
the --pretty-print option itself is undocumented. 


It's not. It's just a novelty.


--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [sr #111166] ngettext syntax

2025-02-21 Thread Robert Elz
Date:Fri, 21 Feb 2025 09:08:13 -0500
From:Chet Ramey 
Message-ID:  <59a1d1d0-b6eb-4652-9e77-1fc4c5992...@case.edu>

  | Given the following, which POSIX says is unspecified:
  |
  | printf '%s %3$s %s\n' A B C D
  |
  | ksh93-u+m prints "A D", which is just wrong.

It actually prints "A D \n" (the \n is obvious, and unimportant, but
the extra space makes a big difference).

  | No matter how you mix numbered
  | and unnumbered specifications, or whether you implement numbered
  | specifications at all, you can't just drop it.

It isn't, and while bizarre indeed, that's a defensible operation.

As you say, this is all unspecified in POSIX, which means anything
is acceptable - here it looks as of the "%3$" is counting the args
after args used already for unnumbered conversions have already been
removed, since the A has already been consumed, the remaining args are
B C D, so the third is D.   So %3$s prints the D, which is followed by
the space which is next in the format string.   The following unnumbered
conversion then takes the arg which follows the last that was used,
which doesn't exist here, meaning the final %s prints "".

Add one more arg, making the command become

printf '%s %3$s %s\n' A B C D E

and the output is "A D E\n" just as predicted.

All perfectly consistent and even rational - though certainly not the
way I would do it.

I abandoned my plans on implementing the numbered conversions when POSIX
insisted that even in the presence of numbered arg conversions, if all
the args aren't consumed by the format, the format string needs to be
repeated, the same as is done when there are no numbered conversions.

Despite most implementations actually working that way, doing so makes
no sense, and makes it much harder to actually use the numbered arg
conversions for their intended purpose -- which isn't just so applications
can supply the args in an order different from what the format string
expects them - it is so the format string can be obtained from a message
catalog, in which different translations (different languages) might easily
need to consume different selections of the args (and in different orders,
which is the point).   Some of the translations might not use the final
args in the list, in which case a POSIX implementation is required to run
the format string again, despite that making no sense at all.

kre




Re: [sr #111166] ngettext syntax

2025-02-21 Thread Chet Ramey

On 2/21/25 10:13 AM, Robert Elz wrote:

 Date:Fri, 21 Feb 2025 09:08:13 -0500
 From:Chet Ramey 
 Message-ID:  <59a1d1d0-b6eb-4652-9e77-1fc4c5992...@case.edu>

   | Given the following, which POSIX says is unspecified:
   |
   | printf '%s %3$s %s\n' A B C D
   |
   | ksh93-u+m prints "A D", which is just wrong.

It actually prints "A D \n" (the \n is obvious, and unimportant, but
the extra space makes a big difference).

   | No matter how you mix numbered
   | and unnumbered specifications, or whether you implement numbered
   | specifications at all, you can't just drop it.

It isn't, and while bizarre indeed, that's a defensible operation.

As you say, this is all unspecified in POSIX, which means anything
is acceptable - here it looks as of the "%3$" is counting the args
after args used already for unnumbered conversions have already been
removed, since the A has already been consumed, the remaining args are
B C D, so the third is D. 


That is bizarre and indefensible. There is no user who would think that
using a numbered conversion specifier is not an absolute position in
the original argument list. ksh93 seems to be alone in its interpretation.


I abandoned my plans on implementing the numbered conversions when POSIX
insisted that even in the presence of numbered arg conversions, if all
the args aren't consumed by the format, the format string needs to be
repeated, the same as is done when there are no numbered conversions.


It's worse -- even if a format string using only numbered conversions
doesn't consume all the arguments, as long as it consumes the last one,
the format string doesn't get reused. It all depends on whatever peculiar
definition of "satisfy" you use.


--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [sr #111166] ngettext syntax

2025-02-21 Thread Robert Elz
Date:Fri, 21 Feb 2025 10:55:56 -0500
From:Chet Ramey 
Message-ID:  

  | That is bizarre and indefensible.

Like I said, it isn't what I'd do, but it is easy to see how an
implementation might end up like that (and ksh93's might be one of
the first of them, which kind of sets the standard).   But if it
has been fixed in a later version (I also have not updated the one
I have in a while) then this really no longer matters.

  | There is no user who would think that using a numbered conversion
  | specifier is not an absolute position in the original argument list.

Probably not, but it doesn't matter if the application doesn't mix
numbered and unnumbered conversions, and it shouldn't.

  | ksh93 seems to be alone in its interpretation.

Apparently was.   But there are plenty of other differences between
the implementations when it comes to mixing numbered and numbered
args, one being what is the sequence of the unnumbered args when
interleaved with numbered args.

That is, in the example

printf '%s %3$s %s\n' A B C D

The updated ksh93 seems to ignore the numbered conversions when counting
the args for the unnumbered ones (so "A C B" (for the first line anyway,
I won't go on about repeating the format string again).

The previous version made an unnumbered arg conversion use the next arg
after the last one used (which would give "A C D" using a more reasonable
interpretation of %3$).

A third possibility is to simply observe that it is the third conversion,
and so should use the third arg, thus giving "A C C".   It might be tempting
to say "except skip any args used by numbered conversions" but that leads
to bizarre special cases, eg: consider

printf '%s %s %s %3$s %s\n' A B C D

and decide what that is supposed to do, and if you conclude something
about that one, apply the same rule to

printf '%s %$4s %s %s %s' A B C D

and see if you like the result.   Try this same test whichever of these
three rules you'd adopt -- or even some other possibility I haven't
considered, another, somewhat weird one, might be to restart the numbering
sequence for unnumbered args after any numbered arg conversion, which would
lead to "A C A" in the original (again, just the first line).

When I was looking at implementing this (I actually had an implementation
that I ended up discarding, as I was never going to to the "rescan the
format" in the numbered conversion case) I tested all the printf
implementations which implemented numbered conversions that I could find,
and they were all amazingly different - it is no surprise at all that
POSIX made mixing the things unspecified (it is also an amazingly useless
thing to support in general, and simply issuing an error and not
proceeding with any further conversions once the first "different" style
of arg reference is encountered is not at all a bad choice).

  | It's worse -- even if a format string using only numbered conversions
  | doesn't consume all the arguments, as long as it consumes the last one,
  | the format string doesn't get reused. It all depends on whatever peculiar
  | definition of "satisfy" you use.

Yes,  which is where the trick of including a conversion "%999$.0s" in the
format string is supposed to stop format string reuse -- as long as there are
no more than 999 args given of course.   This issue has nothing to do with the
mixed numbered/unnumbered problem, which rightly should just be unspecified,
and implementations do whatever they like, reasonable or not (as in the eye
of the coder), but that format string reuse is simply wrong in the numbered
conversion case, no-one really needs that.

kre




Re: [sr #111166] ngettext syntax

2025-02-21 Thread Phi Debian
On Sat, Feb 22, 2025 at 1:54 AM Robert Elz  wrote:

>
>
> That is, in the example
>
> printf '%s %3$s %s\n' A B C D
>
> The updated ksh93 seems to ignore the numbered conversions when counting
> the args for the unnumbered ones (so "A C B" (for the first line anyway,
> I won't go on about repeating the format string again).
>
> The previous version made an unnumbered arg conversion use the next arg
> after the last one used (which would give "A C D" using a more reasonable
> interpretation of %3$).
>

Yes, since numbered unumbered usage was not define (C libc doesn't allow
the mix, doesn't define fmt reuse either) I decided to find a possible way
of doing it.

Previous attempt of placing unumbered after the last numbered one was
leading with serious problem when going into number/unnumbered num.prec. So
previous attempt was trying to do for '%3s %s' the %s is next to the third
arg then would be equivalent to '%3s %4s', this lead to all sort of bug and
crash and then was never used.

The new semantic is simple
- numbered are indexed args access (easy to understand)
- unumbered are counting only the unumbered from the fmt string

This way when you don't mix then the both work as expected.

When you mix them, you need a little head scratch, it is useless, but
predictable and unbreakable.

Since nothing worked before i.e all shell either crashed or refused to run
so complicated construct, then doing this didn't break anything, just makes
numbered working, not breaking unumbered. Now the dare devil can still
mix'n'match, it may not produce what they would like, but at least it don't
crash, and produce result as announced for this semantic.


Again other semantic can be done for mix'n'match...
I would vote for refusing mix'n'match altogeher, yet this is not what I've
done for ksh93 because though bugged it was still defined and one could ask
for fix


>
>
> printf '%s %s %s %3$s %s\n' A B C D
>

$ printf '%s %s %s %3$s %s\n' A B C D
A B C C D

Easy to predict 4 %s take in that order and one number that access the
indexed one, no surprise



>
>
> printf '%s %$4s %s %s %s' A B C D
>

$ printf '%s %$4s %s %s %s' A B C D
A /d1/ksh-latest/arch/linux.i386-64/bin/ksh: printf: $: unknown format
specifier

I guess you meant

$ printf '%s %4$s %s %s %s' A B C D
A D B C D

Can't blame the syntax error :-) I am dislexic too
again no surprise here the %s are counted sequentially and the indexed is
find directly.

The more interesting is
$ printf '%s %6$s %s %s %s' A B C D
A  B C D

The 6th does't exist so get ''

Again simple rule of thumb if you don't mix'n'match they all works as
expected, if you insist on mix'n'match you are entering someting that has
never worked before, and then you got to test do decide how to mix'n'match



>
>   | It's worse -- even if a format string using only numbered conversions
>   | doesn't consume all the arguments, as long as it consumes the last one,
>   | the format string doesn't get reused. It all depends on whatever
> peculiar
>   | definition of "satisfy" you use.
>
> Yes,  which is where the trick of including a conversion "%999$.0s" in the
> format string is supposed to stop format string reuse -- as long as there
> are
> no more than 999 args given of course.   This issue has nothing to do with
> the
> mixed numbered/unnumbered problem, which rightly should just be
> unspecified,
> and implementations do whatever they like, reasonable or not (as in the eye
> of the coder), but that format string reuse is simply wrong in the numbered
> conversion case, no-one really needs that.
>
>
Again the rule I used for ksh93 is that the next arg for fmt reuse is
max( number_of_unumbered_occurences, highest_numberd_index )

A simple rule easy to understand, and making non mixed mode to work as
expected, again leaving mixed mode in the complexity of counting things.

On top of the fact those rules are simple, they also ease the fmt scan and
arg fetching.

Another way of saing all this is that %s (unnumbered) are in fact
internally numbered with 'seq' so %s %s in internally %(seq++)s %(seq++)s
and then numbered are in the midle of this  %(seq++)s %4s %(seq++s)
Next fmt skip max(seq,4)

I forgot to mention your trick to nuke the fmt reuse still works

$ printf '%s %s %s %999$s' A B C D E F G
A B C

:-)

Cheers