gettext feature request

2021-07-24 Thread Jean-Jacques Brucker


Planning to use the /$"string/" feature in my bash code made me think 
too much : https://github.com/foopgp/bash-libs/tree/main/i18n



...what I really *love* to see in bash, is a /$'string'/ feature, which 
doesn't parse any «`» or «$» characters.



Did you already discuss about that ? Is there a hope to use such 
powerful syntax without introducing security issue ?



Cheers,



OpenPGP_0xA3983A40D1458443.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: gettext feature request

2021-07-24 Thread Jean-Jacques Brucker


Le 24/07/2021 à 20:48, Chet Ramey a écrit :

So you want a translation feature without any further interpretation? Or
one just without command substitution?


Translation feature without any further interpretation.

pros:

 * eases interoperability of *.mo files (eg: may then be shared with
   with C programs)
 * more coherent with other single quoted string (would only differ by
   interpreting escaped chars even if not used with printf or echo -e)
 * implies better string factorization (force things like /printf
   $'file %s is too big\n' .../ instead of /echo $'file $f is too big'/
   and later /echo $'file ${fl[$n]} is too big'/)

I see no cons.



What about an option to enable this variant behavior? What do you think
would be a suitable name?

For testing purpose, option-name: *gettext*. But if possible, I would 
prefer such feature enabled by default, then *nogettext* to disable.

As of today (bash 5.1.x), *-g* and *-G* are also available.

Cheers,
Jean-Jacques.


OpenPGP_0xA3983A40D1458443.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: gettext feature request

2021-07-27 Thread Jean-Jacques Brucker


Le 27/07/2021 à 16:39, Chet Ramey a écrit :

That's not a good name. It provides no insight about the option or its
purpose. Maybe something like `quote_translation'.

You are right. And I was sorely lacking in imagination.

That's not backwards compatible, so the best thing for you to do would be
to set this option in the places where it matters for your scripts, maybe
in a script configuration file.


IMHO bash already suffers of having too many options. And using, or not, 
one of those often breaks compatibility.


(example: sourcing different bash files, running incompatible options)


Yet the only case I see which broke backward compatibility, is when this 
2 conditions are met :


 * Some piece of code already use TEXTDOMAIN variable and a translation
   feature ($"..." or the gettext.sh shell functions).
 * This piece of code contain at least a /'C-style-string'/ identical
   to a /$"string-to-be-translated"/.


BTW, I now agree this could really happen, then enabling by default a 
`quote_translation' like this in unacceptable.



Finally there is an other and maybe better solution, by implementing one 
or two more variables:


 * TEXTCDOMAIN (or TEXTQUOTEDOMAIN): if set, enable `quote_translation'
   feature (like TEXTDOMAIN does for actual translation features).
 * And maybe TEXTCDOMAINDIR (or TEXTQUOTEDOMAINDIR): (like
   TEXTDOMAINDIR does for actual translation features).


pros:

 * no additional option
 * backwards compatible

con:

 * probably more complex implementation


Cheers,

Jean-Jacques




OpenPGP_signature
Description: OpenPGP digital signature


Re: gettext feature request

2021-07-27 Thread Jean-Jacques Brucker
I configured the list to only receive digests of bug-bash mailing list. 
So I didn't read your message before looking up myself into archive :


https://lists.gnu.org/archive/html/bug-bash/2021-07/msg00059.html

> So, most people have simply written the whole thing off as a lost cause.
> The best advice I can give is, "if you need localization in your program,
> don't write it in bash".

That is not an option for me, bash has too many advantages comparing to 
other languages :


 * portability.

 * embedded by default on many OS (we are not forced to install heavy 
dependencies).


 * powerful syntax that allows to write much less code than in any 
other language (the downside of this is that code often appears 
unreadable, mainly for beginners).


 * Quick to write, debug and deploy.

 * ... (etc .)


When speed of execution is not a bottleneck, bash is then (at least for 
me) the perfect language to write pieces of Proof-Of-Concept as well as 
some "In-production" tools. And I would prefer to not waste time, 
portability or long term compatibility or maintenance cost (cf. 
dependencies), using some other languages.



I like what you wrote 12 years ago ( 
https://lists.gnu.org/archive/html/bug-bash/2009-02/msg00258.html ). 
IMHO, it could have change some people's mind about bash, and could have 
saved developer energy.



> $'...' already exists, so you can't use that for localization.


I would like to use it still for "C-style-string" feature, and to be 
able to translate it. In other word not changing its current usage, but 
extending it.








OpenPGP_0xA3983A40D1458443.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: gettext feature request

2021-07-29 Thread Jean-Jacques Brucker
I configured the list to only receive digests of bug-bash mailing list. 
So I didn't read your message before looking up myself into archive :


https://lists.gnu.org/archive/html/bug-bash/2021-07/msg00059.html

> So, most people have simply written the whole thing off as a lost cause.
> The best advice I can give is, "if you need localization in your program,
> don't write it in bash".

That is not an option for me, bash has too many advantages comparing to 
other languages :


 * portability.

 * embedded by default on many OS (we are not forced to install heavy 
dependencies).


 * powerful syntax that allows to write much less code than in any 
other language (the downside of this is that code often appears 
unreadable, mainly for beginners).


 * Quick to write, debug and deploy.

 * ... (etc .)


When speed of execution is not a bottleneck, bash is then (at least for 
me) the perfect language to write pieces of Proof-Of-Concept as well as 
some "In-production" tools. And I would prefer to not waste time, 
portability or long term compatibility or maintenance cost (cf. 
dependencies), using some other languages.



I like what you write 12 years ago ( 
https://lists.gnu.org/archive/html/bug-bash/2009-02/msg00258.html ). It 
could have change some people's mind about bash, and could have saved a 
lot of developer energy.



> $'...' already exists, so you can't use that for localization.


I would like to use it still for "C-style-string" feature, and to be 
able to translate it. In other word not changing its current usage, but 
extending it.






OpenPGP_signature
Description: OpenPGP digital signature


Re: gettext feature request

2021-07-31 Thread Jean-Jacques Brucker


Le 29/07/2021 à 21:08, Chet Ramey a écrit :


How about `noexpand_translation'?


It sounds good ! (imho).



IMHO bash already suffers of having too many options. And using, or 
not, one of those often breaks compatibility.


It's not really a matter of having too many options, as long as the
defaults are good and preserve backwards compatibility. That way, people
who want the new behavior can opt in, and people who want no changes 
don't

have to change anything. If, at some point, it makes sense to change the
default behavior, it's a matter of changing the default value of that
option, which still allows users who want the old behavior a way to 
get it.


Now, changing that default behavior, no matter what the reason or how 
long

the option has been there, is always going to present problems. But
sometimes you do it.


Finally there is an other and maybe better solution, by implementing 
one or two more variables:


  * TEXTCDOMAIN (or TEXTQUOTEDOMAIN): if set, enable `quote_translation'
    feature (like TEXTDOMAIN does for actual translation features).
  * And maybe TEXTCDOMAINDIR (or TEXTQUOTEDOMAINDIR): (like
    TEXTDOMAINDIR does for actual translation features).


Did you just invent these variable names? Or are they actually used for
something right now? I prefer not to have optional behavior depend on
whether a variable exists. Sometimes you can't avoid it, such as when you
need a way to indicate more than two possible values, but this isn't one
of those cases.


Yes I just invented these names, they have no known usage.

The advantage of adding such variable is that we could use more easily 
different mo files in a same bash execution :


(at least) one "legacy" (for /$"..."/) and (at least) one 'C-strings' 
(for /$'...'/), which could then be shared with a C program (For example 
lazy people could then reuse coreutils mo files, as they contain lot of 
frequently-used-everywhere strings, while using their own mo file for 
"legacy" strings).



But I agree, the "benefit over complexity" ratio of such implementation 
might be not interesting.


Moreover it could encourage the use of "legacy" translations, whereas we 
know that this poses security problems.



If you prefer to enable translation for all /$'C-string'/ using the same 
existing variable TEXTDOMAIN (and TEXTDOMAINDIR), with an option 
'noexpand_translation' to disable it. I am also fine with this idea.



It is an important choice. I would like to be sure we have not forgotten 
any argument in the balance.



Thanks a lot, Chet.


:-)

Jean-Jacques




OpenPGP_signature
Description: OpenPGP digital signature


Re: gettext feature request

2021-08-11 Thread Jean-Jacques Brucker


Le 10/08/2021 à 17:22, Chet Ramey a écrit :

This is the current description:

noexpand_translation
If  set,  bash encloses the translated results of $"..."
quoting in single quotes instead of double  quotes.   If
the string is not translated, this has no effect.

It will be in the next devel branch push.



Thank a lot Chet.

I still think it would have been more consistent to extend the $'...' 
syntax.


But this change is already a great feature, and I hope the gettext 
developers will warmly welcome it and then update their documentation.


Finally, this functionality which already reinforces the security, does 
not prevent another option which one could call 'translate_c_string' to 
translate the $'strings' (without any expansion).


(my idea of an additional variable like 'TEXTCDOMAIN' was indeed not 
that great)


If both options existed ('noexpand_translation' AND 
'translate_c_string'), I would prefer to use the second which is even 
more secure (either my strings are translated or they are not, but they 
can never be expanded).


But already I'm delighted to not plan horrible workarounds anymore, and 
if only 'noexpand_translation' is released, I'll be happy to use it. :-)


Cheers,
Jean-Jacques.



OpenPGP_0xA3983A40D1458443.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: gettext feature request

2021-08-13 Thread Jean-Jacques Brucker

Le 12/08/2021 à 16:29, Chet Ramey a écrit :

On 8/11/21 6:35 PM, Jean-Jacques Brucker wrote:

Thank a lot Chet.

I still think it would have been more consistent to extend the $'...' syntax.

Why would it be more consistent to add translation to a quoting syntax that
didn't have it than to post-process translations in a different way?

Chet



I'm probably less familiar with the history of shells and bash than you (and

others on this mailing list), but it seems to me that:

First were the chains:
 * '...' who did not expand anything
 * "..." which expanded the syntaxes $VAR and the unfortunate `...` [1]

Then the syntax $"..." was introduced, then the syntax $'...' (bash 2.X).

As the character '$' means "interprets (synonym: translates) what 
follows", it
seems to me quite consistent that $"..." means "translates the entire 
following

string"

Conversely, the syntax $'...' seems to me much less consistent. 
(Reminder: C is

prior to shells, and bash is itself written in C).

However, the "C-string" feature is very useful (and nowadays probably 
more used

than the translation feature).

In absolute terms, if one day we would list all the historical design 
errors,
dare to break some compatibilities, and manage to establish new shell 
standards

(I'm probably dreaming ... but do we ever know ?). Then we could have:

 1. "...": expanded shell string
 2. '...': unexpanded shell string
 3. `...`: unexpanded C string (today $'...' :-/ )

and logically:

 4. $"...": translated expanded shell string (to NOT use for security 
reasons)

 5. $'...': translated unexpanded shell string (soon released $"..." with
    shopt noexpand_translation !?)
 6. $`...`: translated unexpanded C string (the feature[2] I dream of the
    most, maybe soon $'...' with a shopt like translate_c_string ?)

[1]: cf. https://www.grymoire.com/Unix/Sh.html#uh-8a

[2]: What I like the most about this feature, and that I could directly 
reuse

existing translation chains. Not only to dig into the many existing C
translations, but also not to redo the translations when, for various 
reasons
(performance, etc.), we are forced to rewrite bash code in a compiled 
language.




OpenPGP_signature
Description: OpenPGP digital signature


lseek with bash

2011-12-09 Thread Jean-Jacques Brucker
Playing with flock to securely access to a file shared by multiple
process. I noticed there are no documented way to do an lseek on an
opened fd with bash :


#!/bin/bash

exec 18<>/tmp/resource
flock 18
# (...) read and analyze the resource file
# ?? there is no documented way to seek or rewind in the resource...
if i redo "exec 8<>/tmp/resource" it close the file descriptor and so
unlock it for flock :-(
# write in the resource file
exec 18>&-

I have solve my problem by making this small binary (i just needed a rewind) :

int main(int argc,char * argv[]) { return lseek(atoi(argv[1]),0L,0); }

But i ll be glad to use a standard and finished tool.

Of course we could make an "lseek" binary with some options to cover
all use cases of lseek function. But I prefer to have such
functionality inside bash.

If it does not already exist, here is a proposition :
To understand some characters after a file descriptor, which imply a
lseek (if it is not a pipe or a socket) before reading or writing to
this fd :
looks like :
- $fd[sbae[0-9]*]
-or $fd[+$^-[0-9]*]
were s or ^ imply  whence=SEEK_SET , and [0-9]* is the (positive)
offset in octets (default=0, s for Start or Set)
were b or  - imply  whence=SEEK_CUR , and [0-9]* is the (negative)
offset in octets (default=0, b for Before)
were a or + imply  whence=SEEK_CUR , and [0-9]* is the (positive)
offset in octets (default=0, a for After)
were e or $ imply  whence=SEEK_END , and [0-9]* is the (negative)
offset in octets (default=0, e for End)

and this is how it could be use :
read line <&18s   # do an rewind before to read the (first) line
read -c 3 endchars <&18e4 # read 3 chars before the last one.
echo -n >&18b4 # just move SEEK_CUR 4 octets backward.

What do you think ?

Thx,
Jean-Jacques.



lseek in bash

2011-12-10 Thread Jean-Jacques Brucker
playing with flock to securely access to a file shared by multiple
process. I noticed there are no documented way to do an lseek on an
opened fd with bash :


#!/bin/bash

exec 18<>/tmp/resource
flock 18
# (...) read and analyze the resource file
# ?? there is no documented way to seek or rewind in the resource...
if i redo "exec 8<>/tmp/resource" it close the file descriptor and so
unlock it for flock :-(
# write in the resource file
exec 18>&-

I have solve my problem by making this small binary (i just needed a rewind) :

int main(int argc,char * argv[]) { return lseek(atoi(argv[1]),0L,0); }

But i ll be glad to use a standard and finished tool.

Of course we could make an "lseek" binary with some options to cover
all use cases of lseek function. But I prefer to have such
functionality inside bash.

If it does not already exist, here is a proposition :
To understand some characters after a file descriptor, which imply a
lseek (if it is not a pipe or a socket) before reading or writing to
this fd :
looks like :
- $fd[sbae[0-9]*]
-or $fd[+$^-[0-9]*]
were s or ^ imply  whence=SEEK_SET , and [0-9]* is the (positive)
offset in octets (default=0, s for Start or Set)
were b or  - imply  whence=SEEK_CUR , and [0-9]* is the (negative)
offset in octets (default=0, b for Before)
were a or + imply  whence=SEEK_CUR , and [0-9]* is the (positive)
offset in octets (default=0, a for After)
were e or $ imply  whence=SEEK_END , and [0-9]* is the (negative)
offset in octets (default=0, e for End)

and this is how it could be use :
read line <&18s   # do an rewind before to read the (first) line
read -c 3 endchars <&18e4 # read 3 chars before the last one.
echo -n >&18b4 # just move SEEK_CUR 4 octets backward.

What do you think ?

PS: I don't subscribe the mailing list, please keep my email in your
responses ;-).

Thx,
Jean-Jacques.



race bug !?

2021-01-28 Thread Jean-Jacques Brucker


100 % reproducible on my Debian, package version 5.1-2:



$ echo "dc30a6d79f3b47e310b8b9f5fbadba57" | tee >( xxd  -r -p | base64) 
| tee >(xxd  -r -p | basenc --base64url )

dc30a6d79f3b47e310b8b9f5fbadba57
3DCm1587R+MQuLn1+626Vw==
3DCm1587R-MQuLn1-626Vz0Vhw==


I first thought I missed something about base64url encoding, after hours 
or digging and digging documentation I tried :


$ echo "dc30a6d79f3b47e310b8b9f5fbadba57" | xxd  -r -p | basenc --base64url
3DCm1587R-MQuLn1-626Vw==


Those 4 extra characters "z0Vh" are quite disturbing :-/

Then I played more, cf. attachment.


$ LANG= bash --version
GNU bash, version 5.1.4(1)-release (x86_64-pc-linux-gnu)

Cheers,

---

Jean-Jacques.



bashbug.sh
Description: application/shellscript


OpenPGP_0xA3983A40D1458443.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature


Re: race bug !?

2021-01-29 Thread Jean-Jacques Brucker

Big thanks for all.

Indeed there is no bug, and IMO even nothing to improve in bash.

What put me wrong is that such erroneous command "work down" when first 
"xxd  -r -p | base64" output don't begin with hexadecimals characters. 
This happens most often, and then second "xxd  -r -p" will ignore such 
input.


I should have play with file descriptors and redirections to parallelize 
correctly.



$ echo "dc30a6d79f3b47e310b8b9f5fbadba573DCm1587" | xxd  -r -p | base64
3DCm1587R+MQuLn1+626Vz0Vhw==


$ echo "68b329da9893e34099c7d8ad5cb9c940" |  tee >( head -c 4 ; echo ; 
ls -l /proc/self/fd ) | { tee >(xxd  -r -p | basenc --base64 ) ; ls -l 
/proc/self/fd ; }



Cheers,


Le 29/01/2021 à 14:50, Chet Ramey a écrit :

On 1/28/21 6:33 PM, Jean-Jacques Brucker wrote:


100 % reproducible on my Debian, package version 5.1-2:



$ echo "dc30a6d79f3b47e310b8b9f5fbadba57" | tee >( xxd  -r -p | 
base64) | tee >(xxd  -r -p | basenc --base64url )

dc30a6d79f3b47e310b8b9f5fbadba57
3DCm1587R+MQuLn1+626Vw==
3DCm1587R-MQuLn1-626Vz0Vhw==


Without looking at this too closely, this is always going to be somewhat
`racy', for two reasons. First, the process substitutions are executed
asynchronously, so their execution is non-deterministic. Second, the word
expansions in pipelines happen after the shell forks, and after piping is
done, so the standard output of the process substitution is the same as
the standard output of its parent: the pipe.

You can see this by using `wc' to count the lines the second pipeline 
element receives:


$ echo "dc30a6d79f3b47e310b8b9f5fbadba57" | tee >( xxd  -r -p | base64 
) | wc

   2   2  58



I first thought I missed something about base64url encoding, after 
hours or digging and digging documentation I tried :


$ echo "dc30a6d79f3b47e310b8b9f5fbadba57" | xxd  -r -p | basenc 
--base64url

3DCm1587R-MQuLn1-626Vw==


Those 4 extra characters "z0Vh" are quite disturbing :-/


You're not encoding the same data. You've got the original string you
echoed plus the output of the first encoding process substitution.



OpenPGP_0xA3983A40D1458443.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature