gettext feature request
Planning to use the /$"string/" feature in my bash code made me think too much : https://github.com/foopgp/bash-libs/tree/main/i18n ...what I really *love* to see in bash, is a /$'string'/ feature, which doesn't parse any «`» or «$» characters. Did you already discuss about that ? Is there a hope to use such powerful syntax without introducing security issue ? Cheers, OpenPGP_0xA3983A40D1458443.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: gettext feature request
Le 24/07/2021 à 20:48, Chet Ramey a écrit : So you want a translation feature without any further interpretation? Or one just without command substitution? Translation feature without any further interpretation. pros: * eases interoperability of *.mo files (eg: may then be shared with with C programs) * more coherent with other single quoted string (would only differ by interpreting escaped chars even if not used with printf or echo -e) * implies better string factorization (force things like /printf $'file %s is too big\n' .../ instead of /echo $'file $f is too big'/ and later /echo $'file ${fl[$n]} is too big'/) I see no cons. What about an option to enable this variant behavior? What do you think would be a suitable name? For testing purpose, option-name: *gettext*. But if possible, I would prefer such feature enabled by default, then *nogettext* to disable. As of today (bash 5.1.x), *-g* and *-G* are also available. Cheers, Jean-Jacques. OpenPGP_0xA3983A40D1458443.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: gettext feature request
Le 27/07/2021 à 16:39, Chet Ramey a écrit : That's not a good name. It provides no insight about the option or its purpose. Maybe something like `quote_translation'. You are right. And I was sorely lacking in imagination. That's not backwards compatible, so the best thing for you to do would be to set this option in the places where it matters for your scripts, maybe in a script configuration file. IMHO bash already suffers of having too many options. And using, or not, one of those often breaks compatibility. (example: sourcing different bash files, running incompatible options) Yet the only case I see which broke backward compatibility, is when this 2 conditions are met : * Some piece of code already use TEXTDOMAIN variable and a translation feature ($"..." or the gettext.sh shell functions). * This piece of code contain at least a /'C-style-string'/ identical to a /$"string-to-be-translated"/. BTW, I now agree this could really happen, then enabling by default a `quote_translation' like this in unacceptable. Finally there is an other and maybe better solution, by implementing one or two more variables: * TEXTCDOMAIN (or TEXTQUOTEDOMAIN): if set, enable `quote_translation' feature (like TEXTDOMAIN does for actual translation features). * And maybe TEXTCDOMAINDIR (or TEXTQUOTEDOMAINDIR): (like TEXTDOMAINDIR does for actual translation features). pros: * no additional option * backwards compatible con: * probably more complex implementation Cheers, Jean-Jacques OpenPGP_signature Description: OpenPGP digital signature
Re: gettext feature request
I configured the list to only receive digests of bug-bash mailing list. So I didn't read your message before looking up myself into archive : https://lists.gnu.org/archive/html/bug-bash/2021-07/msg00059.html > So, most people have simply written the whole thing off as a lost cause. > The best advice I can give is, "if you need localization in your program, > don't write it in bash". That is not an option for me, bash has too many advantages comparing to other languages : * portability. * embedded by default on many OS (we are not forced to install heavy dependencies). * powerful syntax that allows to write much less code than in any other language (the downside of this is that code often appears unreadable, mainly for beginners). * Quick to write, debug and deploy. * ... (etc .) When speed of execution is not a bottleneck, bash is then (at least for me) the perfect language to write pieces of Proof-Of-Concept as well as some "In-production" tools. And I would prefer to not waste time, portability or long term compatibility or maintenance cost (cf. dependencies), using some other languages. I like what you wrote 12 years ago ( https://lists.gnu.org/archive/html/bug-bash/2009-02/msg00258.html ). IMHO, it could have change some people's mind about bash, and could have saved developer energy. > $'...' already exists, so you can't use that for localization. I would like to use it still for "C-style-string" feature, and to be able to translate it. In other word not changing its current usage, but extending it. OpenPGP_0xA3983A40D1458443.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: gettext feature request
I configured the list to only receive digests of bug-bash mailing list. So I didn't read your message before looking up myself into archive : https://lists.gnu.org/archive/html/bug-bash/2021-07/msg00059.html > So, most people have simply written the whole thing off as a lost cause. > The best advice I can give is, "if you need localization in your program, > don't write it in bash". That is not an option for me, bash has too many advantages comparing to other languages : * portability. * embedded by default on many OS (we are not forced to install heavy dependencies). * powerful syntax that allows to write much less code than in any other language (the downside of this is that code often appears unreadable, mainly for beginners). * Quick to write, debug and deploy. * ... (etc .) When speed of execution is not a bottleneck, bash is then (at least for me) the perfect language to write pieces of Proof-Of-Concept as well as some "In-production" tools. And I would prefer to not waste time, portability or long term compatibility or maintenance cost (cf. dependencies), using some other languages. I like what you write 12 years ago ( https://lists.gnu.org/archive/html/bug-bash/2009-02/msg00258.html ). It could have change some people's mind about bash, and could have saved a lot of developer energy. > $'...' already exists, so you can't use that for localization. I would like to use it still for "C-style-string" feature, and to be able to translate it. In other word not changing its current usage, but extending it. OpenPGP_signature Description: OpenPGP digital signature
Re: gettext feature request
Le 29/07/2021 à 21:08, Chet Ramey a écrit : How about `noexpand_translation'? It sounds good ! (imho). IMHO bash already suffers of having too many options. And using, or not, one of those often breaks compatibility. It's not really a matter of having too many options, as long as the defaults are good and preserve backwards compatibility. That way, people who want the new behavior can opt in, and people who want no changes don't have to change anything. If, at some point, it makes sense to change the default behavior, it's a matter of changing the default value of that option, which still allows users who want the old behavior a way to get it. Now, changing that default behavior, no matter what the reason or how long the option has been there, is always going to present problems. But sometimes you do it. Finally there is an other and maybe better solution, by implementing one or two more variables: * TEXTCDOMAIN (or TEXTQUOTEDOMAIN): if set, enable `quote_translation' feature (like TEXTDOMAIN does for actual translation features). * And maybe TEXTCDOMAINDIR (or TEXTQUOTEDOMAINDIR): (like TEXTDOMAINDIR does for actual translation features). Did you just invent these variable names? Or are they actually used for something right now? I prefer not to have optional behavior depend on whether a variable exists. Sometimes you can't avoid it, such as when you need a way to indicate more than two possible values, but this isn't one of those cases. Yes I just invented these names, they have no known usage. The advantage of adding such variable is that we could use more easily different mo files in a same bash execution : (at least) one "legacy" (for /$"..."/) and (at least) one 'C-strings' (for /$'...'/), which could then be shared with a C program (For example lazy people could then reuse coreutils mo files, as they contain lot of frequently-used-everywhere strings, while using their own mo file for "legacy" strings). But I agree, the "benefit over complexity" ratio of such implementation might be not interesting. Moreover it could encourage the use of "legacy" translations, whereas we know that this poses security problems. If you prefer to enable translation for all /$'C-string'/ using the same existing variable TEXTDOMAIN (and TEXTDOMAINDIR), with an option 'noexpand_translation' to disable it. I am also fine with this idea. It is an important choice. I would like to be sure we have not forgotten any argument in the balance. Thanks a lot, Chet. :-) Jean-Jacques OpenPGP_signature Description: OpenPGP digital signature
Re: gettext feature request
Le 10/08/2021 à 17:22, Chet Ramey a écrit : This is the current description: noexpand_translation If set, bash encloses the translated results of $"..." quoting in single quotes instead of double quotes. If the string is not translated, this has no effect. It will be in the next devel branch push. Thank a lot Chet. I still think it would have been more consistent to extend the $'...' syntax. But this change is already a great feature, and I hope the gettext developers will warmly welcome it and then update their documentation. Finally, this functionality which already reinforces the security, does not prevent another option which one could call 'translate_c_string' to translate the $'strings' (without any expansion). (my idea of an additional variable like 'TEXTCDOMAIN' was indeed not that great) If both options existed ('noexpand_translation' AND 'translate_c_string'), I would prefer to use the second which is even more secure (either my strings are translated or they are not, but they can never be expanded). But already I'm delighted to not plan horrible workarounds anymore, and if only 'noexpand_translation' is released, I'll be happy to use it. :-) Cheers, Jean-Jacques. OpenPGP_0xA3983A40D1458443.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: gettext feature request
Le 12/08/2021 à 16:29, Chet Ramey a écrit : On 8/11/21 6:35 PM, Jean-Jacques Brucker wrote: Thank a lot Chet. I still think it would have been more consistent to extend the $'...' syntax. Why would it be more consistent to add translation to a quoting syntax that didn't have it than to post-process translations in a different way? Chet I'm probably less familiar with the history of shells and bash than you (and others on this mailing list), but it seems to me that: First were the chains: * '...' who did not expand anything * "..." which expanded the syntaxes $VAR and the unfortunate `...` [1] Then the syntax $"..." was introduced, then the syntax $'...' (bash 2.X). As the character '$' means "interprets (synonym: translates) what follows", it seems to me quite consistent that $"..." means "translates the entire following string" Conversely, the syntax $'...' seems to me much less consistent. (Reminder: C is prior to shells, and bash is itself written in C). However, the "C-string" feature is very useful (and nowadays probably more used than the translation feature). In absolute terms, if one day we would list all the historical design errors, dare to break some compatibilities, and manage to establish new shell standards (I'm probably dreaming ... but do we ever know ?). Then we could have: 1. "...": expanded shell string 2. '...': unexpanded shell string 3. `...`: unexpanded C string (today $'...' :-/ ) and logically: 4. $"...": translated expanded shell string (to NOT use for security reasons) 5. $'...': translated unexpanded shell string (soon released $"..." with shopt noexpand_translation !?) 6. $`...`: translated unexpanded C string (the feature[2] I dream of the most, maybe soon $'...' with a shopt like translate_c_string ?) [1]: cf. https://www.grymoire.com/Unix/Sh.html#uh-8a [2]: What I like the most about this feature, and that I could directly reuse existing translation chains. Not only to dig into the many existing C translations, but also not to redo the translations when, for various reasons (performance, etc.), we are forced to rewrite bash code in a compiled language. OpenPGP_signature Description: OpenPGP digital signature
lseek with bash
Playing with flock to securely access to a file shared by multiple process. I noticed there are no documented way to do an lseek on an opened fd with bash : #!/bin/bash exec 18<>/tmp/resource flock 18 # (...) read and analyze the resource file # ?? there is no documented way to seek or rewind in the resource... if i redo "exec 8<>/tmp/resource" it close the file descriptor and so unlock it for flock :-( # write in the resource file exec 18>&- I have solve my problem by making this small binary (i just needed a rewind) : int main(int argc,char * argv[]) { return lseek(atoi(argv[1]),0L,0); } But i ll be glad to use a standard and finished tool. Of course we could make an "lseek" binary with some options to cover all use cases of lseek function. But I prefer to have such functionality inside bash. If it does not already exist, here is a proposition : To understand some characters after a file descriptor, which imply a lseek (if it is not a pipe or a socket) before reading or writing to this fd : looks like : - $fd[sbae[0-9]*] -or $fd[+$^-[0-9]*] were s or ^ imply whence=SEEK_SET , and [0-9]* is the (positive) offset in octets (default=0, s for Start or Set) were b or - imply whence=SEEK_CUR , and [0-9]* is the (negative) offset in octets (default=0, b for Before) were a or + imply whence=SEEK_CUR , and [0-9]* is the (positive) offset in octets (default=0, a for After) were e or $ imply whence=SEEK_END , and [0-9]* is the (negative) offset in octets (default=0, e for End) and this is how it could be use : read line <&18s # do an rewind before to read the (first) line read -c 3 endchars <&18e4 # read 3 chars before the last one. echo -n >&18b4 # just move SEEK_CUR 4 octets backward. What do you think ? Thx, Jean-Jacques.
lseek in bash
playing with flock to securely access to a file shared by multiple process. I noticed there are no documented way to do an lseek on an opened fd with bash : #!/bin/bash exec 18<>/tmp/resource flock 18 # (...) read and analyze the resource file # ?? there is no documented way to seek or rewind in the resource... if i redo "exec 8<>/tmp/resource" it close the file descriptor and so unlock it for flock :-( # write in the resource file exec 18>&- I have solve my problem by making this small binary (i just needed a rewind) : int main(int argc,char * argv[]) { return lseek(atoi(argv[1]),0L,0); } But i ll be glad to use a standard and finished tool. Of course we could make an "lseek" binary with some options to cover all use cases of lseek function. But I prefer to have such functionality inside bash. If it does not already exist, here is a proposition : To understand some characters after a file descriptor, which imply a lseek (if it is not a pipe or a socket) before reading or writing to this fd : looks like : - $fd[sbae[0-9]*] -or $fd[+$^-[0-9]*] were s or ^ imply whence=SEEK_SET , and [0-9]* is the (positive) offset in octets (default=0, s for Start or Set) were b or - imply whence=SEEK_CUR , and [0-9]* is the (negative) offset in octets (default=0, b for Before) were a or + imply whence=SEEK_CUR , and [0-9]* is the (positive) offset in octets (default=0, a for After) were e or $ imply whence=SEEK_END , and [0-9]* is the (negative) offset in octets (default=0, e for End) and this is how it could be use : read line <&18s # do an rewind before to read the (first) line read -c 3 endchars <&18e4 # read 3 chars before the last one. echo -n >&18b4 # just move SEEK_CUR 4 octets backward. What do you think ? PS: I don't subscribe the mailing list, please keep my email in your responses ;-). Thx, Jean-Jacques.
race bug !?
100 % reproducible on my Debian, package version 5.1-2: $ echo "dc30a6d79f3b47e310b8b9f5fbadba57" | tee >( xxd -r -p | base64) | tee >(xxd -r -p | basenc --base64url ) dc30a6d79f3b47e310b8b9f5fbadba57 3DCm1587R+MQuLn1+626Vw== 3DCm1587R-MQuLn1-626Vz0Vhw== I first thought I missed something about base64url encoding, after hours or digging and digging documentation I tried : $ echo "dc30a6d79f3b47e310b8b9f5fbadba57" | xxd -r -p | basenc --base64url 3DCm1587R-MQuLn1-626Vw== Those 4 extra characters "z0Vh" are quite disturbing :-/ Then I played more, cf. attachment. $ LANG= bash --version GNU bash, version 5.1.4(1)-release (x86_64-pc-linux-gnu) Cheers, --- Jean-Jacques. bashbug.sh Description: application/shellscript OpenPGP_0xA3983A40D1458443.asc Description: application/pgp-keys OpenPGP_signature Description: OpenPGP digital signature
Re: race bug !?
Big thanks for all. Indeed there is no bug, and IMO even nothing to improve in bash. What put me wrong is that such erroneous command "work down" when first "xxd -r -p | base64" output don't begin with hexadecimals characters. This happens most often, and then second "xxd -r -p" will ignore such input. I should have play with file descriptors and redirections to parallelize correctly. $ echo "dc30a6d79f3b47e310b8b9f5fbadba573DCm1587" | xxd -r -p | base64 3DCm1587R+MQuLn1+626Vz0Vhw== $ echo "68b329da9893e34099c7d8ad5cb9c940" | tee >( head -c 4 ; echo ; ls -l /proc/self/fd ) | { tee >(xxd -r -p | basenc --base64 ) ; ls -l /proc/self/fd ; } Cheers, Le 29/01/2021 à 14:50, Chet Ramey a écrit : On 1/28/21 6:33 PM, Jean-Jacques Brucker wrote: 100 % reproducible on my Debian, package version 5.1-2: $ echo "dc30a6d79f3b47e310b8b9f5fbadba57" | tee >( xxd -r -p | base64) | tee >(xxd -r -p | basenc --base64url ) dc30a6d79f3b47e310b8b9f5fbadba57 3DCm1587R+MQuLn1+626Vw== 3DCm1587R-MQuLn1-626Vz0Vhw== Without looking at this too closely, this is always going to be somewhat `racy', for two reasons. First, the process substitutions are executed asynchronously, so their execution is non-deterministic. Second, the word expansions in pipelines happen after the shell forks, and after piping is done, so the standard output of the process substitution is the same as the standard output of its parent: the pipe. You can see this by using `wc' to count the lines the second pipeline element receives: $ echo "dc30a6d79f3b47e310b8b9f5fbadba57" | tee >( xxd -r -p | base64 ) | wc 2 2 58 I first thought I missed something about base64url encoding, after hours or digging and digging documentation I tried : $ echo "dc30a6d79f3b47e310b8b9f5fbadba57" | xxd -r -p | basenc --base64url 3DCm1587R-MQuLn1-626Vw== Those 4 extra characters "z0Vh" are quite disturbing :-/ You're not encoding the same data. You've got the original string you echoed plus the output of the first encoding process substitution. OpenPGP_0xA3983A40D1458443.asc Description: application/pgp-keys OpenPGP_signature Description: OpenPGP digital signature