Re: excess braces ignored: bug or feature ?
On Sunday, February 19, 2012 04:25:46 PM Chet Ramey wrote: > I assume you mean the first one. It doesn't matter whether or not the > variable is set as a side effect of the redirection -- it's in a > subshell and disappears. > > Chet Forgot to mention though, It's possible in ksh there is no subshell created if you consider this: $ : "$(&2;})" 1 $ : $(: $( echo ${.sh.subshell} >&2)) 2 It even works with the subshell-less command substitution, but there's no typeset output, so either x is automatically unset, it's never set to begin with, or ${ &2 | : 0 ~ $ : | { echo $BASH_SUBSHELL >&2; } | : 1 ~ $ : | ( echo $BASH_SUBSHELL >&2; ) | : 1 ~ $ : | ( ( echo $BASH_SUBSHELL >&2; ) ) | : 2 ~ $ : | { ( echo $BASH_SUBSHELL >&2; ) } | : 2 ~ $ : | { { echo $BASH_SUBSHELL >&2; } } | : 1 -- Dan Douglas
Process group id of first command in command substitution (bash4 vs bash3)
I'm not sure if it's a bug or not, but there is change between old bash 3.2 and bash 4.2. When you run a script: set -m $(sleep 1; sleep 2) in bash 4.2 the first sleep has same group id as parent shell. However in bash 3.2 it has different group id. Is it bug or not? I'm not able to find documentation for this change. And seems that POSIX says nothing about it. RR
Re: Process group id of first command in command substitution (bash4 vs bash3)
> I'm not sure if it's a bug or not, but there is change between old bash > 3.2 and bash 4.2. > When you run a script: > set -m > $(sleep 1; sleep 2) > > in bash 4.2 the first sleep has same group id as parent shell. However > in bash 3.2 it has different group id. > > Is it bug or not? I'm not able to find documentation for this change. > And seems that POSIX says nothing about it. How could this possibly matter? -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.
On 2/18/12 5:39 AM, John Kearney wrote: > Bash Version: 4.2 > Patch Level: 10 > Release Status: release > > Description: > Current u32toutf8 only encode values below 0x correctly. > wchar_t can be ambiguous size better in my opinion to use > unsigned long, or uint32_t, or something clearer. Thanks for the patch. It's good to have a complete implementation, though as a practical matter you won't see UTF-8 characters longer than four bytes. I agree with you about the unsigned 32-bit int type; wchar_t is signed, even if it's 32 bits, on several systems I use. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: Questionable code behavior in u32cconv?
On 2/18/12 7:07 AM, John Kearney wrote: > Configuration Information [Automatically generated, do not change]: > Machine: x86_64 > OS: linux-gnu > Compiler: gcc > Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='x86_64' > -DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='x86_64-pc-linux-gnu' > -DCONF_VENDOR='pc' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' > -DSHELL -DHAVE_CONFIG_H -I. -I../bash -I../bash/include > -I../bash/lib -g -O2 -Wall > uname output: Linux DETH00 3.0.0-15-generic #26-Ubuntu SMP Fri Jan 20 > 17:23:00 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux > Machine Type: x86_64-pc-linux-gnu > > Bash Version: 4.2 > Patch Level: 10 > Release Status: release > > Description: > Now I may be misreading the code but it looks like the code relating > to iconv is only checking the destination charset the first time, the > code is executed. > > as such breaking the following functionality. > LC_CTYPE=C printf '\uff' > LC_CTYPE=C.UTF-8 printf '\uff' > > Repeat-By: > haven't seen the problem. I can't reproduce it, even using C, zh_CN, and en_US.UTF-8, but I agree that the static data should be reset when the locale, or at least LC_CTYPE, changes. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: Can somebody explain to me what u32tochar in /lib/sh/unicode.c is trying to do?
On 2/19/12 5:07 PM, John Kearney wrote: > Can somebody explain to me what u32tochar is trying to do? > > It seems like dangerous code? > > from the context i'm guessing it trying to make a hail mary pass at > converting utf-32 to mb (not utf-8 mb) Pretty much. It's a big-endian representation of a 32-bit integer as a character string. It's what you get when you don't have iconv or iconv fails and the locale isn't UTF-8. It may not be useful, but it's predictable. If we have a locale the system doesn't know about or can't translate, there's not a lot we can do. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
bug in stub_charset rollup diff of changes to unicode code.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Configuration Information [Automatically generated, do not change]: Machine: x86_64 OS: linux-gnu Compiler: gcc Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='x86_64' - -DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='x86_64-pc-linux-gnu' - -DCONF_VENDOR='pc' -DLOCALEDIR='/usr/s uname output: Linux DETH00 3.0.0-15-generic #26-Ubuntu SMP Fri Jan 20 17:23:00 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux Machine Type: x86_64-pc-linux-gnu Bash Version: 4.2 Patch Level: 10 Release Status: release Description: stub_charset if locale == '\0' return ASCII else if locale=~m/.*\.(.*)(@.*)/ return $1 else if locale=UTF-8 return UTF-8 else return ASCII should be if locale == '\0' return ASCII else if locale=~m/.*\.(.*)(@.*)/ return $1 else return locale because its output is only being used in iconv, so let it decide if the locale makes sense. I've attached a diff of all my changes to the unicode code. Including renamed u2cconv to utf32tomb move special handling of ascii charcter to start of function and remove related call wrapper code. tried to reationalize the code in utf32tomb so its easier to read and understand what is happening. added utf32toutf16 use utf32toutf16 in case wchar_t=2 with wctomb removed dangerious code that was using iconv_open (charset, "ASCII"); as fallback. pointless anyway as we already assign a ascii value if posible. added warning message if encode fails always terminate mb output string. haven't started to test these changes yet firstly would like to know if these changes are acceptable, any observations, I'm still reviewing it myself for consistency. Plus can somebody tell me how this was tested originally? I've got some ideas myself but would like to know what has already been done in that direction. Repeat-By: . Fix: diff --git a/builtins/printf.def b/builtins/printf.def index 9eca215..3680419 100644 - --- a/builtins/printf.def +++ b/builtins/printf.def @@ -859,15 +859,9 @@ tescape (estart, cp, lenp, sawc) *cp = '\\'; return 0; } - - if (uvalue <= UCHAR_MAX) - - *cp = uvalue; - - else - - { - - temp = u32cconv (uvalue, cp); - - cp[temp] = '\0'; - - if (lenp) - - *lenp = temp; - - } + temp = utf32tomb (uvalue, cp); + if (lenp) + *lenp = temp; break; #endif diff --git a/externs.h b/externs.h index 09244fa..8868b55 100644 - --- a/externs.h +++ b/externs.h @@ -460,7 +460,7 @@ extern unsigned int falarm __P((unsigned int, unsigned int)); extern unsigned int fsleep __P((unsigned int, unsigned int)); /* declarations for functions defined in lib/sh/unicode.c */ - -extern int u32cconv __P((unsigned long, char *)); +extern int utf32tomb __P((unsigned long, char *)); /* declarations for functions defined in lib/sh/winsize.c */ extern void get_new_window_size __P((int, int *, int *)); diff --git a/lib/sh/strtrans.c b/lib/sh/strtrans.c index 2265782..495d9c4 100644 - --- a/lib/sh/strtrans.c +++ b/lib/sh/strtrans.c @@ -144,16 +144,10 @@ ansicstr (string, len, flags, sawc, rlen) *r++ = '\\'; /* c remains unchanged */ break; } - - else if (v <= UCHAR_MAX) - - { - - c = v; - - break; - - } else { - - temp = u32cconv (v, r); - - r += temp; - - continue; + r += utf32tomb (v, r); + break; } #endif case '\\': diff --git a/lib/sh/unicode.c b/lib/sh/unicode.c index d34fa08..9a557a9 100644 - --- a/lib/sh/unicode.c +++ b/lib/sh/unicode.c @@ -36,14 +36,6 @@ #include - -#ifndef USHORT_MAX - -# ifdef USHRT_MAX - -#define USHORT_MAX USHRT_MAX - -# else - -#define USHORT_MAX ((unsigned short) ~(unsigned short)0) - -# endif - -#endif - - #if !defined (STREQ) # define STREQ(a, b) ((a)[0] == (b)[0] && strcmp ((a), (b)) == 0) #endif /* !STREQ */ @@ -54,13 +46,14 @@ extern const char *locale_charset __P((void)); extern char *get_locale_var __P((char *)); #endif - -static int u32init = 0; +const char *charset; static int utf8locale = 0; #if defined (HAVE_ICONV) static iconv_t localconv; #endif #ifndef HAVE_LOCALE_CHARSET +static char CType[40]={0}; static char * stub_charset () { @@ -69,6 +62,7 @@ stub_charset () locale = get_locale_var ("LC_CTYPE"); if (locale == 0 || *locale == 0) return "ASCII"; + strcpy(CType, locale); s = strrchr (locale, '.'); if (s) { @@ -77,159 +71,230 @@ stub_charset () *t = 0; return ++s; } - - else if (STREQ (locale, "UTF-8")) - -return "UTF-8"; else - -return "ASCII"; +return CType; } #endif - -/* u32toascii ? */ int - -u32tochar (wc, s) - - wchar_t wc; +utf32_2_utf8 (c, s) + unsigned lon