Re: Process group id of first command in command substitution (bash4 vs bash3)
On 02/20/2012 10:57 PM, Chet Ramey wrote: I'm not sure if it's a bug or not, but there is change between old bash 3.2 and bash 4.2. When you run a script: set -m $(sleep 1; sleep 2) in bash 4.2 the first sleep has same group id as parent shell. However in bash 3.2 it has different group id. Is it bug or not? I'm not able to find documentation for this change. And seems that POSIX says nothing about it. How could this possibly matter? It could matter in sending (receiving) signal to process group. RR
Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.
Eric Blake wrote: Not only can wchar_t can be either signed or unsigned, you also have to worry about platforms where it is only 16 bits, such as cygwin; on the other hand, wint_t is always 32 bits, but you still have the issue that it can be either signed or unsigned. What platform uses unsigned wide ints? Is that even posix compat?
Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.
On 02/22/2012 05:19 AM, Linda Walsh wrote: > > > Eric Blake wrote: > > >> Not only can wchar_t can be either signed or unsigned, you also have to >> worry about platforms where it is only 16 bits, such as cygwin; on the >> other hand, wint_t is always 32 bits, but you still have the issue that >> it can be either signed or unsigned. > > > > What platform uses unsigned wide ints? Is that even posix compat? Yes, it is posix compatible to have wint_t be unsigned. Not only that, but both glibc (32-bit wchar_t) and cygwin (16-bit wchar_t) use a 32-bit unsigned int for wint_t. Any code that expects WEOF to be less than 0 is broken. -- Eric Blake ebl...@redhat.com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.
On 02/22/2012 01:59 PM, Eric Blake wrote: > On 02/22/2012 05:19 AM, Linda Walsh wrote: >> >> >> Eric Blake wrote: >> >> >>> Not only can wchar_t can be either signed or unsigned, you also have to >>> worry about platforms where it is only 16 bits, such as cygwin; on the >>> other hand, wint_t is always 32 bits, but you still have the issue that >>> it can be either signed or unsigned. >> >> >> >> What platform uses unsigned wide ints? Is that even posix compat? > > Yes, it is posix compatible to have wint_t be unsigned. Not only that, > but both glibc (32-bit wchar_t) and cygwin (16-bit wchar_t) use a 32-bit > unsigned int for wint_t. Any code that expects WEOF to be less than 0 > is broken. > But if what you want is a uint32 use a uint32_t ;)
Re: printf "%q" "~" not escaped?
On 2/22/12 2:47 AM, John Kearney wrote: > Bash Version: 4.2 > Patch Level: 10 > Release Status: release > > Description: > printf "%q" "~" not escaped? > > which means that this > eval echo $(printf "%q" "~") > results in your home path not a ~ > unlike > eval echo $(printf "%q" "*") > > as far as I can see its the only character that isn't treated as I > expected. Thanks for the report. This will be fixed in the next bash release. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: Initial test code for \U
On 2/21/12 5:07 PM, John Kearney wrote: > > Initial code for testing \u functionality. Thanks; this is really good work. In the limited testing I've done, ja_JP.SHIFT_JIS is rare and C.UTF-8 doesn't exist anywhere. ja_JP.SJIS is a somewhat less rare substitute for the former, and en_US.UTF-8 seems to perform acceptably for the latter. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: Initial test code for \U
On 02/22/2012 12:55 PM, Chet Ramey wrote: > On 2/21/12 5:07 PM, John Kearney wrote: >> >> Initial code for testing \u functionality. > > Thanks; this is really good work. In the limited testing I've done, > ja_JP.SHIFT_JIS is rare and C.UTF-8 doesn't exist anywhere. C.UTF-8 exists on Cygwin. But you are correct that... > en_US.UTF-8 seems > to perform acceptably for the latter. -- Eric Blake ebl...@redhat.com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.
Eric Blake wrote: On 02/22/2012 05:19 AM, Linda Walsh wrote: Eric Blake wrote: Not only can wchar_t can be either signed or unsigned, you also have to worry about platforms where it is only 16 bits, such as cygwin; on the other hand, wint_t is always 32 bits, but you still have the issue that it can be either signed or unsigned. What platform uses unsigned wide ints? Is that even posix compat? Yes, it is posix compatible to have wint_t be unsigned. Not only that, but both glibc (32-bit wchar_t) and cygwin (16-bit wchar_t) use a 32-bit unsigned int for wint_t. Any code that expects WEOF to be less than 0 is broken. I never had any question that wchar_t could be signed or unsigned. My question had to do with an unqualified wint_t not unsigned wint_t and what platform existed where an 'int' type or wide-int_t, was, without qualifiers, unsigned. I still would like to know -- and posix allows int/wide-ints to be unsigned without the unsigned keyword? That seems very confusing.
Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.
On 02/22/2012 03:01 PM, Linda Walsh wrote: > My question had to do with an unqualified wint_t not > unsigned wint_t and what platform existed where an 'int' type or > wide-int_t, was, without qualifiers, unsigned. I still would like > to know -- and posix allows int/wide-ints to be unsigned without > the unsigned keyword? 'int' is signed, and at least 16 bits (these days, it's usually 32). It can also be written 'signed int'. 'unsigned int' is unsigned, and at least 16 bits (these days, it's usually 32). 'wchar_t' is an arbitrary integral type, either signed or unsigned, and capable of holding the value of all valid wide characters. It is possible to define a system where wchar_t and char are identical (limiting yourself to 256 valid characters), but that is not done in practice. More common are platforms that use 65536 characters (only the basic plane of Unicode) for 16 bits, or full Unicode (0 to 0x10) for 32 bits. Platforms that use 65536 characters and 16-bit wchar_t must have wchar_t be unsigned; whereas platforms that have wchar_t wider than the largest valid character can choose signed or unsigned with no impact. 'wint_t' is an arbitrary integral type, either signed or unsigned, at least as wide as wchar_t, and capable of holding the value of all valid wide characters and the sentinel WEOF. Like wchar_t, it may hold values that are neither WEOF or valid characters; and in fact, it is more likely to do so, since either wchar_t is saturated (all bit values are valid characters) and thus wint_t is a wider type, or wchar_t is sparse (as is the case with 32-bit wchar_t encoding Unicode), and the addition of WEOF to the set does not plug in the remaining sparse values; but using such values has unspecified results on any interface that takes a wint_t. WEOF only has to be distinct, it does not have to be negative. Don't think of it as 'wide-int', rather, think of it as 'the integral type that both contains wchar_t and WEOF'. You cannot write 'signed wint_t' nor 'unsigned 'wint_t'. -- Eric Blake ebl...@redhat.com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.
^ caviot you can represent the full 0x10 in UTF-16, you just need 2 UTF-16 characters. check out the latest version of unicode.c for an example how. On 02/22/2012 11:32 PM, Eric Blake wrote: > On 02/22/2012 03:01 PM, Linda Walsh wrote: >> My question had to do with an unqualified wint_t not >> unsigned wint_t and what platform existed where an 'int' type or >> wide-int_t, was, without qualifiers, unsigned. I still would like >> to know -- and posix allows int/wide-ints to be unsigned without >> the unsigned keyword? > > 'int' is signed, and at least 16 bits (these days, it's usually 32). It > can also be written 'signed int'. > > 'unsigned int' is unsigned, and at least 16 bits (these days, it's > usually 32). > > 'wchar_t' is an arbitrary integral type, either signed or unsigned, and > capable of holding the value of all valid wide characters. It is > possible to define a system where wchar_t and char are identical > (limiting yourself to 256 valid characters), but that is not done in > practice. More common are platforms that use 65536 characters (only the > basic plane of Unicode) for 16 bits, or full Unicode (0 to 0x10) for > 32 bits. Platforms that use 65536 characters and 16-bit wchar_t must > have wchar_t be unsigned; whereas platforms that have wchar_t wider than > the largest valid character can choose signed or unsigned with no impact. > > 'wint_t' is an arbitrary integral type, either signed or unsigned, at > least as wide as wchar_t, and capable of holding the value of all valid > wide characters and the sentinel WEOF. Like wchar_t, it may hold values > that are neither WEOF or valid characters; and in fact, it is more > likely to do so, since either wchar_t is saturated (all bit values are > valid characters) and thus wint_t is a wider type, or wchar_t is sparse > (as is the case with 32-bit wchar_t encoding Unicode), and the addition > of WEOF to the set does not plug in the remaining sparse values; but > using such values has unspecified results on any interface that takes a > wint_t. WEOF only has to be distinct, it does not have to be negative. > > Don't think of it as 'wide-int', rather, think of it as 'the integral > type that both contains wchar_t and WEOF'. You cannot write 'signed > wint_t' nor 'unsigned 'wint_t'. >
Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.
On 02/22/2012 07:43 PM, John Kearney wrote: > ^ caviot you can represent the full 0x10 in UTF-16, you just need 2 > UTF-16 characters. check out the latest version of unicode.c for an > example how. Yes, and Cygwin actually does this. A strict reading of POSIX states that wchar_t must be wide enough for all supported characters, technically limiting things to just the basic plane if you have 16-bit wchar_t and a POSIX-compliant app. But cygwin has exploited a loophole in the POSIX wording - POSIX does not require that all bit patterns are valid characters. So the actual Cygwin implementation is that on paper, rather than representing all 65536 patterns as valid characters, the values used in surrogate halves (0xd800 to 0xdfff) are listed as non-characters (so the use of them triggers undefined behavior per POSIX), but actually using them treats them as surrogate pairs (leading to the full Unicode character set, but reintroducing the headaches that multibyte characters had with 'char', but now with wchar_t, where you are back to dealing with variable-sized character elements). Furthermore, the mess of 16-bit vs. 32-bit wchar_t is one of the reasons why C11 has introduced two new character types, 16-bit and 32-bit characters, designed to fully map to the full Unicode set, regardless of what size wchar_t is. It will be interesting to see how the next version of POSIX takes the additions of C11 and retrofits the other wide-character functions in POSIX but not C99 to handle the new character types. -- Eric Blake ebl...@redhat.com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.
Eric Blake wrote: Don't think of it as 'wide-int', rather, think of it as 'the integral type that both contains wchar_t and WEOF'. You cannot write 'signed wint_t' nor 'unsigned 'wint_t'. --- ?? You say don't think of it that way, but unless I missed something, just like wchar stood for 'wide char', (and char's have always been signed or unsigned, (separate from short ints/unsigned short), the term 'wint' would have come from wide int. But ints have never been unsigned unless specifically prefixed as such... so wints shouldn't have the ambiguity that chars have. It may very well exist as unsigned somewhere -- but the implementer should be chained to a 1960's card punch and forced to write in cobol. You still haven't mentioned anyplace where wint_t is an unsigned value. Is this a hypothetical issue? I.e. in theory it could be unsigned , but in practice no one has ever made it so? If so, it might be a good time to shoot that idea in the foot. (or something like that...)...
Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.
And on the up side if they do ever give in and allow registration of family name characters we may get a wchar_t, schar_t lwchar_t and a llwchar_t :) just imagine a variable length 64bit char system. Everything from Sumerian to Klingon in Unicode, though I think they already are, though not officially, or are being done, Oh god what I really want now is bash in klingon. :)) just imagine black blackround glaring green text. know what I'm doing tonight. check out ( shakes head in disbelief, while chuckling ) Ubuntu Klingon Translators https://launchpad.net/~ubuntu-l10n-tlh Expansion: Ubuntu Font should support pIqaD (Klingon) https://bugs.launchpad.net/ubuntu/+source/ubuntu-font-family-sources/+bug/650729 On 02/23/2012 04:54 AM, Eric Blake wrote: > On 02/22/2012 07:43 PM, John Kearney wrote: >> ^ caviot you can represent the full 0x10 in UTF-16, you just >> need 2 UTF-16 characters. check out the latest version of >> unicode.c for an example how. > > Yes, and Cygwin actually does this. > > A strict reading of POSIX states that wchar_t must be wide enough > for all supported characters, technically limiting things to just > the basic plane if you have 16-bit wchar_t and a POSIX-compliant > app. But cygwin has exploited a loophole in the POSIX wording - > POSIX does not require that all bit patterns are valid characters. > So the actual Cygwin implementation is that on paper, rather than > representing all 65536 patterns as valid characters, the values > used in surrogate halves (0xd800 to 0xdfff) are listed as > non-characters (so the use of them triggers undefined behavior per > POSIX), but actually using them treats them as surrogate pairs > (leading to the full Unicode character set, but reintroducing the > headaches that multibyte characters had with 'char', but now with > wchar_t, where you are back to dealing with variable-sized > character elements). > > Furthermore, the mess of 16-bit vs. 32-bit wchar_t is one of the > reasons why C11 has introduced two new character types, 16-bit and > 32-bit characters, designed to fully map to the full Unicode set, > regardless of what size wchar_t is. It will be interesting to see > how the next version of POSIX takes the additions of C11 and > retrofits the other wide-character functions in POSIX but not C99 > to handle the new character types. >
Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.
On 02/22/2012 10:02 PM, Linda Walsh wrote: > > > Eric Blake wrote: > >> >> Don't think of it as 'wide-int', rather, think of it as 'the integral >> type that both contains wchar_t and WEOF'. You cannot write 'signed >> wint_t' nor 'unsigned 'wint_t'. > > > --- > ?? You say don't think of it that way, but unless I missed something, > just like wchar stood for 'wide char', (and char's have always been > signed or unsigned, (separate from short ints/unsigned short), the > term 'wint' would have come from wide int. But ints have never been > unsigned unless specifically prefixed as such... so wints shouldn't > have the ambiguity that chars have. > > It may very well exist as unsigned somewhere -- but the implementer > should be chained to a 1960's card punch and forced to write in cobol. > > You still haven't mentioned anyplace where wint_t is an unsigned > value. Yes, I have: https://lists.gnu.org/archive/html/bug-bash/2012-02/msg00070.html "both glibc (32-bit wchar_t) and cygwin (16-bit wchar_t) use a 32-bit unsigned int for wint_t." $ printf '#include\n' |gcc -E -|grep wint_t | head -n1 typedef unsigned int wint_t; > Is this a hypothetical issue? I.e. in theory it could > be unsigned , but in practice no one has ever made it so? No, it is not hypothetical. It is real. wint_t can be either signed or unsigned, and portable code cannot assume. -- Eric Blake ebl...@redhat.com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature