Re: accents

2011-05-16 Thread Andreas Schwab
Chet Ramey  writes:

> That's a non sequitor.  My point is that, as I read it, UTF-8 requires the
> use of the shortest sequence that can represent a particular character.

The character is U+0301, and the shorted sequence is 0xcc 0x81.

> In this case, that means that U+00E9 must be used to represent e with
> acute intead of e plus U+0301.

Those are two different Unicode character sequences.

> The point is that the utf-8 encodings of precomposed and decomposed
> unicode are different

Of course, they are different characters.  Encoding and normalisation
are different concepts, the former is about physical representation of
Unicode, the latter is about conversion between abstract Unicode values.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



Re: accents

2011-05-16 Thread Chet Ramey
On 5/9/11 10:46 AM, Thomas De Contes wrote:
> Configuration Information [Automatically generated, do not change]:
> Machine: i386
> OS: darwin10.7.0
> Compiler: /usr/bin/gcc-4.2
> Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='i386' 
> -DCONF_OSTYPE='darwin10.7.0' -DCONF_MACHTYPE='i386-apple-darwin10.7.0' 
> -DCONF_VENDOR='apple' 
> -DLOCALEDIR='/Users/thomas/Administration-ordinateur/autoinstall/macports/share/locale'
>  -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H -DMACOSX   -I.  -I. -I./include 
> -I./lib  
> -I/Users/thomas/Administration-ordinateur/autoinstall/macports/include -pipe 
> -O2 -arch x86_64
> uname output: Darwin tDeContes-fixe.local 10.7.0 Darwin Kernel Version 
> 10.7.0: Sat Jan 29 15:17:16 PST 2011; root:xnu-1504.9.37~1/RELEASE_I386 i386
> Machine Type: i386-apple-darwin10.7.0
> 
> Bash Version: 4.2
> Patch Level: 8
> Release Status: release
> 
> 
> Description:
> 
> 1
> when i do
> PS1="&# $PS1"
> then I have problems since there is some accents in my command lines :
> http://cjoint.com/data1/1dweGsOZD6M.htm
> http://cjoint.com/data1/1dweHvDnu8H.htm
> http://cjoint.com/data1/1dweHSHlKQq.htm
> http://cjoint.com/data1/1dweIqnfKNe.htm
> http://cjoint.com/data1/1dweINbOCMJ.htm
> http://cjoint.com/data1/1dweJgK8K6m.htm
> http://cjoint.com/data1/1dweJGszDSY.htm
> http://cjoint.com/data1/1dweLRlowWS.htm
> 
> i heard that you had the problem without having to do
> PS1="&# $PS1"
> and that you corrected it in that case
> 
> 2
> without doing
> PS1="&# $PS1"
> i don't have problem when there is just one accent on a small line, but i 
> have a lot of them when the command line is larger than the terminal and is 
> displayed on several lines in the terminal
> 
> Do you need screen captures too ?
> 
> 
> Repeat-By:
> 
> make a file and give it a name containing an accent
> 
> 1
> - execute
> PS1="&# $PS1"
> - drag & drop the file with the accent
> - use "top arrow" and "bottom arrow" to move in the history :
> at each time you move on the line containing an accent, it eats one character
> 
> 2
> - type "enter" several times, to get a lot of lines printed on the terminal
> - drag & drop the file with the accent several times, until it makes a 
> command line too long to be written on only one line (the command line should 
> be larger than the terminal) :
> it should already begin to have a strange behavior, at this time (it doesn't 
> continue on the next line)
> - use "top arrow" and "bottom arrow" to move in the history :
> at each time you move on the line containing an accent, it eats not only 
> characters, but lines

These are both the result of a bug in Mac OS X.  Its implementation of
wcwidth doesn't return correct results for Unicode and UTF-8 combining
characters.  Take a look at my messages from yesterday for more details.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/