Bash and Unicode
Hi, I don't know where to ask this or report a bug about it. I find an issue with Unicode characters (polish) using figlet $ figlet ó /\/| |/\/|__ / /_\ |_ \ / _ \___/ /_/ \_\ ó is 243, I can call ord(u'ó') form python but not from bash $ python -c 'print ord(u"ó")' Traceback (most recent call last): File "", line 1, in TypeError: ord() expected a character, but string of length 2 found and it display 2 characters I found that they are: 195+179 and the reason for this is I got: echo -n ó | python -c 'import sys; for c in sys.stdin.read(): print ord(c)' 195 179 in my system. I thought that it's something with bash, so I try to call exec function from C (I think it don't use shell). #include void main() { execl("/usr/bin/figlet-figlet", "", "ó", NULL); } and got the same result. What's wrong? Anybody know how to fix it? Where should I report this. Or maybe I have something wrong with my system. I'm using Ubuntu 11.10 with XFce. -- Jakub Jankiewicz, Web Developer http://jcubic.pl signature.asc Description: PGP signature
Re: Bash and Unicode
On Thu, 20 Sep 2012 19:23:02 -0400 DJ Mills wrote: > On Thu, Sep 20, 2012 at 6:00 PM, Jakub Jankiewicz > wrote: > > Hi, > > > > I don't know where to ask this or report a bug about it. > > > > I find an issue with Unicode characters (polish) using figlet > > > > $ figlet ó > > > > /\/| > > |/\/|__ / > > /_\ |_ \ > > / _ \___/ > > /_/ \_\ > > > > ó is 243, I can call ord(u'ó') form python but not from bash > > > > $ python -c 'print ord(u"ó")' > > Traceback (most recent call last): > > File "", line 1, in > > TypeError: ord() expected a character, but string of length 2 found > > > > and it display 2 characters I found that they are: > > > > 195+179 > > > > and the reason for this is I got: > > > > echo -n ó | python -c 'import sys; > > for c in sys.stdin.read(): > > print ord(c)' > > 195 > > 179 > > > > in my system. I thought that it's something with bash, so I try to > > call exec function from C (I think it don't use shell). > > > > #include > > > > void main() { > > execl("/usr/bin/figlet-figlet", "", "ó", NULL); > > } > > > > and got the same result. What's wrong? Anybody know how to fix it? > > Where should I report this. Or maybe I have something wrong with my > > system. > > > > I'm using Ubuntu 11.10 with XFce. > > > > > > -- > > Jakub Jankiewicz, Web Developer > > http://jcubic.pl > > > This would be an issue with the figlet program, which has nothing to > do with bash. > > The man page (http://linux.die.net/man/6/figlet) gives the email > address i...@figlet.org > I wanted to send this to that list first, but then I found that I can't run this from bash. $ python -c 'print ord(u"ó")' but can from python $ python >>> print ord(u"ó") How come python got 2 characters in first case and one in the second? What is the difference between a way how bash handle this one letter and python? Maybe it's something with readline. Because python do the same what figlet does, handle characters that are passed to him. Maybe python interpreter do something to fix the issue that's in whole system, maybe it's because python use different library or don't use readline. Or maybe it's just configuration issue with my Ubuntu installation or in Ubuntu itself. it's definitely not related to figlet but I'm not sure if it's bash related either. I'm asking it here because you know how bash work and you can point me in right direction and figlet is not the right one. -- Jakub Jankiewicz, Web Developer http://jcubic.pl signature.asc Description: PGP signature
Re: Bash and Unicode
On Fri, 21 Sep 2012 09:11:45 +0200 Andreas Schwab wrote: > Your keyboard input apparently uses UTF-8 encoding, so the single > character ó is represented by two bytes with the values 195 and 179. > This has nothing at all to do with the shell, most likely your locale > settings are messed up. > > Andreas. > Does it mean that, Unicode work normal with bash and it display one character when everything is right? Does this code work on different systems: python -c 'print "ó"' I could report it to Ubuntu, but maybe it's something larger or smaller. Maybe it's a bug and no one nice that or just misconfiguration. In locale I have pl_PL.UTF-8 which I didn't change and everything work except bash. I have the same issue on Red Hat 4.4.6-3 run via ssh, but maybe it's with how keyboard is handled (python run inside ssh work the same but not from command line). How bash handle input? Using readline right? So next thing I should check is to look there, right? Do you know what kind of steps and different libraries/code are executed from time when user hit the key to the time when character is process with bash? So I can track back and check each step. Linux Driver, Readline, bash - are there more? -- Jakub Jankiewicz, Web Developer http://jcubic.pl signature.asc Description: PGP signature
Re: Bash and Unicode
On Fri, 21 Sep 2012 11:46:26 +0200 Andreas Schwab wrote: > Did you read <http://www.python.org/peps/pep-0263.html>? > > Andreas. > I forget about it, this work: python -c '# coding: utf-8 print ord(u"ó")' Probably interactive python have enabled it by default. I try to test sed if I can swap one character and it's not by a chance handled as 2 characters echo -n ó | sed -e 's/[ó]\{1\}/_/' and it's not. The output is one "_" So thanks with your help and sorry for trouble. -- Jakub Jankiewicz, Web Developer http://jcubic.pl signature.asc Description: PGP signature