Bash and Unicode

2012-09-20 Thread Jakub Jankiewicz
Hi,

I don't know where to ask this or report a bug about it.

I find an issue with Unicode characters (polish) using figlet

$ figlet ó

  /\/|
 |/\/|__ /
  /_\ |_ \
 / _ \___/
/_/ \_\

ó is 243, I can call ord(u'ó') form python but not from bash

$ python -c 'print ord(u"ó")'
Traceback (most recent call last):
  File "", line 1, in 
TypeError: ord() expected a character, but string of length 2 found

and it display 2 characters I found that they are:

195+179

and the reason for this is I got:

echo -n ó | python -c 'import sys;
for c in sys.stdin.read():
print ord(c)'
195
179

in my system. I thought that it's something with bash, so I try to call
exec function from C (I think it don't use shell).

#include 

void main() {
execl("/usr/bin/figlet-figlet", "", "ó", NULL);
}

and got the same result. What's wrong? Anybody know how to fix it?
Where should I report this. Or maybe I have something wrong with my
system.

I'm using Ubuntu 11.10 with XFce.


--
Jakub Jankiewicz, Web Developer
http://jcubic.pl


signature.asc
Description: PGP signature


Re: Bash and Unicode

2012-09-20 Thread Jakub Jankiewicz


On Thu, 20 Sep 2012 19:23:02 -0400
DJ Mills  wrote:

> On Thu, Sep 20, 2012 at 6:00 PM, Jakub Jankiewicz 
> wrote:
> > Hi,
> >
> > I don't know where to ask this or report a bug about it.
> >
> > I find an issue with Unicode characters (polish) using figlet
> >
> > $ figlet ó
> >
> >   /\/|
> >  |/\/|__ /
> >   /_\ |_ \
> >  / _ \___/
> > /_/ \_\
> >
> > ó is 243, I can call ord(u'ó') form python but not from bash
> >
> > $ python -c 'print ord(u"ó")'
> > Traceback (most recent call last):
> >   File "", line 1, in 
> > TypeError: ord() expected a character, but string of length 2 found
> >
> > and it display 2 characters I found that they are:
> >
> > 195+179
> >
> > and the reason for this is I got:
> >
> > echo -n ó | python -c 'import sys;
> > for c in sys.stdin.read():
> > print ord(c)'
> > 195
> > 179
> >
> > in my system. I thought that it's something with bash, so I try to
> > call exec function from C (I think it don't use shell).
> >
> > #include 
> >
> > void main() {
> > execl("/usr/bin/figlet-figlet", "", "ó", NULL);
> > }
> >
> > and got the same result. What's wrong? Anybody know how to fix it?
> > Where should I report this. Or maybe I have something wrong with my
> > system.
> >
> > I'm using Ubuntu 11.10 with XFce.
> >
> >
> > --
> > Jakub Jankiewicz, Web Developer
> > http://jcubic.pl
> 
> 
> This would be an issue with the figlet program, which has nothing to
> do with bash.
> 
> The man page (http://linux.die.net/man/6/figlet) gives the email
> address i...@figlet.org
> 

I wanted to send this to that list first, but then I found that I
can't run this from bash.

$ python -c 'print ord(u"ó")'

but can from python

$ python
>>> print ord(u"ó")

How come python got 2 characters in first case and one in the second?

What is the difference between a way how bash handle this one letter
and python? Maybe it's something with readline. Because python do the
same what figlet does, handle characters that are passed to him. Maybe
python interpreter do something to fix the issue that's in whole
system, maybe it's because python use different library or don't use
readline. Or maybe it's just configuration issue with my Ubuntu
installation or in Ubuntu itself.

it's definitely not related to figlet but I'm not sure if it's bash
related either. I'm asking it here because you know how bash work and
you can point me in right direction and figlet is not the right one.

--
Jakub Jankiewicz, Web Developer
http://jcubic.pl


signature.asc
Description: PGP signature


Re: Bash and Unicode

2012-09-21 Thread Jakub Jankiewicz

On Fri, 21 Sep 2012 09:11:45 +0200
Andreas Schwab  wrote:

> Your keyboard input apparently uses UTF-8 encoding, so the single
> character ó is represented by two bytes with the values 195 and 179.
> This has nothing at all to do with the shell, most likely your locale
> settings are messed up.
> 
> Andreas.
> 

Does it mean that, Unicode work normal with bash and it display one
character when everything is right? Does this code work on different
systems:

python -c 'print "ó"'

I could report it to Ubuntu, but maybe it's something larger or
smaller. Maybe it's a bug and no one nice that or just misconfiguration.

In locale I have pl_PL.UTF-8 which I didn't change and everything work
except bash.

I have the same issue on Red Hat 4.4.6-3 run via ssh, but maybe it's
with how keyboard is handled (python run inside ssh work the same but
not from command line).

How bash handle input? Using readline right? So next thing I should
check is to look there, right?

Do you know what kind of steps and different libraries/code are
executed from time when user hit the key to the time when character is
process with bash? So I can track back and check each step.

Linux Driver, Readline, bash - are there more?

--
Jakub Jankiewicz, Web Developer
http://jcubic.pl


signature.asc
Description: PGP signature


Re: Bash and Unicode

2012-09-21 Thread Jakub Jankiewicz


On Fri, 21 Sep 2012 11:46:26 +0200
Andreas Schwab  wrote:

> Did you read <http://www.python.org/peps/pep-0263.html>?
> 
> Andreas.
> 

I forget about it, this work:

python -c '# coding: utf-8
print ord(u"ó")'

Probably interactive python have enabled it by default. I try to test
sed if I can swap one character and it's not by a chance handled as 2
characters

echo -n ó | sed -e 's/[ó]\{1\}/_/'

and it's not. The output is one "_"

So thanks with your help and sorry for trouble.

--
Jakub Jankiewicz, Web Developer
http://jcubic.pl


signature.asc
Description: PGP signature