diferent answers with isalpha()

2007-07-12 Thread nuno
Hi,

I have python with sys.version_info = (2, 4, 4, 'final', 0)

In Idle when I do print 'á'.isalpha() I get True. When I make and
execute a script file with the same code I get False.

Why do I have diferent answers ?


Thank you

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: diferent answers with isalpha()

2007-07-13 Thread nuno
On Jul 13, 6:07 am, Jyotirmoy Bhattacharya <[EMAIL PROTECTED]>
wrote:
> On Jul 13, 5:05 am, [EMAIL PROTECTED] wrote:
>
> > In Idle when I do print 'á'.isalpha() I get True. When I make and
> > execute a script file with the same code I get False.
>
> > Why do I have diferent answers ?
>
> Non-ASCII characters in ordinary (8-bit) strings have all kinds of
> strangeness. First, the answer of isalpha() and friends depends on the
> current locale. By default, Python uses the "C" locale where the
> alphabetic characters are a-zA-z only. To set the locale to whatever
> is the OS setting for the current user, put this near the beginning of
> your script:
>
> import locale
> locale.setlocale(locale.LC_ALL,'')
>
> Apparently IDLE does this for you. Hence the discrepancy you noted.
>
> Second, there is the matter of encoding. String literals like the one
> you used in your example are stored in whatever encoding your text
> editor chose to store your program in. If it doesn't match the
> encoding using by the current locale, once again the program fails.
>
> As I see it, the only way to properly handle characters outside the
> ASCII set is to use Unicode strings.

Jyotirmoy,

You are right. Thank you for your information.

I will follow your advice but it gets me into another problem with
string.maketrans/translate that I can't solve.

-- 
http://mail.python.org/mailman/listinfo/python-list


Help with libxml2dom

2009-08-19 Thread Nuno Santos
I have just started using libxml2dom to read html files and I have some 
questions I hope you guys can answer me.


The page I am working on (teste.htm):

 
   
 Title
   
 
 
   
 
   
   


   
   
  8/15/2009
   
 
   
 


>>> import libxml2dom
>>> foo = open('teste.htm', 'r')
>>> str1 = foo.read()
>>> doc = libxml2dom.parseString(str1, html=1)
>>> html = doc.firstChild
>>> html.nodeName
u'html'
>>> head = html.firstChild
>>> head.nodeName
u'head'
>>> title = head.firstChild
>>> title.nodeName
u'title'
>>> body = head.nextSibling
>>> body.nodeName
u'body'
>>> table = body.firstChild
>>> table.nodeName
u'text' #?! Why!? Shouldn't it be a table? (1)
>>> table = body.firstChild.nextSibling #why this works? is there a 
text element hidden? (2)

>>> table.nodeName
u'table'
>>> tr = table.firstChild
>>> tr.nodeName
u'tr'
>>> td = tr.firstChild
>>> td.nodeName
u'td'
>>> font = td.firstChild
>>> font.nodeName
u'text' # (1)
>>> font = td.firstChild.nextSibling # (2)
>>> font.nodeName
u'font'
>>> a = font.firstChild
>>> a.nodeName
u'text' #(1)
>>> a = font.firstChild.nextSibling #(2)
>>> a.nodeName
u'a'


It seems like sometimes there are some text elements 'hidden'. This is 
probably a standard in DOM I simply am not familiar with this and I 
would very much appreciate if anyone had the kindness to explain me this.


Thanks.
--
http://mail.python.org/mailman/listinfo/python-list