diferent answers with isalpha()
Hi, I have python with sys.version_info = (2, 4, 4, 'final', 0) In Idle when I do print 'á'.isalpha() I get True. When I make and execute a script file with the same code I get False. Why do I have diferent answers ? Thank you -- http://mail.python.org/mailman/listinfo/python-list
Re: diferent answers with isalpha()
On Jul 13, 6:07 am, Jyotirmoy Bhattacharya <[EMAIL PROTECTED]> wrote: > On Jul 13, 5:05 am, [EMAIL PROTECTED] wrote: > > > In Idle when I do print 'á'.isalpha() I get True. When I make and > > execute a script file with the same code I get False. > > > Why do I have diferent answers ? > > Non-ASCII characters in ordinary (8-bit) strings have all kinds of > strangeness. First, the answer of isalpha() and friends depends on the > current locale. By default, Python uses the "C" locale where the > alphabetic characters are a-zA-z only. To set the locale to whatever > is the OS setting for the current user, put this near the beginning of > your script: > > import locale > locale.setlocale(locale.LC_ALL,'') > > Apparently IDLE does this for you. Hence the discrepancy you noted. > > Second, there is the matter of encoding. String literals like the one > you used in your example are stored in whatever encoding your text > editor chose to store your program in. If it doesn't match the > encoding using by the current locale, once again the program fails. > > As I see it, the only way to properly handle characters outside the > ASCII set is to use Unicode strings. Jyotirmoy, You are right. Thank you for your information. I will follow your advice but it gets me into another problem with string.maketrans/translate that I can't solve. -- http://mail.python.org/mailman/listinfo/python-list
Help with libxml2dom
I have just started using libxml2dom to read html files and I have some
questions I hope you guys can answer me.
The page I am working on (teste.htm):
Title
8/15/2009
>>> import libxml2dom
>>> foo = open('teste.htm', 'r')
>>> str1 = foo.read()
>>> doc = libxml2dom.parseString(str1, html=1)
>>> html = doc.firstChild
>>> html.nodeName
u'html'
>>> head = html.firstChild
>>> head.nodeName
u'head'
>>> title = head.firstChild
>>> title.nodeName
u'title'
>>> body = head.nextSibling
>>> body.nodeName
u'body'
>>> table = body.firstChild
>>> table.nodeName
u'text' #?! Why!? Shouldn't it be a table? (1)
>>> table = body.firstChild.nextSibling #why this works? is there a
text element hidden? (2)
>>> table.nodeName
u'table'
>>> tr = table.firstChild
>>> tr.nodeName
u'tr'
>>> td = tr.firstChild
>>> td.nodeName
u'td'
>>> font = td.firstChild
>>> font.nodeName
u'text' # (1)
>>> font = td.firstChild.nextSibling # (2)
>>> font.nodeName
u'font'
>>> a = font.firstChild
>>> a.nodeName
u'text' #(1)
>>> a = font.firstChild.nextSibling #(2)
>>> a.nodeName
u'a'
It seems like sometimes there are some text elements 'hidden'. This is
probably a standard in DOM I simply am not familiar with this and I
would very much appreciate if anyone had the kindness to explain me this.
Thanks.
--
http://mail.python.org/mailman/listinfo/python-list
