from:"KB SU"

[Tutor] accented characters to unaccented

2010-06-07 Thread KB SU

Hi,

I have open url and read like following:

$import urllib
$txt = urllib.urlopen("http://www.terme-catez.si";).read()
$txt

Gives output like below:
other parts are skipped ---
r\n  2010\r\n  http://www.terme-catez.si";
target="_blank">Terme
 \xc4\x8cate\xc5\xbe\r\n  Slovenija\r\n  \r\n  Spletne
re\
xc5\xa1itve\r\n  © 1996-\r\n  2010\r\n  http://www.tme
dia.biz" target="_blank">(T)media\r\n  \r\n\r\n\r\n\r\n  \r\n\r\n  \r\n\r\n
\r\n\r\n\r\nhttp://www.google-analytics.com/urchin.js";
type="text/j
avascript">\r\n\r\n_uacct =
"UA-1815955-
1";\r\nurchinTracker();\r\n\r\n\r\n\r\n\r\n'

If you see above, in junk of HTLM, there is text like 'Terme
\xc4\x8cate\xc5\xbe'  (original is 'Terme Čatež'). Now, I want to convert
code like '\xc4\x8c' or '\xc5\xbe' to unaccented chars so that 'Terme
\xc4\x8cate\xc5\xbe' become 'Terme Catez'. Is there any way convert from
whole HTML.

Thanks in advance.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] help

2010-06-17 Thread KB SU

help
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] accented characters to unaccented

[Tutor] help

2 matches

Site Navigation

Mail list logo

Footer information