Bug#339818: parseString doesn't handle unicode strings

Josh Triplett Sun, 16 Apr 2006 12:57:34 -0700

reopen 339818
retitle 339818 parseString fails silently on unicode strings
thanks

Igor Stroh wrote:
> utidilib actually does handle unicode, you just have to encode
> your unicode objects appropriately first and pass the character_encoding
> option to parseString:
> 
>>>> import tidy
>>>> print tidy.parseString(u"<p>hello".encode("utf8"), char_encoding="utf8")
> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
> <html>
> <head>
> <meta name="generator" content=
> "HTML Tidy for Linux/x86 (vers 1 September 2005), see www.w3.org">
> <title></title>
> </head>
> <body>
> <p>hello</p>
> </body>
> </html>

I'm aware that utidylib can handle utf-8; that wasn't the subject of my
bug.  I'd like utidylib to either 1) handle Python's unicode string
objects correctly, or 2) complain that it doesn't handle unicode string
objects when given one.  I'd prefer the solution that makes utidylib
more capable, but I'll settle for a solution that doesn't fail silently.

Thanks,
Josh Triplett

signature.asc
Description: OpenPGP digital signature

Bug#339818: parseString doesn't handle unicode strings

Reply via email to