reopen 339818 retitle 339818 parseString fails silently on unicode strings thanks
Igor Stroh wrote: > utidilib actually does handle unicode, you just have to encode > your unicode objects appropriately first and pass the character_encoding > option to parseString: > >>>> import tidy >>>> print tidy.parseString(u"<p>hello".encode("utf8"), char_encoding="utf8") > <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN"> > <html> > <head> > <meta name="generator" content= > "HTML Tidy for Linux/x86 (vers 1 September 2005), see www.w3.org"> > <title></title> > </head> > <body> > <p>hello</p> > </body> > </html> I'm aware that utidylib can handle utf-8; that wasn't the subject of my bug. I'd like utidylib to either 1) handle Python's unicode string objects correctly, or 2) complain that it doesn't handle unicode string objects when given one. I'd prefer the solution that makes utidylib more capable, but I'll settle for a solution that doesn't fail silently. Thanks, Josh Triplett
signature.asc
Description: OpenPGP digital signature