Re: Proposal: Django should support unicode strings

Simon Willison Thu, 12 Jan 2006 05:48:07 -0800


On 10 Jan 2006, at 20:35, Antonio Cavedoni wrote:

Maybe we could start a “unicode” branch right after “magic-removal”is merged back into the trunk?


+1, sounds smart.

I've been bitten by unicode problems a bunch of times while usingDjango and I'm not even trying to build an internationalized site.For anyone who thinks this stuff isn't relevant to them, considerthis: if you write a site that does anything with data from RSS feedsthe chances are you will need to consume and process unicode of somesort.

The Flickr Web Services API states the following with regards tounicode:


"""
The Flickr API expects all data to be UTF-8 encoded.

Checks are made for valid UTF-8 sequences. If an invalid sequence isfound, the data is presumed to be ISO-8859-1 and convertedaccordingly to UTF-8.

Sending data in any other encoding will result in garbage intoFlickr. It wont be dangerous garbage (we will always store validUTF-8) but it will still be garbage.


""" http://www.flickr.com/services/api/misc.encoding.html

This approach (assuming anything that is invalid UTF-8 is actuallyISO-8859-1 and converting it) appears to work extremely well. Weshould do this for Django, though we should look at any providedcharset data and convert based on that initially (and only assumeISO-8859-1 if that fails).


Cheers,

Simon

Re: Proposal: Django should support unicode strings

Reply via email to