On Tue, Dec 04, 2007 at 11:18:38PM +0100, Norbert Tretkowski wrote: > I've read the whole bugreport a few times now, and I think I have to > agree with Sean, we should switch the default charset for new databases > to utf8, but shouldn't touch existing ones.
Agreed. I think the easiest way to accomplish this (from the sysadmin's perspective) would be to have a separate package, maybe mysql-server-utf8? But again, I'm not an experienced packager. I'm sure there are great debianish minds that will choose the best path. As for why they didn't go with utf8 as the default in the past, I recommend this article: http://dev.mysql.com/tech-resources/articles/4.1/unicode.html Actually, I *really* recommend that article. It made the difference for me in terms of understanding what otherwise seemed like a nonsensical charset forest with mysql. > Having some testers with MySQL and utf8 experience would be great, > thanks for your offer. Expect a package in experimental if we really > decide to switch the default charset. I look forward to it! I'm confident that together we can get something packaged that will make a lot of people's lives significantly easier in the long run. I did want to chime in with a few more things: (1) A previous comment seemed to indicate that changes to my.cnf would cover everything. If it's possible to get mysql to do utf8 across the board by default (server, db, client, conn) by only adjusting the my.cnf under debian, would someone please attach such a my.cnf? I was unsuccessful in my attempts to utf8-ify that way. (2) What are the consequences of changing the default collation? The default collation for utf8 is utf8_general_ci (try 'show character set;'), but the default for latin1 is latin1_swedish_ci (‽‽). I'll tell you now that the consequences could be painful ☮✈⚔⚠☫⚛☠±♫♥ for users of some webapps, including (in the past) drupal: http://drupal.org/node/66333 In case that wasn't clear, I mean that it can break things. Note that the drupal example above was specifically a collation issue (http://drupal.org/node/66333#comment-412577), and I feel sorry for the reporter, who got a "won't fix" and "When it does occur, it is relatively easy to fix by hand," which is--with all due respect to the fine drupal people--bogus, imho. The problem is that you can't always anticipate when/where/how charset conversion or collation problems will be happening. Here's a horrible example of how NOT to do latin1 -> utf8: http://lists.wikimedia.org/pipermail/mediawiki-l/2004-November/002245.html It would be nice if everybody understood encodings thoroughly and played nice, but doing a little poking turns up tons of examples of webapps behaving badly, and for a variety of reasons. Or maybe it's a clash of expectations/preferences? My personal non-database favorite encoding hobby horse is mailman lists and their archives. Perhaps it's irrational of me to think that I shouldn't have to change browser settings to view things correctly. Try visiting: http://lists.ibiblio.org/pipermail/cc-jp/ for example. I promise it's not broken.* Mostly. :) The more we consolidate on utf8, the better things get, but along the way there will be painful moments. That's life. UTF8 by default is a change that should happen, but carefully, and there will likely need to be legacy support for the old defaults for some time. Cheers, -- Cristóbal Palmer ibiblio.org systems administrator * Hint: View -> Character Encodings -> More Encodings -> East Asian -> EUC-JP Bonus points if you can tell me why some pages, eg. http://lists.ibiblio.org/pipermail/cc-jp/2004-March/000128.html look broken. Mailing list archives are fun, see?