On 21 elo, 13:46, Aymeric Augustin <aymeric.augus...@polytechnique.org> wrote: > Hello, > > The first steps of porting Django to Python 3 was to switch on > unicode_literals, as explained here [1]. This change was discussed in > ticket #18269 [2] and committed in changeset 4a103086d5 [3]. > > This changeset added `from __future__ import unicode_literals` only > where necessary, ie. in modules that contained explicit unicode > literals. This choice absolutely makes sense. Switching > unicode_literals on everywhere in Django would have resulted in tons > of b"" prefixes and/or in an incredible number of changes. Both > options were unrealistic. > > However, it has an unfortunate side effect. In master, some modules > have unicode_literals and others don't. I find myself constantly > checking which mode is in effect. > > So we have two options at this point. > > (1) The status quo > > Pros: > - less work in the short term > - avoiding the cons of solution (2) > > Cons: > - check-top-of-file syndrom > - different behavior in Python 2 and Python 3 in some modules > - different behavior between some modules and others (eg. > moving code isn't safe) > - cognitive overhead > > (2) Progressively turn unicode_literals on throughout the codebase. If > we do it in small steps, it becomes easier to ensure that the change > from str literals to unicode literals doesn't result in regressions on > Python 2. That's how we handled the entire Python 3 port and it worked > well — several regressions were quickly caught and fixed. > > Pros: > - consistent codebase, easier to maintain in the long term > - avoiding the cons of solution (1) > > Cons: > - "native strings" have to be expressed as str("...") in > modules that need them > - more changes, higher risk of regressions on Python 2 > > In my opinion, option (2) is a logical move at this point. However I > believe it deserves a public discussion (or at least an explanation). > What do you think? > > Best regards,
I did some benchmark runs some time ago, and it seems the unicode_literals caused a small performance regression in many queryset related benchmarks. The only one I have available is this: http://users.tkk.fi/~akaariai/djbench/queryannotate.html I remembered doing the benchmarks when reading this post, and thought to mention this. The regression is small and I don't have any ideas how to solve them. It might be it is just a testing artefact. So, this is in no way a complaint against unicode_literals, just something I though to share. BTW if there happens to be some unused hardware available I could automate such benchmarks as above. The hardware needs to be dedicated, and a virtual machine will not do. Benchmarking on shared/virtual machine will lead to inaccurate results. However the performance of the HW isn't important at all, actually an older machine might be better for this purpose... For the actual question: I vote we move every non-empty file to unicode_literals. If we use smaller steps we can bisect breakages easier. - Anssi -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.