On Thu, 06 Aug 2009 20:05:52 +0200, Thorsten Kampe wrote:
> > That is significant! So the winner is:
> >
> > unicode('äöüÄÖÜß','utf-8')
>
> Unless you are planning to write a loop that decodes "äöüÄÖÜß" one
> million times, these benchmarks are meaningless.
What if you're writing a loop which takes one million different lines of
text and decodes them once each?
>>> setup = 'L = ["abc"*(n%100) for n in xrange(1000000)]'
>>> t1 = timeit.Timer('for line in L: line.decode("utf-8")', setup)
>>> t2 = timeit.Timer('for line in L: unicode(line, "utf-8")', setup)
>>> t1.timeit(number=1)
5.6751680374145508
>>> t2.timeit(number=1)
2.6822888851165771
Seems like a pretty meaningful difference to me.
--
Steven
--
http://mail.python.org/mailman/listinfo/python-list