Hi, I'm working on Argon (http://www.third-bit.com/trac/argon) with Greg Wilson this summer
We're having a very strange problem with Python's unicode parsing of source files. Basically, our CGI script was running extremely slowly on our production box (a pokey dual-Xeon 3GHz w/ 4GB RAM and 15K SCSI drives). Slow to the tune of 6-10 seconds per request. I eventually tracked this down to imports of our source tree; the actual request was completing in 300ms, the rest of the time was spent in __import__. After doing some gprof profiling, I discovered _PyUnicodeUCS2_IsLinebreak was getting called 51 million times. Our code is 1.2 million characters, so I hardly think it makes sense to call IsLinebreak 50 times for each character; and we're not even importing our entire source tree on every invocation. Our code is a fork of Trac, and originally had these lines at the top: # -*- coding: iso8859-1 -*- This made me suspicious, so I removed all of them. The CGI execution time immediately dropped to ~1 second. gprof revealed that _PyUnicodeUCS2_IsLinebreak is not called at all anymore. Now that our code works fast enough, I don't really care about this, but I thought python-dev might want to know something weird is going on with unicode splitlines. I documented my investigation of this problem; if anyone wants further details, just email me. (I'm not on python-dev) http://www.third-bit.com/trac/argon/ticket/525 Thanks in advance, Keir _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com