I'm caching the unrendered templates, so there is no filesystem/parsing overhead.
It's only executing python code related to template rendering which is making it slow. I can't really cache template output since it varies a lot depending on who is viewing it. Anyone tried using psyco or similiar stuff to speed up the actual code? I'm using FCGI so the number of backends is limited to 200 per server (and hence the 100 load :). Each server is a dual xeon (I'm running the backends on 2 servers now) so I think it should do a lot more requests than what it's doing now..