I spent the better part of yesterday mucking around in the dregs of Django's cache middleware and related modules, and in doing so I've come to the conclusion that, due to an accumulation of hinderances and minor bugs, the per-site and per-view caching mechanism are effectively broken for many fairly typical usage patterns.
Let me demonstrate by fictional example, with what I would consider to be a pretty typical configuration and use case for the per-site cache: Let's pretend I'm developing a blog powered by Django. I'm using memcached, and I would like to cache pages on that blog for anonymous users, who are going to make up the vast majority of my site's visitors. Ideally, I will serve the exact same cached version of a blog post to every single anonymous visitor to my site, which will help keep server load under control, particularly when I get slashdotted/reddited/what-have-you. Like any blog, a typical page view features the content primarily (e.g a blog post). It also has some "auth" stuff at the top right, which will say "Log in / Register" for non logged in users but show a username and welcome message for logged in users. Each blog post also has an empty comment form at the bottom of it where users can leave comments on the post. Like 99% of the websites out there, I will be using Google Analytics to track my visitors etc. Pretty straightforward, right? Let me count the ways that Django's cache middleware will muck up my goals in the above scenario. First, I'm going to try use the per site cache. Here's what's going to go wrong for me: * It's going to be virtually impossible for me to avoid my cache varying by cookie and thus by visitor. Because in my templates I am checking to see if the current user is logged in, I'm touching the session, which is going to now set the vary cookie header. That means if there is any difference in the cookies users are requesting my pages with, I'm going to be sending each user a separate cached page, keyed off of SESSION_COOKIE_NAME, which is unique for every visitor. * Even if I avoid touching the request user somehow, the CSRF middleware presents the same issue. Because I have a comment form on every page, I have a unique CSRF token for each visitor. Thankfully Django doesn't let me completely shoot myself in the foot by caching the page with one user's token and serving it to everybody else. At least it helpfully sets a CSRF token cookie and varies on it to prevent this. However, that cookie is different for every unique user. That triggers the the same problem as above. I again cannot avoid caching a unique page for each unique visitor. * Unfortunately, my troubles are not over, even if I resign myself to having a cache that varies per visitor. You see, Google Analytics actually sets a handful of other cookies with each page request. And guess what? The values for those cookies are unique *for each request*. This mean...I'm actually not caching at all. Cookies are unique for each and every page request thanks to Google Analytics. My per-site cache configuration is totally and completely inoperable, all because I'm using a tracking service that pretty much *everybody* uses. Since that didn't work, I wonder if it'll work if I do per-view caching? It shouldn't work at all, should it, since it's not like any of the factors I outlined above are different if I'm using the @cache_page decorator to do my caching vs the per-site cache. Well, the sad news is caching does "work" when I use cache_page, and that's not a good thing: * @cache_page caches the direct output of the view/render function. It skips over the middleware that might have very good reason to introduce vary headers and doesn't introduce any vary headers of it's own. So now, with this applied, I *am* serving a cached version of this page even though I absolutely should not be. Some poor user's token is now being sent to everybody. My only chance of redemption is if I happen to have read the docs and discovered that this incantation is required to prevent having cache_page improperly cache the page: @cache_page(60 * 15) @csrf_protect def my_view(request): # ... Of course, the above just puts me right back where I started at the per-site level. There was never any chance of making cache_page work any different from the per-site cache, but it certainly proved to be a temptation if I'm a hurried developer, frustrated by why my per site cache wasn't working and "thankful" for the fact that I could get the cache to start "working" with the cache_page decorator. Hopefully the above example really makes it clear to you guys how all of the seemingly minor bugs and imperfections really do add up to a broken situation for someone coming to this with a pretty standard set of expectations and requirements. Anyhow, the good news is that a good portion of what I have written about already has open tickets which in some cases are close to being ready for checkin: * Google Analytics is a known issue with a proposed patch: https://code.djangoproject.com/ticket/9249 * CSRF is known to not play nicely with caching, it's documented at least: https://docs.djangoproject.com/en/dev/ref/contrib/csrf/#caching * The actual underlying cache_page issue is ticketed: https://code.djangoproject.com/ticket/15855 Still, I can't help but feel that, to an extent, these are band aids. There is still an exceptionally narrow set of circumstances that would allow me to serve a single cached page to all anonymous visitors to my site: namely, I can't touch request.user and I can't use CSRF. Quite honestly, I'm not even sure you should be using a framework like Django if most of your pages don't have logic pertaining to a logged in vs. anonymous user, or have some kind of form on them which requires CSRF protection. Even if all of the above tickets got fixed, it seems like we're still in kind of a bad place. I don't know that I have good solutions to any of this (though I am very much willing to contribute work toward such a solution). I do have a few ideas/questions to pose to conclude with here: * Is it reasonable to set as a goal that Django should attempt to support per site caching for the scenario I described above? I mean, am I wrong in thinking that in an ideal world, it should be possible to serve the same cached page to all anonymous users most of the time, even if there are forms or anonymous vs. logged in user logic on it? * Is an embedded token the only form in which CSRF protection can come from? Why can't the token be set as a cookie and the value of that cookie serve as the CSRF verification (without varying on it in the cache, obviously)? Or perhaps there's a way to dynamically generate a CSRF token via ajax after the page load? I'm certain someone much smarter and more knowledgable than I will point out why these are dreadfully horrible, unworkable ideas, but the embedded token is sort of a deal breaker for effective caching, and these days many, many sites have forms on almost every page (e.g. a hidden login form that's revealed when you press login, comment form, etc.). * Why does the cookie have to vary if the request user object is touched on the template even though it's not authenticated? If the sessionid isn't even in the request cookie (i.e. for a first time visitor), then it doesn't require a real "check" of the session. And correct me if I'm wrong, but doesn't the session key get cycled when a user logs in anyway? In other words, a session key that represents an anonymous user will *always* represent an anonymous user. Perhaps there's a way to keep track of those so the anonymous session ids so the same anonymous cached view can be served to them all. What a waste to generate the entire page dynamically for each individual anonymous user all because of one simple key lookup. Again, this is probably a hopelessly naive idea with a sensible, obvious rebuttal, but perhaps there is some merit in coming up with a creative solution? I have to guess some of you have already spent some brain cycles thinking about the above issues I've raised, in whole or in part, and I apologize if I'm re-hashing an old debate or am so totally off-base that I've wasted your time if you made it this far. My intent, again, is not to complain, but to see if others agree that the current state of the per-site cache is not so great, and if so, to elicit some ideas on how to best address it. It also seems to me that there is more than just one problem standing in the way of things, so "success" might require something of a coordinated effort. Please do let me know if my concerns make sense, if my goal is a legitimate one, if I'm wrong in part or in whole, etc. etc. As I said earlier, if there's a path forward on any of the above I am happy to contribute to the effort. Thanks for listening. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To view this discussion on the web visit https://groups.google.com/d/msg/django-developers/-/G7iNJsARF4IJ. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.