Hi devs, When prefetching related items for a queryset returning a large amount of items, the generated SQL can be quite inefficient. Here's an example:
class Category(models.Model): type = models.PositiveIntegerField(db_index=True) class Item(models.Model): category = models.ForeignKey(Category, related_name='%(class)ss') If I have 50.000 categories of type=5, then "Category.objects.filter(type=5).prefetch_related('items')" will generate an SQL on the Item table with an IN clause containing the 50.000 Category ID's. This is because get_prefetch_queryset() on the Manager classes in django.db.models.related all do this: def get_prefetch_queryset(self, instances, queryset=None): [...] query = {'%s__in' % self.query_field_name: instances} queryset = queryset._next_is_sticky().filter(**query) [...] While this is great when instances is a short list, we can hope to do better than that when instances is large. A much more efficient query in the above case would be queryset._next_is_sticky().filter(category__type=5). We just need to make sure this alternative query will fetch (at least) the same items as the query with the IN clause would. I thought I could use Prefetch('items', queryset=Item.objects.filter(category__type=5))to accomplish this, but while the custom queryset *is* used, get_prefetch_queryset() still adds the IN clause unconditionally. This is A) redundant B) sends a huge SQL string over the wire for the database to process, and C) seriously messes up the database query planner, often generating an inefficient execution plan. I would like to fix this problem. The easiest way seems to be to let Prefetch() tell get_prefetch_queryset() to skip the IN clause. django.db.models.prefetch_one_level() is in charge of passing the Prefetch queryset to get_prefetch_queryset(), so one option would be to add a skip_in_clause attribute to the Prefetch model which prefetch_one_level() would pass to get_prefetch_queryset(). The latter could then do: def get_prefetch_queryset(self, instances, queryset=None, skip_in_clause= False): [...] if not skip_in_clause: query = {'%s__in' % self.query_field_name: instances} queryset = queryset._next_is_sticky().filter(**query) [...] In my preliminary testing on real data, this: Category.objects.filter(type=5).prefetch_related(Prefetch('items', queryset= Item.objects.filter(category__type=5), skip_in_clause=True)) is about 20x faster than the current implementation when the query returns 50.000 categories. The improvement would be greater on larger datasets. Any comments? If needed, I can provide a pull request with a suggested implementation. Apart from the Prefetch class and prefetch_one_level(), there are 4 instances of get_prefetch_queryset() in django.db.models.related that would need to be changed. Thanks, Erik -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at http://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/8fe6b600-854c-491e-9acf-e13399d1eafb%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.