#33015: QuerySet.extra Use Case: use db indexes when filtering on large lists of
tuples
-------------------------------------+-------------------------------------
               Reporter:  kris-      |          Owner:  nobody
  swann                              |
                   Type:             |         Status:  new
  Uncategorized                      |
              Component:  Database   |        Version:  2.2
  layer (models, ORM)                |
               Severity:  Normal     |       Keywords:  QuerySet.extra
           Triage Stage:             |      Has patch:  0
  Unreviewed                         |
    Needs documentation:  0          |    Needs tests:  0
Patch needs improvement:  0          |  Easy pickings:  0
                  UI/UX:  0          |
-------------------------------------+-------------------------------------
 Hello,

 While reading through the documentation for .extra(..) it looked like use
 cases should be logged as a ticket here -- so here is that.

 In an effort to speed up some slow queries that we've been encountering
 I've been looking into taking advantage of db indexes.

 Basically we have run into the same issue as this SO question:
 https://stackoverflow.com/questions/23991090/django-filter-multiple-
 columns-with-a-list-of-tuples

 We are using Postgres as our backing db.

 We have a model that looks like this

 {{{
 class ExampleModel(models.Model):
     val1 = models.TextField()
     val2 = models.TextField()
     # etc...

     class Meta:
         indexes = [
             models.Index(fields=['val1', 'val2'])
         ]
 }}}


 For demonstration purposes, we'll fill it with sample data

 {{{
 sample_data = [
     ExampleModel(val1=str(v1), val2=str(v2))
     for v1 in range(0,1000)
     for v2 in range(0,1000)
 ]
 ExampleModel.objects.bulk_create(sample_data)
 }}}

 Using a or-chained Q object is significantly slower

 Note: In practice this can be a list of any arbitrary combinations, but
 for ease of generation, we'll use an easy to generate list of tuples
 {{{
 items_to_fetch = [
     (i, j)
     for i in range(0,100)
     for j in range(0,100)
 ]
 }}}


 Using an or-chained Q object:
 {{{
 import time
 query = Q()
 for v1, v2 in items_to_fetch:
     query |= Q(val1=v1, val2=v2)
 start = time.perf_counter()
 ExampleModel.objects.filter(query)
 end = time.perf_counter()
 print(f"{end - start} seconds")

 >>> OUTPUT:
 >>> 43.73997112699999 seconds
 }}}

 VS.

 Using .extra(...)
 {{{
 import time
 start = time.perf_counter()
 ExampleModel.objects.extra(
     where=["(val1, val2) in %s"],
     params=[tuple(items_to_fetch)]
 )
 end = time.perf_counter()
 print(f"{end - start} seconds")

 >>> OUTPUT:
 >>> 0.0004218950000449695 seconds
 }}}



 If there are any alternatives worth trying out, that would be really
 helpful info. I wasn't able to find anything in the docs (although I might
 have just been looking in the wrong places too).

-- 
Ticket URL: <https://code.djangoproject.com/ticket/33015>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/053.c10780c7e581f6f9f1af286899bb91dc%40djangoproject.com.

Reply via email to