#33015: QuerySet.extra Use Case: use db indexes when filtering on large lists of
tuples
-------------------------------------+-------------------------------------
Reporter: kris- | Owner: nobody
swann |
Type: | Status: new
Uncategorized |
Component: Database | Version: 2.2
layer (models, ORM) |
Severity: Normal | Keywords: QuerySet.extra
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
Hello,
While reading through the documentation for .extra(..) it looked like use
cases should be logged as a ticket here -- so here is that.
In an effort to speed up some slow queries that we've been encountering
I've been looking into taking advantage of db indexes.
Basically we have run into the same issue as this SO question:
https://stackoverflow.com/questions/23991090/django-filter-multiple-
columns-with-a-list-of-tuples
We are using Postgres as our backing db.
We have a model that looks like this
{{{
class ExampleModel(models.Model):
val1 = models.TextField()
val2 = models.TextField()
# etc...
class Meta:
indexes = [
models.Index(fields=['val1', 'val2'])
]
}}}
For demonstration purposes, we'll fill it with sample data
{{{
sample_data = [
ExampleModel(val1=str(v1), val2=str(v2))
for v1 in range(0,1000)
for v2 in range(0,1000)
]
ExampleModel.objects.bulk_create(sample_data)
}}}
Using a or-chained Q object is significantly slower
Note: In practice this can be a list of any arbitrary combinations, but
for ease of generation, we'll use an easy to generate list of tuples
{{{
items_to_fetch = [
(i, j)
for i in range(0,100)
for j in range(0,100)
]
}}}
Using an or-chained Q object:
{{{
import time
query = Q()
for v1, v2 in items_to_fetch:
query |= Q(val1=v1, val2=v2)
start = time.perf_counter()
ExampleModel.objects.filter(query)
end = time.perf_counter()
print(f"{end - start} seconds")
>>> OUTPUT:
>>> 43.73997112699999 seconds
}}}
VS.
Using .extra(...)
{{{
import time
start = time.perf_counter()
ExampleModel.objects.extra(
where=["(val1, val2) in %s"],
params=[tuple(items_to_fetch)]
)
end = time.perf_counter()
print(f"{end - start} seconds")
>>> OUTPUT:
>>> 0.0004218950000449695 seconds
}}}
If there are any alternatives worth trying out, that would be really
helpful info. I wasn't able to find anything in the docs (although I might
have just been looking in the wrong places too).
--
Ticket URL: <https://code.djangoproject.com/ticket/33015>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/053.c10780c7e581f6f9f1af286899bb91dc%40djangoproject.com.