#33025: Avoid accessing ConnectionsHandler.__getitem__ in Query.build_lookup
unless
necessary, for performance.
-------------------------------------+-------------------------------------
Reporter: Keryn | Owner: Keryn Knight
Knight |
Type: | Status: assigned
Cleanup/optimization |
Component: Database | Version: dev
layer (models, ORM) |
Severity: Normal | Keywords:
Triage Stage: | Has patch: 1
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
Currently, `build_lookup` contains:
{{{
# For Oracle '' is equivalent to null. The check must be done at this
# stage because join promotion can't be done in the compiler. Using
# DEFAULT_DB_ALIAS isn't nice but it's the best that can be done here.
# A similar thing is done in is_nullable(), too.
if
(connections[DEFAULT_DB_ALIAS].features.interprets_empty_strings_as_nulls
and
lookup_name == 'exact' and lookup.rhs == ''):
return lhs.get_lookup('isnull')(lhs, True)
}}}
but going through `connections` requires accessing the thread critical
`asgiref.Local` which is only strictly necessary if the cheap comparisons
are truthy and can be short-circuited like so:
{{{
if (lookup_name == 'exact' and lookup.rhs == '' and
connections[DEFAULT_DB_ALIAS].features.interprets_empty_strings_as_nulls):
return lhs.get_lookup('isnull')(lhs, True)
}}}
This is coming off the back of the report in #33015, which is a nice
pathological case we can use:
{{{
In [1]: from django.db.models import Q
In [2]: from testapp.models import ExampleModel
In [3]: items_to_fetch = [(i, j) for i in range(0,100) for j in
range(0,100)]
In [4]: query = Q()
...: for v1, v2 in items_to_fetch:
...: query |= Q(val1=v1, val2=v2)
%timeit x = ExampleModel.objects.filter(query)
644 ms ± 6.26 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
}}}
If we cProfile that `ExampleModel.objects.filter(query)` we get something
like:
{{{
ncalls tottime percall cumtime percall filename:lineno(function)
30001/1 0.178 0.000 1.268 1.268 query.py:1221(build_filter)
40000 0.142 0.000 0.157 0.000 query.py:1457(names_to_path)
10002/1 0.077 0.000 1.268 1.268 query.py:1386(_add_q)
20000 0.046 0.000 0.379 0.000 query.py:1156(build_lookup)
}}}
and if we do `%lprun -f Query.build_filter -f Query.build_lookup
ExampleModel.objects.filter(query)` on it we'll get (snipped for brevity):
{{{
Function: build_filter at line 1221
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1329 20000 651989.0 32.6 24.9 condition =
self.build_lookup(lookups, col, value)
Function: build_lookup at line 1156
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1195 20000 200730.0 10.0 38.1 if
(connections[DEFAULT_DB_ALIAS].features.interprets_empty_strings_as_nulls
and
1196
lookup_name == 'exact' and lookup.rhs == ''):
1197 return
lhs.get_lookup('isnull')(lhs, True
}}}
Because I'm using sqlite, that 38% of time is essentially free to reclaim
by switching the if condition as above, in which case we get cProfile
output like so:
{{{
ncalls tottime percall cumtime percall filename:lineno(function)
30001/1 0.171 0.000 1.082 1.082 query.py:1220(build_filter)
40000 0.145 0.000 0.159 0.000 query.py:1456(names_to_path)
10002/1 0.076 0.000 1.082 1.082 query.py:1385(_add_q)
20000 0.042 0.000 0.177 0.000 query.py:1559(setup_joins)
220005 0.039 0.000 0.064 0.000 {built-in method
builtins.isinstance}
20000 0.037 0.000 0.223 0.000 query.py:1156(build_lookup)
}}}
and line profile information like so:
{{{
Function: build_filter at line 1220
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1328 20000 447682.0 22.4 18.9 condition =
self.build_lookup(lookups, col, value)
Function: build_lookup at line 1156
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1195 20000 12744.0 0.6 3.9 if (lookup_name
== 'exact' and lookup.rhs == '' and
1196
connections[DEFAULT_DB_ALIAS].features.interprets_empty_strings_as_nulls):
1197 return
lhs.get_lookup('isnull')(lhs, True
}}}
which ultimately changes the timed performance for non oracle backends to:
{{{
In [5]: %timeit x = ExampleModel.objects.filter(query)
547 ms ± 3.59 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
}}}
(YMMV; it's being tested on a laptop and I've seen it ranging from 519ms
to 600ms average.)
Patch will be forthcoming as a PR shortly, to confirm it doesn't break CI.
--
Ticket URL: <https://code.djangoproject.com/ticket/33025>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/052.e6e48a16690562614c7ee62bcad19ba6%40djangoproject.com.