#32970: Investigate feasibility of improving WhereNode clone performance
-------------------------------------+-------------------------------------
Reporter: Keryn | Owner: Keryn Knight
Knight |
Type: | Status: assigned
Cleanup/optimization |
Component: Database | Version: dev
layer (models, ORM) |
Severity: Normal | Keywords:
Triage Stage: | Has patch: 1
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
This relates to #32948 insofar as it's about figuring out how shallow or
deep a copy is necessary for cloning `Node` instances/subclasses. And a
bit to #28455 which acknowledges the need to avoid cloning sometimes due
to it being costly.
The PR I made for optimising Q combining/inverting
(https://github.com/django/django/pull/14673) wants to introduce a
`__copy__` method onto `Q` objects to indicate what level of operation is
needed. Unlike that PR, it's decidedly not possible to take the same
approach with WhereNode and just do `clone.__dict__ =
self.__dict__.copy()` (things broke horribly) because of attributes like
`contains_aggregate` and `output_field` which are cached properties on the
WhereNode instance. Also `resolved` gets set after the fact by
`resolve_expression`. But, as far as I can tell, the looping/testing/child
cloning ''may be removable''. Perhaps once upon a time they were required
and other components changed such that it's now safe to consider. I can't
readily say, but initial experimentation suggests it's OK to look at, at
least in theory.
There is a PR already for this, here:
https://github.com/django/django/pull/14709 which is currently marked
draft/WIP because I couldn't easily run the whole test suite locally and
needed to see if the more exotic parts caused problems. They didn't, which
was unexpected, so here I am. The PR currently replaces the
`WhereNode.clone` method like so:
{{{
clone = self.__class__._new_instance(children=[],
connector=self.connector, negated=self.negated)
for child in self.children:
if hasattr(child, 'clone'):
clone.children.append(child.clone())
else:
clone.children.append(child)
}}}
with:
{{{
clone = self.__class__._new_instance(children=None,
connector=self.connector, negated=self.negated)
clone.children = self.children[:]
}}}
But I ''think'' that Q and WhereNode ''can'' both just return a shallow
copy by only aliasing the children property, at which point I think the
`__copy__` method could be implemented on `Node` directly and mirror the
existing `__deepcopy__` method. By preferring those stdlib names it draws
a line in the sand over what level of copy should be expected. The
existing `clone` and `copy` methods can become aliases to same.
=== Before any changes
If we now begin to examine the before and after with as much data as I can
easily gather. First a bit of prelude:
{{{
In [1]: from django.db.models.sql.where import WhereNode
In [2]: from django.db.models import QuerySet
In [3]: from django.db.models.sql import Query
In [4]: from django.contrib.auth.models import User
In [5]: x = User.objects.all()
In [6]: y = User.objects.filter(username='test',
email='[email protected]').exclude(username='test').filter(email='nottest')
In [7]: %load_ext line_profiler
}}}
Now let's establish a ballpark for how much time is spent where, I've
stripped the line profiling down to just the relevant bits for clarity.
==== Queryset with no clauses (`x`)
{{{
In [8]: %lprun -f QuerySet._chain -f QuerySet._clone -f Query.chain -f
Query.clone -f WhereNode.clone for _ in range(1000): x._chain()
Function: QuerySet._chain
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1325 def _chain(self,
**kwargs):
...
1330 1000 88169.0 88.2 97.8 obj =
self._clone()
Function: QuerySet._clone
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1337 def _clone(self):
...
1342 1000 80796.0 80.8 86.3 chained =
self.query.chain()
1343 1000 7240.0 7.2 7.7 c =
self.__class__(model=self.model, query=chained, using=self._db,
hints=self._hints))
Function: Query.chain
Line # Hits Time Per Hit % Time Line Contents
==============================================================
341 def chain(self,
klass=None):
...
346 2000 116961.0 58.5 94.8 obj =
self.clone()
Function: Query.clone
Line # Hits Time Per Hit % Time Line Contents
==============================================================
290 def clone(self):
...
304 2000 18294.0 9.1 27.0 obj.where =
self.where.clone()
Function: WhereNode.clone
Line # Hits Time Per Hit % Time Line Contents
==============================================================
142 def clone(self):
...
148 4000 6227.0 1.6 63.7 clone =
self.__class__._new_instance(
149 2000 1170.0 0.6 12.0 children=[],
connector=self.connector, negated=self.negated)
150 2000 1407.0 0.7 14.4 for child in
self.children:
151 if
hasattr(child, 'clone'):
152
clone.children.append(child.clone())
153 else:
154
clone.children.append(child)
155 2000 969.0 0.5 9.9 return clone
}}}
==== Queryset with where clauses (`y`)
Youu can see the cost begin to go up:
{{{
In [9]: %lprun -f QuerySet._chain -f QuerySet._clone -f Query.chain -f
Query.clone -f WhereNode.clone for _ in range(1000): y._chain()
Function: Query.clone
Line # Hits Time Per Hit % Time Line Contents
==============================================================
290 def clone(self):
...
304 2000 60065.0 30.0 59.1 obj.where =
self.where.clone()
Function: WhereNode.clone
Line # Hits Time Per Hit % Time Line Contents
==============================================================
142 def clone(self):
148 8000 11473.0 1.4 33.1 clone =
self.__class__._new_instance(
149 4000 2379.0 0.6 6.9 children=[],
connector=self.connector, negated=self.negated)
150 14000 7241.0 0.5 20.9 for child in
self.children:
151 10000 5553.0 0.6 16.0 if
hasattr(child, 'clone'):
152 2000 1255.0 0.6 3.6
clone.children.append(child.clone())
153 else:
154 8000 5069.0 0.6 14.6
clone.children.append(child)
155 4000 1715.0 0.4 4.9 return clone
}}}
and the timings for those 2 (`x` and `y`) querysets, plus an example of
prefetching related data:
{{{
In [10]: %timeit x._chain()
6.43 µs ± 52 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [11]: %timeit y._chain()
8.24 µs ± 35.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops
each))
In [12]: %timeit y.query.where.clone()
2.79 µs ± 38.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [13]: %prun for _ in range(100000): y._chain()
3400003 function calls (3300003 primitive calls) in 1.506 seconds
ncalls tottime percall cumtime percall
filename:lineno(function)
100000 0.357 0.000 1.007 0.000 query.py:290(clone)
200000/100000 0.301 0.000 0.539 0.000 where.py:142(clone) <--
WhereNode
100000 0.172 0.000 1.344 0.000 query.py:1337(_clone)
100000 0.068 0.000 1.086 0.000 query.py:341(chain)
100000 0.058 0.000 1.415 0.000 query.py:1325(_chain)
In [14]: %timeit
tuple(User.objects.prefetch_related('groups__permissions',
'user_permissions'))
18.2 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each))
}}}
=== After replacing the clone method
Given those data points, here are the same bits of line profiling, showing
only those where the needle has moved:
==== Queryset with no clauses (`x`)
{{{
Function: Query.clone
Line # Hits Time Per Hit % Time Line Contents
==============================================================
290 def clone(self):
...
304 2000 14134.0 7.1 22.8 obj.where =
self.where.clone()
Function: WhereNode.clone
Line # Hits Time Per Hit % Time Line Contents
==============================================================
142 def clone(self):
...
148 2000 5586.0 2.8 71.0 clone =
self.__class__._new_instance(children=None, connector=self.connector,
negated=self.negated)
149 2000 1412.0 0.7 18.0 clone.children =
self.children[:]
150 2000 865.0 0.4 11.0 return clone
}}}
==== Queryset with where clauses (`y`)
{{{
Function: Query.clone at line 290
Line # Hits Time Per Hit % Time Line Contents
==============================================================
290 def clone(self):
...
304 2000 13730.0 6.9 24.7 obj.where =
self.where.clone()
Function: WhereNode.clone
Line # Hits Time Per Hit % Time Line Contents
==============================================================
142 def clone(self):
...
148 2000 5273.0 2.6 69.0 clone =
self.__class__._new_instance(children=None, connector=self.connector,
negated=self.negated)
149 2000 1498.0 0.7 19.6 clone.children =
self.children[:]
150 2000 871.0 0.4 11.4 return clone
}}}
And the timings; in the empty case (`x`), it's ''basically'' just paying
the cost of class creation, but where there are clauses it is more
consistent (ie: ''nearer'' a constant time vs linear growth).
{{{
In [10]: %timeit x._chain()
6.51 µs ± 47.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops
each))
In [11]: %timeit y._chain()
6.25 µs ± 36.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [12]: %timeit y.query.where.clone()
1.16 µs ± 9.09 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops
each)
In [13]: %prun for _ in range(100000): y._chain()
3800003 function calls in 1.926 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
100000 0.355 0.000 0.606 0.000 query.py:290(clone)
100000 0.169 0.000 0.939 0.000 query.py:1337(_clone)
100000 0.068 0.000 0.139 0.000 where.py:142(clone) <---
WhereNode
100000 0.067 0.000 0.683 0.000 query.py:341(chain)
100000 0.058 0.000 1.009 0.000 query.py:1325(_chain)
In [14]: %timeit
tuple(User.objects.prefetch_related('groups__permissions',
'user_permissions'))
17.9 ms ± 345 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
}}}
Gut feeling says the "final" implementation could be something like the
following, directly on the `Node` class.
{{{
def __copy__(self):
clone = self._new_instance(children=None, connector=self.connector,
negated=self.negated)
clone.children = self.children[:]
return clone
copy = __copy__
clone = __copy__
def __deepcopy__(self, memo):
...
clone.children = copy.deepcopy(self.children, memo)
...
}}}
which would remove the need for methods on either subclass. Note also that
aliasing the methods that way rather than doing `def copy(): return
self.clone()` as `WhereNode` currently does is also ''slightly'' quicker.
We're talking `0.10µs` - small potatoes, but it doesn't hurt to have +
it's 1 fewer stack frame.
A final note then: if this properly shallow clone is done, I suspect
`resolve_expression` can be altered too, to avoid calling `_resolve_node`
if it's already been done on `self` - the shallow copy would ensure that
anything which had previously been resolved is kept around and doesn't
need doing again, I think.
Other stats: Of the 14,000 tests I can run easily, only 1 time was
`resolved` in `self.__dict__` when `clone` was called (the value was
`True`). `contains_aggregate` turned up 209 times (207 with a value of
`False`, 2 with `True`), and `output_field` only appeared 2 times (both
`BooleanField`). Those are the only extra attribute I spotted.
--
Ticket URL: <https://code.djangoproject.com/ticket/32970>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/052.4d956bcf3f5630124f8d48a1b615cba9%40djangoproject.com.