#32970: Investigate feasibility of improving WhereNode clone performance
-------------------------------------+-------------------------------------
Reporter: Keryn Knight | Owner: Keryn
Type: | Knight
Cleanup/optimization | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by Keryn Knight:
Old description:
> This relates to #32948 insofar as it's about figuring out how shallow or
> deep a copy is necessary for cloning `Node` instances/subclasses. And a
> bit to #28455 which acknowledges the need to avoid cloning sometimes due
> to it being costly.
>
> The PR I made for optimising Q combining/inverting
> (https://github.com/django/django/pull/14673) wants to introduce a
> `__copy__` method onto `Q` objects to indicate what level of operation is
> needed. Unlike that PR, it's decidedly not possible to take the same
> approach with WhereNode and just do `clone.__dict__ =
> self.__dict__.copy()` (things broke horribly) because of attributes like
> `contains_aggregate` and `output_field` which are cached properties on
> the WhereNode instance. Also `resolved` gets set after the fact by
> `resolve_expression`. But, as far as I can tell, the
> looping/testing/child cloning ''may be removable''. Perhaps once upon a
> time they were required and other components changed such that it's now
> safe to consider. I can't readily say, but initial experimentation
> suggests it's OK to look at, at least in theory.
>
> There is a PR already for this, here:
> https://github.com/django/django/pull/14709 which is currently marked
> draft/WIP because I couldn't easily run the whole test suite locally and
> needed to see if the more exotic parts caused problems. They didn't,
> which was unexpected, so here I am. The PR currently replaces the
> `WhereNode.clone` method like so:
> {{{
> clone = self.__class__._new_instance(children=[],
> connector=self.connector, negated=self.negated)
> for child in self.children:
> if hasattr(child, 'clone'):
> clone.children.append(child.clone())
> else:
> clone.children.append(child)
> }}}
> with:
> {{{
> clone = self.__class__._new_instance(children=None,
> connector=self.connector, negated=self.negated)
> clone.children = self.children[:]
> }}}
>
> But I ''think'' that Q and WhereNode ''can'' both just return a shallow
> copy by only aliasing the children property, at which point I think the
> `__copy__` method could be implemented on `Node` directly and mirror the
> existing `__deepcopy__` method. By preferring those stdlib names it draws
> a line in the sand over what level of copy should be expected. The
> existing `clone` and `copy` methods can become aliases to same.
>
> === Before any changes
>
> If we now begin to examine the before and after with as much data as I
> can easily gather. First a bit of prelude:
>
> {{{
> In [1]: from django.db.models.sql.where import WhereNode
> In [2]: from django.db.models import QuerySet
> In [3]: from django.db.models.sql import Query
> In [4]: from django.contrib.auth.models import User
> In [5]: x = User.objects.all()
> In [6]: y = User.objects.filter(username='test',
> email='[email protected]').exclude(username='test').filter(email='nottest')
> In [7]: %load_ext line_profiler
> }}}
>
> Now let's establish a ballpark for how much time is spent where, I've
> stripped the line profiling down to just the relevant bits for clarity.
>
> ==== Queryset with no clauses (`x`)
>
> {{{
> In [8]: %lprun -f QuerySet._chain -f QuerySet._clone -f Query.chain -f
> Query.clone -f WhereNode.clone for _ in range(1000): x._chain()
>
> Function: QuerySet._chain
> Line # Hits Time Per Hit % Time Line Contents
> ==============================================================
> 1325 def _chain(self,
> **kwargs):
> ...
> 1330 1000 88169.0 88.2 97.8 obj =
> self._clone()
>
> Function: QuerySet._clone
> Line # Hits Time Per Hit % Time Line Contents
> ==============================================================
> 1337 def _clone(self):
> ...
> 1342 1000 80796.0 80.8 86.3 chained =
> self.query.chain()
> 1343 1000 7240.0 7.2 7.7 c =
> self.__class__(model=self.model, query=chained, using=self._db,
> hints=self._hints))
>
> Function: Query.chain
> Line # Hits Time Per Hit % Time Line Contents
> ==============================================================
> 341 def chain(self,
> klass=None):
> ...
> 346 2000 116961.0 58.5 94.8 obj =
> self.clone()
>
> Function: Query.clone
> Line # Hits Time Per Hit % Time Line Contents
> ==============================================================
> 290 def clone(self):
> ...
> 304 2000 18294.0 9.1 27.0 obj.where =
> self.where.clone()
>
> Function: WhereNode.clone
> Line # Hits Time Per Hit % Time Line Contents
> ==============================================================
> 142 def clone(self):
> ...
> 148 4000 6227.0 1.6 63.7 clone =
> self.__class__._new_instance(
> 149 2000 1170.0 0.6 12.0 children=[],
> connector=self.connector, negated=self.negated)
> 150 2000 1407.0 0.7 14.4 for child in
> self.children:
> 151 if
> hasattr(child, 'clone'):
> 152
> clone.children.append(child.clone())
> 153 else:
> 154
> clone.children.append(child)
> 155 2000 969.0 0.5 9.9 return clone
> }}}
>
> ==== Queryset with where clauses (`y`)
>
> Youu can see the cost begin to go up:
> {{{
> In [9]: %lprun -f QuerySet._chain -f QuerySet._clone -f Query.chain -f
> Query.clone -f WhereNode.clone for _ in range(1000): y._chain()
>
> Function: Query.clone
> Line # Hits Time Per Hit % Time Line Contents
> ==============================================================
> 290 def clone(self):
> ...
> 304 2000 60065.0 30.0 59.1 obj.where =
> self.where.clone()
>
> Function: WhereNode.clone
> Line # Hits Time Per Hit % Time Line Contents
> ==============================================================
> 142 def clone(self):
> 148 8000 11473.0 1.4 33.1 clone =
> self.__class__._new_instance(
> 149 4000 2379.0 0.6 6.9 children=[],
> connector=self.connector, negated=self.negated)
> 150 14000 7241.0 0.5 20.9 for child in
> self.children:
> 151 10000 5553.0 0.6 16.0 if
> hasattr(child, 'clone'):
> 152 2000 1255.0 0.6 3.6
> clone.children.append(child.clone())
> 153 else:
> 154 8000 5069.0 0.6 14.6
> clone.children.append(child)
> 155 4000 1715.0 0.4 4.9 return clone
>
> }}}
>
> and the timings for those 2 (`x` and `y`) querysets, plus an example of
> prefetching related data:
>
> {{{
> In [10]: %timeit x._chain()
> 6.43 µs ± 52 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
> In [11]: %timeit y._chain()
> 8.24 µs ± 35.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops
> each))
> In [12]: %timeit y.query.where.clone()
> 2.79 µs ± 38.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops
> each)
> In [13]: %prun for _ in range(100000): y._chain()
> 3400003 function calls (3300003 primitive calls) in 1.506 seconds
> ncalls tottime percall cumtime percall
> filename:lineno(function)
> 100000 0.357 0.000 1.007 0.000 query.py:290(clone)
> 200000/100000 0.301 0.000 0.539 0.000 where.py:142(clone) <--
> WhereNode
> 100000 0.172 0.000 1.344 0.000 query.py:1337(_clone)
> 100000 0.068 0.000 1.086 0.000 query.py:341(chain)
> 100000 0.058 0.000 1.415 0.000 query.py:1325(_chain)
> In [14]: %timeit
> tuple(User.objects.prefetch_related('groups__permissions',
> 'user_permissions'))
> 18.2 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each))
> }}}
>
> === After replacing the clone method
>
> Given those data points, here are the same bits of line profiling,
> showing only those where the needle has moved:
>
> ==== Queryset with no clauses (`x`)
>
> {{{
> Function: Query.clone
> Line # Hits Time Per Hit % Time Line Contents
> ==============================================================
> 290 def clone(self):
> ...
> 304 2000 14134.0 7.1 22.8 obj.where =
> self.where.clone()
>
> Function: WhereNode.clone
> Line # Hits Time Per Hit % Time Line Contents
> ==============================================================
> 142 def clone(self):
> ...
> 148 2000 5586.0 2.8 71.0 clone =
> self.__class__._new_instance(children=None, connector=self.connector,
> negated=self.negated)
> 149 2000 1412.0 0.7 18.0 clone.children =
> self.children[:]
> 150 2000 865.0 0.4 11.0 return clone
> }}}
>
> ==== Queryset with where clauses (`y`)
>
> {{{
> Function: Query.clone at line 290
> Line # Hits Time Per Hit % Time Line Contents
> ==============================================================
> 290 def clone(self):
> ...
> 304 2000 13730.0 6.9 24.7 obj.where =
> self.where.clone()
>
> Function: WhereNode.clone
> Line # Hits Time Per Hit % Time Line Contents
> ==============================================================
> 142 def clone(self):
> ...
> 148 2000 5273.0 2.6 69.0 clone =
> self.__class__._new_instance(children=None, connector=self.connector,
> negated=self.negated)
> 149 2000 1498.0 0.7 19.6 clone.children =
> self.children[:]
> 150 2000 871.0 0.4 11.4 return clone
> }}}
>
> And the timings; in the empty case (`x`), it's ''basically'' just paying
> the cost of class creation, but where there are clauses it is more
> consistent (ie: ''nearer'' a constant time vs linear growth).
> {{{
> In [10]: %timeit x._chain()
> 6.51 µs ± 47.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops
> each))
> In [11]: %timeit y._chain()
> 6.25 µs ± 36.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops
> each)
> In [12]: %timeit y.query.where.clone()
> 1.16 µs ± 9.09 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops
> each)
> In [13]: %prun for _ in range(100000): y._chain()
> 3800003 function calls in 1.926 seconds
> ncalls tottime percall cumtime percall filename:lineno(function)
> 100000 0.355 0.000 0.606 0.000 query.py:290(clone)
> 100000 0.169 0.000 0.939 0.000 query.py:1337(_clone)
> 100000 0.068 0.000 0.139 0.000 where.py:142(clone) <---
> WhereNode
> 100000 0.067 0.000 0.683 0.000 query.py:341(chain)
> 100000 0.058 0.000 1.009 0.000 query.py:1325(_chain)
> In [14]: %timeit
> tuple(User.objects.prefetch_related('groups__permissions',
> 'user_permissions'))
> 17.9 ms ± 345 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
> }}}
>
> Gut feeling says the "final" implementation could be something like the
> following, directly on the `Node` class.
> {{{
> def __copy__(self):
> clone = self._new_instance(children=None, connector=self.connector,
> negated=self.negated)
> clone.children = self.children[:]
> return clone
>
> copy = __copy__
> clone = __copy__
>
> def __deepcopy__(self, memo):
> ...
> clone.children = copy.deepcopy(self.children, memo)
> ...
> }}}
>
> which would remove the need for methods on either subclass. Note also
> that aliasing the methods that way rather than doing `def copy(): return
> self.clone()` as `WhereNode` currently does is also ''slightly'' quicker.
> We're talking `0.10µs` - small potatoes, but it doesn't hurt to have +
> it's 1 fewer stack frame.
>
> A final note then: if this properly shallow clone is done, I suspect
> `resolve_expression` can be altered too, to avoid calling `_resolve_node`
> if it's already been done on `self` - the shallow copy would ensure that
> anything which had previously been resolved is kept around and doesn't
> need doing again, I think.
>
> Other stats: Of the 14,000 tests I can run easily, only 1 time was
> `resolved` in `self.__dict__` when `clone` was called (the value was
> `True`). `contains_aggregate` turned up 209 times (207 with a value of
> `False`, 2 with `True`), and `output_field` only appeared 2 times (both
> `BooleanField`). Those are the only extra attribute I spotted.
New description:
This relates to #32948 insofar as it's about figuring out how shallow or
deep a copy is necessary for cloning `Node` instances/subclasses. And a
bit to #28455 which acknowledges the need to avoid cloning sometimes due
to it being costly.
The PR I made for optimising Q combining/inverting
(https://github.com/django/django/pull/14673) wants to introduce a
`__copy__` method onto `Q` objects to indicate what level of operation is
needed. Unlike that PR, it's decidedly not possible to take the same
approach with WhereNode and just do `clone.__dict__ =
self.__dict__.copy()` (things broke horribly) because of attributes like
`contains_aggregate` and `output_field` which are cached properties on the
WhereNode instance. Also `resolved` gets set after the fact by
`resolve_expression`. But, as far as I can tell, the looping/testing/child
cloning ''may be removable''. Perhaps once upon a time they were required
and other components changed such that it's now safe to consider. I can't
readily say, but initial experimentation suggests it's OK to look at, at
least in theory.
There is a PR already for this, here:
https://github.com/django/django/pull/14709 which is currently marked
draft/WIP because I couldn't easily run the whole test suite locally and
needed to see if the more exotic parts caused problems. They didn't, which
was unexpected, so here I am. The PR currently replaces the
`WhereNode.clone` method like so:
{{{
clone = self.__class__._new_instance(children=[],
connector=self.connector, negated=self.negated)
for child in self.children:
if hasattr(child, 'clone'):
clone.children.append(child.clone())
else:
clone.children.append(child)
}}}
with:
{{{
clone = self.__class__._new_instance(children=None,
connector=self.connector, negated=self.negated)
clone.children = self.children[:]
}}}
But I ''think'' that Q and WhereNode ''can'' both just return a shallow
copy by only aliasing the children property, at which point I think the
`__copy__` method could be implemented on `Node` directly and mirror the
existing `__deepcopy__` method. By preferring those stdlib names it draws
a line in the sand over what level of copy should be expected. The
existing `clone` and `copy` methods can become aliases to same.
=== Before any changes
If we now begin to examine the before and after with as much data as I can
easily gather. First a bit of prelude:
{{{
In [1]: from django.db.models.sql.where import WhereNode
In [2]: from django.db.models import QuerySet
In [3]: from django.db.models.sql import Query
In [4]: from django.contrib.auth.models import User
In [5]: x = User.objects.all()
In [6]: y = User.objects.filter(username='test',
email='[email protected]').exclude(username='test').filter(email='nottest')
In [7]: %load_ext line_profiler
}}}
Now let's establish a ballpark for how much time is spent where, I've
stripped the line profiling down to just the relevant bits for clarity.
==== Queryset with no clauses (`x`)
{{{
In [8]: %lprun -f QuerySet._chain -f QuerySet._clone -f Query.chain -f
Query.clone -f WhereNode.clone for _ in range(1000): x._chain()
Function: QuerySet._chain
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1325 def _chain(self,
**kwargs):
...
1330 1000 88169.0 88.2 97.8 obj =
self._clone()
Function: QuerySet._clone
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1337 def _clone(self):
...
1342 1000 80796.0 80.8 86.3 chained =
self.query.chain()
1343 1000 7240.0 7.2 7.7 c =
self.__class__(model=self.model, query=chained, using=self._db,
hints=self._hints))
Function: Query.chain
Line # Hits Time Per Hit % Time Line Contents
==============================================================
341 def chain(self,
klass=None):
...
346 2000 116961.0 58.5 94.8 obj =
self.clone()
Function: Query.clone
Line # Hits Time Per Hit % Time Line Contents
==============================================================
290 def clone(self):
...
304 2000 18294.0 9.1 27.0 obj.where =
self.where.clone()
Function: WhereNode.clone
Line # Hits Time Per Hit % Time Line Contents
==============================================================
142 def clone(self):
...
148 4000 6227.0 1.6 63.7 clone =
self.__class__._new_instance(
149 2000 1170.0 0.6 12.0 children=[],
connector=self.connector, negated=self.negated)
150 2000 1407.0 0.7 14.4 for child in
self.children:
151 if
hasattr(child, 'clone'):
152
clone.children.append(child.clone())
153 else:
154
clone.children.append(child)
155 2000 969.0 0.5 9.9 return clone
}}}
==== Queryset with where clauses (`y`)
Youu can see the cost begin to go up:
{{{
In [9]: %lprun -f QuerySet._chain -f QuerySet._clone -f Query.chain -f
Query.clone -f WhereNode.clone for _ in range(1000): y._chain()
Function: Query.clone
Line # Hits Time Per Hit % Time Line Contents
==============================================================
290 def clone(self):
...
304 2000 60065.0 30.0 59.1 obj.where =
self.where.clone()
Function: WhereNode.clone
Line # Hits Time Per Hit % Time Line Contents
==============================================================
142 def clone(self):
148 8000 11473.0 1.4 33.1 clone =
self.__class__._new_instance(
149 4000 2379.0 0.6 6.9 children=[],
connector=self.connector, negated=self.negated)
150 14000 7241.0 0.5 20.9 for child in
self.children:
151 10000 5553.0 0.6 16.0 if
hasattr(child, 'clone'):
152 2000 1255.0 0.6 3.6
clone.children.append(child.clone())
153 else:
154 8000 5069.0 0.6 14.6
clone.children.append(child)
155 4000 1715.0 0.4 4.9 return clone
}}}
and the timings for those 2 (`x` and `y`) querysets, plus an example of
prefetching related data:
{{{
In [10]: %timeit x._chain()
6.43 µs ± 52 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [11]: %timeit y._chain()
8.24 µs ± 35.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops
each))
In [12]: %timeit y.query.where.clone()
2.79 µs ± 38.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [13]: %prun for _ in range(100000): y._chain()
3400003 function calls (3300003 primitive calls) in 1.506 seconds
ncalls tottime percall cumtime percall
filename:lineno(function)
100000 0.357 0.000 1.007 0.000 query.py:290(clone)
200000/100000 0.301 0.000 0.539 0.000 where.py:142(clone) <--
WhereNode
100000 0.172 0.000 1.344 0.000 query.py:1337(_clone)
100000 0.068 0.000 1.086 0.000 query.py:341(chain)
100000 0.058 0.000 1.415 0.000 query.py:1325(_chain)
In [14]: %timeit
tuple(User.objects.prefetch_related('groups__permissions',
'user_permissions'))
18.2 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each))
}}}
=== After replacing the clone method
Given those data points, here are the same bits of line profiling, showing
only those where the needle has moved:
==== Queryset with no clauses (`x`)
{{{
Function: Query.clone
Line # Hits Time Per Hit % Time Line Contents
==============================================================
290 def clone(self):
...
304 2000 14134.0 7.1 22.8 obj.where =
self.where.clone()
Function: WhereNode.clone
Line # Hits Time Per Hit % Time Line Contents
==============================================================
142 def clone(self):
...
148 2000 5586.0 2.8 71.0 clone =
self.__class__._new_instance(children=None, connector=self.connector,
negated=self.negated)
149 2000 1412.0 0.7 18.0 clone.children =
self.children[:]
150 2000 865.0 0.4 11.0 return clone
}}}
==== Queryset with where clauses (`y`)
{{{
Function: Query.clone at line 290
Line # Hits Time Per Hit % Time Line Contents
==============================================================
290 def clone(self):
...
304 2000 13730.0 6.9 24.7 obj.where =
self.where.clone()
Function: WhereNode.clone
Line # Hits Time Per Hit % Time Line Contents
==============================================================
142 def clone(self):
...
148 2000 5273.0 2.6 69.0 clone =
self.__class__._new_instance(children=None, connector=self.connector,
negated=self.negated)
149 2000 1498.0 0.7 19.6 clone.children =
self.children[:]
150 2000 871.0 0.4 11.4 return clone
}}}
And the timings; in the empty case (`x`), it's ''basically'' just paying
the cost of class creation, but where there are clauses it is more
consistent (ie: ''nearer'' a constant time vs linear growth).
{{{
In [10]: %timeit x._chain()
6.51 µs ± 47.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops
each))
In [11]: %timeit y._chain()
6.25 µs ± 36.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [12]: %timeit y.query.where.clone()
1.16 µs ± 9.09 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops
each)
In [13]: %prun for _ in range(100000): y._chain()
2100003 function calls in 1.087 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
100000 0.355 0.000 0.606 0.000 query.py:290(clone)
100000 0.169 0.000 0.939 0.000 query.py:1337(_clone)
100000 0.068 0.000 0.139 0.000 where.py:142(clone) <---
WhereNode
100000 0.067 0.000 0.683 0.000 query.py:341(chain)
100000 0.058 0.000 1.009 0.000 query.py:1325(_chain)
In [14]: %timeit
tuple(User.objects.prefetch_related('groups__permissions',
'user_permissions'))
17.9 ms ± 345 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
}}}
Gut feeling says the "final" implementation could be something like the
following, directly on the `Node` class.
{{{
def __copy__(self):
clone = self._new_instance(children=None, connector=self.connector,
negated=self.negated)
clone.children = self.children[:]
return clone
copy = __copy__
clone = __copy__
def __deepcopy__(self, memo):
...
clone.children = copy.deepcopy(self.children, memo)
...
}}}
which would remove the need for methods on either subclass. Note also that
aliasing the methods that way rather than doing `def copy(): return
self.clone()` as `WhereNode` currently does is also ''slightly'' quicker.
We're talking `0.10µs` - small potatoes, but it doesn't hurt to have +
it's 1 fewer stack frame.
A final note then: if this properly shallow clone is done, I suspect
`resolve_expression` can be altered too, to avoid calling `_resolve_node`
if it's already been done on `self` - the shallow copy would ensure that
anything which had previously been resolved is kept around and doesn't
need doing again, I think.
Other stats: Of the 14,000 tests I can run easily, only 1 time was
`resolved` in `self.__dict__` when `clone` was called (the value was
`True`). `contains_aggregate` turned up 209 times (207 with a value of
`False`, 2 with `True`), and `output_field` only appeared 2 times (both
`BooleanField`). Those are the only extra attribute I spotted.
--
--
Ticket URL: <https://code.djangoproject.com/ticket/32970#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/067.4f35f237231bb0fd703218e785e40672%40djangoproject.com.