#32970: Investigate feasibility of improving WhereNode clone performance
-------------------------------------+-------------------------------------
               Reporter:  Keryn      |          Owner:  Keryn Knight
  Knight                             |
                   Type:             |         Status:  assigned
  Cleanup/optimization               |
              Component:  Database   |        Version:  dev
  layer (models, ORM)                |
               Severity:  Normal     |       Keywords:
           Triage Stage:             |      Has patch:  1
  Unreviewed                         |
    Needs documentation:  0          |    Needs tests:  0
Patch needs improvement:  0          |  Easy pickings:  0
                  UI/UX:  0          |
-------------------------------------+-------------------------------------
 This relates to #32948 insofar as it's about figuring out how shallow or
 deep a copy is necessary for cloning `Node` instances/subclasses. And a
 bit to #28455 which acknowledges the need to avoid cloning sometimes due
 to it being costly.

 The PR I made for optimising Q combining/inverting
 (https://github.com/django/django/pull/14673) wants to introduce a
 `__copy__` method onto `Q` objects to indicate what level of operation is
 needed. Unlike that PR, it's decidedly not possible to take the same
 approach with WhereNode and just do `clone.__dict__ =
 self.__dict__.copy()` (things broke horribly) because of attributes like
 `contains_aggregate` and `output_field` which are cached properties on the
 WhereNode instance. Also `resolved` gets set after the fact by
 `resolve_expression`. But, as far as I can tell, the looping/testing/child
 cloning ''may be removable''. Perhaps once upon a time they were required
 and other components changed such that it's now safe to consider. I can't
 readily say, but initial experimentation suggests it's OK to look at, at
 least in theory.

 There is a PR already for this, here:
 https://github.com/django/django/pull/14709 which is currently marked
 draft/WIP because I couldn't easily run the whole test suite locally and
 needed to see if the more exotic parts caused problems. They didn't, which
 was unexpected, so here I am. The PR currently replaces the
 `WhereNode.clone` method like so:
 {{{
 clone = self.__class__._new_instance(children=[],
 connector=self.connector, negated=self.negated)
 for child in self.children:
     if hasattr(child, 'clone'):
         clone.children.append(child.clone())
     else:
         clone.children.append(child)
 }}}
 with:
 {{{
 clone = self.__class__._new_instance(children=None,
 connector=self.connector, negated=self.negated)
 clone.children = self.children[:]
 }}}


 But I ''think'' that Q and WhereNode ''can'' both just return a shallow
 copy by only aliasing the children property, at which point I think the
 `__copy__` method could be implemented on `Node` directly and mirror the
 existing `__deepcopy__` method. By preferring those stdlib names it draws
 a line in the sand over what level of copy should be expected. The
 existing `clone` and `copy` methods can become aliases to same.


 === Before any changes

 If we now begin to examine the before and after with as much data as I can
 easily gather. First a bit of prelude:

 {{{
 In [1]: from django.db.models.sql.where import WhereNode
 In [2]: from django.db.models import QuerySet
 In [3]: from django.db.models.sql import Query
 In [4]: from django.contrib.auth.models import User
 In [5]: x = User.objects.all()
 In [6]: y = User.objects.filter(username='test',
 email='[email protected]').exclude(username='test').filter(email='nottest')
 In [7]: %load_ext line_profiler
 }}}

 Now let's establish a ballpark for how much time is spent where, I've
 stripped the line profiling down to just the relevant bits for clarity.

 ==== Queryset with no clauses (`x`)

 {{{
 In [8]: %lprun -f QuerySet._chain -f QuerySet._clone -f Query.chain -f
 Query.clone -f WhereNode.clone for _ in range(1000): x._chain()

 Function: QuerySet._chain
 Line #      Hits         Time  Per Hit   % Time  Line Contents
 ==============================================================
   1325                                               def _chain(self,
 **kwargs):
   ...
   1330      1000      88169.0     88.2     97.8          obj =
 self._clone()

 Function: QuerySet._clone
 Line #      Hits         Time  Per Hit   % Time  Line Contents
 ==============================================================
   1337                                               def _clone(self):
   ...
   1342      1000      80796.0     80.8     86.3          chained =
 self.query.chain()
   1343      1000       7240.0      7.2      7.7          c =
 self.__class__(model=self.model, query=chained, using=self._db,
 hints=self._hints))

 Function: Query.chain
 Line #      Hits         Time  Per Hit   % Time  Line Contents
 ==============================================================
    341                                               def chain(self,
 klass=None):
    ...
    346      2000     116961.0     58.5     94.8          obj =
 self.clone()

 Function: Query.clone
 Line #      Hits         Time  Per Hit   % Time  Line Contents
 ==============================================================
    290                                               def clone(self):
    ...
    304      2000      18294.0      9.1     27.0          obj.where =
 self.where.clone()

 Function: WhereNode.clone
 Line #      Hits         Time  Per Hit   % Time  Line Contents
 ==============================================================
    142                                               def clone(self):
    ...
    148      4000       6227.0      1.6     63.7          clone =
 self.__class__._new_instance(
    149      2000       1170.0      0.6     12.0              children=[],
 connector=self.connector, negated=self.negated)
    150      2000       1407.0      0.7     14.4          for child in
 self.children:
    151                                                       if
 hasattr(child, 'clone'):
    152
 clone.children.append(child.clone())
    153                                                       else:
    154
 clone.children.append(child)
    155      2000        969.0      0.5      9.9          return clone
 }}}

 ==== Queryset with where clauses (`y`)

 Youu can see the cost begin to go up:
 {{{
 In [9]: %lprun -f QuerySet._chain -f QuerySet._clone -f Query.chain -f
 Query.clone -f WhereNode.clone for _ in range(1000): y._chain()

 Function: Query.clone
 Line #      Hits         Time  Per Hit   % Time  Line Contents
 ==============================================================
    290                                               def clone(self):
    ...
    304      2000      60065.0     30.0     59.1          obj.where =
 self.where.clone()

 Function: WhereNode.clone
 Line #      Hits         Time  Per Hit   % Time  Line Contents
 ==============================================================
    142                                               def clone(self):
    148      8000      11473.0      1.4     33.1          clone =
 self.__class__._new_instance(
    149      4000       2379.0      0.6      6.9              children=[],
 connector=self.connector, negated=self.negated)
    150     14000       7241.0      0.5     20.9          for child in
 self.children:
    151     10000       5553.0      0.6     16.0              if
 hasattr(child, 'clone'):
    152      2000       1255.0      0.6      3.6
 clone.children.append(child.clone())
    153                                                       else:
    154      8000       5069.0      0.6     14.6
 clone.children.append(child)
    155      4000       1715.0      0.4      4.9          return clone

 }}}

 and the timings for those 2 (`x` and `y`) querysets, plus an example of
 prefetching related data:

 {{{
 In [10]: %timeit x._chain()
 6.43 µs ± 52 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
 In [11]: %timeit y._chain()
 8.24 µs ± 35.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops
 each))
 In [12]: %timeit y.query.where.clone()
 2.79 µs ± 38.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
 In [13]: %prun for _ in range(100000): y._chain()
 3400003 function calls (3300003 primitive calls) in 1.506 seconds
        ncalls  tottime  percall  cumtime  percall
 filename:lineno(function)
        100000    0.357    0.000    1.007    0.000 query.py:290(clone)
 200000/100000    0.301    0.000    0.539    0.000 where.py:142(clone) <--
 WhereNode
        100000    0.172    0.000    1.344    0.000 query.py:1337(_clone)
        100000    0.068    0.000    1.086    0.000 query.py:341(chain)
        100000    0.058    0.000    1.415    0.000 query.py:1325(_chain)
 In [14]: %timeit
 tuple(User.objects.prefetch_related('groups__permissions',
 'user_permissions'))
 18.2 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each))
 }}}

 === After replacing the clone method

 Given those data points, here are the same bits of line profiling, showing
 only those where the needle has moved:

 ==== Queryset with no clauses (`x`)

 {{{
 Function: Query.clone
 Line #      Hits         Time  Per Hit   % Time  Line Contents
 ==============================================================
    290                                               def clone(self):
    ...
    304      2000      14134.0      7.1     22.8          obj.where =
 self.where.clone()

 Function: WhereNode.clone
 Line #      Hits         Time  Per Hit   % Time  Line Contents
 ==============================================================
    142                                               def clone(self):
    ...
    148      2000       5586.0      2.8     71.0          clone =
 self.__class__._new_instance(children=None, connector=self.connector,
 negated=self.negated)
    149      2000       1412.0      0.7     18.0          clone.children =
 self.children[:]
    150      2000        865.0      0.4     11.0          return clone
 }}}

 ==== Queryset with where clauses (`y`)

 {{{
 Function: Query.clone at line 290
 Line #      Hits         Time  Per Hit   % Time  Line Contents
 ==============================================================
    290                                               def clone(self):
    ...
    304      2000      13730.0      6.9     24.7          obj.where =
 self.where.clone()

 Function: WhereNode.clone
 Line #      Hits         Time  Per Hit   % Time  Line Contents
 ==============================================================
    142                                               def clone(self):
    ...
    148      2000       5273.0      2.6     69.0          clone =
 self.__class__._new_instance(children=None, connector=self.connector,
 negated=self.negated)
    149      2000       1498.0      0.7     19.6          clone.children =
 self.children[:]
    150      2000        871.0      0.4     11.4          return clone
 }}}

 And the timings; in the empty case (`x`), it's ''basically'' just paying
 the cost of class creation, but where there are clauses it is more
 consistent (ie: ''nearer'' a constant time vs linear growth).
 {{{
 In [10]: %timeit x._chain()
 6.51 µs ± 47.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops
 each))
 In [11]: %timeit y._chain()
 6.25 µs ± 36.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
 In [12]: %timeit y.query.where.clone()
 1.16 µs ± 9.09 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops
 each)
 In [13]: %prun for _ in range(100000): y._chain()
 3800003 function calls in 1.926 seconds
 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 100000    0.355    0.000    0.606    0.000 query.py:290(clone)
 100000    0.169    0.000    0.939    0.000 query.py:1337(_clone)
 100000    0.068    0.000    0.139    0.000 where.py:142(clone) <---
 WhereNode
 100000    0.067    0.000    0.683    0.000 query.py:341(chain)
 100000    0.058    0.000    1.009    0.000 query.py:1325(_chain)
 In [14]: %timeit
 tuple(User.objects.prefetch_related('groups__permissions',
 'user_permissions'))
 17.9 ms ± 345 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
 }}}

 Gut feeling says the "final" implementation could be something like the
 following, directly on the `Node` class.
 {{{
 def __copy__(self):
     clone = self._new_instance(children=None, connector=self.connector,
 negated=self.negated)
     clone.children = self.children[:]
     return clone

 copy = __copy__
 clone = __copy__

 def __deepcopy__(self, memo):
     ...
     clone.children = copy.deepcopy(self.children, memo)
     ...
 }}}

 which would remove the need for methods on either subclass. Note also that
 aliasing the methods that way rather than doing `def copy(): return
 self.clone()` as `WhereNode` currently does is also ''slightly'' quicker.
 We're talking `0.10µs` - small potatoes, but it doesn't hurt to have +
 it's 1 fewer stack frame.

 A final note then: if this properly shallow clone is done, I suspect
 `resolve_expression` can be altered too, to avoid calling `_resolve_node`
 if it's already been done on `self` - the shallow copy would ensure that
 anything which had previously been resolved is kept around and doesn't
 need doing again, I think.

 Other stats: Of the 14,000 tests I can run easily, only 1 time was
 `resolved` in `self.__dict__` when `clone` was called (the value was
 `True`). `contains_aggregate` turned up 209 times (207 with a value of
 `False`, 2 with `True`), and `output_field` only appeared 2 times (both
 `BooleanField`). Those are the only extra attribute I spotted.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/32970>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/052.4d956bcf3f5630124f8d48a1b615cba9%40djangoproject.com.

Reply via email to