[Django] #33003: Micro-optimisation for QuerySet._chain

Django Sun, 08 Aug 2021 09:01:14 -0700

#33003: Micro-optimisation for QuerySet._chain
-------------------------------------+-------------------------------------
               Reporter:  Keryn      |          Owner:  Keryn Knight
  Knight                             |
                   Type:             |         Status:  assigned
  Cleanup/optimization               |
              Component:  Database   |        Version:  dev
  layer (models, ORM)                |
               Severity:  Normal     |       Keywords:
           Triage Stage:             |      Has patch:  0
  Unreviewed                         |
    Needs documentation:  0          |    Needs tests:  0
Patch needs improvement:  0          |  Easy pickings:  0
                  UI/UX:  0          |
-------------------------------------+-------------------------------------
 Whilst working on #28455 I noticed that `_chain` accepts `kwargs`, but
 that appears to only be to facilitate pickling, with
 `Prefetch.__getstate__` making use of the functionality. Thus most of the
 time, kwargs is empty and there's no reason to touch/update the
 `__dict__`. And it's faster not to do so. Note that in comparison with the
 benefits of #28455 this is absolutely peanuts, but still, it's unrelated
 to that ticket so here we are:


 {{{
 In [1]: class A:
    ...:     def __init__(self, **kwargs):
    ...:         self.__dict__.update(**kwargs)
 In [2]: a = A(a=1, b=2, c=3, d=4)
 In [3]: empty = {}
 In [4]: full = {'d': 3, 'c': 1}
 In [5]: %timeit a.__dict__.update(empty)
 91.2 ns ± 1.25 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops
 each)
 In [6]: %timeit a.__dict__.update(full)
 133 ns ± 0.92 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops
 each)
 }}}
 If we just ''check'' the dictionary isn't empty (in `_chain` that's
 `kwargs`):
 {{{
 In [7]: %timeit if empty: a.__dict__.update(empty)
 19 ns ± 0.377 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops
 each)
 In [8]: %timeit if full: a.__dict__.update(full)
 156 ns ± 1.33 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops
 each)
 }}}

 You can see the paying the basic cost of an if (for me, that appears to be
 `20ns` for an empty dict, and more like `6ns` for an interned/singleton
 value like `1` or `True` or `''` as a baseline) is worthwhile, and having
 pickling a Prefetch be `20ns` slower isn't too bad, given it's not exactly
 the most common use case...

 The difference is small but can be made more obvious by checking the
 cProfile timings over many (''many'') iterations; here's the before across
 100 users each with a group and permission:
 {{{
 In [2]: %prun -stime -l_chain for _ in range(1000):
 tuple(User.objects.prefetch_related('groups', 'user_permissions',
 'groups__permissions'))
 48095003 function calls (46202003 primitive calls) in 34.811 seconds

 Ordered by: internal time
 List reduced from 384 to 1 due to restriction <'_chain'>

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 312000    4.070    0.000    8.791    0.000 query.py:1325(_chain)
 In [3]: %timeit -n1000 -r10 tuple(User.objects.prefetch_related('groups',
 'user_permissions', 'groups__permissions'))
 18.3 ms ± 391 µs per loop (mean ± std. dev. of 10 runs, 100 loops each)
 }}}
 And after applying the change as follows:
 {{{
 def _chain(...):
     ...
     if kwargs:
         obj.__dict__.update(kwargs)
     ...
 }}}
 we end up with:
 {{{
 In [2]: %prun -stime -l_chain for _ in range(1000):
 tuple(User.objects.prefetch_related('groups', 'user_permissions',
 'groups__permissions'))
 47779003 function calls (45886003 primitive calls) in 33.220 seconds

 Ordered by: internal time
 List reduced from 384 to 1 due to restriction <'_chain'>

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 312000    0.267    0.000    5.644    0.000 query.py:1325(_chain)
 In [3]: %timeit -n1000 -r10 tuple(User.objects.prefetch_related('groups',
 'user_permissions', 'groups__permissions'))
 18.1 ms ± 289 µs per loop (mean ± std. dev. of 10 runs, 100 loops each)
 }}}

 I have to assume that the overall timing differences measured by cProfile
 and timeit are both just standard fluctuations, despite cProfile's
 `tottime` reporting there.

 Branch/PR to follow. It won't be a shock if the diff is 3 lines long,
 given the change is shown above.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/33003>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/052.9e039a6558d8e5ef01abdf9b7476a453%40djangoproject.com.

[Django] #33003: Micro-optimisation for QuerySet._chain

Reply via email to