[Django] #33973: Performance regression when moving from 3.1 to 3.2

Django Thu, 01 Sep 2022 07:47:26 -0700

#33973: Performance regression when moving from 3.1 to 3.2
-------------------------------------+-------------------------------------
               Reporter:  Marc       |          Owner:  nobody
  Parizeau                           |
                   Type:             |         Status:  new
  Uncategorized                      |
              Component:             |        Version:  3.2
  Uncategorized                      |       Keywords:  performance
               Severity:  Normal     |  regression
           Triage Stage:             |      Has patch:  0
  Unreviewed                         |
    Needs documentation:  0          |    Needs tests:  0
Patch needs improvement:  0          |  Easy pickings:  0
                  UI/UX:  0          |
-------------------------------------+-------------------------------------
 I am seing a sharp increase in execution time for some of my queries when
 moving from Django 3.1 to 3.2. And the performance hit appears to be the
 same for Django 4.


 My backend is Postgres 14.2.

 My Django project has forums for different types of content. The forum app
 consists essentially of 5 tables:
 1. a `Post` table that contains forum posts (essentially a text field);
 2. a `Thread` table where rows point to a specific content and post;
 3. a `FollowUp` table where rows point to a specific thread and post;
 4. a `ThreadEntry` table where rows point to a thread, a user, and the
 last seen thread post for this user;
 5. a `FollowUpEntry` table where rows point to a followup, a user, and the
 last seen followup post for this user.

 Here is an example query that executes 2 times slower on 3.2 than on 3.1:

 {{{
 Content.objects.all().annotate(
   has_unread_posts=Greatest(
     # a content is unread if at least one condition is true
     Exists(
       # a thread has never been read (has no user entry)
       Thread.objects.filter(
         content=OuterRef('pk'),
       ).exclude(threadentry__user=user)
     ),
     Exists(
       # a thread entry is not up-to-date
       ThreadEntry.objects.filter(
         thread__content=OuterRef('pk'),
         user=user,
       ).exclude(post=F('thread__post'))
     ),
     Exists(
       # a followup has never been read
       FollowUp.objects.filter(
         thread__content=OuterRef('pk')
       ).exclude(followupentry__user=user)
     ),
     Exists(
       # a followup entry is not up-to-date
       FollowUpEntry.objects.filter(
         followup__thread__content=OuterRef('pk'),
         user=user,
       ).exclude(post=F('followup__post'))
     ),
   )
 ).filter(
   has_unread_posts=True,
 ).order_by(
   'course__uid',
   '-version__start',
 ).select_related(
   'course',
   'version',
 )
 }}}


 `Course` and `Version` are other tables related to `Content`.

 I want to know with this query, for each content, whether or not there is
 something new in the corresponding forum for a given user.  There is
 something new if any one of the following condition is true:
 1. there exists a thread for which the user has no thread entry (an entry
 is added when the thread is first read by the user);
 2. there exists a user thread entry for which the last read post is not up
 to date with the current thread post (the thread owner has modified the
 post since);
 3. there exists a followup for which the user has no followup entry (an
 entry is added when the followup is first read by the user);
 4. there exists a user followup entry for which the last read post is not
 up to date with the followup post (the followup owner has modified the
 post since).

 On my machine, just by changing the Django version using pip, and nothing
 else, this query takes about 1 second of execution on Django 3.1.14, and a
 little more than 2 seconds on Django 3.2.15, so about a 2x increase. Here
 are the current table sizes for my forum app:
 - `Thread`: ~30K
 - `FollowUp`: ~46K
 - `ThreadEntry`: ~1.3M
 - `FollowUpEntry`: ~4.5M
 - `Post`: ~103K

 And there is 33 `Content` rows.

 Am I the only one observing such performance regressions with Django 3.2?
 On other more complex queries that contain nested subqueries, I have seen
 up to **30x** execution time increases.

 Did something major happen in SQL generation from 3.1 to 3.2?

 Am I doing something wrong? How can this happen?

 Any help on understanding what is going on with Django 3.2 would be much
 appreciated.

 Best regards,

-- 
Ticket URL: <https://code.djangoproject.com/ticket/33973>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/01070182f9854cef-5995e223-f96e-4bff-8050-72f4b0c63c3c-000000%40eu-central-1.amazonses.com.

[Django] #33973: Performance regression when moving from 3.1 to 3.2

Reply via email to