#36248: Bulk deletion of model referred to by a SET_NULL key can exceed 
parameter
limit
-------------------------------------+-------------------------------------
     Reporter:  bobince              |                     Type:
                                     |  Uncategorized
       Status:  new                  |                Component:  Database
                                     |  layer (models, ORM)
      Version:  5.1                  |                 Severity:  Normal
     Keywords:                       |             Triage Stage:
                                     |  Unreviewed
    Has patch:  1                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
 To avoid hitting DBMS-specific limits on the number of parameters that can
 be used in a parameterised query, Django tries to do bulk operations in
 limited-size batches. This includes in
 db.models.deletion.Collector.collect, which calls get_del_batches to split
 a list of objects into manageable chunks for deletion. This was done in
 #16426.

 If there are keys on other models referring to the model being deleted,
 and those keys have an on_delete setting that would cause an update to
 that field (such as SET_NULL), the Collector's field_updates list will
 gather per-chunk updates to each such field. However, Collector.delete
 then combines the QuerySets generated for each update into a single
 combined filter. If there are more rows in total than the parameter limit
 then this will fail.

 It seems straightforward to stop doing that:

 {{{
 @@ -481,9 +481,8 @@ class Collector:
                          updates.append(instances)
                      else:
                          objs.extend(instances)
 -                if updates:
 -                    combined_updates = reduce(or_, updates)
 -                    combined_updates.update(**{field.name: value})
 +                for update in updates:
 +                    update.update(**{field.name: value})
                  if objs:
                      model = objs[0].__class__
                      query = sql.UpdateQuery(model)
 }}}

 However:

 - I'm not certain what the original intent was behind combining the
 updates here. Can there be multiple updates for some other reason than
 batch chunking, that we want to minimise queries for? This happened in
 #33928.

 - Testing it is a bit tricky since the SQLite parameter limit is now often
 higher than the traditional 999, and Python's sqlite3 doesn't let us read
 or change it. On my Ubuntu box here it's 250000, which is a bit more than
 is comfortable to be doing in a test. We can count queries like
 DeletionTests.test_large_delete does but that's not _exactly_ checking the
 thing we actually care about, that the number of parameters is within
 bounds. (It's not possible at present to read the number of parameters
 used from the queries_log.)

 (Diversion: actually in practice I'm personally affected more by this
 issue in mssql-django, but there are a bunch of confounding factors around
 that backend's prior wonky attempts to work around the parameter limit on
 its own. https://github.com/microsoft/mssql-django/issues/156 has some
 background.)
-- 
Ticket URL: <https://code.djangoproject.com/ticket/36248>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/django-updates/0107019587be5b26-83334f80-c1df-401e-8ae1-613c88e2368d-000000%40eu-central-1.amazonses.com.

Reply via email to