Re: MySQL data loss possibility with concurrent ManyToManyField saves

Shai Berger Mon, 21 Mar 2016 00:28:52 -0700

First of all, I would like to say that I strongly support the move to READ 
COMITTED, including backporting it to 1.8.x.

But we also need to explain: REPEATABLE READ is a higher transaction isolation 
level than READ COMMITTED. If you have problematic code, it should lead to 
more deadlocks and/or transactions failing at commit time (compared to READ 
COMMITTED), not to data loss. The reason we get data losses is MySql's unique 
interpretation of REPEATABLE READ. If you're interested in the details (and if 
you use MySql, you should be), read on.

With MySql's REPEATABLE READ, the "read" operations -- SELECT statements -- 
indeed act like they act in the usual REPEATABLE READ: Once you've read some 
table, changes made to that table by other transactions will not be visible 
within your transaction. But "write" operations -- UPDATE, DELETE, INSERT and 
the like -- act as if they're under READ COMMITTED, affecting (and affected by) 
changes committed by other transactions. The result is, essentially, that 
within a transaction, the reads are not guaranteed to be consistent with the 
writes [1].

In particular, in the bug[2] that caused this discussion, we get the following  
behavior in one transaction:

        (1) BEGIN TRANSACTION

        (2) SELECT ... FROM some_table WHERE some_field=some_value
                (1 row returned)

        (3) (some other transactions commit)

        (4) SELECT ... FROM some_table WHERE some_field=some_value
                (1 row returned, same as above)

        (5) DELETE some_table WHERE some_field=some_value
                (answer: 1 row deleted)

        (6) SELECT ... FROM some_table WHERE some_field=some_value
                (1 row returned, same as above)

        (7) COMMIT
                (the row that was returned earlier is no longer in the database)

Take a minute to read this. Up to step (5), everything is as you would expect; 
you should find steps (6) and (7) quite surprising.

This happens because the other transactions in (3) deleted the row that is 
returned in (2), (4) & (6), and inserted another one where 
some_field=some_value; that other row is the row that was deleted in (5). The 
row that this transaction selects was not seen by the DELETE, and hence not 
changed by it, and hence continues to be visible by the SELECTs in our 
transaction. But when we commit, the row (which has been deleted) no longer 
exists.

I have expressed elsewhere my opinion of this behavior as a general database 
feature, and feel no need to repeat it here; but I think that, if possible, it 
is Django's job as a framework to protect its users from it, at least as a 
default.

On Monday 21 March 2016 02:25:37 Cristiano Coelho wrote:
> What performance changes can you expect doing this change? It is probably
> that default on MySQL for a good reason.

The Django project is usually willing to give up quite a lot of performance in 
order to prevent data losses. I agree that this default on MySql is probably 
for a reason, but I don't think it can be a good reason for Django.

Have fun,
        Shai.

[1] https://dev.mysql.com/doc/refman/5.7/en/innodb-consistent-read.html
[2] https://code.djangoproject.com/ticket/26347

Re: MySQL data loss possibility with concurrent ManyToManyField saves

Reply via email to