#36489: OneToOneField + concurrent get_or_create results in wrong object in
field
cache
--------------------------------+-----------------------------------------
Reporter: Brian Atkinson | Type: Uncategorized
Status: new | Component: Uncategorized
Version: 5.2 | Severity: Normal
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------+-----------------------------------------
When using `get_or_create` and `update_or_create` in a context where
concurrent creations are happening, the caller that loses the creation
race will end up with an incomplete object in the field cache.
Example models:
{{{
from django.db import models
class Parent(models.Model):
name = models.TextField()
class Child(models.Model):
parent = models.OneToOneField(Parent, on_delete=models.CASCADE)
}}}
In effect, we have two separate processes both expecting to be able to
call `Child.objects.get_or_create(parent=common_parent, defaults={...})`
and have the resulting `common_parent.child` be the correct result. We are
seeing both processes fail the initial get call
(https://github.com/django/django/blob/ff0ff98d427982b7225df59f454a86bdf66251d6/django/db/models/query.py#L977)
and proceeding to attempt the create call
(https://github.com/django/django/blob/ff0ff98d427982b7225df59f454a86bdf66251d6/django/db/models/query.py#L984).
One of the processes succeeds on the insert, and the other will fail the
insert. The process that fails the insert correctly issues the second get
call
(https://github.com/django/django/blob/ff0ff98d427982b7225df59f454a86bdf66251d6/django/db/models/query.py#L987)
and returns the result.
The bug is that for the process that fails the insert, the object at
`common_parent.child` is not the value returned from the get_or_create. It
seems to be the object that was created for insert, but wasn't cleared
when the insert failed. This can be observed through the
`common_parent.child.id` field being None. This has caused data corruption
and other downstream bugs due to the wrong data being cached.
Here is a test case that is a linear representation of the concurrent bug
described above:
{{{
from unittest import mock
from django.db.models import QuerySet
from django.test import TestCase
from sample.models import Child, Parent
class TestBug(TestCase):
def test_bug(self):
parent = Parent.objects.create(name="Test name")
triggered = False
def effect(*args, **kwargs):
nonlocal triggered
if not triggered and kwargs.get("parent") == parent:
triggered = True
Child.objects.update_or_create(
# Force a brand new instance (emulates racing thread).
parent=Parent.objects.get(pk=parent.pk),
defaults={},
)
return mock.DEFAULT
with mock.patch.object(
QuerySet,
"_extract_model_params",
autospec=True,
wraps=QuerySet._extract_model_params,
side_effect=effect,
):
child, created = Child.objects.get_or_create(parent=parent,
defaults={})
assert not created
assert child.parent_id is not None
assert parent.child.id is not None
}}}
At first glance this test case will seem quite janky. The patching is
being done to emulate the effect of two racing threads by injecting a
concurrent call to `get_or_create` in the middle of the outer
`get_or_create`.
--
Ticket URL: <https://code.djangoproject.com/ticket/36489>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/django-updates/01070197c2b6fe8c-a11991f9-26e4-435a-997a-f8b4ef8ecdee-000000%40eu-central-1.amazonses.com.