#36489: OneToOneField + concurrent get_or_create results in wrong object in
field
cache
--------------------------------+--------------------------------------
Reporter: Brian Atkinson | Owner: (none)
Type: Uncategorized | Status: new
Component: Uncategorized | Version: 5.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------+--------------------------------------
Description changed by Brian Atkinson:
Old description:
> When using `get_or_create` and `update_or_create` in a context where
> concurrent creations are happening, the caller that loses the creation
> race will end up with an incomplete object in the field cache.
>
> Example models:
>
> {{{
> from django.db import models
>
> class Parent(models.Model):
> name = models.TextField()
>
> class Child(models.Model):
> parent = models.OneToOneField(Parent, on_delete=models.CASCADE)
> }}}
>
> In effect, we have two separate processes both expecting to be able to
> call `Child.objects.get_or_create(parent=common_parent, defaults={...})`
> and have the resulting `common_parent.child` be the correct result. We
> are seeing both processes fail the initial get call
> (https://github.com/django/django/blob/ff0ff98d427982b7225df59f454a86bdf66251d6/django/db/models/query.py#L977)
> and proceeding to attempt the create call
> (https://github.com/django/django/blob/ff0ff98d427982b7225df59f454a86bdf66251d6/django/db/models/query.py#L984).
> One of the processes succeeds on the insert, and the other will fail the
> insert. The process that fails the insert correctly issues the second get
> call
> (https://github.com/django/django/blob/ff0ff98d427982b7225df59f454a86bdf66251d6/django/db/models/query.py#L987)
> and returns the result.
>
> The bug is that for the process that fails the insert, the object at
> `common_parent.child` is not the value returned from the get_or_create.
> It seems to be the object that was created for insert, but wasn't cleared
> when the insert failed. This can be observed through the
> `common_parent.child.id` field being None. This has caused data
> corruption and other downstream bugs due to the wrong data being cached.
>
> Here is a test case that is a linear representation of the concurrent bug
> described above:
>
> {{{
> from unittest import mock
>
> from django.db.models import QuerySet
> from django.test import TestCase
>
> from sample.models import Child, Parent
>
> class TestBug(TestCase):
> def test_bug(self):
> parent = Parent.objects.create(name="Test name")
>
> triggered = False
>
> def effect(*args, **kwargs):
> nonlocal triggered
> if not triggered and kwargs.get("parent") == parent:
> triggered = True
> Child.objects.update_or_create(
> # Force a brand new instance (emulates racing
> thread).
> parent=Parent.objects.get(pk=parent.pk),
> defaults={},
> )
> return mock.DEFAULT
>
> with mock.patch.object(
> QuerySet,
> "_extract_model_params",
> autospec=True,
> wraps=QuerySet._extract_model_params,
> side_effect=effect,
> ):
> child, created = Child.objects.get_or_create(parent=parent,
> defaults={})
> assert not created
>
> assert child.parent_id is not None
> assert parent.child.id is not None
> }}}
>
> At first glance this test case will seem quite janky. The patching is
> being done to emulate the effect of two racing threads by injecting a
> concurrent call to `get_or_create` in the middle of the outer
> `get_or_create`.
New description:
When using `get_or_create` and `update_or_create` in a context where
concurrent creations are happening, the caller that loses the creation
race will end up with an incomplete object in the field cache.
Example models:
{{{
from django.db import models
class Parent(models.Model):
name = models.TextField()
class Child(models.Model):
parent = models.OneToOneField(Parent, on_delete=models.CASCADE)
}}}
In effect, we have two separate processes both expecting to be able to
call `Child.objects.get_or_create(parent=common_parent, defaults={...})`
and have the resulting `common_parent.child` be the correct result. We are
seeing both processes fail the initial get call
(https://github.com/django/django/blob/5.2.3/django/db/models/query.py#L946)
and proceeding to attempt the create call
(https://github.com/django/django/blob/5.2.3/django/db/models/query.py#L953).
One of the processes succeeds on the insert, and the other will fail the
insert. The process that fails the insert correctly issues the second get
call
(https://github.com/django/django/blob/5.2.3/django/db/models/query.py#L956)
and returns the result.
The bug is that for the process that fails the insert, the object at
`common_parent.child` is not the value returned from the get_or_create. It
seems to be the object that was created for insert, but wasn't cleared
when the insert failed. This can be observed through the
`common_parent.child.id` field being None. This has caused data corruption
and other downstream bugs due to the wrong data being cached.
Here is a test case that is a linear representation of the concurrent bug
described above:
{{{
from unittest import mock
from django.db.models import QuerySet
from django.test import TestCase
from sample.models import Child, Parent
class TestBug(TestCase):
def test_bug(self):
parent = Parent.objects.create(name="Test name")
triggered = False
def effect(*args, **kwargs):
nonlocal triggered
if not triggered and kwargs.get("parent") == parent:
triggered = True
Child.objects.update_or_create(
# Force a brand new instance (emulates racing thread).
parent=Parent.objects.get(pk=parent.pk),
defaults={},
)
return mock.DEFAULT
with mock.patch.object(
QuerySet,
"_extract_model_params",
autospec=True,
wraps=QuerySet._extract_model_params,
side_effect=effect,
):
child, created = Child.objects.get_or_create(parent=parent,
defaults={})
assert not created
assert child.parent_id is not None
assert parent.child.id is not None
}}}
At first glance this test case will seem quite janky. The patching is
being done to emulate the effect of two racing threads by injecting a
concurrent call to `get_or_create` in the middle of the outer
`get_or_create`.
--
--
Ticket URL: <https://code.djangoproject.com/ticket/36489#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/django-updates/01070197c2b970b0-bb712210-7e16-46dc-ad23-d925e02f1bb5-000000%40eu-central-1.amazonses.com.