Hello,
fetching 3GB of existing records to only pass afterwards to
bulk_create() some non-existent ones is not feasible.
I found a way to convince Postgresql to report which rows were not
inserted on INSERT ON CONFLICT DO UPDATE:
Consider this:
CREATE TABLE t (
id SERIAL PRIMARY KEY,
name VARCHAR(10) NOT NULL UNIQUE,
comment VARCHAR(10) NOT NULL);
And now the magic:
WITH
to_be_inserted AS (
SELECT 'name1' AS "name", 'comment1' as "comment" UNION ALL
SELECT 'name4', 'comment4' UNION ALL
SELECT 'name5', 'comment5'),
successfully_inserted AS (
INSERT INTO t ("name", "comment" ) SELECT *
FROM to_be_inserted ON CONFLICT DO NOTHING RETURNING *)
SELECT s.id FROM to_be_inserted AS b
LEFT JOIN successfully_inserted AS s ON (b.name = s.name AND
b.comment = s.comment);
Returns a column "id" where for each record from to_be_inserted the id
is NULL for already existing records, or the new identifier.
This way bulk_create() can be implemented, so that it sends post_save
signal for all records created, forwards ON CONFLICT DO NOTHING to the
database and returns only the objects from its input, which were
actually created.
Looking at the existing code, my feeling is that this query does not fit
anyhow in the current approaches, hence I will be very glad if somebody
gets expired from this idea and implements it in Django.
Greetings
Дилян
On 09/28/2017 07:20 PM, Tom Forbes wrote:
I've been in similar situations before, you can usually get away with
using a single query to fetch existing records and only pass data that
doesn't exist to bulk_create. This works great for a single identity
column, but if you have multiple it gets messy.
It seems all supported databases offer at least ON CONFLICT IGNORE in
some form or another, with pretty similar syntax.
On 28 Sep 2017 18:11, "Дилян Палаузов" <dpa-dja...@aegee.org
<mailto:dpa-dja...@aegee.org>> wrote:
Hello,
I want after a user request to be sure that certain objects are
stored in a Postgres database, even if before the request some of
the objects were there.
The only way I can do this with django, not talking about raw sql,
is with "for obj in objects: Model.objects.get_or_create(obj)". It
works, but creates several INSERTs, and is hence suboptimal.
I cannot use bulk_create(), which squeezes all the INSERTs to a
single one, as it does not work, if any of the to-be-inserted rows
was already in the database.
In Postgresql this can be achieved by sending "INSERT ... ON
CONFLICT DO NOTHING".
I propose changing the interface of QuerySet.bulk_create to accept
one more parameter on_conflict, that can be a string e.g. "DO
NOTHING" or Q-objects (which could be used to implement ON CONFLICT
DO UPDATE SET ... WHERE ...
def bulk_create(self, objs, batch_size=None, on_conflict=None): ...
What are the arguments against or in favour?
The further, bulk_create() does not send post_save signal, because
it is difficult to implement with the standard backends, except with
postgresql.
I propose extending the implementation to send the signal:
https://code.djangoproject.com/ticket/28641#comment:1
<https://code.djangoproject.com/ticket/28641#comment:1>
when Postgresql is used. I assume there a no users, who want to get
a (post_save) signal on save() but not on bulk_create().
Combining ON CONFLICT DO NOTHING with sending post_save gets however
nasty, as "INSERT ... ON CONFLICT DO NOTHING RETURNING id;" does not
return anything on unchanged rows, hence the system knows at the end
how much rows were changed, but not which, so it cannot determine
for which objects to send post_save. At least I have not found a
way how to figure out which rows were inserted/not inserted.
However, this can be achieved by RETURNING * and then comparing the
returned objects to the sent objects, eventually making
bulk_create() return the objects actually inserted in the database.
These changes will allow a switch to a single INSERT on Postgresql.
Regards
Дилян
--
You received this message because you are subscribed to the Google
Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to django-developers+unsubscr...@googlegroups.com
<mailto:django-developers%2bunsubscr...@googlegroups.com>.
To post to this group, send email to
django-developers@googlegroups.com
<mailto:django-developers@googlegroups.com>.
Visit this group at
https://groups.google.com/group/django-developers
<https://groups.google.com/group/django-developers>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-developers/daa88462-c095-dfcc-2ce7-6d34f6bbc2f6%40aegee.org
<https://groups.google.com/d/msgid/django-developers/daa88462-c095-dfcc-2ce7-6d34f6bbc2f6%40aegee.org>.
For more options, visit https://groups.google.com/d/optout
<https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google
Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to django-developers+unsubscr...@googlegroups.com
<mailto:django-developers+unsubscr...@googlegroups.com>.
To post to this group, send email to django-developers@googlegroups.com
<mailto:django-developers@googlegroups.com>.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-developers/CAFNZOJOUGzYixc4cvPF2%2B_VTo2YwqwVDk%2BTkErxW3hjXvpaXbQ%40mail.gmail.com
<https://groups.google.com/d/msgid/django-developers/CAFNZOJOUGzYixc4cvPF2%2B_VTo2YwqwVDk%2BTkErxW3hjXvpaXbQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Django
developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-developers/bdd293c6-e1cd-fe2d-fa8f-45f803bcd1df%40aegee.org.
For more options, visit https://groups.google.com/d/optout.