Hello,

fetching 3GB of existing records to only pass afterwards to bulk_create() some non-existent ones is not feasible.

I found a way to convince Postgresql to report which rows were not inserted on INSERT ON CONFLICT DO UPDATE:

Consider this:
CREATE TABLE t (
  id SERIAL PRIMARY KEY,
  name VARCHAR(10) NOT NULL UNIQUE,
  comment VARCHAR(10) NOT NULL);

And now the magic:

WITH
  to_be_inserted AS (
     SELECT 'name1' AS "name", 'comment1' as "comment" UNION ALL
     SELECT 'name4',           'comment4'              UNION ALL
     SELECT 'name5',           'comment5'),
  successfully_inserted AS (
      INSERT INTO t ("name", "comment" ) SELECT *
        FROM to_be_inserted ON CONFLICT DO NOTHING RETURNING *)
SELECT s.id FROM to_be_inserted AS b
LEFT JOIN successfully_inserted AS s ON (b.name = s.name AND b.comment = s.comment);

Returns a column "id" where for each record from to_be_inserted the id is NULL for already existing records, or the new identifier.

This way bulk_create() can be implemented, so that it sends post_save signal for all records created, forwards ON CONFLICT DO NOTHING to the database and returns only the objects from its input, which were actually created.

Looking at the existing code, my feeling is that this query does not fit anyhow in the current approaches, hence I will be very glad if somebody gets expired from this idea and implements it in Django.

Greetings
  Дилян

On 09/28/2017 07:20 PM, Tom Forbes wrote:
I've been in similar situations before, you can usually get away with using a single query to fetch existing records and only pass data that doesn't exist to bulk_create. This works great for a single identity column, but if you have multiple it gets messy.

It seems all supported databases offer at least ON CONFLICT IGNORE in some form or another, with pretty similar syntax.

On 28 Sep 2017 18:11, "Дилян Палаузов" <dpa-dja...@aegee.org <mailto:dpa-dja...@aegee.org>> wrote:


    Hello,

    I want after a user request to be sure that certain objects are
    stored in a Postgres database, even if before the request some of
    the objects were there.

    The only way I can do this with django, not talking about raw sql,
    is with "for obj in objects: Model.objects.get_or_create(obj)".  It
    works, but creates several INSERTs, and is hence suboptimal.

    I cannot use bulk_create(), which squeezes all the INSERTs to a
    single one, as it does not work, if any of the to-be-inserted rows
    was already in the database.

    In Postgresql this can be achieved by sending "INSERT ... ON
    CONFLICT DO NOTHING".

    I propose changing the interface of QuerySet.bulk_create to accept
    one more parameter on_conflict, that can be a string e.g. "DO
    NOTHING" or Q-objects (which could be used to implement ON CONFLICT
    DO UPDATE SET ... WHERE ...

    def bulk_create(self, objs, batch_size=None, on_conflict=None): ...

    What are the arguments against or in favour?

    The further, bulk_create() does not send post_save signal, because
    it is difficult to implement with the standard backends, except with
    postgresql.

    I propose extending the implementation to send the signal:

    https://code.djangoproject.com/ticket/28641#comment:1
    <https://code.djangoproject.com/ticket/28641#comment:1>

    when Postgresql is used.  I assume there a no users, who want to get
    a (post_save) signal on save() but not on bulk_create().

    Combining ON CONFLICT DO NOTHING with sending post_save gets however
    nasty, as "INSERT ... ON CONFLICT DO NOTHING RETURNING id;" does not
    return anything on unchanged rows, hence the system knows at the end
    how much rows were changed, but not which, so it cannot determine
    for which objects to send post_save.  At least I have not found a
    way how to figure out which rows were inserted/not inserted.

    However, this can be achieved by RETURNING * and then comparing the
    returned objects to the sent objects, eventually making
    bulk_create() return the objects actually inserted in the database.

    These changes will allow a switch to a single INSERT on Postgresql.

    Regards
       Дилян

-- You received this message because you are subscribed to the Google
    Groups "Django developers  (Contributions to Django itself)" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to django-developers+unsubscr...@googlegroups.com
    <mailto:django-developers%2bunsubscr...@googlegroups.com>.
    To post to this group, send email to
    django-developers@googlegroups.com
    <mailto:django-developers@googlegroups.com>.
    Visit this group at
    https://groups.google.com/group/django-developers
    <https://groups.google.com/group/django-developers>.
    To view this discussion on the web visit
    
https://groups.google.com/d/msgid/django-developers/daa88462-c095-dfcc-2ce7-6d34f6bbc2f6%40aegee.org
    
<https://groups.google.com/d/msgid/django-developers/daa88462-c095-dfcc-2ce7-6d34f6bbc2f6%40aegee.org>.
    For more options, visit https://groups.google.com/d/optout
    <https://groups.google.com/d/optout>.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com <mailto:django-developers+unsubscr...@googlegroups.com>. To post to this group, send email to django-developers@googlegroups.com <mailto:django-developers@googlegroups.com>.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAFNZOJOUGzYixc4cvPF2%2B_VTo2YwqwVDk%2BTkErxW3hjXvpaXbQ%40mail.gmail.com <https://groups.google.com/d/msgid/django-developers/CAFNZOJOUGzYixc4cvPF2%2B_VTo2YwqwVDk%2BTkErxW3hjXvpaXbQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django 
developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/bdd293c6-e1cd-fe2d-fa8f-45f803bcd1df%40aegee.org.
For more options, visit https://groups.google.com/d/optout.

Reply via email to