bulk_update() support for unique fields instead of only primary key

2022-06-06 Thread Ebram Shehata
I've already created a ticket that ended up with 'WONTFIX' here 
.
Please read it first.

The scenario that led me trying to add that feature:
Well, I'v a model that has a primary key and unique field...
Every  X period of time I've a cron that calls an external API to load 
updated data and store it in my Django project db.
Thing is I want to execute the update query without having to have the pk 
with me.. I already have the unique field in hand (got it from the API) and 
it should be enough to identify what object to be updated..

example code that will do so..

*# call external API and load data as list[dict]*
*objs = list()*
*for record in data:*
*# Just create instance of MyModel class with values needed..*
*objs.append(MyModel(my_unique_field=data['national_id'], name="New 
Name"))*

*# I expect to do:MyModel.objects.bulk_update(objs, fields=["name"], 
unique_field="my_unique_field")*

The current implementation requires every object to have its primary key 
set, but, that's not always the case. Also, if the primary key is 
my-django-project-generated, there's no way AFAIK to get those PKs without 
having to query them from DB, which will be a very expensive query to run 
if the data is large.

You can find the POC PR here: https://github.com/django/django/pull/15764

I really thing that'd be a very useful update to add.

I also have a question about this function: We can't use it to update 
primary keys.. I'm wondering why ? I want to also add that support to it.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/f532b4fd-aba6-44dd-b373-e6e934ff8b67n%40googlegroups.com.


Re: bulk_update() support for unique fields instead of only primary key

2022-06-07 Thread Ebram Shehata

Thank you for your reply.

> Implementation might get complicated in multi table
> inheritance, where the unqiue field may not reside on the target table
> itself (would need a similar tables+rows lookup expansion as done for pk).

How can it be complicated ? I thought it's just something like 
getattr(MyModel, field_name) to get value and
model._meta.get_field(unique_field) to get the field instance..
Where can I find explanation for what you mean? and where can I find the PK 
code you're referring to?

> Overall this sounds like a valuable API addition to bulk_update

Thank you. I was asked to re-open the ticket if there's an agreement to add 
the update..
Should I re-open it or we will need more than one person to agree on the 
update ?
(I'm sorry, this is my first contribution)
On Tuesday, June 7, 2022 at 8:03:13 AM UTC+2 j.bre...@netzkolchose.de wrote:

> Imho the pk field being treated special by bulk_update (as the only 
> record filter predicate, non updatable) comes from one simple idea - 
> stable object identity during the updates. For an ORM proper object 
> identification is rather important to not mix things up, shifting 
> identities might create havoc further down the road or in internal code, 
> if it is not prepared to deal with identity shifts. Whether stable 
> identities are really needed for bulk_update or could be lifted by 
> making pk updatable I cannot tell for sure.
>
> About record identification by another unique field - imho that should 
> be safe as long as the unique field can sufficiently identify rows in 
> question on a table. Implementation might get complicated in multi table 
> inheritance, where the unqiue field may not reside on the target table 
> itself (would need a similar tables+rows lookup expansion as done for pk).
> Overall this sounds like a valuable API addition to bulk_update, whether 
> it ultimately could lift the non updatable state of pk itself, remains 
> uncertain (prolly needs tons of tests and a serious code review of some 
> internals).
>
> Cheers,
> jerch
>
> NB: Btw fetching proper pks upfront from some other unique field is 
> typically very cheap compared to bulk_update runtime itself, given you 
> have indexed those columns.
>
>
> Am 07.06.22 um 02:15 schrieb Ebram Shehata:
> > I've already created a ticket that ended up with 'WONTFIX' here 
> > <https://code.djangoproject.com/ticket/33770>.
> > Please read it first.
> > 
> > The scenario that led me trying to add that feature:
> > Well, I'v a model that has a primary key and unique field...
> > Every  X period of time I've a cron that calls an external API to load 
> > updated data and store it in my Django project db.
> > Thing is I want to execute the update query without having to have the 
> > pk with me.. I already have the unique field in hand (got it from the 
> > API) and it should be enough to identify what object to be updated..
> > 
> > example code that will do so..
> > 
> > /# call external API and load data as list[dict]/
> > /objs = list()/
> > /for record in data:/
> > /# Just create instance of MyModel class with values needed../
> > /objs.append(MyModel(my_unique_field=data['national_id'], name="New 
> > Name"))/
> > /# I expect to do:
> > MyModel.objects.bulk_update(objs, fields=["name"], 
> > unique_field="my_unique_field")/
> > 
> > The current implementation requires every object to have its primary key 
> > set, but, that's not always the case. Also, if the primary key is 
> > my-django-project-generated, there's no way AFAIK to get those PKs 
> > without having to query them from DB, which will be a very expensive 
> > query to run if the data is large.
> > 
> > You can find the POC PR here: 
> https://github.com/django/django/pull/15764
> > 
> > I really thing that'd be a very useful update to add.
> > 
> > I also have a question about this function: We can't use it to update 
> > primary keys.. I'm wondering why ? I want to also add that support to it.
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> > Groups "Django developers (Contributions to Django itself)" group.
> > To unsubscribe from this group and stop receiving emails from it, send 
> > an email to django-develop...@googlegroups.com 
> > <mailto:django-develop...@googlegroups.com>.
> > To view this discussion on the web visit 
> > 
> https://groups.google.com/d/msgid/django-developers/f532b4fd-aba6-44dd-b373-e6e934ff8b67n%40googlegroups.com
>  
> &g

Re: bulk_update() support for unique fields instead of only primary key

2022-06-19 Thread Ebram Shehata
Ah, I see. Thank you for explaining the thing and linking the code.
It looks more interesting now! I'm interested to start working on it.
I'll wait for approval from other people, too.

On Wednesday, June 8, 2022 at 2:17:32 PM UTC+2 j.bre...@netzkolchose.de 
wrote:

>
> > How can it be complicated ? I thought it's just something like 
> > getattr(MyModel, field_name) to get value and
> > model._meta.get_field(unique_field) to get the field instance..
> > Where can I find explanation for what you mean? and where can I find the 
> > PK code you're referring to?
>
> Field updates for multi-table models are internally split into multiple 
> tables updates, if the fields reside on several tables from the 
> inheritance chain. The identity of records is matched against pk values 
> stored in OneToOne xxx_ptr fields on the subtables. You can find the 
> construction logic for this in UpdateQuery.get_related_updates 
> (
> https://github.com/django/django/blob/49b470b9187b6be60e573fed08c8f4a87f133750/django/db/models/sql/subqueries.py#L124
> ).
>
> With using a different field for record identification without having 
> the pk (thus from degenerated model instances) this wont work anymore 
> (note it directly refers to pk__in selection). Most likely you will have 
> to do an explicit db roundtrip resolving "some_unique_field -> pk" 
> upfront to keep it working for multi-table spanning updates. Thats where 
> the impl might get complicated/messy.
>
>
> > Thank you. I was asked to re-open the ticket if there's an agreement to 
> > add the update..
> > Should I re-open it or we will need more than one person to agree on the 
> > update ?
> > (I'm sorry, this is my first contribution)
>
> You def. should get for more feedback, a single opinion does not mean 
> much here.
>
> Cheers, jerch
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/aa2081d9-90ea-4bd5-b51b-ad21468324b5n%40googlegroups.com.