ORM difficulties

2015-12-05 Thread Raphael Gaschignard


Hi list, 

  I want to preface this by saying I’m really glad to see the ORM where it 
is in 1.8, it’s gotten really far and I now think it’s not hopeless to 
imagine writing complex things in the ORM…


   So earlier this week I wrote a bit of a rant in the bug tracker about 
how annotate is confusing (https://code.djangoproject.com/ticket/25834).


  A short summary is that annotate and values (with a spattering of order_by) 
are basically what decides grouping, but there’s a lot of trial and error 
involved in getting the right grouping with a tricky interaction between 
the two.


  A behaviour that exemplifies this is that 
queryset.annotate(foo=thing).annotate(bar=other_thing) is *not* the same as 
queryset.annotate(foo=thing, 
bar=other_thing) given certain things. This goes against the intuitive 
interpretation of the queryset API IMO.


  I think there should be some update to the API to render grouping more 
explicit. Absent that, given that the docs are 

now oriented in a sort of “You shouldn’t need extras anymore” fashion, 
there really should be a “SQL to ORM” migration guide/cookbook to point out 
explicitly how to go from SELECT stuff FROM table GROUP BY properties to 
the right annotate/values. I found what jarshwah pointed out in the ticket 
was really helpful.


  Also, some random feature requests:

   - .values(‘stuff’, my_thing=Coalesce(‘thing’, ‘stuff’)) should work
   - there should be a provided Year and Month functions to extract 
   years/months from date fields

  Anyways I wanted to share this experience so that anyone who has the 
courage to right new docs/continue evolving things can know of at least one 
team’s difficulties.


 Raphael

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/8d175352-d15e-4dc2-a471-014d6b0bc3e2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Faster Migrations! But at what cost?

2019-05-19 Thread Raphael Gaschignard
Hi Developers,

  We have a decently-sized "large project", around 240 models across 90 
apps, with roughly 500 migrations to work off of. We do periodically squash 
migrations to keep the migration count under control, but because of all 
this migrations in our testing server take 3-5 minutes to run to 
completion. 

I am not sure about what the size of a typical Django project is (or 
rather, a typical "large project") so it's hard for me to quantify how big 
of an issue this is.

Looking through the migration code and some profiling I found a place where 
caching was possible (on the ModelState -> Model rendering, based on some 
of the invariants stated in ModelState code), which would cut *our* full 
migration from 230 seconds to 50 seconds (on my machine at least). On the 
specific caching I did, I was hitting a 90% cache hit rate on our full 
migration run.

Caching is always a bit scary, though, and there are a *lot* of places in 
the apps registry code/model registration code in particular where caches 
are constantly being wiped. So this stuff scares me quite a bit. In my 
personal ideal, I would love to be able to check in my caching thing but 
have it be behind some MIGRATIONS_FASTER_BUT_MAYBE_UNSAFE flag. I am not 
recommending this for Django because it's not how the project tends to do 
things, this is just my personal feeling. After all, you're rarely running 
all  your migrations in production, so this is a testing problem more than 
anything.

I do think there would be an alternative way to move forward though. 
Currently the migrations Operation class relies on having the from_state 
and to_state for DB operations in particular. But I think that we could 
change up this API based on how these properties are used in 
Django-provided Operation classes to avoid having to copy the state to 
provide from_state and to_state. I haven't gone through with this 
investigation too much yet but I think this would improve things a bit.

So this is a multi-pronged question:

- Have there ever been any surveys about how the size of Django projects? I 
don't know the value of investigating this further except for our own usage.

- Does the caching of ModelState.render as done in this PR 
 (still need to work through a 
couple failing tests) sound reasonable? Or is this veering too far in the 
performance/safety guarantee tradeoff?
- Is the migration operation infrastructure considered a public API? As in, 
would changing the Operation model API (potentially breaking subclasses) be 
considered a major undertaking? Or would it be an acceptable cost to pay 
for some performance improvements?

I am still trying to wrap my head around some of this problem space, so any 
insight will be very appreciated

Thanks, 
   Raphael

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/fd945497-ef84-4135-b92a-5473ca098809%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Faster Migrations! But at what cost?

2019-05-20 Thread Raphael Gaschignard
a.get_field("author").related_model` would not return `` but `"myapp.Author"`. However, as far as I can tell, that's highly 
backward incompatible. And at this point, I also don't see a way to make this backward compatible.

Cheers,

Markus

On Mon, May 20, 2019, at 5:26 PM, charettes wrote:

Hello Raphael,


Have there ever been any surveys about how the size of Django projects? I don't 
know the value of investigating this further except for our own usage.

I'm not aware of any similar surveys in the recent years but I would
say *240 models across 90 apps, with roughly 500 migrations* would be
considered a really large project in my experience. Did you look into
squashing these 500 migrations by any chance? Something we did at
$DAYJOB to speed up the test bootstraping process is to prebuild
containers with migrations already applied in production so CI running
PRs only applies new migrations on top of them.


Does the caching of ModelState.render as done in this PR (still need to work 
through a couple failing tests) sound reasonable? Or is this veering too far in 
the performance/safety guarantee tradeoff?

While the layer you added seems to yield significant benefits I would
argue that it complicates an already too complex apps rendering caching
layer. As you'll probably come to discover while trying to resolve the
currently failing tests model.Fields equality is not implemented how
you'd expect it to be[0] and thus require costly deconstruction to be
used as a cache staleness predicate[1].


Is the migration operation infrastructure considered a public API? As in, would 
changing the Operation model API (potentially breaking subclasses) be 
considered a major undertaking? Or would it be an acceptable cost to pay for 
some performance improvements?

Given the large adoption of migrations and the fact the Operation API
is publicly documented[2] I would say the performance benefits would
need to be quite substantial to break backward compatibility. In my
opinion, and I think that's something Markus Holtermann who also worked
a lot on speeding up migrations would agree on, we should focus our
efforts on avoiding model rendering at all cost. We've already made all
state mutation (Operation.state_forwards) avoid all accesses to .apps
and I think the next step would be to make `database_forwards` and
`database_backwards` do the same. This is something Markus worked on a
few years ago[3].

Cheers,
Simon

[0]
https://github.com/django/django/blob/1d0bab0bfd77edcf1228d45bf654457a8ff1890d/django/db/models/fields/__init__.py#L495-L499
[1]
https://github.com/django/django/blob/1d0bab0bfd77edcf1228d45bf654457a8ff1890d/django/db/migrations/autodetector.py#L49-L87
[2]
https://docs.djangoproject.com/en/2.2/ref/migration-operations/#writing-your-own
[3]
https://github.com/django/django/compare/master...MarkusH:schemaeditor-modelstate

Le dimanche 19 mai 2019 22:13:03 UTC-4, Raphael Gaschignard a écrit :

Hi Developers,

  We have a decently-sized "large project", around 240 models across 90 apps, 
with roughly 500 migrations to work off of. We do periodically squash migrations to keep 
the migration count under control, but because of all this migrations in our testing 
server take 3-5 minutes to run to completion.

I am not sure about what the size of a typical Django project is (or rather, a typical 
"large project") so it's hard for me to quantify how big of an issue this is.

Looking through the migration code and some profiling I found a place where 
caching was possible (on the ModelState -> Model rendering, based on some of 
the invariants stated in ModelState code), which would cut *our* full migration 
from 230 seconds to 50 seconds (on my machine at least). On the specific caching I 
did, I was hitting a 90% cache hit rate on our full migration run.

Caching is always a bit scary, though, and there are a *lot* of places in the 
apps registry code/model registration code in particular where caches are 
constantly being wiped. So this stuff scares me quite a bit. In my personal 
ideal, I would love to be able to check in my caching thing but have it be 
behind some MIGRATIONS_FASTER_BUT_MAYBE_UNSAFE flag. I am not recommending this 
for Django because it's not how the project tends to do things, this is just my 
personal feeling. After all, you're rarely running all your migrations in 
production, so this is a testing problem more than anything.

I do think there would be an alternative way to move forward though. Currently 
the migrations Operation class relies on having the from_state and to_state for 
DB operations in particular. But I think that we could change up this API based 
on how these properties are used in Django-provided Operation classes to avoid 
having to copy the state to provide from_state and to_state. I haven't gone 
through with this investigation too much yet but I think this would im