Faster Migrations! But at what cost?

Raphael Gaschignard Sun, 19 May 2019 19:13:30 -0700

Hi Developers,

  We have a decently-sized "large project", around 240 models across 90 
apps, with roughly 500 migrations to work off of. We do periodically squash 
migrations to keep the migration count under control, but because of all 
this migrations in our testing server take 3-5 minutes to run to 
completion.

I am not sure about what the size of a typical Django project is (or
rather, a typical "large project") so it's hard for me to quantify how big
of an issue this is.

Looking through the migration code and some profiling I found a place where
caching was possible (on the ModelState -> Model rendering, based on some
of the invariants stated in ModelState code), which would cut *our* full
migration from 230 seconds to 50 seconds (on my machine at least). On the
specific caching I did, I was hitting a 90% cache hit rate on our full
migration run.

Caching is always a bit scary, though, and there are a *lot* of places in
the apps registry code/model registration code in particular where caches
are constantly being wiped. So this stuff scares me quite a bit. In my
personal ideal, I would love to be able to check in my caching thing but
have it be behind some MIGRATIONS_FASTER_BUT_MAYBE_UNSAFE flag. I am not
recommending this for Django because it's not how the project tends to do
things, this is just my personal feeling. After all, you're rarely running
all your migrations in production, so this is a testing problem more than
anything.

I do think there would be an alternative way to move forward though.
Currently the migrations Operation class relies on having the from_state
and to_state for DB operations in particular. But I think that we could
change up this API based on how these properties are used in
Django-provided Operation classes to avoid having to copy the state to
provide from_state and to_state. I haven't gone through with this
investigation too much yet but I think this would improve things a bit.

So this is a multi-pronged question:

- Have there ever been any surveys about how the size of Django projects? I
don't know the value of investigating this further except for our own usage.

- Does the caching of ModelState.render as done in this PR
<https://github.com/django/django/pull/11388> (still need to work through a
couple failing tests) sound reasonable? Or is this veering too far in the
performance/safety guarantee tradeoff?
- Is the migration operation infrastructure considered a public API? As in,
would changing the Operation model API (potentially breaking subclasses) be
considered a major undertaking? Or would it be an acceptable cost to pay
for some performance improvements?

I am still trying to wrap my head around some of this problem space, so any
insight will be very appreciated

Thanks,
Raphael

--
You received this message because you are subscribed to the Google Groups
"Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-developers/fd945497-ef84-4135-b92a-5473ca098809%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Faster Migrations! But at what cost?

Reply via email to