Hi everyone,
I agree that there is room for improvement around migrations in Django
with respect to the expected downtime/table locking. But I don’t know
whether this should in Django itself or handled by a 3rd party library.
I’m maintaining the django-migration-linter (the very idea was to have
an automated tool since reviewing migrations is not something new Django
devs can do easily), which currently tackles another issue than the one
you mention. It tries, in a best effort manner as you’ve seen through
the SQL checks, to understand if the migration is introducing a backward
incompatible change to your schema, which will desynchronize your code
and database schema (so you’ll get errors when you deploy one without
the other – and you’ll have a hard time doing a rollback).
It’s a different aspect, compared to checking that the forward migration
will be instant and non-locking. However, it is planned to add this to
the linter too
(https://github.com/3YOURMIND/django-migration-linter/issues/99
<https://github.com/3YOURMIND/django-migration-linter/issues/99>).
You’ve noticed that it’s based on the SQL strings, which has advantages
and drawbacks. The checks are easy to implement (just a set of rules
basically), it could be completely Django-agnostic and it can lint
RunSQL operations. But it’s a bit harder to represent a state when
multiple SQL statements are executed, to check if they cancel each other
out for instance.
An idea about something that could be in Django would be, somehow like
the “database features”, marking for each migration operation
(AlterField, AddField, …) on each database backend, if they can
potentially mean downtime. It would be a hint about the risk of the
migration, and display a warning if necessary.*
*Going one step further, for some operations, Django could query the
number of rows of the corresponding table and have some risk heuristics.
Basically: “you’re adding a column to a table with less than 100 rows,
this should not be risky” or “you’re adding a column to a table with
millions of rows, be careful!”.
However, as these heuristics would be rather arbitrary, I’m not sure
that the Django project would want to make this kind of choices – so it
might be a better fit for a library. (what I’m describing is more or
less the ultimate goal the django-migration-linter)
Hope this gives some insights about the project and how it could help on
this topic :)
Cheers,
David
On 23/12/2020 19:02, Tom Forbes wrote:
Thanks for all the input here!
I didn’t know about django-migration-linter, and it does seem quite
interesting. It works by looking at the complete set of SQL statements
that would be run, and running regular expressions/string comparisons
on it
(https://github.com/3YOURMIND/django-migration-linter/blob/master/django_migration_linter/sql_analyser/base.py#L59-L62
<https://github.com/3YOURMIND/django-migration-linter/blob/master/django_migration_linter/sql_analyser/base.py#L59-L62>).
I don’t hate this idea, but I think that having direct access to the
migration objects themselves could greatly simplify the implementation.
Django-pg-zero-downtime-migrations also looks interesting and is more
in-line with what I was imagining
(https://github.com/tbicr/django-pg-zero-downtime-migrations/blob/master/django_zero_downtime_migrations/backends/postgres/schema.py
<https://github.com/tbicr/django-pg-zero-downtime-migrations/blob/master/django_zero_downtime_migrations/backends/postgres/schema.py>)
- it overrides the schema editor to detect unsafe migrations, and
provides a nice way to abort migrations that take locks for too long.
However I feel that the implementation could be greatly simplified
with some support for Django - you shouldn’t really need to override
the schema editor, Django should pass you a list of operations and ask
“is this safe?”. The results can be reported via the checks framework
which allows application owners can decide to treat these warnings as
errors or silence them as needed. It also brings those warnings
front-and-centre while developing, rather than having it fail when
running “migrate” later on in the deployment process.
Django-safemigrate is very related and something I wanted to cover in
a later message. I feel that some of this functionality could (or even
should) be integrated with Django, as migrations in “real” apps are
usually a two step process: you run the schema migrations, which might
add a nullable column, deploy your app, then run data migrations to
populate the column. At least bringing the concept of a pre-deploy and
a post-deploy migration to Django would be pretty useful I think.
I’d say the open questions so far are:
1. Does it make sense to add a specific system check registration
function for migrations? If not, how do we expose this to users
wishing to write system checks?
2. Are we comfortable with creating a public API for some of the
migrations internals?
3. Should we (or can we?) make the distinction between pre-deploy and
post-deploy migrations?
Another consideration with all this is that for a non-trivial amount
of applications all this doesn’t really matter. If you’ve got an
internal app that is used during business hours almost none of this
applies - you can get away with downtime and “dangerous” migrations,
and similarly if you are starting a new project you don’t want Django
to start complaining about adding a non-null field to a table that has
no rows in a project that isn’t even deployed yet. This is why I think
the MVP of this would just be to allow users to write their own
business-specific checks for migrations and allow third party apps to
more easily integrate with the migration framework, rather than
anything more complicated.
Tom
On 23 Dec 2020 at 17:32:23, Ryan Hiebert <r...@ryanhiebert.com
<mailto:r...@ryanhiebert.com>> wrote:
This is an interesting discussion that is separate from, but
related to, django-safemigrate, which we use to separate which
migrations may run before or after deployment. I intend follow
along with this discussion, to see if there are reasonable ways to
identify when these zero-downtime operations should be run
compared with deployments.
https://github.com/aspiredu/django-safemigrate
<https://github.com/aspiredu/django-safemigrate>
On Wed, Dec 23, 2020 at 10:01 AM Paveł Tyślacki
<pavel.tysla...@gmail.com <mailto:pavel.tysla...@gmail.com>> wrote:
Want to share lib I work previously for migration safety with
postgres some time ago:
https://github.com/tbicr/django-pg-zero-downtime-migrations
<https://github.com/tbicr/django-pg-zero-downtime-migrations>
there are also checker and replacement of unsafe mirations
with safe replacement.
ср, 23 дек. 2020 г. в 12:18, Adam Johnson <m...@adamj.eu
<mailto:m...@adamj.eu>>:
Hi Tom,
I love this idea and would like to help developers write
safe migrations too.
You didn't mention this pre-existing project:
https://github.com/3YOURMIND/django-migration-linter
<https://github.com/3YOURMIND/django-migration-linter> .
It might be worth checking how that works.
I'm not sure what any public API could look like here. In
django-linear-migrations (
https://github.com/adamchainz/django-linear-migrations
<https://github.com/adamchainz/django-linear-migrations> )
I tapped a bit into the MigrationLoader class, which is
undocumented but fairly stable. Perhaps we could just
document a few more of the migration internals?
On Tue, 22 Dec 2020 at 22:58, Tom Forbes <t...@tomforb.es
<mailto:t...@tomforb.es>> wrote:
Hey,
A fairly common problem with large and/or highly
trafficked web applications is migration safety: you
want your migrations to be non-locking and
instantaneous. The specifics are very database
dependent but in Postgres simply altering the
nullability of a column can incur serious downtime
depending on the size of the table in question. The
same thing goes for adding a new non-nullable column
to a table or even just creating an index without the
`CONCURRENTLY` option.
Code review is one way of catching these issues but an
automated solution that runs as part of the test suite
would be the preferred solution. I don’t think using
some kind of linter for this is a good idea as it
doesn’t have the context to decide if it _is_ a safe
migration (i.e removing `NOT NULL` from a column in
the same migration that creates the column is safe). A
linter also doesn’t know which migrations are pending
and which have been applied.
Rails has the pretty nice strong_migrations Gem
(https://github.com/ankane/strong_migrations
<https://github.com/ankane/strong_migrations>) which
attempts to detect dangerous migrations. Unless I’m
missing something right now it seems quite hard to add
automated checks like this to Django migrations that
can inspect the full set of “pending migration
operations". We have the `pre_migrate` signal that
takes a “plan” argument with the scary sounding caveat
that the passed object has no public API, but using a
signal feels kind of wrong. It should be part of a
system check - i.e a warning that says “your app may
die if this migration is applied”.
I’d like to make this simpler to implement on a
per-project basis (or in a third party app) but I’m
not sure how. It should be a system check but the only
suitable hook we have is the generic `register()`
callback that takes a set of `app_configs`, and I’d
expect going from that to a pending set of operations
would be non trivial. A user-facing interface for this
might be a function that is called for every specific
type of migration operation with the operation itself
and a list of all operations that would be applied
of “migrate” was run.
```
@some_migration_callback_decorator(AlterField)
def check_null(operation, pending_operations):
If operation.is_going_to_not_null:
if not
table_being_created(operation.table_name,
pending_operations):
return Warning(f’Setting
{operation.field_name} to NOT NULL may cause downtime’)
```
I’m not sure if Django should ship migration warnings
itself but having the ability for projects to enforce
arbitrary constraints on their migrations might be an
interesting feature?
Slightly related to
https://code.djangoproject.com/ticket/31700
<https://code.djangoproject.com/ticket/31700>, where
we want to add more contextual/coloured outputs for
dangerous operations.
Tom
--
You received this message because you are subscribed
to the Google Groups "Django developers (Contributions
to Django itself)" group.
To unsubscribe from this group and stop receiving
emails from it, send an email to
django-developers+unsubscr...@googlegroups.com
<mailto:django-developers+unsubscr...@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-developers/CAFNZOJNoi583wUoQp%2BRBB_%2B3i_vqSPRbVNwhiU_OG7V2eL8x3g%40mail.gmail.com
<https://groups.google.com/d/msgid/django-developers/CAFNZOJNoi583wUoQp%2BRBB_%2B3i_vqSPRbVNwhiU_OG7V2eL8x3g%40mail.gmail.com?utm_medium=email&utm_source=footer>.
--
Adam
--
You received this message because you are subscribed to
the Google Groups "Django developers (Contributions to
Django itself)" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to
django-developers+unsubscr...@googlegroups.com
<mailto:django-developers+unsubscr...@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-developers/CAMyDDM1vtHgDU2VO-kyV9CM%2BxXRPbPmHKgUVJqYGxd2POdou%2BQ%40mail.gmail.com
<https://groups.google.com/d/msgid/django-developers/CAMyDDM1vtHgDU2VO-kyV9CM%2BxXRPbPmHKgUVJqYGxd2POdou%2BQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the
Google Groups "Django developers (Contributions to Django
itself)" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to
django-developers+unsubscr...@googlegroups.com
<mailto:django-developers+unsubscr...@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-developers/CAL6gfQY-hCG3pH7asZ1GCqBuOOpYnOHgzMDMfyaYVpEkme4s8A%40mail.gmail.com
<https://groups.google.com/d/msgid/django-developers/CAL6gfQY-hCG3pH7asZ1GCqBuOOpYnOHgzMDMfyaYVpEkme4s8A%40mail.gmail.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google
Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to django-developers+unsubscr...@googlegroups.com
<mailto:django-developers+unsubscr...@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-developers/CABpHFHTDHevJLPe8XAGKWR%2BrgxznJJPSSUxM_j8aTjc5j5aUDw%40mail.gmail.com
<https://groups.google.com/d/msgid/django-developers/CABpHFHTDHevJLPe8XAGKWR%2BrgxznJJPSSUxM_j8aTjc5j5aUDw%40mail.gmail.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google
Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to django-developers+unsubscr...@googlegroups.com
<mailto:django-developers+unsubscr...@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-developers/CAFNZOJMJm_g20VjxEpojj46e4Zjp5h4qwq7QS6DMUCw4TeinRw%40mail.gmail.com
<https://groups.google.com/d/msgid/django-developers/CAFNZOJMJm_g20VjxEpojj46e4Zjp5h4qwq7QS6DMUCw4TeinRw%40mail.gmail.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups "Django
developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-developers/eb1590cc-1e07-2749-39a7-30e67b9d4955%40gmail.com.