On 18/03/2012, at 7:38 PM, Kushagra Sinha wrote:

> Abstract
> ------------------------------------------------------------------------------
> A database migration helper has been one of the most long standing feature
> requests in Django. Though Django has an excellent database creation helper,
> when faced with schema design changes, developers have to resort to either
> writing raw SQL and manually performing the migrations, or using third party
> apps like South[1] and Nashvegas[2].
> 
> Clearly Django will benefit from having a database migration helper as an
> integral part of its codebase.
> 
> From [3], the consensus seems to be on building a Ruby on Rails ActiveRecord
> Migrations[4] like framework, which will essentially emit python code after
> inspecting user models and current state of the database.

Check the edit dates on that wiki -- most of the content on that page is 
historical, reflecting discussions that were happening over 3 years ago. There 
have been many more recent discussions.

The "current consensus" (at least, the consensus of what the core team is 
likely to accept) is better reflected by the GSoC project that was accepted, 
but not completed last year. I posted to Django-developers about this a week or 
so ago [1]; there were some follow up conversations in that thread, too [2].

[1] http://groups.google.com/group/django-developers/msg/cf379a4f353a37f8
[2] http://groups.google.com/group/django-developers/msg/2f287e5e3dc9f459

> The python code
> generated will then be fed to a 'migrations API' that will actually handle the
> task of migration. This is the approach followed by South (as opposed to
> Nashvegas's approach of generating raw SQL migration files). This ensures
> modularity, one of the trademarks of Django.

I don't think you're going to be able to ignore raw SQL migrations quite that 
easily. Just like the ORM isn't able to express every query, there will be 
migrations that you can't express in any schema migration abstraction. Raw SQL 
migrations will always need to be an option (even if they're feature limited).

> Third party developers can create
> their own inspection and ORM versioning tools, provided the inspection tool
> emits python code conforming to our new migrations API.
> 
> To sum up, the complete migrations framework will need, at the highest level:
> 1. A migrations API that accepts python code and actually performs the
>    migrations.

This is certainly needed. I'm a little concerned by your phrasing of an "API 
that accepts python code", though. An API is something that Python code can 
invoke, not the other way around. We're looking for 
django.db.backends.migration as an analog of django.db.backends.creation, not a 
code consuming utility library.

> 2. An inspection tool that generates the appropriate python code after
>    inspecting models and current state of database.

The current consensus is that this shouldn't be Django's domain -- at least, 
not in the first instance. It might be appropriate to expose an API to extract 
the current model state in a Pythonic form, but a fully-fledged, user 
accessible "tool".

> 3. A versioning tool to keep track of migrations. This will allow 'backward'
>    migrations.

If backward migrations is the only reason to have a versioning tool, then I'd 
argue you don't need versioning.

However, that's not the only reason to have versioning, is it :-)

> South's syncdb:
> class Command(NoArgsCommand):
>     def handle_noargs(self, migrate_all=False, **options):

As a guide for the future -- large wads of code like this aren't very 
compelling as part of a proposal unless you're trying to demonstrate something 
specific. In this case, you're just duplicating some of South's internals -- 
"I'm going to take South's lead" is all you really needed to say.

> If migrations become a core part of Django, every user app will have a
> migration folder(module) under it, created at the time of issuing
> django-admin.py startapp. Thus by modifying the startapp command to create a
> migrations module for every app it creates, we will be able to use South's
> syncdb code as is and will also save the user from issuing
> schemamigration --initial for all his/her apps.
> 
> Now that we have a guaranteed migrations history for every user app, migrate
> command will also be more or less a copy of South's migrate command.

What does this "history" look like? Are migrations named? Are they dated? 
Numbered? How do you handle dependencies? Ordering? Collisions between parallel 
development? 

*This* is the sort of thing a proposal should be elaborating. 
> 
> As much as I would have liked to use Django creation API's code for creating
> and destroying models, we cannot. The reason for this is Django's creation API
> uses its inspection tools to generate *SQL* which is then directly fed to
> cursor.execute. What we need is a migrations API which gobbles up *python*
> code generated by the inspection tool. Moreover deprecating/removing Django's
> creation API to use the new migrations API everywhere will give rise to
> performance issues since time will be wasted in generating python code and 
> then
> converting python to SQL for Django's core apps which will never have
> migrations anyways.

This sounds like a false economy to me. If we're talking about the core 
pipeline for handling a HTTP request, then every method call and abstraction 
counts. However, that's not what we're talking about. We're talking about 
utilities used to synchronize the database. They're called by manual 
invocation, infrequently, and *never* as part of the request/response cycle. 

Yes, there will probably be a slowdown -- but we get the benefit of a 
consistent interface to database creation. However, unless the slowdown to 
syncdb is such that it becomes *seriously* observable -- e.g., turns sycndb 
into a 1 minute operation, rather than a 1 second operation -- then you're 
advocating for duplicating code paths in order to maintain a false economy.

> The creation API and code that depends on it (syncdb, sql, django.test.simple
> and django.contrib.gis.db.backends) will be left as is.
> 
> Therefore much of the code for our new migrations API will come from South.

Again, the code snippet highlights nothing here. Anyone qualified to review 
your proposal is at least familiar with South, so there's no need to give a 
page long example of South's usage unless you're trying to say something 
specific about South's API and usage.

> Schedule and Goal
> ------------------------------------------------------------------------------
> Week 1    : Discussion on API design and overriding django-admin startapp
> Week 2-3  : Developing the base migration API
> Week 4    : Developing migration extensions and overrides for PostgreSQL
> Week 5    : Developing migration extensions and overrides for MySQL
> Week 6    : Developing migration extensions and overrides for SQLite
> Week 7    : Developing the inspection tools
> Week 8    : Developing the ORM versioning tools and glue code
> Week 9-10 : Writing tests/documentaion
> Week 11-12: Buffer weeks for the unexpected, Oracle DB? and
>             djago.contrib.gis.backends?
> 

Week 13 - profit.

Seriously, this is a very unconvincing timetable. What are you basing these 
estimates on?

Some of the things that raise flags for me:

 * What makes you think that MySQL, PostgreSQL and SQLite are all equally 
complex when it comes to migrations? SQLite doesn't let you rename a table. 
Tracking MySQL index changes is non-trivial. 

* On what basis do you assert that "developing inspection tools" -- presumably 
for all three databases covered in weeks 4-6 -- will take 1 week? 

 * If you're not working on tests until week 9-10, how do you plan to establish 
that the work you do in week 1 actually works?

> Note: Work on Oracle and GIS may not be possible as part of GSoC
> 
> I will personally consider my project to be successful if I have created and
> tested at least the base API + PostgreSQL extension and inspection + version
> tools.

If that's the case, then why does your schedule say you're going to complete 
MySQL and SQLite, and possibly Oracle as well? 

I can see that you're obviously enthused by this project, but as it stands, I 
can't say this is a very compelling proposal.

 * It ignores the most recent activity in the area (last year's GSoC, in 
particular)

 * It is extremely light in detail on how some very big details (like your 
"versioning tools" will work)

 * The proposed schedule reads more like a list of things you know you need to 
do, not a detailed work breakdown backed by realistic estimates.

Thanks for taking the time to submit this proposal. I'd encourage you to have a 
second swing at this. Read the recent discussions on the topic; take a look at 
last year's GSoC proposal; and spend some time elaborating on the details that 
I've highlighted.

Yours,
Russ Magee %-)

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to