Simple schema migration for Django
Users who downloaded 0.03 - please download 0.04. 0.03 was a corrupt release and I've been without an internet connection to get the fix up for a while!
Why did this project start?
Django is excellent at creating a schema from your models. A simple syncdb command will generate sql and install your schema. Currently, the main Django trunk has no way of migrating a schema to a new structure. So if you want to change your model structure and roll this new structure out to a live site, you have to write sql scripts to change the structure so that it fits your new model.
The schema-evolution branch in Django looks like a great way to generate sql that changes your schema, however this is not what this project attempts to do (currently). This project is a way for developers to roll out pre-prepared schema changes in an automated way. It would match up very well with a project that generated base and update migrations that a developer could then review and add to the migration list for rollout. Automatically applying schema changes (as the original proposal for schema-evolution outlined) is very troublesome, for reasons I ramble on and on about further down the page [2].
How it works.
DbMigration works by specifiying a series of migration scripts at an application level. These scripts will deal with any DDL and data malipulation you need to roll out in an automated way. These sql and/or python scripts are written by the developer who needs to roll out the change.
DbMigration supports migrations written in SQL or Python.
Note : If you use migrations to roll out an application, the default automatic schema creation that syncdb usually does will not run for it. You need to create a migration to set up your table structure. (you can use the output of 'python manage.py sqlall' to do get the sql for it. For an explanation of why this is, please see appendix 1)
Setting up your project
First, unpack dbmigration into the django/contrib directory. Then add the line :
'django.contrib.dbmigration',
to your project's INSTALLED_APPS in settings.py. This must come before any apps that use migrations so that the migration models can be installed before the migrators try to use them. Now apply the management.py.patch to management.py in django/core. This integrates the migration procedure into the syncdb command.
Setting up your application
To add migrations to your application, you need to create a migrations module in your application and set up a default list of migrations.
mkdir migrations touch migrations/__init__.py echo "migration_list = []" > migrations/migrator.py
Examples
Using SQL migrations
Say that your application is called myapp and has a model called WooYay. You have added a new column called my_attribute to the WooYay model. You need to roll out this to your live and dev servers, so you need to create an SQL migration that will add the column definition.
In this example, we're going to call the migration AddMyAttribtute
First, set up the directory to hold the sql migrations, if it does not already exist.
mkdir migrations/sql
Now create the sql file in migrations/sql/AddMyAttribute.sql
ALTER TABLE myapp_wooyay ADD COLUMN my_attribute VARCHAR(255);
Now add a line to the list of migrations in migrator.py
'AddMyAttribute',
That's it! Your migrations will be run when you use syncdb.
Engine specific SQL migrations
Sometimes the sql to alter your schema will be different depending on what database engine you use. In this example, we will say that we're using the postgresql engine and want a postgresql specific sql file for the migration we used above.
First, create the directory for postgresql specific sql
mkdir migrations/sql/postgresql
Now you can create your postgresql specific migration sql in that directory.
Using Python migrations
Sometimes it might be easier to do your migration using Python and the Django models. To define a python migration, simply define a function with the same name as the migration in migrator.py
For example, let's say that your app is called "myapp" and you want to do something special to all User records with an aol email address, but that something special is easier to do in python than it is in sql. Your app structure should look like this :
myapp/
migrations/
__init__.py
migrator.py
The content of migrator.py could be :
migration_list = ['migration1',]
def migration1():
# python code that you need to run to complete the migration goes here
# e.g.
from django.contrib.auth.models import User
users = User.objects.filter(email_address__endswith='@aol.com')
for user in users:
# do something to the user here and save
Pre-Migration Checks
Sometimes you might want to check if a certain condition exists and only run the migration if it does. An example of this might be that you are converting an existing application to use migrations, but your live servers have been installed already without them. You have created an 'init' migration that you do not want to run if the initial schema already exists. You can define a method with the same name as the migration prefixed with pre_ to run as a check before the actual migration runs. Here's how your migrator.py might look in that case :
migration_list = ['init', 'update_1',]
def pre_init():
from django.db import connection
cursor = connection.cursor()
table_exists = # put whatever code here you need to to see if the myapp_mymodel table exists or not
if table_exists:
# the table exists, do not run this migration
return False
# table is not there, run the migration
return True
This example pre_ migration code should find out if the table that corresponds to the model object in this application exists or not. (At the moment you will have to supply this code yourself. The next release will contain helper code to check if tables exist or not.) If it succeeds then the initial structure must already be there, and so it returns False to tell the migrator not to run the init migration. If the table does not exist, we return True, telling the migrator that we do want to run the migration.
If a pre-check returns False, the migration is logged as having been run so the pre-check will not be run again next time you roll out your database.
Pre-checks must not cause any database errors, or else on certain database engines the transaction will abort and following migrations will not run.
pre_ methods can be defined for both SQL and Python migrations.
Downloads
If you've got this far, you must be serious about using an automated, if slightly more demanding, rollout system. And you probably think I've made a modicum of sense. Great! I could do with your help, suggestions and related whatnottery. My code might not be terribly pythonic so please feel free to submit fixes or suggestions for better ways to do things!
These downloads are for the 0.96 release of Django.
dbmigration-0.04.tgz - includes the patch for management.py, a prepatched management.py and the test project used to make sure it's all working ;)
Contact the Author
mike a t mugwuffin d o t com
appendix
1. Why syncdb does not create tables for applications that use migrations
The default syncdb will always create the schema based on the latest version of your model. But when you're rolling out to a live server you need to be able to run the migrations to bring the schema up to date. So you need to be able to tell what state the schema is in and urn the migrations to bring it up to the latest state. The existing syncdb is not capable of this, and so we have to rely on migrations to create the schema from scratch. The sql for the initial migration can be created easily by running python manage.py sqlall [appname]
2. Why can't your code just work out what sql needs to be run and run it?
Because trying to automatically generate sql to migrate a schema from one state to another is a nightmare. Not that Django makes it difficult to do, or that the code in the schema-evolution branches is bad. It's simply because automatically generating SQL to migrate a schema from one state to another is incredibly difficult.
Let's take a simple example. You want to change a numeric column to a character column. Simple? If you're using MySQL, yes. If you're using PosgreSQL 8, possibly. If you're using PostgreSQL 7, no. MySQL and PostgreSQL 8 can use a simple ALTER on the column (unless the column is a foreign key). In PostgreSQL 7 you have to create a new column, copy the data, delete the old column and rename the new column to the old name.
Now let's say you want to alter a column that is referenced in some kind of foreign key relationship. You'd better hope that your foreign keys have been created as deferrable, or else you're going to have to create a whole new table structure to record the existing data, delete everything from the structure you want to alter, then alter the empty tables and copy transformed data back in - because you won't be able to alter the column that the foreign key depends on while there is data still in there.
Now let's say you're using Oracle. Let's say you have triggers installed on inserts or updates to notify people in your organization or to carry out indexing or other system tasks. You might need to turn some of these off during a migration. How can we tell what needs to be turned off and what doesn't? Unfortunately, the answer is: we can't. An automated system simply can't make an informed decision without input from a developer.
Oh, and if you're using a database that supports it, let's say that there's a view that uses the column you're altering. Suddenly, after the automatic evolution, the view breaks. Currently, Django cannot manage views. You've just hit a brick wall.
That simple alteration of a column has suddenly got very complex.
What if you're merging or splitting a column? You need to handle that in code, which the developer responsible for that mirgration must write and test. This isn't possible to do automatically. With this project, the developers resonsible for those transformations write the code and no other developers need to know about it - they just apply the migrations automatically.
Automatically generating basic migration scripts seems like a very good idea though to give the developers a head start in writing the migrations.