tl;dr
Some argue that thinking about databases purely in terms of migrations provides repeatability and reliable deployments. Others argue that working declaratively, treating the database state as code, is a better development experience providing immutability, reliable testing and greater productivity. Most people have a strong opinion – not enough people discuss the relative pros and cons in a balanced way. This makes it hard for people who need to make rational choices for real projects to decide which will suit them.
This post aims to address that. Comments welcome.
The most significant choice any team must make when choosing a strategy for delivering their database:
Migrations or state?
I’m taking it as a given that using a standardised process for source control is a good thing. This will make your lives as database developers much easier and it will help you work more closely with your DBAs/ops teams as it will give you more control over your deployments. In order to do this you do need to pick either the state or migrations format to manage your database scripts. This means that whatever option you choose you will have to live with its flaws and find ways to manage them.
This post follows on from my article in NDC magazine (p16) and covers some of the same ground as my “Elephant in the Room” talk. It describes the relative pros and cons of each approach, including a technical example, and suggests which might be better suited to your projects – because often projects are better suited to one approach or the other.
Defining the problem and the two solutions
You need to update your database schema without changing/deleting the data. This means, unlike with application code, you can’t simply delete the old database and create a new one every time you deploy your changes. As well as defining how the database should look, you need to write an upgrade script.
The two approaches, at their root, are about which is your source of truth, the definition of how you want the database to look, or the scripts you produce to handle the upgrade. The former is referred to as the declarative or state-based model. The latter is referred to as the imperative or migrations-based model. There are also a myriad of other names given to the two approaches. In this post I will use the terms ‘state’ and ‘migrations’.
State: Your source of truth is how the database should be
Migrations: Your source of truth is how the database should change
Some would say state is a better development experience, providing immutability and reliable testing. Others would say migrations provides more repeatable and reliable deployments. Most have a strong opinion about which is better and find it hard to understand how the other can ever be reliable or agile. In my opinion there is not enough balanced consideration of their relative strengths. That is what I will aim to provide in this post.
Our scenario
In order to explore the relative strengths of the two approaches, let’s imagine a scenario. I’m going to start with the following table and stored procedure:
-- table_foo.sql CREATE TABLE table_foo (col1 NVARCHAR(max), col2 NVARCHAR(max)) -- myproc.sql CREATE PROCEDURE myproc AS SELECT col1 FROM table_foo
And then I have two tasks to perform in order to implement two separate features. These tasks are given to different developers:
- Alter the stored procedure so that it selects data from both columns
- Rename table_foo to table_bar
Now let’s discuss everything that is about to go wrong. Let’s start by trying a migrations approach.
The first developer is going to write some code to define the change they need to make. It will probably look a little bit like this:
ALTER PROCEDURE myproc AS BEGIN SELECT col1, col2 FROM table_foo END
All good.
Now comes the second developer, tasked with renaming the table. It is not hard to imagine the second developer writing some code that looks like this in order to handle the dependency as well as the table rename:
EXEC sp_rename 'table_foo', 'table_bar‘ ALTER PROCEDURE myproc AS BEGIN SELECT col1 FROM table_bar END
Of course, this second developer should have checked that they were working off the most recent version of the database – but they didn’t.
The problem here is that the second developer just overwrote the first developer’s changes – the stored procedure no longer selects col2. With ALTER statements the last one to run always wins. This means that with a migrations-based strategy, team discipline is exceptionally important and managing the order of upgrade scripts is a very important task.
Now imagine you are using branches. Imagine the merge conflict as you have to painstakingly go through every script, line by line, to work out the order that they need to run in or if there are conflicts.
So clearly the migrations approach is hard and we should use state instead, right? Well let’s consider how a state-based approach would handle this scenario:
First, the stored procedure. Instead of writing an ALTER statement the developer will write a new definition of the stored procedure as a CREATE statement, something a bit like this:
-- myproc.sql CREATE PROCEDURE myproc AS SELECT col1, col2 FROM table_foo
And the second developer will do the same thing, something a bit like this:
-- myproc.sql CREATE PROCEDURE myproc AS SELECT col1 FROM table_bar
And we still have a conflict, but this time simple diff tools can help us to spot and resolve it:
So in the state model it is easier to spot and handle conflicts. Great. But what about that table rename?
When using a state-based solution you will most often be using a diff tool like those provided by Redgate or Visual Studio to examine the differences and generate an upgrade script. While this is a very efficient solution for most changes, with table renames and a few other types of table refactoring they can do bad things, a bit like this:
DROP TABLE table_foo CREATE TABLE table_bar (col1 NVARCHAR(max), col2 NVARCHAR(max)) ALTER PROCEDURE myproc AS SELECT col1, col2 FROM table_bar
In this scenario, when doing a deployment, a diff tool is likely to see that there is a table in the production database called foo, which should be deleted, and a table in the source called bar, which should be created. It does not know the context of the change.
This is an area where a purely migrations-based approach is strong and a purely state-based approach is weak since developers can write scripts to define precisely how a deployment should be handled.
And here’s another problem: Drift
So we have already seen that migrations- and state-based approaches are strong and weak in different areas. Let’s consider a problem that both suffer with, but in different ways.
Do you work with that frustrating person who is always making ‘hot-fix’ changes to QA or even production databases? Perhaps it is you, because the cost of delaying the fix outweighs the cost of the risk? Either way, let’s just assume that some change has happened on the production environment and was not made in the development environment.
What would a state-based approach do? It would roll back the fix. It would make the target database look like the version in development that did not have the fix. Oops.
What would a migrations-based approach do? It would ignore the hot-fix. It is possible the deployment would fail, or (in my opinion worse) it would pass. This is bad because now you are running a version of the database in production that has never been tested. You could very easily have integration issues. What is even more damaging is that, unless you have some sort of check in place, you will never find out, until you have a problem.
So both state and migrations fall short here. The only solution is to make efforts to eradicate drift and implement monitoring systems to alert you whenever drift does occur so that it can be dealt with. Redgate offers a free tool that can help with this: DLM Dashboard
So where does that leave us?
For state-based solutions you get the benefits of a much easier development workflow. It enables larger teams to work together on more complicated databases with more complicated dependency webs to unpick. This lends itself to those monolithic legacy systems that so many of us are lumbered with. For similar reasons it also lends itself to fast development, perhaps in the early phase of a project, where breaking changes aren’t so important and the database is frequently being updated.
The caveat is that you need to really understand your deployment tooling and you probably need a way to test and review your upgrade scripts. You probably want to have a staging environment where you can give all deployment scripts a dry run. You also need to have a plan for what to do when an upgrade script generated by your software is not sufficient.
For migrations-based solutions you are giving control of the deployment process to your developers. This gives them great power and great responsibility. It works really well for databases that are simple data stores where most of the changes are table refactorings or data migrations. However it is a very inefficient way to work with stored procedures and functions etc, where there are no data migrations to worry about. It also becomes quite painful quite quickly when there are frequent database changes because the task of managing the order of scripts and reviewing all the code from vX to vY becomes a big burden. In order to manage this effectively you need to enforce rock solid processes to ensure you avoid conflicts.
In both cases you need to think hard about drift and be aware of the consequences of failing to deal with it appropriately.
It is worth knowing that many of the tools you are likely to use to support you offer features that patch around the problems with their standard approach. For example, among state-based tools Visual Studio SSDT lets you use pre- and post-deployment scripts and Redgate has a ‘migrations scripts’ feature, both allowing users to hard code certain parts of the deployment process. Of the many migrations-based tools ReadyRoll has an interesting feature where you can effectively use a purely migrations approach for your tables at the same time as using a purely state approach for your functions etc. These sort of hybrid solutions do solve some of the problems, but they also bring extra complexity and will normally live mostly in one camp with only a few token gestures at the other.
So what should I use for my project?
In summary, if you are a large development team, making frequent changes, on a database that contains many stored procedures and which has a complicated dependency network, you will probably find a state based solution more successful.
If you are a small team that does not frequently edit your database, which you mostly use as a simple data dump, and as a result most of the changes are table refactorings or data migrations, you are likely to have more success with a migrations-based solution.
Of course, the chances are you won’t fit neatly into one category or the other so you’ll have to make a decision based on what mostly fits your needs and what your team feels most comfortable with. And by ‘team’ I include all the developers, testers and DBAs, as well as anyone else who plays a role delivering your database.
And of course, you may find that throughout the lifecycle of your database you decide to switch from one approach to the other. For example, when working on a greenfield project that has not yet been released, the experience of working in a state-based environment is really nice. You can work quickly making many changes and refactorings as you try different designs – and deployments are no concern just yet. Later on, when you have an established product with paying customers and you make fewer changes to the database you may find the migrations approach is more suitable. And after a while, when that exciting new tool has turned into a monolithic, complex, legacy nightmare where changing something over here breaks something over there, you might find the state-based solution becomes attractive once more.
I guess, if you take one thing from this post, please let it be the message that it is a wonderful thing to be open-minded. And to take anyone who tells you there is only one correct way to do something with a pinch of salt.
I wish you every success. Please do let me know what you have tried and what worked well – and not so well.
13 comments for “Critiquing two different approaches to delivering databases: Migrations vs state”