@_AlexYates_ Hi, Could you give insight how rollback work with Database CI/CD?
— Gourav Sharma (@sharmagourav) October 30, 2017
A thorny topic. Rolling back code is easy. Normally you can just redeploy the old binaries and you’re done. Databases are more difficult – because data. The existence of persistent data has two significant consequences:
- You can’t just redeploy the entire database – you need to create a script to migrate your existing database from the existing (undesirable) state back to the old (desired) state. This probably isn’t an automated process.
- You need to think about new data that has entered the database since the deployment. Do you want to keep it? Do you need to process it to get it into an older format? (E.g. if you need to undo a column split.)
In this post I will discuss four of the most common approaches to rolling back database changes and explain when you may or may not want to use them. I’ll finish by providing some general advice to reduce the complexity of your roll-backs.
1: Restore a backup
This is the most reliable (and possibly the most disruptive) approach. As long as you have a working backup you can always restore it. However, this will involve downtime and you will lose any data that was created/updated after the backup was taken. Some people perform a backup (and a restore to test the backup) as an automated pre-deploy step to minimise any potential data loss should a backup be required. However, for many the overhead makes this impractical as anything other than a last resort.
2: Create roll-back scripts up-front
Many DBAs require that any upgrade script comes with a corresponding roll-back script. This enforces teams to consider potential rollbacks upfront and it often means the roll-back scripts can be tested in advance. If chained together using some form of migration tool it is possible to create an automated way to upgrade or downgrade your database from any version to any version – which can be pretty cool.
However, this also adds significant overhead to your development process. Often writing the roll-back script is harder than writing the upgrade script so developers spend a disproportionate amount of their time working on scripts that are unlikely to ever be used. Also, if the upgrade script doesn’t work as expected you may be unlikely to trust the downgrade script anyway so there is a large group of people who don’t have a lot of faith in this approach.
3: Create roll-back scripts when you need them
Since creating upgrade scripts up front can add significant overhead some people prefer to only create the downgrade scripts when they need them. Often these can be generated for you very quickly by using a comparison tool like Visual Studio schema compare or Redgate SQL Compare.
However, if you have complicated table migrations to consider or you need to handle new data (added/updated after the deployment) as well as old data (added/updated before the deployment) this might be complicated, and it might take you more time. Not a fun task when your boss is standing over your shoulder looking at their watch and complaining that the business is losing a zillion dollars per second. Under pressure people are more likely to make mistakes.
4: Roll forward
Since rolling back database changes is a fundamentally complicated process there is a growing movement of people who believe in rolling forward instead. This means that the team don’t bother trying to get back to the old state. Instead they focus on getting the new state to work. This cuts a large amount of overhead in development, allowing you to become more agile in your development efforts. This makes it popular among the DevOps/CI/CD community.
However, a roll-forward strategy places a lot of faith in the ability of your people to fix issues quickly while under pressure. Sure, there are plenty of benefits, but these should be weighed up against the worst-case scenario of a deployment which goes horribly wrong and is managed poorly. How old is your last (working) backup and how long would it take to restore it?
A roll-forward approach works well when you are good at breaking your work down into small chunks that can be deployed independently, automatically and reliably. If your team are able to split up a big feature into little tasks that are small enough to be written and deployed 10 times per day (for example) then there is a good chance that the failed deployment was only a small change (which is probably easier to fix) and that the team have the ability to develop the fix and deploy it within one tenth of a day.
The best approach?
The annoying thing is that there is no one approach which will be the best in 100% of cases.
One universal truth is that you are only ever as good as your last (working) backup so please ensure you have healthy backup practices in place. While this is probably unlikely to be your primary roll-back strategy it probably will be your last resort so you want to know exactly how painful it would be should things come to that and you want to take proportionate steps to minimise that pain.
Generally, roll-forward is considered the most mature approach and some would call it a “best practice”. However, this is only practical if you have invested in Agile, DevOps and Database Lifecycle Management (DLM) etc. For example, if you have just manually deployed three years’ worth of work in a highly regulated industry and you spot a critical security issue you might not have the time, experience or ability to roll forward within an acceptable timeframe.
- Use source control. Without good source control practices it can be very difficult to know with confidence what the state of the database was last week when everything was working fine. If you don’t know what the database looked like last week, good luck getting back to that working state.
- Deploy small and often. It’s a lot easier to roll-back a small change than a big change. Also, if you deploy often you are likely to be better at it and you are less likely to make mistakes. Especially if you invest in automation.
- Take backups often (and restore them to ensure they work). Remember, in the worst case scenario you are only ever as good as your most recent restore.
- Distinguish between deployments that involve table changes and deployments that don’t. It’s much easier to revert stored procedures and views to an old state than it is to revert tables.
- Use an expand/contract model where you separate the deployment of new tables/columns from the dropping of old tables/columns and maintain backwards compatibility for at least a few versions. This way you might be able to revert to a working state quickly simply by re-deploying the old application code or the database stored procedures etc while giving yourself time to think about repairing new table structures without your boss breathing down your neck. (Just don’t forget to delete the old tables/columns once you are confident the new stuff works. No-one likes to maintain databases with all sorts of legacy columns all over the place that no-one ever deleted. If you aren’t sure about deleting a table/column try renaming it first and see if anyone complains. Remaining to something like “TableName_DeleteOn20171130” should make your intentions pretty clear.)