It’s always the case that my words are my own and do not necessarily reflect those of my company. This post is intended as an honest description of a complicated and difficult project and a celebration of its eventual success.
Also, this post is in no way related to my previous post. In fact, I was planning to publish this one first, but life happens.
Also, I know many of you are keen to hear more about my plans with DLM Consultants. It’s coming, you’ll just need to wait a few extra days. 🙂
Right, let’s get stuck in…
Let’s talk about ‘The Phoenix Project’
If you are involved in IT and you haven’t yet read The Phoenix Project shame on you. Perhaps that’s harsh, but really, you should read it. It’s an easy and funny read about a fictional IT project gone wrong with some very clear and important lessons that you need to learn.
Here’s a summary of the IT project at the centre of the novel:
- According to the VP of marketing the project is genius and it is going to turn the company around and make everyone rich. The IT guys should all take the blame for not being able to deliver it to schedule. Don’t they realise how important it is?
- According to the Ops team the project is sapping resources away from maintaining the stuff that actually makes the company money and the developers keep breaking stuff because they don’t know what they are doing. All this for some unproven marketing idea.
- According to the Dev manager Ops keep getting in the way and marketing keep throwing new requirements at them making the project more complicated and harder to deliver.
- (There’s also an annoying data security character who generally gets in the way – but I didn’t like him much and he isn’t relevant to my metaphor so I’m going to conveniently forget about him. Call it artistic licence.)
Sound familiar? It certainly struck a nerve with me when I read it. Partly because I recognised the project but mostly because I recognised one of the characters – it was me. That book certainly made me think differently about how I approached my role.
Anyway, remember those bullets. They’ll be important later.
The novel opens at a point where the project has pretty much stalled. The company is enormously dysfunctional. Their competitor is able to deliver customer value much more quickly.
The CEO brings in some Yoda type character who gives the one page synopsis of many of the pivotal books about DevOps and the company makes some changes. They start tracking their changes better, they start collaborating across organisational silos, they spin up a much smaller team and task them to start from scratch and aim to deliver a tiny part of the project quickly. Then they work iteratively to gradually add further functionality small and often.
Following this new approach the company is able to deliver a better product, more quickly and for less money. In the end the company succeeds and those who helped bring about the change all get rewarded.
That’s pretty much the gist of it. Go and read the book.
Now let’s talk about Redgate
More specifically, let’s talk about SQL Source Control. And let’s talk about ‘migration scripts’.
Let’s start with some context. SQL Source Control is a tool that scripts out your database in a state-based format and commits the result to your source control repository. (I discuss state- and migrations-based approaches to Database Lifecycle Management (DLM) here. Summary: neither are perfect. Each has a place.) Given that a purely state based approach to DLM has its flaws some clever product manager at Redgate came up with the concept of a feature we called ‘migration scripts’. (Because the term ‘migration’ isn’t overloaded enough! :-P)
I’m not going to get too technical but the principle is that by adding the ability to hard code some of your own migration scripts to handle the tricky parts of deployments that state-based tools don’t deal with very well we would produce a hybrid state- AND migrations-based solution that achieves the best of both worlds. The product managers and marketers claim this would be revolutionary. We would set a new standard for how to implement DLM. People would write books and create cat memes about us. Chuck Norris might acknowledge us – briefly. (Not publicly of course, but we’d know, and that would be enough).
“According to the VP of marketing the project is genius and it is going to turn the company around and make everyone rich.”
We started the project in 2010 and first shipped ‘migration scripts’ in 2011 as part of SQL Source Control v3, back when SVN and TFS were the big guns in the source control arena and Git was still mostly a taboo in the Microsoft community, discussed in darkened rooms by hippy, long-haired developers, spotty graduates and that rebel who had a Linux box under their desk. It was one of many up and coming source control systems on our radar, but not the only one.
We didn’t know that distributed source control systems (DVCSs) would be the future. We didn’t know who would win the battle of the DVCS. Many of us thought it would be Mercurial. A crystal ball would have been nice back then. We didn’t know that GitHub was about to hit the mainstream. Hindsight is a wonderful thing.
There were many problems with our first attempt at ‘migration scripts’ in SQL Source Control. They didn’t work with branching (this didn’t really seem too important for databases in the pre-Git era), our implementation couldn’t handle data changes that didn’t include schema changes and we assumed that all source control commits would have sequential version numbers. We weren’t prepared for those strange Git commit hashes.
We also made some architectural decisions based on some poor assumptions which hurt us later on. Most significantly we asked people to set up a separate repository for their migration scripts rather than saving them in the same directory as your ‘state’ scripts. (We thought users would want this flexibility. They didn’t care.)
@_AlexYates_ do you think there’s a requirement for DBA’s to read this book?
— Kevin Chant (@kevchant) June 28, 2016
Let’s talk about ‘migrations v2’
By 2012 we had worked out that we had a problem, so we started work on ‘migrations v2’.
But first of all, a diversion…
In 2012 a different team at Redgate had started work on a new tool called ‘Deployment Manager’. It was a Release Management tool that was a fork of the Octopus Deploy code-base but with first class support for database deployments. It should have inherited the migrations feature from SQL Source Control. (People created migration scripts in SQL Source Control and those migration scripts would be used by SQL Compare or Deployment Manager when deploying the database.)
But Deployment Manager never (really) supported ‘migration scripts’. The original implementation was going to be dropped and ‘migrations v2 was just a few months away’. (A phrase that was still being uttered three years later. We’ll get to that.) ‘Migrations v1’ and ‘migrations v2’ were incompatible and we didn’t want to waste resources supporting the original implementation that was going to be replaced soon anyway.
I was working as a pre-sales engineer on Deployment Manager at the time. Many of my early blog posts featured it. I was sitting with the development team and feeding them comments, stats and insights from our prospective customers. The biggest feature request that we had was for migration scripts. (By an order of magnitude.)
It didn’t help that while migration scripts were enormously important for anyone wanting to set up release management for SQL Server, and hence for Deployment Manager users, they had a mediocre take-up among SQL Source Control users. Therefore they weren’t an enormous priority for the SQL Source Control team. It’s also important to consider that SQL Source Control was already one of Redgate’s most important products with thousands of users who were paying the licence fees that kept the lights on. Deployment Manager was brand new and had a much smaller user base. And remember, this is before Redgate really started thinking about the bigger DLM picture with the clarity that it does today. The SQL Source Control team’s brief was to solve the source control problem, not to support a full end to end deployment pipeline. And ‘migration scripts’ solved a deployment problem, not a source control problem, not their problem.
“According to the Ops team the project is sapping resources away from maintaining the stuff that actually makes the company money.”
We had silos, but they were based on different products, rather than different functional teams.
Back to migrations v2, which was being developed by one of the two or three teams working on SQL Source Control.
Getting migrations right was a hard problem and the ‘migrations v2’ team soon realised this. The ‘migrations v2’ concept had been designed up front and it was certainly complete. If you want a full description of how it worked I blogged about it a while back.
While the feature worked in theory, our users struggled. They needed a deep understanding of how it worked and they were required to write complicated and unintuitive ‘guard clauses’ to ensure that the migration scripts could run on any branch. Those users often made mistakes. When they did they sometimes got their databases into pretty nasty, broken states that we couldn’t easily help them to fix.
We tried to UX around it or offer decent warnings/error messages, we tried to document around it, I wrote a blog post, but frankly, we were asking too much from our users. ‘Migrations v2’ was complicated. It certainly wasn’t ‘ingeniously simple’.
Ops team: “The developers keep breaking stuff because they don’t know what they are doing.”
VP of marketing: “The IT guys should all take the blame for not being able to deliver it to schedule. Don’t they realise how important it is?”
Given the complexity of the problem and the lack of adoption of what was now being referred to as ‘migrations v1’ (arguably because it was increasingly evident that ‘migrations v1’ was flawed) solving the migrations problem went in and out of fashion. For some it was the next killer feature, for others it was a complicated and expensive feature that was sapping resources away from more achievable improvements. Between 2012 and 2016 several different product managers were responsible for SQL Source Control and each had their own opinions.
I’ll put my hands up… I was always that marketing character. I would regularly wax lyrical about how important it was that we delivered a credible migrations solution with *all the features*. I used to get frustrated whenever a developer told me that they were having problems delivering it, or whenever a product manager decided to put resources on other work. There were certainly some lessons for me in the Phoenix Project.
“According to the Dev manager Ops keep getting in the way and marketing keep throwing new requirements at them making the project more complicated and harder to deliver.”
We created some problems for ourselves. Back in 2012, when ‘v2 was only a few months from completion’ it had felt like a good idea to build the new migrations functionality on a new branch to isolate a complicated and risky refactor from our core product. With every extra month the branches became more and more divergent as the core SQL Source Control team was doing regular maintenance and other feature work while the migrations team tried to tackle the migrations problem. Merging the branches became a hugely expensive and complicated task that several times was pushed back in favour of easier wins or general maintenence, exacerbating the problem.
In 2013, after a decision by the SQL Source Control team to pause development on the ‘migrations v2’ branch, the Deployment Manager team* decided to roll up their sleeves and get it done themselves. (Bickering silos.) After all, ‘migrations v2’ was in many ways more important for Deployment Manager users than it was for people who only used SQL Source Control.
*Actually, in 2013 the Deployment Manager team split into two teams**, one of which was called the Continuous Automation Team (the CAT team… purr). It was the CAT team that did the heroic work. Credit where due.
** Actually the team changes were more complicated than that, the CAT team included some people from the SQL Compare team for example – but one of the tasks of a writer is to simplify complexities that aren’t directly relevant to their core message*** to give their prose more clarity and readability .
*** This isn’t the only over-simplification I have made – let’s move on.
It took a lot of effort but they managed to merge the two branches and get something working. They built a feature flag into various tools that allowed users to enable the ‘migrations v2 BETA’ or stick with the default v1 version that was unsupported in Deployment Manager. Finally there was progress – we were getting somewhere. We were forcing this beast to work with Deployment Manager, warts and all.
The following week the decision was made to kill Deployment Manager.
The frustration, anger and sadness was immense. One or two people resigned. The team had done a wonderful job with the product but, ultimately, the strategy was wrong. We were fighting on two fronts, building a combined release management platform and database deployment engine. Octopus Deploy, our main competitor, with a single focus, was beating us on the Release Management front. Microsoft bought InRelease and ploughed investment into it creating the Release functionality that now exists in VSTS.
No-one else was making a serious go at the database problem but our fix was shackled to a failing release management tool. We should have been focusing on building a stand-alone database deployment engine, and partnering with Release Management platforms like Octopus and Microsoft in order to offer the complete solution. So we pivoted – by sacrificing Deployment Manager. It hurt.
While Deployment Manager didn’t succeed in its own right, it paved the way for our current (successful) DLM strategy, joining up the siloed products (the first way – systems thinking), working with partners like Octopus Deploy and a few new products. The old Deployment Manager team was asked to build DLM Automation and a drift monitoring tool called SQL Lighthouse instead. Later Lighthouse turned into DLM Dashboard, but that’s a different story. Fun fact: DLM Dashboard still contains a whole bunch of old Deployment Manager code that the team was able to re-use.
‘Migrations v2’ had been merged back into SQL Source Control, but it was still an expensive and thorny problem. By now we are approaching 2014. ‘A few months’ has already turned into two years and now we had two poor solutions to the problem, v1 lacked the required features and was architecture such that developing it further was not practical, v2 was unintuitive, buggy and downright dangerous. (After the Deployment Manager bombshell the team were never able to prioritise fixing some bugs that had been discovered since the merge so they’d been left to rot. We discovered later that we were silently eating our own error logs (oops!) making problems very hard to debug).
Bottom line: We realised we were still a long way from solving the problem. After the immediate fire-fighting associated with dropping Deployment Manager we felt like we’d taken a 6 month step backwards. Our most loyal users, who had been using Deployment Manager and trying out the migrations v2 BETA for us, were getting impatient and frustrated with us.
For the next year or so a lot of people debated. Call me that awful marketing person from The Phoenix Project again but I was still banging my drum that solving the migrations problem was fundamental to the Redgate DLM story. (Maybe it was my war-wounds following those rose-tinted Deployment Manager days talking?) Other people argued it was too big, complicated, expensive and risky. They probably had a point.
It wasn’t our greatest hour.
Let’s talk about small autonomous teams
The feedback kept coming in that solving the migrations issue was critical if our customers were going to adopt SQL Source Control, DLM Automation and the SQL Toolbelt for DLM. It was also getting embarrassing that the migrations v2 BETA had been out for so long and had basically stalled. But ‘migrations v2’ was so complicated, so buggy and so unusable that we weren’t prepared to remove the BETA label and trying to fix these issues just seemed too complicated.
The problem had become painful enough that the business was prepared to invest to make it go away. They decided to throw a full sized dev team purely at the migrations problem with the brief ‘just fix it’.
However, a couple of developers intervened. They could see the requirements were sprawling and that not many people had a very clear idea exactly what was required. They could see that throwing resources at the problem wouldn’t help – we’d just end up with an even bigger and thornier beast. They persuaded the decision makers to try something different.
The two developers were given just a few weeks to take a step back and take a fresh look at the problem. They were challenged to come up with a new concept and to deliver the smallest slither of functionality in a few weeks. Both developers had had some experience with both v1 and v2 and were able to learn from the lessons of each. By now Redgate had also standardised to using GitHub for source control so the developers had much more experience using distributed source control systems too, which helped.
They didn’t start cutting code right away. They talked a lot, testing out different ideas with different stakeholders. There were certainly trade-offs to be considered and they were building up a clearer idea of the priorities. At times they got frustrated at how long it felt they were spending without writing any code, but it turns out the fastest way to prototype different solutions with stakeholders and users actually isn’t to waste time writing any code at all. (The second way – amplify feedback loops.)
After a few sprints they had a hacky command line implementation of a single use-case for a single feature. They demoed it to a few of our most battle-worn users and to a bunch of internal stake-holders, myself included. They seemed to be on to something. They had a viable plan, and by being focussed and making conscious decisions to avoid feature creep they were able to prototype the concept and get some real world feedback from users.
We gave them another few weeks and we challenged them to implement the next slice. It worked. They iterated a few times. They changed the design quite a lot in the early days as they were learning and figuring out what the eventual solution should look like. They re-wrote the entire thing from scratch more than once, but each time they showed their work to our users and the internal stake-holders we were building more confidence that we might be getting close to a viable solution. (The third way – culture of continual experimentation and learning).
Eventually we started making real progress and we had enough confidence to start investing more. We added a UX expert to the team and more developers to help turn the thing from a hacky thought-exercise to a polished product. We invited more users to try it. We fixed a lot of bugs. We kept seeking more feedback from users and stakeholders.
We didn’t make the mistake of building this on a different branch, it was shipped to thousands of SQL Source Control users every week on “Release Wednesday”, we just neglected to tell most of them about the secret feature flag that a select few used to enable the new functionality. (Which definitely was not called ‘migrations v3’!)
And a couple of weeks ago we removed the BETA label and we released it as a fully supported feature. I won’t go into detail here but you can read the docs if you are interested in how it all worked.
We wrapped it up with some other bits and pieces, like a brand new UI and the first bits of SQL Server 2016 support (more is coming!), and called it SQL Source Control v5. I can honestly say I have never been more proud (and simultaneously more ashamed) of any release in my five and a half years at Redgate.
Let’s talk about helping your business win
The tagline for The Phoenix Project is “A novel about IT, DevOps and helping your business win”. After all, shouldn’t helping your business win be the entire point of everything you do in your job?
For me, SQL Source Control v5, combined with SQL Compare and DLM Automation, is the most complete hybrid DLM solution for SQL Server available today. SSDT does not allow users to hard code those tricky one-off database refactors that diff engines, including DacFX, can’t handle. (There are various partial fixes, like the migration log and pre/post-deploy scripts, but nothing as powerful as migration scripts. For example, try automating a column split with SSDT). In my opinion open source migrations frameworks like DbUp and Flyway require too much manual scripting to be practical for large projects. ReadyRoll is good, another hybrid DLM solution from Redgate, but it is better suited to people who want to work in a migrations-first approach, as opposed to a state-first approach.
‘Migrations v2’ bore the characteristics I bulleted out at the top of this post. We were having the same pains that were described in familiar detail in the book. We followed a similar (not identical, similar) strategy and yielded a similar result. Both the original developers are now team leads and the product and project managers responsible for the new approach have both moved on to new exciting roles in Redgate. I’m leaving – but that’s unrelated, and I’m pretty excited about that too! (There were plenty of others involved, many were more involved than me, but including everyone would make a boring read. I’m not naming names. You know who you are.)
If you have a project that is getting out of control I’m not saying you should copy us – but I am saying you should at least read the book. Perhaps something will resonate, and perhaps you’ll start thinking about the problem differently, and perhaps that will help. If your project is costing your business as much as ours cost us, isn’t it worth a try?
SQL Source Control v5 is a Phoenix. I gives me an enormous sense of satisfaction (and relief) to see it fly out the door while I’m still here.