Developing databases: shared vs dedicated?

If your dev team builds C# projects, each developer will be compiling their code locally to test it out before committing any changes to source control. Then the version of the source code that exists in source control will be used whenever the code is deployed to live. This means there is a single source of truth and a complete audit trail of changes. Anything else would seem odd and enormously fragile and inefficient to a C# developer.

But if your dev team builds SQL Server databases there is a good chance you have a shared development server. That’s the equivalent of saving a C# project on a network share and letting everyone hack away at once, cherry-picking the changes that they are ready to commit to source control. To a C# developer that sounds like madness.

Teams often use the shared model because databases are complicated, persistent beasts that are more onerous to maintain. There are also significant issues with the security of data if there is any sensitive data on the dev database. Allowing the developers to store that stuff on their local machine is the sort of thing that data breaches are made of. Also, frankly, databases can use up a lot of the resources of your dev workstation so devs might prefer not to have to install that stuff locally, especially if they don’t make database changes very often.

So should your team use a shared database server or dedicated dev databases for each developer?

Everyday development issues

Working with a shared development database, at first glance, looks a lot simpler. If you only have three database servers (Dev, Test and Prod, for example) then it might feel like your dev process is much simpler. Some people might argue for shared dev databases for reasons like:

  1. As an application developer I don’t need to worry about the database, it’s just there and the DBA looks after it for me.
  2. Connection strings are easier to manage if there are fewer potential endpoints.
  3. If I use a shared database my team’s changes are integrated with other teams’ changes by default, so I’m already doing continuous integration (CI), right?
  4. We can easily manage our changes simply by diffing the Dev and Test/Prod databases.

However, upon greater inspection these arguments do not stand up.

  1. As an application developer you often need to raise tickets for your DBAs if you want them to provision a new database server for you. You might need approval from umpteen managers in different departments. This can cause enormous delays which can result in missing deadlines. You are also reliant on DBAs for stuff like test data sets – more on that later.
  2. If you set the default connection string in your apps to look for the local server you can override it when you deploy to Test/Prod servers etc. This is possible in various ways and isn’t normally too complicated. Presumably you already have a process for this whenever you deploy to Test/Prod anyway?
  3. Arguably yes, your changes are integrated automatically on the shared model. However, when you consider the fact that most of the time you want to release different bits of work at different times the claim about CI is no longer true. In the world of application development people typically use tools like branching or feature flags to manage different streams of work and enable people to release each piece independently. I’ve never seen anyone do that properly with a shared database.
  4. Leading on from point 3, if you want to release one team’s work to production but not another team’s work the process of diffing the Dev and Test database becomes a complicated and error prone one. It’s not simply a case of merging/deploying the appropriate branch. Was this change yours? What about that one? Are you sure you don’t have a dependency on something someone else did? Have you tested your changes in isolation? Probably not. When we deploy these changes to production it will probably be the first time we have ever run the database with this specific set of changes – which doesn’t sound very smart. (“But it worked on my machine!”)

Using a shared dev database can work reasonably well for small teams, but once you start to scale it simply doesn’t work. You run into problems like:

  • Oops, Bob broke the database, now the entre dev team are disrupted because they all depend on it.
  • Oops, Mary forgot to commit her changes to source control, or she added her sql scripts to source control but forgot to run them on dev. Or did she? I’m confused.
  • Oops, I forgot to delete that test code from the dev database last year. Now I’m not sure if we need it or not. And what’s this thing that Jo committed last year? Does she even still work here?
  • Can anyone remember if I should trust the source control version of the code, or the dev database? Or is the production database the master version? Which is the truth?

Frankly, as a development experience, the overhead of running the database locally is almost always worth it to avoid the issues above, which frankly, in 2017, are solved problems.

Provisioning, maintenance and DBAs

Maintaining databases can require a somewhat specialist skillset. That’s why folks who care about the performance, integrity and security of their data will generally want someone in the team with this specialist knowledge. Typically that’s your DBA.

The DBA’s primary responsibility will be to look after the production database. They may also look after your staging and testing databases too. If you have a shared dev database this sometime also falls into their remit. They might ask reasonable questions if you were to suggest moving to dedicated databases:

  1. As a DBA, I understand that maintaining databases is a big task and it makes my life easier to reduce the number of databases I have to support. Frankly, I’m busy with production so supporting dev will never be my priority. Moving to dedicated databases will probably give me a bunch of grief that I don’t have time for.
  2. As a DBA stuff like managing backups and managing the monitoring software is my DNA. How am I supposed to do that in a sensible way with the databases on your dev boxes.
  3. As a DBA I’m not really familiar with dev processes like managing multiple dev streams in source control so I don’t really understand the point in this whole exercise. And frankly, production is on fire so you’ll have to wait.
  4. As a DBA I’m instinctively sceptical and distrustful. I don’t like the idea of letting the devs run databases on their own machines because they’ll screw stuff up and I’ll have to come fix it.

Again, most of these ideas are debunked with one simple rule that everyone needs to buy into:

Dev databases are the dev’s responsibility. They are disposable and can be generated from source control. The devs will skill up enough to handle these basic tasks and the DBAs agree to empower the devs to handle the basic database provisioning and dev functions on their own machines.

If the DBAs are still nervous about this, remind them that if the devs need to start maintaining their own databases, they might learn a thing or two about database administration – and as a consequence they might do dumb stuff with data less often.

Also, the DBA probably does not enjoy the complicated manual deployment process that is so common with shared database development. Moving to the dedicated model should enable developers to manage their changes better, meaning the deployments will become simpler.

Data, testing and security

The remaining concern is data. This is the compromise that everyone has to make. Moving to dedicated databases has enormous benefits, but there are downsides. The biggest of which is test data.

Developers need test data to test their changes properly. The easier it is for them to test stuff, the less likely it is that they’ll send buggy scripts to the DBAs.

The issue is that the data can eat up diskspace and may well be sensitive. Managing sensitive data has always been important from an ethical perspective, but with increasingly stringent regulations like the GDPR coming into force, it’s becomingly more important from a business perspective to manage sensitive data properly too.

I’ve worked with some organisations that moved to dedicated database development and seen real people’s email addresses and marketing data like hobbies and interests on dev workstations. I told them not to tell me if they had even more sensitive information like financial, medical, political, sexual preference but that it was very important to clean up their act.

If you found out your bank, healthcare provider or email provider allowed your data to be stored on dev workstations you would be outraged – and quite rightly so. And I’m sure the regulators would be outraged too.

Whether you have a shared development database or dedicated databases for developers, this data should not exist in the dev domain. One could argue that if the data is on a shared dev server the data is less likely to be accidentally left on a train when a developer forgets their laptop, and if you did have a malicious developer it would be easier to lock down their access should the need arise. However, these arguments are pretty weak. Fundamentally you should have a process for masking sensitive data regardless. If you have a suitable, repeatable process for delivering clean data for development the fact that you are moving to dedicated databases should not cause too much of a problem.

So the remaining question is diskspace. If you need terabytes of data to test efficiently dedicated database development might be problematic. I propose three solutions:

  • Use less data: You probably want enough data on your local machine to test the logic in your sprocs etc, but testing performance etc might be a task for a shared integration or test server. You should also reduce dependencies between your tests and your data. Adopting a unit test framework like tSQLt that supports mocking and faking data would help.
  • Use virtualisation technologies like Delphix or Redgate SQL Clone etc: There are various commercial tools that allow you to generate virtual databases. Typically you will have a master image that all developers can connect to. However, the changes are stored locally. This can give developers the experience of working on their own isolated database without using up anything like as much space on their hard drive.
  • Buy bigger hard drives: Sometimes throwing hardware at a problem is the simplest, quickest and cheapest fix – and there is nothing wrong with that. A few hundred bucks per developer for a larger SSDs might sound expensive if you have a big team, but if you have a big team the cost of the inefficiencies, errors and complications around shared database development will cost you an order of magnitude more.

What does an ideal solution look like?

The golden rule is that source control should be the source of truth. All your databases should be deployed from source control. The dev databases should be a reflection of what’s in source control, whether that’s on a shared server or on the dev’s local workstation.

Making source control the source of truth has various benefits. For example, you can use branches to separate the changes necessary to deploy different features. Of course, everyone has their own opinions about which is the best branching strategy, but regardless of your favourite, branches can help. And branching does not work on a shared database (at least not well). Let me give you an example.

If the Tatooine dev team want to switch to the branch where they are building that complicated new feature, what are the Alderaan dev team supposed to do when their dev database switched to a different schema under their feet? Team Alderaan and team Tatooine need their own databases. In a lot of places I’ve worked that involves sending a ticket to the DBA to provision a new instance (as well as various managers to discuss licencing, strategy, the meaning of “agile”, the next most important planet in the Star Wars universe, (Hoth? Degobah? Coruscant?), various other things that I can’t remember because I was asleep and, finally, waiting 6 months).

Once the team has finally finished the Naboo feature (I know right? Ruddy millennials!), merging changes back can become a painful process too. Changes and features got backed up and merged in one big go after 12 months of dev work. (Ouch.) If a single reader has ever done a merge like that in a painless way I invite you to leave a comment.

Alternatively, if you had been using the dedicated model branching becomes simple and by now we’d have used up all the Star Wars, Star Trek, Battlestar Galactica, Babylon 5 and Thunderbirds references and we’d be on to Space 1999 or some other 70s garbage.

Once you have started using your dedicated databases I would advise you re-purpose your old shared dev database. Re-name it to your “integration” database. Let all developers know that every time anyone commits to master the “integration” database will be blown away and re-deployed from source control. At the same time revoke all the developer’s permissions to make schema changes.

Now the “integration” database is your canary. If a developer commits a bad SQL script the canary database will blow up. An appropriate notification should be fired off and now the top priority is fixing the source code. Since whatever you are doing holds no value if the source code doesn’t even compile. If you do this, you will be able to legitimately claim that your shared dev (integration) database demonstrates continuous integration.

This will encourage good practices, enforce legal SQL syntax, and ensure that the source code truly is the sole source of truth. This is a great foundation, enabling you to start thinking about automated deployments to other environments (Test, Prod etc) and a proper automated database testing strategy.

If you were to move to dedicated databases and a sensible source control/branching strategy you would see:

  • A significant reduction in the amount of garbage test code and partially finished work on your dev server.
  • You would be much more confident that the code on your dev server matches the code on the production server because both were deployed from the same source.
  • Your dev team will no longer be bottlenecked by tickets submitted to the DBA team and the DBA team would get bothered less by devs about dev DB issues.
  • It will become much easier to manage changes from different development streams.
  • Your deployments would become much simpler because it will be easier to call out which changes need to be deployed. And hence your production deployments will become simpler and more reliable.
  • You’ll be able to scale your database development and velocity much more easily.
  • It will become more important to manage test data properly, but that’s always been important.
  • You will be conforming with industry standard approaches for managing source code.
  • Developer’s may become more restricted on the quantity of data in their dev database. This is a pain but should encourage better development and testing practices. On balance, it’s a small price to pay given the benefits.

Leave a Reply

Your email address will not be published.