This post is for Redgate SQL Source Control users who have large amounts of data in source control and experience performance issues as a result.
The objective: source control appropriate data
All database developers should be keeping their database schema in source control. This is the foundation of any fit for purpose change management process, reliable testing or deployment pipeline. In general it is not useful or appropriate to put data in source control.
Let’s consider your ‘contacts’ table for example. This table may be very large, the data is not controlled by the developers and there are likely to be data protection issues if developers can see it. You should not put this in source control.
However, there is some data that developers do maintain and where changes need to be managed.
Now let’s consider your “CountryCodes” table. It’s likely that this table is relatively small and that the data does not change often. It is also likely that this sort of ‘static’, ‘reference’ or ‘lookup’ data is referenced in other places so we have dependencies to manage. When we do need to make a change, for example if we had needed to add Scotland last year or if we need to add Catalonia in the future, we may need to make other associated changes at the same time either to the database or other systems like other databases, or .NET applications.
If we were ever to build a particular version of our database we need to know that we have the right CountryCodes data for that particular revision. Often when I speak to clients they will tell me that their database just won’t work without certain reference data so it is important this is kept in source control too. This way all developers can access and work on the same data from a single source of truth.
Redgate SQL Source Control has a feature to manage this called ‘static data’. While by default the data is not added to source control you are able to mark particular tables as static data tables. The contents of these tables will now be added to your source control repository.
The problem: performance issues with large amounts of static data
However, there can be a problem. The static data feature in SQL Source Control was designed for small tables (a few hundred rows etc). If you put a lot of data into source control (many tables with 1000s of rows across many columns etc) you can get performance issues. The main reason is that comparing large tables of data takes longer than comparing the relatively small amount of code that defines your schema. Also, the underlying SQL Compare engine takes longer to compare scripts files (like you get in source control) than it does to compare SQL Server databases.
However, this potential poor performance is especially exaggerated in the case of SQL Source Control as opposed to SQL Compare/SQL Data Compare. Many people assume that when checking for changes SQL Source Control simply does a diff between your source control repository and your dev database. In fact it is more complicated. In order to work out what changes you made and what changes other people made SQL Source Control is actually doing three separate diffs behind the scenes between a working base source control location, a transient source control location and your development database. More detail are available here. In each case SQL Source Control is using the SQL Data Compare engine to do a line by line comparison of the scripts folders that define your static data tables. If SQL Source Control is running on slow hardware or across unreliable or slow networks the performance problems can be compounded.
This slow performance can be felt every time users open SQL Source Control and want to refresh the uncommitted changes screen or the ‘get latest’ screen, regardless of whether there are any static data changes, because SQL Source Control does not know if there are any changes until it has completed the data diffs.
After so much potential bad news I feel like I should re-assure users that smaller tables are absolutely fine and that many people around the world use this feature with no problems! 🙂
The solution: switch to temporarily ignore static data
There is a way to tell SQL Source Control to temporarily ignore static data. Data that is already in source control will stay there but SQL Source Control will ignore it.
This means that while your team are not making static data changes they do not suffer the static data related performance issues. When static data changes are made the feature can be turned back on temporarily so that users can commit/update to the latest version and then users can turn it off again. This is a good solution for users who require that large quantities of static data are kept in source control but where the data is modified infrequently
- Go to the SQL Source Control config files folder. By default, this is located at %localappdata%\Red Gate\SQL Source Control
- Open RedGate_SQLSourceControl_Engine_EngineOptions.xml in a text editor.
- Inside the
<EngineOptions>
tags, add:<IgnoreDataDirectory>True</IgnoreDataDirectory>
- Save and close the file
And then commit it back to source control.
Now SQL Source Control will start ignoring all your static data so any static data related performance issues should go away.
To stop ignoring static data, for example when you want to commit or retrieve a static data update, either remove the tag or edit it to false.
Since SQL Source Control runs on the developer’s workstation each developer will need to toggle this on/off for themselves. As a team you will also need to ensure you can communicate effectively when changes do occur because while SQL Source Control is ignoring static data users will not be alerted to new static data updates through SQL Source Control.
One final note
If you are experiencing performance issues that are getting worse over time it might be worth reading this post by Art of SQL:
http://artofsql.net/guides/improving-sql-source-control-and-sql-data-compare-performance/
At the time of writing there is a known bug (SC-7647) where performance gets worse over time when using static data. In this post Art of SQL documents a relatively straightforward work-around until the bug has been fixed.
1 comment for “Managing performance when working with large amounts of static data with Redgate SQL Source Control”