When it
comes to database versioning there are quite a few options and tools out there and each
caters for different development and technology scenarios. I have come across
two vastly different strategies in my (short) life:
Main repository strategy
In this
strategy, you set up a reference database on a central server that all
developers can access. This database is widely accepted as the definitive
model, and all changes that have not been committed to this reference database's structure can be assumed as
non-existent or still in development. When changes are made to the solution, a
developer would have to first compare his local database with this reference
database before starting a unit of work. When updates are run on external
servers - whether that is on the QA,
staging or production environments - a compare is done between the database
being updated and the reference database. Most differences identified in this way can be applied by the comparison tool being used but in some situations they will have to be applied by hand.
Benefits
of this strategy:
- At any given moment in time, anyone on the team can find out and apply the database structure as it should be in order not to break the solution.
- There are multiple comparison tools in the market, both open source and commercial.
- A database administrator might have access to copy the reference database in order to obtain a clean slate solution. The major benefit of this is that you wouldn't have to make sure that any look up tables are pre-populated, since the reference database should already carry this information.
Challenges
of this strategy:
- Theoretically two developers might be able to make conflicting changes to the reference database at the same time which in turn will cause a breaking build. If the data layer is accessed through Entity Framework or uses any other technology where a central access point is used, one can circumvent this problem using Team Foundation Server. If an internal rule is instated that all database changes should be committed in a short sprint comprising of the developer first checking out and locking all database access files, committing the database change both in the reference database and in the code, and then checking in the access files' changes; no two developers would have the lock on these files at the same time. This effectively enforces a one-at-a-time access mechanism to the database. The sprint doesn't have to contain the entire solution for the requested change - only the database scaffolding and any breaking code fixes should be included.
- Updates to remote servers would require VPN access or exposure to the Internet and might also be terribly slow. This can be circumvented by either developing a small in-house tool or finding an existing one that lets you save the database structure from the reference database in a file. One can then copy this file to the remote server which also has a copy of the tool installed, and compare against the file instead of a direct database compare.
- Additional data manipulations would have to be communicated either by email or by checking in scripts together with the solution. Issues such as the default value for a non nullable column that is assumed wrong can get quite out of hand. Not to mention when columns should be populated through the transformation of a sub set of other data.
- The version of any given database can only be determined through a comparison. Rolling back to any given version might become a nightmare since there are no clearly defined version intervals. This would make tracking down specific bugs difficult unless a copy of the database on which it occurs can be obtained.
Versioning scripts strategy
The
general gist of this versioning option is that each change to the database is
accompanied by a change script which is then checked in together with the code
of your solution into TFS. The first
step of this strategy is to create a baseline script which would generate your
entire database and populate any look up tables. This baseline can be generated
by hand or by using Microsoft SQL Server
tools. It can also be in one big file or comprise of a file per table or stored
procedure or other objects. The preferred method is to generate all the table
scripts in one file and split out each stored procedure and view into its own
separate file.
This
baseline is then augmented by change scripts that each follows a sequential
number. With each change script any additional data manipulations can be
included such as default values for non nullable
columns, data that could be drawn from other columns, and so forth.
These
change scripts should be accompanied by a table carrying versioning information
on each database. This serves not only the purpose of clearly defining what
version the database is on or what change scripts have not been run yet, but
also allows one to use tools to apply the change scripts automatically.
Whenever a major release number changes, it is advisable to create a new
baseline script in order to make fresh roll-outs easier.
Benefits
of this strategy:
- Simultaneous updates to the database structures are explicitly handled by TFS. If developers try to check in change scripts at the same time, TFS would stop the one that is later by a milli-second. However this would only be the case if an internal rule is instated that change scripts should carry a specific name such as "c[major release].[minor release].[point release].sql".
- A developer can easily create a clean database simply by running the baseline and the change scripts against it. In the same way, a developer can generate a database only up to a certain version in order to recreate a specific bug thrown by solutions that run on an earlier database version.
- There might not be a definitive
database structure stored somewhere, but one can be easily created.
Challenges
of this strategy:
- If a tool isn't used to run the change scripts, it might take some time and training before developers get in the habit of running them manually at frequent intervals.
- When major release numbers change and new baselines are created, a situation might arise where a certain implementation is still on the previous version but needs a bug fix that would instigate a database change. In these situations, one can either generate a change script for the previous version, and augment the baseline of the new version; or one can generate change scripts for both versions. In both options you would have to keep track of which implementations is running which version and apply only the necessary change scripts.
- The sheer number of change
scripts might become a problem on solutions where frequent database changes are
committed if tools aren't used to manage this.
Database versioning might be something that seems unnecessary in a one man development, but the moment multiple developers or multiple deployments are involved, it becomes a critical problem if not managed correctly. Database versioning is something you should at least consider and start to plan for before writing even a line of code on a new project. If your database is not being versioned at the moment the best time to start would've been yesterday.
No comments:
Post a Comment