When we and our Thoughtworks colleagues started doing agile projects, we realized
    that we needed to solve the problem of how to evolve a database to support this
    evolution of architecture. We began around 2000 with a project whose database ended up
    with around 600 tables. As we worked on this project we developed techniques that
    allowed to change the schema and migrate existing data comfortably. This allowed our
    database to be completely flexible and evolvable. We described these techniques in an
    earlier version of this article, a description that's inspired other teams and
    toolsets. Since then we've used and further developed these techniques on hundreds of
    projects world-wide, from small teams to large multi-national programs of work. We've
    been long intending to bring this article up to date, and now we've had the chance to
    give it a proper refresh.
Jen implements a new story
To get a feel for how all of this works, lets outline what happens when a
      developer (Jen) writes some code to implement a new user story. The story states
      that the user should be able to see, search, and update the location, batch, and
      serial numbers of a product in inventory. Looking at the database schema, Jen sees
      that currently there's no such fields in the inventory table, just a single
      inventory_code field which is the concatenation of these three fields.
      She has to take this single code and split it into three separate fields:
      location_code, batch_number and
      serial_number.
Here's the steps she'll need to go through:
- Add new columns to the 
inventory table in the existing schema 
- Write a data migration script to split data from the existing
        
inventory_code column updating the location_code,
        batch_number and serial_number columns. 
- Change the application code to use the new columns
 
- Change any database code, such as views, stored procedures and triggers, to use the new columns
 
- Change any indexes based on the 
inventory_code column 
- Commit this database migration script and all the application code changes to
        the version control system
 
To add new columns and migrate the data, Jen writes a SQL migration script that
      she can run against the current schema. This will both change the schema and also
      migrate all the existing data in the inventory.
ALTER TABLE inventory ADD location_code VARCHAR2(6) NULL;
ALTER TABLE inventory ADD batch_number VARCHAR2(6) NULL;
ALTER TABLE inventory ADD serial_number VARCHAR2(10) NULL;
UPDATE inventory SET location_code = SUBSTR(product_inventory_code,1,6);
UPDATE inventory SET batch_number = SUBSTR(product_inventory_code,7,6);
UPDATE inventory SET serial_number = SUBSTR(product_inventory_code,11,10);
DROP INDEX uidx_inventory_code;
CREATE UNIQUE INDEX uidx_inventory_identifier
  ON inventory (location_code,batch_number,serial_number);
ALTER TABLE product_inventory DROP COLUMN inventory_code;
Jen runs this migration script on a local copy of database on her machine. She
      then proceeds to update the application code to use these new columns. As she does
      this she runs the existing test suite against this code to detect any changes in the
      application's behavior. Some tests, those that relied on the combined column need to
      be updated. Some tests may need to be added. Once Jen has done all this, and the
      application has all its tests green on her machine, Jen pushes all her changes to
      the shared project version control repository - which we call the
      mainline. These changes include the migration scripts and the
      application code changes.
If Jen isn't too familiar with making this change, she's fortunate that it's a
      common one to make to a database. So she can look up it the database refactoring book, there's also a summary online.
Once the changes are in mainline they will be picked up by
      the Continuous Integration server. It will run the
      migration scripts on the mainline copy of the database, and then run all the application
      tests. If all is green, this process will be repeated across the whole Deployment Pipeline, including QA and Staging environments. The same code
      finally is run against production, now updating the live database's schema and
      data.
A small user story like this has only a single database migration, larger stories
      are often broken down into several separate migrations for each change to the
      database. Our usual rule is to make each database change as small as possible. The
      smaller it is, the easier it is to get right, and any errors are quick to spot and
      debug. Migrations like this compose easily, so it's best to do many small ones.
Dealing with Change
As agile methods have spread in popularity in the early 2000s, one of their most
      obvious characteristics is their towards change.
      Before they appeared on the scene most
      of the thinking about software process was about understanding requirements
      early, signing off on these requirements, using the requirements as a basis for
      design, signing off on that, and then proceeding with construction. This is a
      plan-driven cycle, often referred to (usually with derision) as the waterfall
      approach
Such approaches look to minimize changes by doing extensive up-front work. Once
      the early work is done, changes cause significant problems. As a result such
      approaches run into trouble if requirements are changing, and requirements churn is
      a big problem for such processes.
Agile processes approach change differently. They seek to embrace change,
      allowing changes to occur even late in a development project. Changes are
      controlled, but the attitude of the process is to enable change as much as possible.
      Partly this is in response to the inherent instability of requirements in many
      projects, partly it is to better support dynamic business environments by helping
      them change with the competitive pressures.
In order to make this work, you need a different attitude to design. Instead of
      thinking of design as a phase, which is mostly completed before you begin
      construction, you look at design as an on-going process that is interleaved with
      construction, testing, and even delivery. This is the contrast between planned and
      evolutionary design.
One of the vital contributions of agile methods is that they have come up with
      practices that allow evolutionary design to work in a controlled manner. So instead
      of the common chaos that often happens when design isn't planned up-front, these
      methods provide techniques to control evolutionary design and make them
      practical.
An important part of this approach is iterative development, where you run the
      entire software life-cycle many times during the life of a project. Agile processes
      run complete life cycles in each iteration, completing the iteration with working,
      tested, integrated code for a small subset of the requirements of the final product.
      These iterations are short, anywhere from a few hours to a couple of weeks,
      more skillful teams use shorter iterations.
While these techniques have grown in use and interest, one of the biggest
      questions is how to make evolutionary design work for databases. For a long time,
      people in the database community
      considered database design as something that absolutely needs up-front planning.
      Changing the database schema late in the development tended to cause wide-spread
      breakages in application software. Furthermore changing a schema after deployment
      resulted in painful data migration problems. 
Over the course of the last decade and half we've been involved in many large
      projects that have used evolutionary database design and made it work. Some projects
      involved more than 100 people in multiple sites world-wide. Others involved over
      half-million lines of code, over 500 tables. Some had multiple versions of the
      application in production and applications that needed 24*7 uptime. During these
      projects we have seen iterations of a month and 1 week duration, shorter iterations
      worked better. The techniques we describe below are the ones that we used to make
      this work.
Since the early days we have tried to spread these techniques over more of our
      projects, gaining more experience from more cases and now all our projects use this
      approach. We've also found inspiration, ideas, and experience from other agile
      practitioners.
Limitations
Before we dive into the techniques, it's important to state that we haven't
        solved all the problems of evolutionary database design.
We have had projects with hundreds of retail stores having their own database,
        all of which need to upgraded together. However we haven't yet explored the
        situation where there is a lot of customization on such a large group of sites.
        An example might be a small business application that allows customization of
        the schema, deployed into thousands of different small companies.
Increasingly we are seeing people use multiple schemas as part of a single
        database environment. We've worked with projects using a handful of schemas like
        this, but haven't yet cranked this up to tens or hundreds of schemas. This is a
        situation that we anticipate having to deal with in the next few years.
We don't consider these problems to be inherently unsolvable. After all when we
        wrote the original version of this article we hadn't solved the problems of 24*7
        uptime or integration databases. We found ways to deal with those, and expect
        we'll push the limits of evolutionary database design further too. But until we
        do, we won't claim we can solve such problems.
The Practices
Our approach to evolutionary database design depends on several important
      practices.
DBAs collaborate closely with developers
One of the tenets of agile methods is that people with different skills and
        backgrounds need to collaborate very closely together. They can't communicate
        mainly through formal meetings and documents. Instead they need to be out talking
        with each other and working with each other all the time.Everybody is affected by
        this: analysts, PMs, domain experts, developers... and DBAs. 
Every task that a developer works on potentially needs a DBA's help. Both the
        developers and the DBA need to consider whether a development task is going to
        make a significant change to the database schema. If so, the developer needs to
        consult with the DBA to decide how to make the change. The developer knows what
        new functionality is needed, and the DBA has a global view of the data in the
        application and other surrounding applications. Many times developers have
        visibility into the application they are working on and not necessarily into all
        the other upstream or downstream dependencies on the schema. Even if it's a single
        database application, there could be dependencies in the database that a
        developer isn't aware of.
At any time a developer can call on the DBA and ask to pair to sort out a database
        change. By pairing, the developer learns about how the database works, and the DBA
        learns the context of the demands on the database. For most changes it's up to the
        developers to call on the DBA if they are concerned about the database impact of
        changes. But DBAs also take initiative. When they see stories that they think are
        likely to have a significant data impact, they can seek out the developers
        involved to talk through the database impact. DBAs can also review the migrations
        as they get committed into version control. While it's annoying to reverse a
        migration, we again gain from each migration being small, which makes it easier to
        reverse.
To make this happen the DBA has to make herself approachable and available. She
        needs to make it easy for a developer to just pop over for a few minutes and ask
        some questions, maybe on the slack channel or hipchat room or whatever
        communication medium the developers are using. When setting up the project space,
        make sure the DBAs and developers sit close to each other so they can easily get
        together. Ensure that DBAs are told about any application design sessions
        so they can pop in easily. In many environments we see people erecting
        barriers between the DBA and application development functions. These barriers
        must come down for an evolutionary database design process to work.
All database artifacts are version controlled with application code
Developers benefit a lot from using version control for all their artifacts:
        application code, unit and functional tests, and other code such as build scripts,
        and puppet or chef scripts used to create an environment.
Similarly all the database artifacts should be in version control, in the same
        repository that's being used by everyone else. The benefits for this are:
- There's only one place to look, making it easier for anyone on the project to
          find things.
 
- Every change to the database is stored, allowing easy audit if any problems
          occur. We can trace every deployment of the database to the exact state of the
          schema and supporting data.
 
- We prevent deployments where the database is out of sync with the
          application, which leads to errors retrieving and updating data.
 
- We can easily create new environments: for development, testing, and indeed
          production. Everything needed to create a running version of the software should
          be in a single repository, so it can be quickly checked out and built.
 
All database changes are migrations
In many organizations we see a process where developers make changes to a
        development database using schema editing tools and ad-hoc SQL for standing data.
        Once they've finished their development task, then DBAs compare the development
        database to the production database and make the corresponding changes to the
        production database when promoting the software to live. But doing this at
        production time is tricky as the context for the changes in development is lost.
        The purpose of the changes needs be understood again, by a different group of
        people.
To avoid this we prefer to capture the change during development, and keep
        the change as a first class artifact that can be tested and deployed to production
        with the same process and control as changes to application code. We do this by
        representing every change to the database as a database migration script
        which is version controlled together with application code changes. These
        migration scripts include:
        schema changes, database code changes, reference data updates, transaction data
        updates, and  fixes to production data problems caused by bugs.
Here is a change adding min_insurance_value and
        max_insurance_value to the equipment_type table, with some
        default values.
ALTER TABLE equipment_type ADD(
  min_insurance_value NUMBER(10,2),
  max_insurance_value NUMBER(10,2)
);
UPDATE equipment_type SET
          min_insurance_value  =  3000,
          max_insurance_value = 10000000;
This change adds some standing data to the location and
        equipment_type tables.
-- Create new warehouse locations #Request 497
INSERT INTO location (location_code, name , location_address_id,
  created_by, created_dt)
VALUES ('PA-PIT-01', 'Pittsburgh Warehouse', 4567,
  'APP_ADMIN' , SYSDATE);
INSERT INTO location (location_code, name , location_address_id,
  created_by, created_dt)
VALUES ('LA-MSY-01', 'New Orleans Warehouse', 7134,
  'APP_ADMIN' , SYSDATE);
-- Create new equipment_type #Request 562
INSERT INTO equipment_type (equipment_type_id, name,
  min_insurance_value, max_insurance_value, created_by, created_dt)
VALUES (seq_equipment_type.nextval, 'Lift Truck',
  40000, 4000000, 'APP_ADMIN', SYSDATE);
With this way of working, we never use schema editing tools such as Navicat, DBArtisan or SQL Developer to alter schemas,
        nor do we ever run ad hoc DDL or DML in order to add standing data or fix problems. Other
        than updates to the database that occur due to the application software, all
        changes are made by migrations.
Defining migrations as sets of SQL commands is part of the story, but in order
        to apply them properly we need some extra things to manage them.
- Each migration needs a unique identification.
          
 
- We need to track which migrations have been applied to the database
          
 
- We need to manage the sequencing constraints between the migrations. In the
          example above, we must apply the 
ALTER TABLE migration first,
          otherwise the second migration will fail to insert the equipment type. 
We handle these demands by giving each migration a sequence number. This acts
        as a unique identifier and ensures we can maintain the order that they're applied
        to the database. When a developer creates a migration she puts the SQL into a text
        file inside a migrations folder within the project's version control repository.
        She looks up the highest currently used number in the migrations folder, and uses
        that number together with a description to name the file. Thus the earlier pair of
        migrations might be called
        0007_add_insurance_value_to_equipment_type.sql and
        0008_data_location_equipment_type. 
To track the application of the migrations to the database we use a changelog
        table. Database migration frameworks typically create this table and automatically
        update it whenever a migration is applied. This way the database can always report
        which migration it is synchronized with. If we don't use such a framework, after
        all they didn't exist when we starting doing this, we automate this with a
        script.
With this numbering scheme in place, we can then track changes as they apply to
        the many databases that we manage.
Some of these data migrations may have to be released more frequently than the
        migrations related to new features, in that scenario, we have found it to be
        useful to have separate migration repository or folder for data related bug fixes.
        
Each of these folders can be tracked separately by the database migration tools such as Flyway, dbdeploy, MyBatis or similar tools,
       with a separate table to store the migration numbers. The property
        flyway.table in Flyway is used to change the name of the table
        where the migration metadata is stored
        
Everybody gets their own database instance
Most development organizations share a single development database, which is
        used by all members of the organization. Perhaps a separate database is used for
        QA or staging, but the notion is to limit how many databases are running live.
        Sharing databases like this is a consequence of database instances being difficult
        to set up and manage, leading organizations to minimize how many there are.
        Controls on who can alter the schema in such situations varies, some places
        require all changes to be made through the DBA team, others allow any developers
        to change the schema of the development database, and the DBAs get involved when
        changes are promoted downstream.
When we started work with agile database projects, we noted that application
        developers usually follow a pattern where they work in a private working copy of
        the code. People learn by trying things out, so in programming terms developers
        experiment with how to implement a certain feature and may make a few attempts
        before picking one. It's important to be able to experiment in private workspace
        and pushing to a shared area when things are more stable. If everyone is working
        in a shared area, then they are constantly interrupting each other with half-done
        changes. Although we favor Continuous Integration, where integrations occur after
        no more than a few hours, the private working copy is still important. Version
        control systems support this work, allowing developers to work independently while
        supporting integrating their work in a mainline copy.
This separate working works with files, but it can also work with databases.
        Each developer gets their own database instance which they can freely modify
        without touching other people's work. When they are ready they can push and share
        their changes, as we'll see in the next section.
These separate databases can either be separate schemas on a shared server or,
        more commonly these days, a separate database running on a developer's laptop or
        workstation. A decade ago, database licensing costs could make individual database
        instances prohibitively expensive - but these days this is rarely the case,
        particularly as open-source databases have grown in popularity. We've found it
        handy to run a database in a virtual machine running on a developer's machine. We
        define the build of the database VM using Vagrant and
        Infrastructure As Code, so the developer doesn't need to know the
        details of setting up the database VM, or have to do it manually.
Many DBAs still see multiple databases as anathema, too difficult to work in
        practice, but we've found that you can easily manage a hundred or so application
        database instances. The vital thing is to have tools to allow you to manipulate
        databases much as you would manipulate files.
<target name="create_schema"
        description="create a schema as defined in the user properties">
    <echo message="Admin UserName: ${admin.username}"/>
    <echo message="Creating Schema: ${db.username}"/>
    <sql password="${admin.password}" userid="${admin.username}"
         url="${db.url}" driver="${db.driver}" classpath="${jdbc.classpath}"
         >
        CREATE USER ${db.username} IDENTIFIED BY ${db.password} DEFAULT TABLESPACE ${db.tablespace};
        GRANT CONNECT,RESOURCE, UNLIMITED TABLESPACE TO ${db.username};
        GRANT CREATE VIEW TO ${db.username};
        ALTER USER ${db.username} DEFAULT ROLE ALL;
    </sql>
</target>
Creation of developer schemas can be automated, using the build script to
        reduce workload on the DBA. This automation can also be restricted just to
        environments in development.
<target name="drop_schema">
    <echo message="Admin UserName: ${admin.username}"/>
    <echo message="Working UserName: ${db.username}"/>
    <sql password="${admin.password}" userid="${admin.username}"
         url="${db.url}" driver="${db.driver}" classpath="${jdbc.classpath}"
         >
        DROP USER ${db.username} CASCADE;
    </sql>
</target>
For example, a developer joins a project, checks out the code base and starts
        to setup her local development environment. She uses the template
        build.properties file and makes changes, such as setting
        db.username to Jen and so forth for the rest of the
        settings. Once these settings are done she can just run ant
        create_schema and get a schema of her own on the team development database
        server or database server on his laptop.
With the schema created, she can then run the database migration script to
        build all the database content to populate her database instance: tables, indexes,
        views, sequences, stored procedures, triggers, synonyms and other database
        specific objects.
Similarly, there are scripts to  delete schemas - either because they are no
        longer needed, or merely because the developer wishes
        to clean up and start again with a fresh schema. Database environments should be
        phoenixes - regularly burnt down and rebuilt at
        will. That way there's less danger of environments accumulating characteristics
        that aren't reproducible, or audited.
This need for a private workspace is true for developers, but also true for
        everyone else on the team. QA staff should create their own databases, so they
        also can work without danger of getting confused by changes outside their
        knowledge. DBAs should be able to experiment with their own database copy as they
        explore modeling options, or performance tuning.
Developers continuously integrate database changes
Although developers can experiment frequently in their own sandbox, it's vital
        to integrate their different changes back together frequently using Continuous Integration (CI). CI involves setting up an
        integration server that automatically builds and tests the mainline software. Our
        rule of thumb is that each developer should integrate into mainline at least
        once a day.
        Many tools exist to help with CI including: GoCD, Snap CI, Jenkins, Bamboo and Travis CI
 Figure 7 shows the flow of how database
        migrations are developed, tested locally, checked
        into  source control, picked up by the CI server and applied to
        the integration database, tested again, and packaged for downstream use.
Let's take an example
1
Jen starts a development that include a database schema change. If the
            change is easy, such as adding a column, Jen decides how to make the change
            directly. If it's complicated she grabs the DBA and talks it over with
            her.
Once she has sorted out the change, she writes the migration.
ALTER TABLE project ADD projecttypeid NUMBER(10) NULL;
ALTER TABLE project ADD (CONSTRAINT fk_project_projecttype
  FOREIGN KEY (projecttypeid)
  REFERENCES projecttype DEFERRABLE INITIALLY DEFERRED);
UPDATE project
      SET projecttypeid = (SELECT projecttypeid
                  FROM projecttype
                  WHERE name='Integration');
Adding a nullable column is a backwards compatible change, so she can
            integrate the change without needing to change any application code.
            However, if it isn't a backwards compatible change, such as splitting a table,
            then Jen would need to modify the application code too.
 
 
2
Once Jen has finished her changes, she is ready to integrate. The first
            step in integration is updating her local copy from
            mainline. These are changes other members of the team have done while
            she's been working on her task. She then checks her changes work with these
            updates by rebuilding the database and running all the tests. 
If she runs into problems, due to the other developers’ changes interfering
            with hers, she needs to fix those problems on her copy. Usually such clashes
            are easy to sort out, but occasionally they are more involved. Often these
            more complex
            conflicts trigger a conversation between Jen and her teammates so they can sort
            out how to resolve overlapping changes.
Once she has her local copy working again, she checks to see if any more
            changes have been pushed to master while she's working, if so she needs to
            repeat the integration with the new changes. Usually, however, it doesn't take
            more than one or two of these cycles before her code is fully integrated with
            mainline.
 
 
3
Jen pushes the change to mainline.
            Since the change is backwards compatible with the existing application code,
            she can integrate the database change before updating the application code to
            use it - a common example of Parallel Change.
 
 
4
The CI server detects the change in mainline and starts a new
            build which contains the database migration.
 
 
5
The CI server uses its own database copy for the build, so applies
            the database migration script to this database to apply the changes in the
            migration. In addition it runs the rest of the build
            steps: compile, unit tests, functional tests etc.
 
 
6
Once the build finishes successfully, the CI server packages the build
            artifacts and publishes them. These build artifacts contain the database
            migration scripts, so that they can be applied to the databases in downstream
            environments, such as a Deployment Pipeline. The build
            artifacts also contain the application code packaged into a jar, war, dll
            etc.
 
 
 
This is exactly the practice of Continuous
        Integration, which is commonly used with application source code
        management. The steps above are just about treating the database code as another
        piece of source code. As such the database code - DDL, DML, Data, views, triggers,
        stored procedures - is kept under configuration management in the same way as the
        source code. Whenever we have a successful build, by packaging the database
        artifacts along with the application artifacts, we have a complete and
        synchronized version history of both application and database.
With application source code, much of the pain of integrating with changes can
        be handled by source code control systems and using various tests in local
        environments. For databases there's a bit more effort involved as there is data
        (state) in the database that needs to preserve its business meaning. (We'll talk
        more about automated database refactorings like this shortly.) In addition the DBA
        needs to look at any database changes and ensure that they fit within the overall
        scheme of the database schema and data architecture. For all this to work smoothly,
        big changes shouldn't come as surprises at integration time - hence the need for
        the DBA to collaborate closely with the developers.
We emphasize integrating frequently because we've found that it's much easier
        to do frequent small integrations rather than infrequent large integrations - a
        case of Frequency Reduces Difficulty. The
        pain of integration increases exponentially with the size of the integration, so
        doing many small changes is much easier in practice, even though it appears
        counter-intuitive to many.
A database consists of schema and data
When we talk about a database here, we mean not just the schema of the database
        and database code, but also a fair amount of data. This data consists of common
        standing data for the application, such as the inevitable list of all the states,
        countries, currencies, address types and various application specific data. We may
        also include some sample test data such as a few sample customers, orders etc. This
        sample data would not make it to production, unless specifically needed for sanity
        testing or semantic monitoring.
This data is there for a number of reasons. The main reason is to enable
        testing. We are great believers in using a large body of automated tests to help
        stabilize the development of an application. Such a body of tests is a common
        approach in agile methods. For these tests to work efficiently, it makes sense to
        work on a database that is seeded with some sample test data, which all tests can
        assume is in place before they run.
This sample data needs to be version controlled, so we know where to look for
        it when we need to populate a new database, and so we have a record of changes
        that's synchronized with the tests and application code. 
As well as helping test the code, this sample test data also allows to test our
        migrations as we alter the schema of the database. By having sample data, we are
        forced to ensure that any schema changes also handle sample data.
In most projects we've seen this sample data be fictional. However in a few
        projects we've seen people use real data for the samples. In these cases this
        data's been extracted from prior legacy systems with automated data conversion
        scripts. Obviously you can't convert all the data right away, as in early
        iterations only a small part of the new database is actually built. But we can use
        Incremental Migration to develop the conversion scripts to
        provide necessary data just in time.
Not just does this help flush out data conversion problems early, it makes it
        much easier for domain experts to work with the growing system as they are
        familiar with the data they are looking at and can often help to identify 
        cases that may cause problems for the database and application design. As a result
        we are now of the view that you should try to introduce real data from the very
        first iteration of your project. We've found Jailer to
        be a useful tool to help with this process.
All database changes are database refactorings
The changes that we make to the database alter the way the database stores
        information, introduces new ways to store information, or removes storage that's
        no longer needed. But none of the database changes, on their own, change the
        overall behavior of the software. Consequently we can see them as fitting the
        definition of a refactoring.
a change made to the internal structure of software to make it easier to
          understand and cheaper to modify without changing its observable behavior
-- Refactoring (chapter 2)
Recognizing this, we collected and documented many
        of these refactorings. By writing such a catalog, we make it easier to
        make these changes correctly since we can follow the steps we've successfully used
        before.
One of the big differences about database refactorings is that they involve
        three different changes that have to be done together
- Changing the database schema
 
- Migrating the data in the database
 
- Changing the database access code
 
Thus whenever we describe a database refactoring, we have to describe all three
        aspects of the change and ensure that all three are applied before we apply any
        other refactorings.
Like code refactoring, database refactorings are very small. The concept of
        chaining together a sequence of very small changes is much the same for databases
        as it is for code. The three dimensional nature of the change makes it all the
        more important to keep to small changes.
Many database refactorings, such as Introduce New
        Column, can be done without having to update all the code that accesses the
        system. If code uses the new schema without being aware of it, the column will
        just go unused. Many changes, however don't have this property, we call these
        destructive changes. Destructive changes need a bit more care, the degree
        of which depends on the degree of destruction involved. 
An example of a minor
        destructive change is Make Column Non Nullable,
        which changes a nullable column to not nullable. This is destructive because if
        any existing code doesn't set it to a value, then we'll get an error. We'll also
        get problems if there are any nulls in the existing data.
We can avoid the problem with existing nulls (at the cost of slightly different
        ones) by assigning default data to any rows that have nulls here. For the problem
        of application code not assigning (or assigning null) we have two options. One is
        to set a default value to the column.
ALTER TABLE customer
  MODIFY last_usage_date DEFAULT sysdate;
UPDATE customer
  SET last_usage_date =
    (SELECT MAX(order_date) FROM order
      WHERE order.customer_id = customer.customer_id)
  WHERE last_usage_date IS NULL;
UPDATE customer
  SET last_usage_date = last_updated_date
  WHERE last_usage_date IS NULL;
ALTER TABLE customer
  MODIFY last_usage_date NOT NULL;
The other way of dealing with a lack of assignment is to change the application
        code as part of the refactoring. This is the option we prefer if we can
        confidently get to all the code that updates the database, which is usually easy
        if the database is only used by a single application, but is hard if it's a
        shared database.
A more complex case is Split Table,
        particularly if the access to the table is spread widely across the application
        code. If this is the case it's important to let everyone know that the change is
        coming up so they can prepare themselves for it. It may also be wise to wait for a
        relatively quiet moment, such as the start of an iteration.
Any destructive change is much easier if database access is all channeled
        through a few modules of the system. That makes it easier to find and update the
        database access code.
The important thing overall is to choose a procedure that's appropriate for the
        kind of change that you're making. If in doubt try to err on the side of making
        changes easier. Our experience is that we got burned much less frequently than
        many people would think, and with a strong configuration control of the entire
        system it's not difficult to revert should the worst happen.
Taking care of database changes including DDL, DML and data migration
        during development, provides the most context for the data team, avoiding batch
        migration of all changes by data team during deployment without
        context.
 
Transition phase
We've already alluded to the difficulties we run into when we get a destructive
          database refactoring and we can't easily change the access code. These problems
          grow horns and big sharp teeth when you have a shared database, which may have
          many applications and reports using it. In this situation, you have to take much
          more care over something like a Rename Table. To
          keep us safe from such horns and teeth, we turn to the transition
          phase.
A transition phase is a period of time when the database supports
          both the old access pattern and the new ones simultaneously. This allows older
          systems time to migrate over to the new structures at their own pace.
ALTER TABLE customer RENAME to client;
CREATE VIEW customer AS
SELECT id, first_name, last_name FROM client;
For the Rename Table example, the developer
          would create a script that renames the table customer to
          client and also creates a view named
          customer that existing applications can use. This Parallel Change supports new and old access. It does add complexity, so
          it's important that it gets removed once downsteam systems have had time to
          migrate. In some organizations this can be done in a couple of months and in some
          other organizations it may take years.
Views are one technique to enable transition phases. We also make use of
          database triggers, which are handy for things like Rename Column
Automate the refactorings
Since refactoring became well known for application code, many languages have
        seen good support for automated refactorings. These simplify and speed up
        refactoring by swiftly carrying out the various steps with no human involved to
        make mistakes. Such automation is also available for databases. Frameworks
        like Liquibase and Active Record Migrations provide a DSL to apply
        database refactorings, allowing a standard way to apply database migrations.
However these kinds of standardized refactorings don't work so well for
        databases since the rules for dealing with data migration and legacy data are very
        dependent on a team's specific context. So we prefer to handle database
        refactoring by writing scripts for migration and focus on tools to automate how
        to apply them.
We write each script, as we've shown so far, by combining SQL DDL (for the
        schema change) and DML (for the data migration) and putting the result in a
        folder in our version-controlled repository. Our automation ensures we never
        apply these changes manually, they are only applied by the automation tooling.
        That way we maintain the ordering of the refactorings and update the database
        metadata.
We can apply the refactorings to any database instance, to bring them up to
        date with the latest master, or to any previous version. The tooling uses the
        metadata information in the database to find out its current version, then applies
        each refactoring between it and the desired version. We can use this approach to
        update development instances, test instances, and production databases.
Updating production databases isn't any different to test databases, we run the same
        set of scripts against different data. We do prefer releasing frequently as that
        keeps the updates small, as that means that the updates occur more quickly and it
        is easier to deal with any problems that come up. The easiest way to do these
        updates is to take the production database down while we apply the updates, this
        works well for most situations. If we have to do them while keeping the
        application live, it is possible, but the techniques we use will need another
        article to explain.
So far, we've found that this technique has worked remarkably well. By breaking
        down all the database changes into a sequence of small, simple changes; we've been
        able to make quite large changes to production data without getting ourselves in
        trouble.
As well as automating the forward changes, you can consider automating reverse
        changes for each refactoring. If you do this you'll be able to back out changes to
        a database in the same automated way. We haven't found this to be cost effective
        and beneficial enough to try all the time, also we've not had much demand for it,
        but it's the same basic principle. On the whole we prefer to write our migrations
        so that the database access section can work with both the old and new version of
        the database. This allows us to update the database to support a future need and
        make it live, have it running in production for a while, and then only push the
        update that uses the new data structures once we've found they have settled down
        without problems.
These days there are many tools that automate applying database migrations,
        including: Flyway, Liquibase, MyBatis migrations,
        DBDeploy. Here's applying a migration with Flyway.
psadalag:flyway-4 $ ./flyway migrate
Flyway 4.0.3 by Boxfuse
Database: jdbc:oracle:thin:@localhost:1521:xe (Oracle 11.2)
Successfully validated 9 migrations (execution time 00:00.021s)
Creating Metadata table: "JEN_DEV"."schema_version"
Current version of schema "JEN_DEV": << Empty Schema >>
Migrating schema "JEN_DEV" to version 0 - base version
Migrating schema "JEN_DEV" to version 1 - asset
Migrating schema "JEN_DEV" to version 2 - asset type
Migrating schema "JEN_DEV" to version 3 - asset parameters
Migrating schema "JEN_DEV" to version 4 - inventory
Migrating schema "JEN_DEV" to version 5 - split inventory
Migrating schema "JEN_DEV" to version 6 - equipment type
Migrating schema "JEN_DEV" to version 7 - add insurance value to equipment type
Migrating schema "JEN_DEV" to version 8 - data location equipment type
Successfully applied 9 migrations to schema "JEN_DEV" (execution time 00:00.394s).
psadalag:flyway-4 $ 
Developers can update their databases on demand
As we explained above, the first step in integrating our changes into mainline
        is pulling any changes that have occurred while we were doing our own bit of work.
        Not just is this essential during the integration step, it is often useful before
        we're done, so we can assess the impact of any changes we've heard our colleagues
        talk about. In both cases, it's important to be able to easily pull changes from
        the mainline and apply them to our local database.
We start this by pulling changes from mainline into our local workspace. Often
        that's pretty simple, but sometimes we'll find that our colleagues have pushed a
        migration into mainline while we've been working. If we have written a migration
        with the sequence number 8, we'll see another migration with that number appear
        in our migrations folder. Running our migration tool should detect this
psadalag:flyway-4 $ ./flyway migrate
Flyway 4.0.3 by Boxfuse
Database: jdbc:oracle:thin:@localhost:1521:xe (Oracle 11.2)
ERROR: Found more than one migration with version 8
Offenders:
-> /Users/psadalag/flyway-4/sql/V8__data_location_equipment_type.sql (SQL)
-> /Users/psadalag/flyway-4/sql/V8__introduce_fuel_type.sql (SQL)
psadalag:flyway-4 $
Once we see we have a clash our first step is simple, we need to renumber our
        migration to 9 so it will apply on top of the new migration on mainline. Once
        we've renumbered we need to test that there aren't any conflicts between the
        migrations. To do this we clean out the database and then apply all the
        migrations, including the new 8 and our renumbered 9 to a
        blank database copy.
psadalag:flyway-4 $ mv sql/V8__introduce_fuel_type.sql sql/V9__introduce_fuel_type.sql
psadalag:flyway-4 $ ./flyway clean
Flyway 4.0.3 by Boxfuse
Database: jdbc:oracle:thin:@localhost:1521:xe (Oracle 11.2)
Successfully cleaned schema "JEN_DEV" (execution time 00:00.031s)
psadalag:flyway-4 $ ./flyway migrate
Flyway 4.0.3 by Boxfuse
Database: jdbc:oracle:thin:@localhost:1521:xe (Oracle 11.2)
Successfully validated 10 migrations (execution time 00:00.013s)
Creating Metadata table: "JEN_DEV"."schema_version"
Current version of schema "JEN_DEV": << Empty Schema >>
Migrating schema "JEN_DEV" to version 0 - base version
Migrating schema "JEN_DEV" to version 1 - asset
Migrating schema "JEN_DEV" to version 2 - asset type
Migrating schema "JEN_DEV" to version 3 - asset parameters
Migrating schema "JEN_DEV" to version 4 - inventory
Migrating schema "JEN_DEV" to version 5 - split inventory
Migrating schema "JEN_DEV" to version 6 - equipment type
Migrating schema "JEN_DEV" to version 7 - add insurance value to equipment type
Migrating schema "JEN_DEV" to version 8 - data location equipment type
Migrating schema "JEN_DEV" to version 9 - introduce fuel type
Successfully applied 10 migrations to schema "JEN_DEV" (execution time 00:00.435s).
psadalag:flyway-4 $
Usually this works just fine, but occasionally we'll see a conflict - perhaps
        the other developers renamed the table that we're making a change to. In which
        case we need to figure out how to resolve the conflict. Here the small size of the
        migrations makes it easier to spot and deal with conflicts.
Finally once the database changes are integrated, we need to rerun the
        application test suite, in case the migration we got from mainline causes any of
        our tests to break.
This procedure allows us to work independently for short periods, even without
        a network connection, and then integrate whenever it suits us. It is entirely up
        to us when and how often we do this integration - as long as we
        ensure we are synchronized before we push to mainline.
Clearly separate all database access code
To understand the consequences of database refactorings, it's important to be
        able to see how the database is used by the application. If SQL is scattered
        willy-nilly around the code base, this is very hard to do. As a result it's
        important to have a clear database access layer to show where the database is
        being used and how. To do this we suggest following one of the data source
        architectural patterns from P ofEAA.
Having a clear database layer has a number of valuable side benefits. It
        minimizes the areas of the system where developers need SQL knowledge to
        manipulate the database, which makes life easier to developers who often are not
        particularly skilled with SQL. For the DBA it provides a clear section of the code
        that he can look at to see how the database is being used. This helps in preparing
        indexes, database optimization, and also looking at the SQL to see how it could be
        reformulated to perform better. This allows the DBA to get a better understanding
        of how the database is used.
Release frequently
When we wrote the original version of this article over a decade ago, there was
        little support for the idea that software should be frequently released to
        production. Since then the rise of the internet giants has shown that a rapid
        sequence of releases is a key part of a successful digital strategy.
With every change captured in a migration, we can easily deploy new changes
        into test and production environments. The kind of evolutionary database design we
        discuss here is both a vital part of enabling frequent releases, and also benefits
        from the learning we get from the feedback of seeing software used for real.