DISQUS

Adam @ Heroku: Database Versioning

  • jollyjerry · 9 months ago
    I like your idea of a schema.yml for defining the actual structure of
    the database. To make development easier, we could have a task 'rake
    db:sync' that is run after fetching the latest version of schema.yml
    from version control. This task would examine the schema.yml diff and
    apply it to the developer's local database. This would also eliminate
    the need for add_column, remove_column, and various change_column
    methods.

    I would rename the second part of this puzzle to 'data migrations'. A
    common pattern that I've noticed when running data migrations is:

    * do some data manipulation before the db schema changes.
    * change the db schema.
    * do some data manipulation after the db schema changes.

    Building upon the syntax you proposed, my ideal data migration would
    look something like:


    migrate :from => '123abc', :to => '456def' do |db|
    before_migrate do |old_schema|
    # optional massaging of data while schema is '123abc'
    end

    after_migrate |new_schema|
    # manipulation after schema is '456def'
    end
    end


    When the db:sync task is run, each schema migration would look for an
    optional corresponding data migration. If a data migration is found,
    call the before_migrate and after_migrate callbacks before and after
    the schema migration is run respectively.

    Here's an example flow:

    * My working directory is on schema revision 123abc.
    * I update from source control; The new incoming schema revision is
    456def.
    * I run rake db:sync
    * this looks at my current schema revision (123abc) and my
    destination revision (456def) and finds there is 1 data migration to be
    run (:from => '123abc', :to '456def').
    * we call the before_migrate data migration callback, update the
    db schema, and then call the after_migrate data migration callback.
  • misfo · 9 months ago
    I like this idea, but I'm wondering something: are there problems with migrating a schema just by looking at the diffs between two schema files? I would think that the DataMapper's column definitions in the model would have obviated schema migrations altogether if it was possible to reliably migrate a schema just by looking at the difference between the current schema and the desired one...
  • jollyjerry · 9 months ago
    I hand-waved that part. I don't think it's possible to get the difference through a code diff, but I do think it can be done. One possible solution is to have dry-run the current and incoming schema files into an in memory Schema objects. The Schema object would know how to diff themselves.


    current_schema = Schema.parse("old_schema.yml")
    incoming_schema = Schema.parse("new_schema.yml")

    Schema.differences_between(current_schema, incoming_schema) do |diff|
    # diff would know if it's a add_column, remove_column, or change_column
    # and use your db credentials to make the changes.
    diff.apply
    end


    Slightly less handwavy, but still pseudocode :)
  • Erik · 9 months ago
    I don't think this can handle the example Adam used, where you need to "message" the data *between* manipulations in a single migration.

    In his example, you need to split the name AFTER adding the first_name and last_name columns, but BEFORE you delete the name column. I don't see how you could do that without mingling your schema changes and your data changes together.
  • Erik · 9 months ago
    sorry, "massage" not "message". :)
  • Alan Pinstein · 9 months ago
    Good ideas. I am using PHP's propel ORM which has no migrations support. What you describe is effectively what I've been doing manually for some time:

    1. Maintain canonical DB schema in a single place (for me, an actual create_schema.sql file).
    2. Have a script for use in development that rebuilds the DB from scratch from the canonical schema. I run this all the time in development; this means I am always working off of the canonical DB schema.
    3. Manage a per-production-update "update.sql" script that I run ONCE on production data to update it from current release to "new" schema. Sometimes if I need programmatic migration I'll write a script to do that.

    As I've been envisioning what a "structured" db management system would look like, it's a lot like the one you describe.
  • dreur · 9 months ago
    Or replace Propel by PhpDoctrine ;)
  • technoweenie · 9 months ago
    I'm not sure I agree with most of your points. First, migrations were never meant to access your application models. It is too easy to run into migrations that access old models and whatnot. If you did need to access the model, you'd create a new ActiveRecord class inside the migration class and sidestep those problems. Sure, ActiveRecord would benefit greatly for a great raw SQL interface like SQL... but that's a whole other discussion (curse you Twitter, for keeping Nick Kallen too busy to work on ActiveRelation). Also, why would you even run hundreds of migrations when bootstrapping an app? I always run db:schema:load, and a custom task to import essential data.

    I'm not really sure that generating a schema yml is any better than generating a ruby file. I've found that the problems with schema differences between commits has to do with git branches, and differing column orders in everyone's development databases. Also, git branching seems to be the cause of consistency issues. Someone adds a field in a branch, commits. Then goes to master and does something and inadvertently pushes their modified schema to the master branch. It happens quite often, and is a good argument against my practice of running `rake db:schema:load`. We may need to be able to branch databases as well. Branch, make your updates, merge to master, and run the migrations fresh against the master dev database.

    I'm curious how you'd propose to handle the case where a migration in master and one in a feature branch that's ready for merging both attempt to migrate from the same version? This was the problem we had with the old counter-based migration numbers that ActiveRecord used to have.
  • agile · 9 months ago
    I branch my db when I branch my code, http://github.com/agile/db_branch/tree/master

    saves me from a lot of that hassle.
  • Kyle Lahnakoski · 9 months ago
    I never had the luxury of BOFD, so I have been doing plenty of migrations. Like you, I have also found that putting the version information into the database helps with ensuring the correct migration script are applied, and the correct client versions are used.

    I have no idea what language your scripts are using: But I find that SQL is perfect for what you are proposing (with a little boilerplate version verification at the beginning

    Maintaining schema.rb separate from the schema in the production database seems like a maintenance nightmare. I much prefer a “scrubbing” migration that can act on the existing database to generate a “fresh” database. Then I extract of schema and data as SQL. The scrubbing migration needs to be maintained, but not to the extent schema.rb needs to: It only needs to remove data from the tables.

    I must agree with technoweenie, the *real* next step in database versioning is being able to merge disparate migrations automatically. This will allow the version control system to deal with separate branches, and possibly rollbacks. With proper version control over a database, we can leverage multiple developers without the pain of migration integration.
  • Giles Bowkett · 9 months ago
    I think the answer is to use Err's auto_migrations plugin, and never use migrations.
  • technoweenie · 9 months ago
    Can you rename a column with auto_migrations? I believe it's effectively the same as Datamapper's auto_upgrade feature.
  • rjspotter · 9 months ago
    Sort of. When you rename a column in your schema.rb auto_migrate drops the old column and creates a new one with the new name and the same column definition.

    --R
  • john watson · 9 months ago
    Have you looked at how liquid base solves this in java? We are using it at work to handle migrations. I don't know a lot about it but I think it has a md5 checksum piece to know if some changes have been applied or not. Maybe this tool could inspire something new in ruby?
  • john watson · 9 months ago
    Sorry the tool is called liquibase. Not liquid base.
  • Paul Keeble · 9 months ago
    I think my main fear is one of not running the same thing in development that would run in production. If you create a clean schema in Development then when you come to going into a QA like environment then you could be looking at a number of bugs, simply because you never run the upgrade process itself. Using the migrations although slightly slower has the advantage of always going through the same reliable process.

    I do think you are onto something however with the ability to generate an intermediate description of databases that can produce a reliable structure difference so that a database can be checked after the event for conformance to some set of rules. The structure is likely automateable but the inserted data has the potential to report failures when everything is fine, but do so only in production like environments (where say a config like setting in the database was changed by a user at some point but the initial default value was inserted by a migration).
  • technoweenie · 9 months ago
    I migrate my databases in development mode incrementally and only ever do a full schema load when bootstrapping new installations. Migrations are tested on every developer's machine and staging before making it to the live server.
  • trevorturk · 9 months ago
    I agree that running migrations on a new app is bad practice. That's actually been a nitpick of mine about Heroku from the get-go. Using rake db:schema:load is the clear winner, and I've actually used application code instead of a rake db:bootstrap task to do any necessary setup. I think that's the best approach, personally, because then all you've got to worry about when installation an application is ensuring that the schema has been loaded and firing up the app.

    As for some of the other problems you've mentioned, I believe there are more low-tech solutions. At my work, we've been using the concept of a DataMigrator class to help with migration problems. In our world, any migration that does data manipulation is put through the DataMigrator. So, for example, you may have a problem with counter_caches, so you'd write a class method DataMigrator.fix_counter_caches and run that in the migration. This method could be tested in the handy dandy test/unit/data_migrator_test.rb file, so you're much less likely to have errors in your data manipulations because... you have tests!

    So, yeah, everyone I know has to fight with migrations from time to time - they're not perfect. I'm not sure that the scenario you're describing is readily attainable, though, but you Heroku guys are a forward-thinking lot :)
  • Carlos Júnior · 9 months ago
    I really like the way that DataMapper do, because it keeps in sync the Model (which is versioned by some SCM) and your database, so if you rollback some version in your SCM and re-migrate, you'll get the database for your current code.
  • marius a. eriksen · 9 months ago
    You could also tie migrations to your VCS — especially if you're using a powerful one like Git. For example, to run a migration, you first set your source code repository version to the one where the migration was created, and then run.

    In reality, that's probably a bit trickier because your code depends not only on other versioned code around it, but also on its surrounding environment. But if you *do* have an environment that's entirely contained within your VCS, you're all set.
  • postscript07 · 9 months ago
    I follow very a similar process to the ideas laid out in the post and those of Adam Pinstein (see above). Principles:

    1. a canonical schema definition for development and bootstrapping (though in fact I keep the schema *in the models* a la datamapper)
    2. forward-only migrations, only for the production database