Migration Bug: Post-Mortem

Bug Description

During deployment the run_migrations_online() was failing due to a missing revision. This was causing the API service to crash every couple of seconds.

Timeline

  1. During development, some revisions were created (1)

  2. The changes were pushed to remote

  3. Some time passed and the changes were deployed on production (by accident)

  4. After aligning with the team on a different solutions, initial revisions (1) were deleted, and new revisions (2) were created. Changes were pushed to remote and deployed.

  5. API service was crashing with a lot of errors

Root Cause Analysis

The cause of the API crashing was the missing revision. The reason of the missing revision was deploying the wrong change, so it was mainly an issue created by a user.

Lessons Learned

First and foremost, if at any point a revision was pushed to the remote repository, it should never be deleted. Secondly, the current approach of upgrading the databases on every deployment is not an optimal solution as it might cause similar problems in the future.

Bugfix

In order to fix the issue, it was necessary to add all the missing revisions (1) and redeploy with these revisions present.

Closure

The severity of this bug was large, yet the cause of it was somewhat banal. This means that there needs to be something that has to change with the current approach of upgrading the databases. As follow-up a link to the documentation of the new implementation will be added here.