Django Backups: dumpdata Versus SQL Dump (PostgreSQL And MySQL)

This article examines the topic of Django project data backups for the use case of restoring the data on a new server for example after a server crash or to move to a different server. When thinking about backing up your Django project data you have two options. You could use Django's own dumpdata and loaddata utilities or you could use a SQL dump of the underlying database and then restore your data from that. A third option might be to save and recover the database files directly but that won't be covered here. In this article I have a look at the first two options and discuss the pros and cons of each. For the SQL dump approach PostgreSQL and MySQL will be used as examples.

First let me start off with an easy recommendation. Nothing is keeping you from using both approaches simultaneously just to be safe. Dumping the data to SQL or Django's own serialization format is the easy part of the backup process. The harder part is restoring the project data on a new server. And it doesn't hurt to have a second option available just in case the first option unexpectedly fails or runs into difficulties for one reason or another. So I would recommend to regularly backup your data both as a SQL dump and as a serialization dump via dumpdata.

Using SQL dump

That being said, my favorite approach when it comes to backing up and restoring a whole project is the SQL dump approach. And that's because this approach can afford to be dumb about Django's ORM and the way it works internally. That makes it a simple approach as long as the whole database is cloned. And it avoids some of the possible complications with dumpdata that we will discuss later. There are only two preparations that need to be made before restoring the project data from a SQL dump:

The code base on the target server must be exactly the same as on the server that is being backed up. For that purpose it makes sense to mark the backups with the git revision number or the revision number of any other VCS you are using. This way you will know which code base to install before the database restore is run.
A blank database needs to be set up with the access credentials specified in the Django project before the data are restored from the SQL dump.

So let's see how that works in practice.

PostgreSQL

All commands and scripts must be run by the postgres user.

Step 1: Backup project data.

pg_dump dbname > dump.sql

The command (or script) must be run by the postgres user. "dbname" must be the database name defined in settings.py. The SQL dump will be in a file called dump.sql, which we will use below to restore the data.

Step 2: Restore the blank project database.

DROP DATABASE IF EXISTS dbname;
CREATE USER dbuser WITH PASSWORD dbpass;
CREATE DATABASE dbname;
GRANT ALL PRIVILEGES ON DATABASE dbname TO dbuser;

(Snippet)

"dbname", "dbuser", and "dbpass" must correspond to the DATABASES values in settings.py. The DROP DATABASE statement is just to make sure that the database gets deleted, if it was created and possibly prepopulated by a setup script. After dropping and re-creating the database you can start from a clean slate.

In order for the script to work, make sure that no other process has a connection to the database. The django webserver should also be stopped for that reason. After that the script can be run as follows.

psql < createdb.sql

Step 3: Restore the backed up project data.

psql dbname < dump.sql

MySQL

Step 1: Backup the database.

# rootpass is the root password. No space between -p and rootpass.
mysqldump -u root -prootpass dbname > dump.sql

Note: If you are worried about putting the mysql root password in the command line, you can set the password in my.cnf and omit the -p parameter from the command above. The mysql docs list the locations that will be searched for my.cnf based on your operating system here.

Step 2: Create a blank project database.

DROP DATABASE IF EXISTS dbname;
CREATE DATABASE dbname;
GRANT ALL ON dbname.* TO 'dbuser'@'localhost' IDENTIFIED BY 'dbpass';

(Snippet)

The same remarks apply as for the corresponding step in PostgreSQL. "dbname", "dbuser", and "dbpass" must correspond to the values in DATABASES in settings.py. All connections to the database and the django webserver should be stopped before running the script.

Step 3: Restore the backed up project data.

mysql -u root -prootpass dbname < dump.sql

SQLite

SQLite is usually not used in production, where it would have to be backed up. So I will not cover that case.

The dumpdata approach

Step 1: Backup the project data.

python manage.py dumpdata > dump.json

This must be run in the project's environment.

Step 2: Create the project database.

This step is equivalent to step 2 in the SQL approach. The database must be created at some point. However if the database was created by a setup script it should theoretically not be necessary to drop and recreate it right before proceeding to step 3. On the other hand, as will be discussed later, you could run into situations where loaddata leads to conflicts and fails to restore the database.

Step3: Restore the backed up project data.

python manage.py loaddata dump.json

Comparison and issues

The SQL approach guarantees that the underlying database is exactly in the same state as it was on the original server. That combined with the fact that the code base is also in the exact same state should guarantee that the new server is a pretty good clone of the original. On the other hand dumpdata and loaddata don't necessarily guarantee that the database on the new server is in the same state as the one that was backed up. This is due to the fact, that the data are not dumped on the database level but on a Django model level. The database is then created from that using loaddata but some things could get lost in that translation. Here is a list of a few things to keep in mind when using dumpdata for backups.

Throughout the Django documentation there are warnings that if you change the default Manager for a Model, it will have an impact on what data will be dumped by dumpdata and subsequently restored via loaddata. If the default manager does not retrieve all data, then those data will be lost and won't be restored on the target server. (Django docs)
A full database backup restore using loaddata can lead to IntegrityError or ProgrammingError failures based on the dumped data and the state of the database. So if you run into these kinds of problems, you have to figure out where it went wrong and try to find a fix. Sometimes it helps to do a reduced dumpdata, for example you could try
```
dumpdata manage.py dumpdata --exclude auth.permission --exclude contenttypes > dump.json
```
If that also doesn't work, you can go down the rabbit hole of trying to figure out what went wrong or you can switch to the SQL dump approach instead.

On the other hand, if you know exactly what you are doing, dumpdata offers greater flexibility. For example it allows you to move your data from one database system to another, or to backup and restore only parts of your data. The SQL approach would be infeasible for these use cases. However for a full database backup dumpdata could turn out to be the more complex option.