Introduction
With the release of PostgreSQL 17, a game-changing feature has arrived: incremental backups. In prior versions, such as PostgreSQL 16 and earlier, users could only perform full backups. This approach, while reliable, had its drawbacks—particularly the time it took to complete these backups and the storage they required.
Previously, users relied on WAL (Write-Ahead Logging) or archive logs as a workaround to achieve incremental-style backups. However, this method comes with a major downside: slow recovery times. During restoration, each archived WAL segment had to be processed in sequence to replay every transaction, which extended the time required for a full database recovery.
PostgreSQL 17’s native support for incremental backups is designed to address these challenges by streamlining backup processes and enhancing recovery speed.
How the incremental backups works
In PostgreSQL 17, incremental backups are designed to back up only the data changed since the last full backup, making backups faster and more efficient. This is achieved using the pg_basebackup tool with the –incremental option, which backs up all non-relation files completely while only capturing modified blocks and metadata required to reconstruct changed database blocks.
How it Works
- PostgreSQL uses WAL summaries, which are stored in the pg_wal/summaries directory, to determine which blocks have changed since the last backup. These summaries cover the Log Sequence Numbers (LSNs) from the start of the previous backup to the start of the current one.
- By analyzing these WAL summaries, PostgreSQL identifies the modified blocks that need to be backed up, ensuring efficient incremental backups without re-backing up unchanged data.
Restoration Requirements
To restore an incremental backup, all previous backups in the sequence are required to reconstruct the database. Each incremental backup builds on its predecessors, allowing PostgreSQL to recreate the database state efficiently by including only the necessary blocks in each backup.
PostgreSQL does not automatically track which backups are required as a foundation for restoring future incremental backups. This means that users must manage the relationships between their full and incremental backups manually, ensuring that earlier backups are retained if they may be needed for restoration.
When to Use Incremental Backups
- Large Databases with Minimal Changes: Incremental backups are most beneficial for sizable databases where a significant portion of the data remains unchanged or changes gradually. In these cases, incremental backups can save both time and storage.
- Small Databases: For smaller databases, full backups are often simpler to manage and sufficient for most recovery needs, making incremental backups unnecessary.
- Heavily Modified Large Databases: If a large database undergoes frequent, widespread modifications, incremental backups might offer limited benefits, as they won’t be substantially smaller than full backups.
How to take incremental backups and restore
Prerequisites
To perform the incremental backups below configuration parameters need to be set
- wal_level
- max_wal_senders
- summarize_wal
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | [postgres@localhost ~]$ psql psql (17.0) Type "help" for help. postgres=# alter system set summarize_wal to true ; ALTER SYSTEM postgres=# select pg_reload_conf(); pg_reload_conf ---------------- t (1 row) postgres=# postgres=# show summarize_wal ; summarize_wal --------------- on (1 row) postgres=# show wal_level ; wal_level ----------- replica (1 row) postgres=# show max_wal_senders ; max_wal_senders ----------------- 10 (1 row) postgres=# |
Steps to perform incremental backup and restore
For this test I am creating a test table with some data.
1 2 3 4 5 6 7 8 9 10 11 | template1=# create table test(id int, name varchar); CREATE TABLE template1=# template1=# insert into test values (1,'Anurag'); INSERT 0 1 template1=# select * from test; id | name ----+-------- 1 | Anurag (1 row) |
Take a full backup
Now taking full backup using the pg_basebackup command. Pg_basebackup is a command line tool used to take online backups in PostgreSQL .
1 2 3 4 5 6 7 8 9 10 11 | [postgres@localhost ~]$ pg_basebackup -D full_backup --verbose pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed pg_basebackup: write-ahead log start point: 3/2000028 on timeline 1 pg_basebackup: starting background WAL receiver pg_basebackup: created temporary replication slot "pg_basebackup_209564" pg_basebackup: write-ahead log end point: 3/200A2C0 pg_basebackup: waiting for background process to finish streaming ... pg_basebackup: syncing data to disk ... pg_basebackup: renaming backup_manifest.tmp to backup_manifest pg_basebackup: base backup completed |
Now the full backup is completed. Logged into the database and inserted some more records to make changes to the database.
1 2 3 4 5 6 7 8 9 10 | template1=# insert into test values (2,'Prathap'); INSERT 0 1 template1=# template1=# select * from test; id | name ----+--------- 1 | Anurag 2 | Prathap (2 rows) |
Take an incremental backup
To backup the changes made in the previous setup, take the incremental backup using the pg_basebackup command line tool. While taking the incremental backup the backup_manifest of the previous backup needs to be passed to the “–incremental” option.
1 2 3 4 5 6 7 8 9 10 11 | [postgres@localhost ~]$ pg_basebackup -D incremental_backup --incremental=full_backup/backup_manifest --verbose pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed pg_basebackup: write-ahead log start point: 3/4000028 on timeline 1 pg_basebackup: starting background WAL receiver pg_basebackup: created temporary replication slot "pg_basebackup_209755" pg_basebackup: write-ahead log end point: 3/4000120 pg_basebackup: waiting for background process to finish streaming ... pg_basebackup: syncing data to disk ... pg_basebackup: renaming backup_manifest.tmp to backup_manifest pg_basebackup: base backup completed |
Restore the backup
To restore an incremental backup, you’ll need to combine all previous backups since the last full backup using the pg_combinebackup tool. This tool rebuilds a complete backup by merging the full backup with each subsequent incremental backup.
Key Considerations for Restoration:
Maintain Backup Sequence: Ensure backups are provided in the correct sequence, starting with the full backup followed by each incremental backup in order. Any deviation from this order will cause an error.
Complete Backup Chain Required: All incremental backups, starting with the initial full backup, must be available for successful restoration. Omitting any part of the sequence or skipping the full backup will result in an error
How to Use pg_combinebackup:
- List all backups in sequence from oldest to newest, starting with the full backup, followed by each incremental backup in the order they were taken.
- Specify the output directory where the combined full backup will be stored.
1 2 3 | [postgres@localhost ~]$ pg_combinebackup full_backup incremental_backup --output=complete_backup [postgres@localhost ~]$ du -sh complete_backup 39M complete_backup |
Steps to Start the Server with the Backup:
- Navigate to the Backup Directory: Ensure that the complete backup is located in the designated folder.
- Start the PostgreSQL Server on Port 5434: Use the pg_ctl command to initiate the server, specifying the data directory and port.
This setup allows you to run the restored instance alongside your main server instance, enabling you to verify the backup or perform other maintenance without interference.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | [postgres@localhost ~]$ /usr/pgsql-17/bin/pg_ctl -D complete_backup start waiting for server to start....2024-11-04 03:53:48.797 PST [210479] LOG: redirecting log output to logging collector process 2024-11-04 03:53:48.797 PST [210479] HINT: Future log output will appear in directory "log". done server started [postgres@localhost ~]$ psql -p 5434 psql (17.0) Type "help" for help. postgres=# c template1 You are now connected to database "template1" as user "postgres". template1=# template1=# select * from test; id | name ----+--------- 1 | Anurag 2 | Prathap (2 rows) |
Conclusion