Setting Up and Restoring PostgreSQL with WAL: A Comprehensive Guide

Introduction to WAL

First and foremost, it's important to understand what Write-Ahead Logging (WAL) is and why it's so crucial for PostgreSQL. When we talk about database integrity, we mean that despite various failures—be it a power outage, disk failure, or network issue—the data remains as accessible and consistent as possible. To ensure this reliability, PostgreSQL uses the WAL mechanism.

What is Write-Ahead Logging (WAL)?

WAL (Write-Ahead Logging) is a log that records all data changes before they are permanently applied to the database files. In other words, before an update or insert is "written" to disk, PostgreSQL writes a "description" of this change to the WAL log file. This can be compared to drafting notes or outlines before finalizing a document.

Why is it necessary?

Data Integrity: In case of a failure, the system can "read through" the log, determine what data was changed, and restore it to a consistent state.
Operational Continuity: Because the WAL file is very fast to write to (essentially a sequential log), the bulk of the write load doesn’t hit the actual database files immediately, allowing data operations to proceed more efficiently.

Main Types of Backups

When administering PostgreSQL, there are two primary backup methods:

Logical Backups:
- Performed using utilities like pg_dump or pg_dumpall.
- They save the database structure (DDL) and data (DML) as SQL scripts or in custom binary formats.
- Advantages include easy schema and data migration to other PostgreSQL versions, and the ability to restore individual tables or schemas.
- The downside is that for large datasets, creating a dump can take a considerable amount of time and consume significant resources.
Physical Backups:
- Involves copying the database’s entire file structure (the data directory, which contains all PostgreSQL files, including WAL logs).
- Typically performed with utilities like pg_basebackup, or, when using Docker, through configured volumes and archiving tools.
- Physical backups more precisely capture the database state than logical dumps, but they can be much larger in size. Additionally, moving a physical copy to a different PostgreSQL version (e.g., from 12 to 14) is more complex, so version compatibility must be considered.

Both methods are valid and are usually applied based on the situation. However, in most cases, it's recommended to combine dumps with physical backups.

Why Combine Regular Dumps with WAL?

WAL acts as a "black box" or "change journal" that allows you to roll back or replay operations performed between backups. Imagine a typical scenario: you create a backup (dump) once a day. However, between the time of the dump and a potential failure, changes (new orders, transactions, records, etc.) have been made to the database. If you only restore from the dump, you’d lose all changes made during that day.

How does WAL help here?

While the database is running, PostgreSQL continuously writes all changes to WAL files. If WAL archiving is enabled (more on that in the following chapters), these files are stored in a separate, secure location (for example, on a remote server or in the cloud).

After a failure, you can first restore the database from the latest dump, then "replay" the WAL logs recorded after that dump. This brings the database up to the exact state it was in just before the failure.

Thus, combining regular dumps (logical or physical) with WAL archiving allows you to:

Quickly restore the "bulk" of the data using the dump.
Recover changes made after the dump by replaying WAL logs.

In practice, this approach provides an optimal balance between backup storage costs and assured data preservation. If the storage running PostgreSQL fails, you restore from the most recent backup (dump or physical backup), then apply the accumulated WAL logs from between the backup and the failure.

Setting Up the Environment

In Chapter 2, we will learn how to configure Docker Compose for PostgreSQL and prepare the basic configuration needed for further WAL setup. Special attention will be paid to the proper directory structure (volumes) to ensure that WAL files and backups can be stored without issues.

Creating and Configuring a docker-compose.yml with PostgreSQL

Below is an example of a basic docker-compose.yml file that can be used as a starting point. It sets up a single PostgreSQL service and mounts a directory for storing data and logs outside the container (using a volume). This not only simplifies backups but also ensures that data persists even if the container is recreated.

services:
  postgres:
    image: postgres:14
    container_name: my_postgres
    restart: always
    environment:
      POSTGRES_USER: myuser
      POSTGRES_PASSWORD: mypassword
      POSTGRES_DB: mydatabase
    volumes:
      - postgres_data:/var/lib/postgresql/data
      # Optionally, you can add another volume for logs,
      # but by default PostgreSQL writes logs to stdout
      # - postgres_logs:/var/log/postgresql
    ports:
      - "5432:5432"

volumes:
  postgres_data:
    # A local volume managed by Docker. You can configure
    # type, name, and other parameters if needed.

Key Points in the Example:

Image: Uses the official postgres:14 image. You can choose another version, but make sure that all the parameters we discuss later are present in that version.
Environment: Sets the default user, password, and database name. This is a convenient minimal set of parameters.
Volumes:
- postgres_data ensures that all database files (including WAL files) are stored not inside the container, but on the host machine (in a Docker-managed volume).
- When the container is recreated, data in postgres_data will not be lost, which is crucial for the integrity and persistent storage of WAL.
Ports: Exposes port 5432 for external database access.

You can place this docker-compose.yml file in any directory on the host. After that, simply run:

docker-compose up -d

and the PostgreSQL container will be ready to work.

Overview of Key PostgreSQL Parameters Important for WAL

While many PostgreSQL settings can be configured through environment variables or additional config files, there are several key options related to WAL that you should know in advance:

wal_level:
Determines the level of detail recorded in the WAL.
By default, for modern PostgreSQL versions (starting with 10), wal_level is set to replica, which is sufficient for most cases including replication and archiving.
If you need finer control over archiving or replication, you may need to change it to logical or leave it as is.
archive_mode:
Turns WAL archiving on (on) or off (off).
By default, it's off, so for a complete backup and recovery system, you'll need to set archive_mode = on.
archive_command:
The command that PostgreSQL calls to copy each completed WAL file to an archive.
For example:
```
archive_command = 'cp %p /path/to/archive/%f'
```
We'll discuss this in more detail in the next chapter when we set up WAL archiving.
max_wal_size, min_wal_size:
Parameters that define the minimum and maximum allowable sizes of the WAL within PostgreSQL’s data structure.
In most cases, you can leave the default values, but for very high-load systems, you might increase max_wal_size to avoid frequent checkpoints.
wal_compression:
Specifies whether to compress WAL files "on the fly."
This can be useful if the WAL size is large and you want to save space.

Keep in mind that not all parameters are conveniently set through environment variables (e.g., POSTGRES_INITDB_ARGS or via an additional configuration file). Sometimes it’s easier to prepare your own postgresql.conf and then mount it via a volume to /var/lib/postgresql/data or /etc/postgresql/postgresql.conf inside the container. However, in most cases, the basic parameters passed through environment variables (or additional SQL scripts during initialization) are sufficient.

Directory Structure Inside and Outside the Container (Volumes)

When working with PostgreSQL via Docker, it's important to understand where data, particularly WAL files, are physically stored. By default, PostgreSQL inside the container uses the directory:

/var/lib/postgresql/data

Within this directory, you’ll find both the main database directory and:

The pg_wal folder (previously pg_xlog) where WAL files reside.
Configuration files (during initialization).
The entire table structure and other auxiliary files.

When we mount postgres_data:/var/lib/postgresql/data, Docker stores this data in a special place on the host (or an external volume driver). Thus, when stopping or removing the container:

The data (including WAL) is not automatically deleted.
You can back it up using standard methods or copy the entire folder if you need to make a "physical copy" of the database.

If a separate location for logs or archives is required, you can add another volume, mounting it to a host directory:

volumes:
  postgres_logs:
    # ...

services:
  postgres:
    # ...
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./logs:/var/log/postgresql

Then you can directly examine PostgreSQL logs on the host in the ./logs folder. For WAL archives, you can specify a directory in the PostgreSQL configuration (for example, /var/lib/postgresql/data/archives), and then mount a separate volume there. In subsequent chapters, we’ll go into more detail on how to work with this.

Key Considerations When Preparing the Environment

Keep everything that needs backup outside the container.
Using volumes is the best way. If the container is removed, the data remains, allowing you to recreate the container at any time without losing the database.
Choose a stable PostgreSQL image.
Generally, it's better to use an updated major release (e.g., 14 or 15) to get the latest features and patches. But if you have a production system, ensure that the required version is supported and that you understand its behavior.
Store configuration files alongside data or attach them via a separate volume.
It’s often more convenient to have your own postgresql.conf, pg_hba.conf, or a folder with SQL initialization scripts. This will allow you to quickly make configuration changes, especially regarding WAL, without needing to rebuild the image.
Consider disk space.
Make sure the host has enough space to store both the main data and the potential for WAL archiving, especially if the database handles a lot of writes. A disk that’s too small may lead to a full volume and a halted container.

At this stage, we have a basic docker-compose.yml and a general understanding of how PostgreSQL stores data and WAL. In the following chapters, we’ll configure parameters for WAL archiving to create a complete data recovery system by combining backups with log "replaying."

Configuring PostgreSQL

Now that we've set up a basic Docker environment and briefly reviewed the PostgreSQL parameters that affect WAL, it's time to configure the system to archive WAL files for later recovery.

Key Configuration Parameters

To fully utilize WAL, you need to enable and properly configure several key settings in postgresql.conf (or pass them via environment variables when starting the container):

wal_level
Determines the detail level of WAL records.
Common values:

replica (default in modern PostgreSQL versions)
logical (if planning logical replication)

For most archiving and standard replication cases, replica suffices.

archive_mode
Turns WAL file archiving on or off.
For backup and recovery with WAL, set this to on.

archive_command
Specifies the command PostgreSQL uses to copy each completed WAL file to permanent storage.
For example:

archive_command = 'cp %p /var/lib/postgresql/data/archives/%f'

Here, %p is the path to the WAL file in the working directory, and %f is its filename.
This assumes there's enough space in /var/lib/postgresql/data/archives.
For remote archiving, you can use tools like scp, rsync, etc., provided access is correctly set up.

archive_timeout (optional)
Forces PostgreSQL to close a WAL file after a specified time interval, even if it's not full.
Useful for predictable archiving schedules, especially under low load.

wal_log_hints (for some replication and standby setups)
Usually not required, but may be needed if you plan to use a replica as a "promoted" standby without a full page rebuild.

max_wal_size, min_wal_size
Settings that control how much WAL data PostgreSQL retains before the next checkpoint.
For heavy write loads and high-performance requirements, you might increase max_wal_size.

Configuring Files and Placement in Docker

Method 1: Using Environment Variables
PostgreSQL's Docker image supports initialization via environment variables and SQL scripts. However, for settings like archive_mode or archive_command, it's usually simpler to use a custom configuration file.

Method 2: Using a Custom postgresql.conf
A common practice is to specify an additional volume in docker-compose.yml that overrides the default postgresql.conf or mounts your own file into the configuration directory:

services:
  postgres:
    image: postgres:14
    container_name: my_postgres
    environment:
      POSTGRES_USER: myuser
      POSTGRES_PASSWORD: mypassword
      POSTGRES_DB: mydatabase
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - archives:/var/lib/postgresql/data/archives
      - ./my_postgresql.conf:/etc/postgresql/postgresql.conf
      - ./pg_hba.conf:/etc/postgresql/pg_hba.conf
    ports:
      - "5432:5432"
    command: >
      -c 'config_file=/etc/postgresql/postgresql.conf'
      -c 'hba_file=/etc/postgresql/pg_hba.conf'

volumes:
  postgres_data:
  archives:

With this setup, your host folder structure might look like:

my_project/
  ├─ docker-compose.yml
  ├─ my_postgresql.conf
  └─ pg_hba.conf (optional)

Save the default settings to my_postgresql.conf by running:

docker run --rm postgres:14 cat /usr/share/postgresql/postgresql.conf.sample > my_postgresql.conf

Then edit my_postgresql.conf, uncommenting and modifying lines as needed, for example:

# WAL Settings
wal_level = replica                     # Level of detail in WAL
archive_mode = on                       # Enable archiving
archive_command = 'cp %p /var/lib/postgresql/data/archives/%f'  # Command to archive WAL
archive_timeout = 300                   # Archive timeout (5 minutes)
max_wal_size = 2GB                      # Maximum WAL size
min_wal_size = 80MB                     # Minimum WAL size
wal_buffers = 16MB                      # Buffers for WAL records (optimal for high load)

# Checkpoint Settings
checkpoint_timeout = 5min               # Frequency of checkpoints
checkpoint_completion_target = 0.7      # Target completion ratio before timeout
checkpoint_warning = 30s                # Warning if checkpoints occur too frequently

# Parallel Operation Optimization
effective_cache_size = 512MB            # Estimated system cache (typically 50%-75% of RAM)
default_statistics_target = 100         # Statistics for query planner

Inside the container, PostgreSQL will read the settings from /etc/postgresql/postgresql.conf, which we've mounted. We specify this file path using the -c 'config_file=...' option in docker-compose.yml.

The pg_hba.conf file controls access (authentication). While it doesn't directly affect WAL, if you're moving to a custom setup, ensure it has correct settings for your environment (e.g., allowing user myuser connections from desired IP addresses). For example:

# Allow local connections for all users
local   all             all                                     trust

# Allow network connections using MD5 authentication
host    all             all             0.0.0.0/0               md5
host    all             all             ::/0                    md5

In Docker, it might be useful to specify something like host all all 172.17.0.0/16 md5 (or your specific range), depending on Compose network settings.

Verifying Settings and Restarting the Container

Step 1: Create Directory for Archives
Before starting the container, ensure the directory for WAL archives exists if you haven't created a volume as in our example.

Step 2: Start or Restart the Container
If the container is already running, restart it to apply new settings:

docker-compose down
docker-compose up -d

Check logs with:

docker-compose logs -f postgres

Look for lines indicating that archive_mode is enabled (archive_mode = on) and the archive_command is set.

Step 3: Verify Settings Inside PostgreSQL
Connect to PostgreSQL using psql. For example, to connect to database mydatabase:

docker exec -it my_postgres psql -U myuser -d mydatabase

If you need to connect to the default postgres database to create mydatabase, use:

docker exec -it my_postgres psql -U myuser -d postgres

Then list databases with:

\l

If mydatabase doesn't exist, create it:

CREATE DATABASE mydatabase;

Check the WAL-related settings with these SQL commands inside the psql shell:

SHOW wal_level;
SHOW archive_mode;
SHOW archive_command;

Step 4: Test WAL Archiving
To test, make some changes in the database so PostgreSQL rolls over a WAL file and triggers archive_command. In the archives folder (either in the volume or on the host), you should see a file named something like 00000001000000000000000A.

List the files:

docker exec -it my_postgres ls -l /var/lib/postgresql/data/archives

If files are appearing, WAL archiving is working. If not, check PostgreSQL logs—issues with write permissions or incorrect paths might prevent cp %p /.../%f from executing.

By following these steps, we've enabled WAL archiving, set the destination for archived files, and verified that the configuration works. Next, we'll explore how to manage the storage of these archives, efficiently control their size, and eventually use them to recover data between dump intervals.

Organizing WAL Archiving

In this section, we'll explore where and how to store WAL files, what tools and commands are used for their archiving, and introduce compression, rotation, and monitoring mechanisms. Properly configured archiving will later allow us to "replay" operations during restoration from a dump—ensuring no important changes are lost.

Methods for Storing WAL Files

Local Disk
- Pros: Easy setup (simply specify a copy command to a local folder), quick file access, no network dependency.
- Cons: If the local disk fails, the archive is lost. Space can quickly run out if WAL volume is high.
Network Storage (NFS, SMB, etc.)
- Pros: The archive is stored separately from the main DB server disk, increasing reliability. Multiple servers can connect to the same storage.
- Cons: Requires a stable network with sufficient bandwidth. Network issues can interrupt the archiving process.
Cloud Storage (S3, GCS, etc.)
- Pros: Data "lives" in the cloud with built-in mechanisms for reliable storage and replication.
- Cons: Integration can be complex (special utilities like s3cmd, rclone, or custom scripts may be needed). Network latency may be higher than local or network storage.
Remote Server (using scp, rsync)
- Pros: Simple to implement, only requires SSH access; the archive resides separate from the current DB server.
- Cons: Ensure proper permissions and sufficient space on the remote server. Network issues can cause archiving to hang or fail.

In practice, many start with local disk (or a Docker volume) for debugging and testing, then move the archive to a more reliable location: such as a network share or S3.

Configuring `archive_command`

To have PostgreSQL "hand off" WAL files to an archive, you need to set the archive_command in postgresql.conf (or via environment variables). A basic example for local archiving:

archive_mode = on
archive_command = 'cp %p /var/lib/postgresql/data/archives/%f'

Here:

%p is the path to the source WAL file.
%f is the filename (typically something like 00000001000000000000000A).
/var/lib/postgresql/data/archives/ is the directory where files are stored. This can be a local directory or a mounted volume.

Common Errors and Their Solutions

Missing Archive Directory
If the archives/ folder doesn’t exist or PostgreSQL lacks write permissions, errors like could not create file /... Permission denied will appear in the logs each time a WAL file completes.

Solution: Ensure the directory exists and has proper permissions. In a Docker container, the folder must exist inside the container if it's not mounted externally.

Overwriting Existing Files
By default, cp %p /archives/%f will overwrite a file if it already exists. This might be an issue in rare cases (e.g., archiving from multiple replication nodes).

Solution: Add a file existence check, for example:
```
archive_command = 'test ! -f /archives/%f && cp %p /archives/%f'
```
This prevents overwriting an existing WAL file.

Network Unavailability (for Remote Storage)
If using scp or rsync to a remote server, network downtime will cause PostgreSQL to repeatedly try executing the command, hanging on archiving.

Solution: Ensure stable network access, use additional rsync flags like --partial --append, configure SSH keys for automatic login, and monitor PostgreSQL logs during serious outages.

Insufficient Storage Space
Heavy load and a large number of WAL files can quickly fill up the archive. When the directory is full, PostgreSQL won't be able to finish writing WAL files, leading to critical errors.

Solution: Regularly clear or rotate old files, monitor storage usage, and expand storage as needed.

Compression, Rotation, and Monitoring Mechanisms

Compression
To avoid clogging storage with gigabytes of WAL files, compress them. For example, using gzip:
```
archive_command = 'cp %p /var/lib/postgresql/data/archives/%f && gzip /var/lib/postgresql/data/archives/%f'
```
Or a combined command:
```
archive_command = 'gzip -c %p > /var/lib/postgresql/data/archives/%f.gz'
```
Keep in mind that during recovery, these files will need to be decompressed (PostgreSQL can't work directly with .gz files). Recovery scripts or procedures must handle decompression accordingly.
Compression options:
- gzip: Balanced speed and compression, almost universally available.
- bzip2 or xz: Higher compression but more CPU-intensive.
- zstd: A modern algorithm offering good speed/compression ratio, but requires installing the corresponding utility.
Archive Rotation
It's impossible to store all WAL files indefinitely—they can take up terabytes of space. You need a rotation policy:
- Automatic Rotation: If using logical or streaming replication, some WAL files might no longer be needed. Tools like pg_archivecleanup can delete WAL files not required for a certain recovery point.
- External Scripts: Use cron jobs to delete old files (e.g., older than N days) if you only need to keep them for a certain period.
- Logical Dumps: Once you have a "golden" dump (or physical copy) from a certain date, there's no need to keep extremely old WAL files since you wouldn't restore beyond that date.
Monitoring Archiving Status
To catch issues early, monitor:
- PostgreSQL Logs: Each time a WAL file is copied, PostgreSQL logs the event. Repeating errors should trigger immediate investigation.
- Archive Directory Size: A sudden spike in size could indicate peak load or failure to delete old files.
- Number of Files: If daily dumps occur, the WAL volume between dumps should be predictable. An unusually high number of new files might indicate a transaction surge or cyclical errors.
Integrate monitoring systems (Prometheus, Zabbix, ELK, Grafana, etc.) to track disk usage, directory growth rates, and log error patterns.

In summary, an effective archiving strategy involves storing WAL files on reliable media (local, network, or cloud), compressing them if necessary to save space, and regularly cleaning up old logs that are no longer needed for recovery. In the following sections, we'll dive into how to actually use these files to restore data between dump points and a failure moment—essentially, how to "replay" transactions from WAL.

The Recovery Process

In earlier sections, we covered how to configure PostgreSQL for writing and archiving WAL files. Now, let's break down the concrete steps that allow us to "replay" operations (transactions) occurring between the time a dump was created and the point of failure. The main idea is that a dump restores the base state, and then WAL files are used to recover all subsequent changes made after that dump.

Step 1: Preparing a Fresh Dump

To have a starting point for recovery using WAL, we need a current dump (logical or physical). Suppose we regularly (for example, once a day or several times a day) create a logical dump using pg_dump:

pg_dump -U myuser -d mydatabase > /backup/pg_dumps/mydatabase_20250101.sql

Or we perform a physical backup using pg_basebackup (which preserves the database’s file system structure). In any case, we should have a snapshot of the database state at the time of the backup.

Step 2: Preparing the Recovery Directory

When a failure occurs, we need to create a new clean environment (a new PostgreSQL data folder) and "deploy" our backup there. In a Docker container, this usually looks like:

Stop the PostgreSQL container if it's still running.

Clear out or create a new folder (volume) for the database:

docker-compose down
# Assuming we store data in volume postgres_data
# If necessary, recreate the volume:
docker volume rm my_project_postgres_data
docker volume create my_project_postgres_data

Start the container with a new (empty) volume:
```
docker-compose up -d
```

Now, /var/lib/postgresql/data contains a fresh PostgreSQL file system.

Step 3: Restoring from the Dump

Option 1: Logical Dump (.sql or custom format)
Connect to the now-empty database using psql:

psql -U myuser -d mydatabase -h 127.0.0.1 -p 5432 < /backup/pg_dumps/mydatabase_20250101.sql

Or, if the dump is in custom format, use pg_restore:

pg_restore -U myuser -d mydatabase -h 127.0.0.1 -p 5432 /backup/pg_dumps/mydatabase_20250101.dump

Wait for the restoration to finish. At this point, the database data reflects its state at the time the dump was made.

Option 2: Physical Copy (via pg_basebackup)
Unpack the physical backup into a data folder (or corresponding volume):

pg_basebackup -D /var/lib/postgresql/data_restored -U myuser -Fp -Xs -P

Then, start PostgreSQL pointing to this directory in Docker (or by copying files into the appropriate volume). With a physical backup, we already have the "live" state of the entire database file system, including pg_wal. However, for full recovery, additional setup of recovery files may be necessary—this will be covered in the following steps.

Step 4: Applying WAL to Recover Data Between the Dump and Failure

Now for the crucial part: "replaying" all WAL files accumulated after the dump. The logic is as follows:

When PostgreSQL starts, it enters recovery mode and begins sequentially reading archived WAL files.
All changes recorded in WAL are applied to the database.
Once it reaches the desired point (usually the last available WAL file before the failure), the restored database becomes up-to-date.

Configuring Recovery Mode

Historically, PostgreSQL used a file named recovery.conf to specify where to retrieve WAL files for recovery. Starting with version 12, these parameters have moved into postgresql.conf or are signaled with standby.signal/recovery.signal. The logic remains roughly the same:

restore_command — The command PostgreSQL uses to fetch the required WAL file from the archive. It's the reverse of archive_command.
recovery_target_time or recovery_target_xid (optional) — If you need to recover to a specific point in time or a particular transaction.
recovery_target_action — What to do after reaching the target point (stop or continue operation).

Example snippet in postgresql.conf configured for recovery:

# Enable recovery mode (PostgreSQL 12+)
restore_command = 'cp /var/lib/postgresql/data/archives/%f %p'

# If you want to recover to a specific point in time:
# recovery_target_time = '2025-01-01 10:30:00'

# Upon reaching the recovery target, stop reading WAL and "promote" to normal mode
recovery_target_action = 'promote'

In older PostgreSQL versions (pre-12), you could create a recovery.conf file in the data directory with:

restore_command = 'cp /var/lib/postgresql/data/archives/%f %p'
recovery_target_action = 'promote'

Placing Configuration in Docker

Create or edit an additional configuration file—say recovery.conf for PostgreSQL <12, or include these parameters directly in postgresql.conf for PostgreSQL ≥12.

In your docker-compose.yml, you might specify:

services:
  postgres:
    # ...
    volumes:
      - ./postgresql.conf:/etc/postgresql/postgresql.conf
      - ./my_recovery_signal_file:/var/lib/postgresql/data/recovery.signal
    command: >
      -c 'config_file=/etc/postgresql/postgresql.conf'

Why ./my_recovery_signal_file?
The recovery.signal file signals PostgreSQL to start in recovery mode. It's required in cases such as:

Recovery from WAL Archive: PostgreSQL uses restore_command to sequentially restore changes from archived WAL files.
Point-In-Time Recovery (PITR): If recovery_target_time is set, PostgreSQL will halt recovery at that specified time.
Replication: When setting up a database replica, recovery mode is necessary to sync with the master.

Make sure no conflicting settings exist on startup (for example, archive_mode=on while simultaneously trying to start recovery).

Launching and Verifying

Once PostgreSQL starts in recovery mode, it will sequentially copy WAL files using restore_command. Log messages should appear such as:

LOG:  starting point-in-time recovery to 2025-01-01 10:30:00
LOG:  restored log file "00000001000000000000000A" from archive
LOG:  consistent recovery state reached

When all necessary files are processed, the database will "promote" to normal operational mode.

Step 5: Nuances and Common Pitfalls

Version Matching
The dump, WAL files, and PostgreSQL version being restored must match (or be compatible). For instance, restoring a PostgreSQL 14 dump on version 15 may cause issues with a physical copy. While logical dumps are more forgiving, it’s still best to avoid major version jumps.

Clean Data Directory
If the data directory already contains "old" files from another PostgreSQL installation, restoration may fail or not start at all. Ensure the folder is clean before deploying a backup.

Number of WAL Files
If the archive lacks necessary WAL files, full recovery to the most recent state isn’t possible. Verify that archiving was continuous and that all files from the time of the dump until the failure are present. If a WAL file is missing, PostgreSQL will error out with something like could not restore file ..., and you'll need to find workarounds (such as aborting recovery after the last available file).

Conflicting Settings
In modern versions (12+), recovery.conf is no longer used. Having such a file will cause PostgreSQL to throw an error at startup. Use signal files (recovery.signal or standby.signal) and place recovery parameters in postgresql.conf.

Permissions and Paths
When using Docker, issues often arise from incorrect paths inside the container. Double-check that the path in restore_command points to the actual archive location. If the archive is mounted from outside, ensure PostgreSQL inside the container can "see" the exact directory you specified.

Summary of the Full Recovery Process

Deploy a new environment or clear the existing one.
Import a fresh dump (logical or physical).
Configure paths to the WAL archive and set up commands for fetching them.
Start PostgreSQL in recovery mode, monitor the logs, and wait for recovery to complete.
Obtain an up-to-date database that includes all transactions performed after the dump was created.

Practical Examples

In this section, we'll look at specific commands and verification methods to ensure that our backup and WAL setup is functioning flawlessly. We'll also share handy tips and discuss next steps for diving deeper into backup strategies and scaling.

Docker and psql Commands for Backups

Below are some commands you can run in a standard Docker environment with PostgreSQL.

1. Creating a Logical Dump

Assuming our container is named my_postgres and we want to back up the database mydatabase as user myuser.

docker exec -it my_postgres \
  pg_dump -U myuser mydatabase > /backup/pg_dumps/mydatabase_$(date +%Y%m%d_%H%M).sql

docker exec -it my_postgres — enters the container.
pg_dump -U myuser mydatabase — performs the backup.
> /backup/pg_dumps/... — redirects output to a file (assuming /backup/pg_dumps/ is mounted as a volume or exposed externally).

Note: Ensure the directory /backup/pg_dumps exists inside the container. If not, run the redirection on the host with something like:

docker exec my_postgres pg_dump ... > backup.sql

2. Creating a Physical Backup Using pg_basebackup

docker exec -it my_postgres \
  pg_basebackup -U myuser -D /backup/physical_backup -Fp -Xs -P

-D /backup/physical_backup — directory to save the physical files.
-Fp — plain format, copies files directly.
-Xs — includes replication slots (or streams WAL) during the process.
-P — shows progress.

After execution, you'll find a complete file structure in /backup/physical_backup, which you can use to restore in place of /var/lib/postgresql/data.

3. Restoring a Logical Dump

Assuming you want to restore to the same database (mydatabase) in the same container:

docker exec -i my_postgres \
  psql -U myuser -d mydatabase < /backup/pg_dumps/mydatabase_20250101.sql

psql -U myuser -d mydatabase < ... — applies the SQL dump script to the existing database.

If you need a "clean" restore (for example, reset the database before loading the dump), you can first drop and recreate the database:

DROP DATABASE mydatabase;
CREATE DATABASE mydatabase;

Then load the dump.

WAL Recovery Examples

Checking for Archived WAL Files

ls /var/lib/postgresql/data/archives

If everything is working, you'll see files named like 00000001000000000000000A, etc.

Verifying Archiving Parameters in a Running Container

docker exec -it my_postgres psql -U myuser -c "SHOW archive_mode;"
docker exec -it my_postgres psql -U myuser -c "SHOW archive_command;"
docker exec -it my_postgres psql -U myuser -c "SHOW wal_level;"

Make sure archive_mode = on, archive_command is set, and wal_level is either replica or logical.

Simulating a Failure and Recovery

Stop the container and reset or change the volume (or clear it).
Start the container with an empty database.
Load the latest dump.
Set the path to the archived WAL files in the restore_command parameter (in postgresql.conf or recovery.conf for versions <12).
Restart the container and wait while PostgreSQL "replays" the WAL.
Verify that the expected data (created after the dump) has been recovered.

Handy Tips and Lifehacks

Regular Test Restores:
Practice Plan B: once backups and WAL are set up, try restoring the database on a test system using these backups. Only by actually performing a restore can you be confident that your backup won't be useless in an emergency.
Schedule: Once a week or month, perform a "dry run" of your recovery procedure so you won't waste time during a real crisis.
Rotating Old WAL Files:
Use tools like pg_archivecleanup to automatically delete unneeded WAL files (if you don't intend to roll back to very old points).
In large systems, thousands of WAL files can consume a lot of space and complicate monitoring.
Monitoring and Alerts:
- PostgreSQL Logs: Regularly check logs for archiving errors. If archive_command returns a non-zero status, relevant messages will appear.
- Third-Party Tools: Connect Prometheus/Grafana or another monitoring system to track disk usage, new WAL file creation, etc.
Backup Automation:
- Cron Jobs: Set up cron inside the container (or on the host) to run pg_dump or pg_basebackup at scheduled times. Log cron output to detect potential errors.
- Docker Approach: Some prefer a separate container that "attaches" to the database to perform backups, then uploads results to S3, a local folder, or network storage.
Version Compatibility Checks:
When upgrading PostgreSQL (e.g., from 14 to 15), ensure beforehand that physical backups and WAL are compatible. Logical dumps are usually easier to transfer, but review Release Notes carefully.
Storing Backups Off-Server:
If the entire system (server/Docker host) fails, local backups won't save you. Always keep backups (dumps and WAL archives) in another data center or physical location.

What to Explore Next

Streaming Replication:
Provides a "hot" standby server that receives WAL files in real time. In case of failure, you can quickly switch over to the replica.
Logical Replication:
Allows selective replication of tables or even individual data rows to other clusters.
Sharding, Patroni, Stolon, and Other Cluster Managers:
For high loads and stringent availability requirements, use additional tools that simplify the management of multiple PostgreSQL nodes.
Automated Backup Tools:
Tools like Barman or WAL-G handle rotation, monitoring, and disk space optimization automatically.

By following these chapters, you can build a working system—from a basic Docker Compose file and setting archive_mode to testing recovery in a failure scenario. Regular checks, log monitoring, and timely WAL rotation are key to reliability and will help you avoid critical data loss.

Read also:

Advanced SQL Constructs, Query Optimization, Transactions, Big Data, and Security. Practical examples and tips for advanced database operations.

SQL: From basic operations and JOINs to indexes and query optimization for high performance.

Learn about the Pydantic library, data model creation, validation, and integration with FastAPI, SQLAlchemy, and Django.