Saturday, 12 September 2020

All about Physical Replication and Log shipping in Postgres

In my previous post, I tried to explain how we can configure the replication step by step. Today, I thought I will mention a few things about how it actually works.

Physical Replication

The transaction log entries can be taken from one database server (the master server) and be applied to the data files on another server (the standby server). In this case, the standby server will then have the exact replica of the database of the master server. The process of transferring transactional log entries and applying them to another server is called a Physical replication.

It is called like this because a transaction log recovery works on the level of data files and the database replica on the stand by the server will have the exact byte copy of the master database.

Physical replication works for the entire database cluster for all databases. When a new database is created this is reflected in the transaction log and therefore replicated to the standby server, the standby server can be configured to allow read-only queries. In this case, it will be called Hot standby.

Log shipping

One of the ways to set up a physical replication is to constantly ship your WAL files from the master server to the standby server and apply them there to have a synchronized copy of the database. This scenario is called log shipping.

When this implementation is set up and both servers are started, the master server will copy all the WAL Files to the archive location and the standby server will take them from there and replace them over it's base-backup.

This process will continue until configure or otherwise in the recovery configuration file. No transaction will be possible on the standby server but it can be configured for READ-ONLY access.

If the master server crashes and it's necessary to switch to the standby site, it should be promoted. This means that it should stop the recovery and allow the READ-WRITE transaction.

To do this. Simply remove the recovery configuration file and restart the server. After this, the standby server becomes the new master. It would make total sense, to make the all Master a standby after it's back in service to keep the class redundant. In that case, the preceding step should be repeated with server switching roles.

Never start the master server as the master after the standby was promoted. It would cause differences in the replicas and eventual data loss.

Log shipping is asynchronous, there could be a delay before the data that it's committed on the master server appears on the slave. The Slaves are eventually consistent with the master. This also means that there is a chance of data loss when the primary server crashes, as there may be transactions that have not been shipped yet.

The benefit of replication implemented via log shipping are as follows.

It's relatively easy to setup.
The number of standby server can be greater than 1.
There is no need to connect to the standby server and master server.
Also the master doesn't know that there is a standby server and doesn't depend on it. In fact, standby can be used as a master for other standby servers. This would be called cascading replication.

On the other hand, there are also some problems.

- It is necessary to provide a network location for the WAL archive which both master and the standby can access. This implies the involvement of a third entity.

- It is possible to archive the WAL files directly to the stand by server but then, the setup will not be symmetrical and in case of a failure of the master and promotion of the standby, the setup over the new standby will be a bit more complex.

- Another disadvantage is the standby can replay a WAL File only after it was archived. This happens only when the file is complete, meaning that it reaches its size of 16 megabytes. This means that the latest transaction performed on the master may not be reflected on the STANDBY, especially if the transaction on the master are small and don't happen very often.

In that case, it may happen that the Backup database doesn't have the recent transaction and its state is behind the master database. In the case of a master failure, some data will be lost.

Disadvantages of Physical and Logical data replication.

Physical: For example, when an index is created on a table it is not the create index command that it sends to the standby server, but the actual content of the data file with the index data.

This can create excessive load on the network and can become a bottleneck for a system under heavy load.

Another disadvantage of physical replication is that, every change in the data files will be replicated, even if that doesn't change the data itself.

For example, this happens when the vacuum command is executed on the MASTER that removes that dead tuples from the table or the cluster command that rearranges the rows.

Logical replication doesn't send the result of a SQL command but the command itself. It's an alternative to physical replication.

In this case, the amount of data sent over a network can be much smaller and the servers don't need to be identical. What's more the data structure on the servers doesn't have to be the same. Logical replication can be treated as if all the data are changing SQL commands which are performed on the publisher tables or sent to the subscriber, the subscriber will then apply the commands in its database to the same tables. This happens at the level of SQL, not at a lower level of physical replication.

This allows the subscriber to have a different data structure as long as the SQL commands can be executed.

Logical replication has the following benefits over physical replication.

It's simple to set up, which means it doesn't require any third-party software or hardware and works out-of-the-box in PostgreSQL
It can be very flexible, which means it doesn't require an identical database schema on the two servers and doesn't require an identical set up of the servers in general.
A publisher can be configured to replicate only certain types of operations.

For example, inserts and deletes but not update the changes in data on the physical level such as vacuum or cluster are not replicated. On the other hand, flexibility brings complexity.

When implementing logical replication, the following should be kept in mind.

We could have possible inconsistencies. For example, logical replication doesn't respect the foreign keys so it can bring the target database into an inconsistent state.
When the schema changes on the publisher side, the replication can suddenly break if the schema on the subscriber is not compatible.
Another thing is that logical replication only replicates changes from the publisher to the subscriber. If somebody changes the data directly on the subscriber, the replication will not bring the tables back in sync.
Another downside is that only tables are replicated other Schema objects are not. This can be an issue when auto-incremented fields that are based on sequences are used in the database.