Switchover - master server and warm standby (with streaming replication) in PostgreSQL | Learn DBA : A Life Long Learning Experience

Wednesday, 8 April 2020

Switchover - master server and warm standby (with streaming replication) in PostgreSQL

Switch-over of a master database to replica is pretty easy. Lets see how we do it..

I have 2 servers : 5522(data_replica2) and 5432(data), and at this time, 5522 is the master server and 5432 is a warm standby with streaming replication.

The initial setup looks something like this as below image. I can write the data to 5522, where 5432 is used for read-only and I cannot perform any operation.

("warm standby" is usually used to refer to a second server which is keeping up to date, but not accepting connection, used as read-only)

The proper way to promote a standby to a primary is by using the trigger file or running pg_ctl promote

-bash-4.1$ /usr/pgsql-9.6/bin/pg_ctl promote
server promoting

Standby has been promoted as master and a new timeline followed which you can notice in logs
user=,db=,28043LOG:  received promote request
user=,db=,28043LOG:  redo done at 0/77000024
user=,db=,28043LOG:  last completed transaction was at log time 2020-04-08 01:20:40.119967+05:30
user=,db=,28043LOG:  selected new timeline ID: 4
user=,db=,28043LOG:  archive recovery complete
user=,db=,28043LOG:  MultiXact member wraparound protections are now enabled
user=,db=,28044LOG:  checkpoint starting: force
user=,db=,28041LOG:  database system is ready to accept connections
user=,db=,30934LOG:  autovacuum launcher started
user=,db=,28044LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s, total=0.040 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=15119 kB

Promoting the standby will make it pick a new timeline, so we will need to add the parameter - recovery_target_timeline = 'latest' if we want the new standby to follow that timeline switch.

Otherwise, You may end up with errors as below when a new timeline is forked.
(Read here)

cp: cannot stat `pg_xlog/RECOVERYXLOG': No such file or directory
cp: cannot stat `pg_xlog/RECOVERYXLOG': No such file or directory
user=,db=,9262FATAL:  requested timeline 3 is not a child of this server's history
user=,db=,9262DETAIL:  Latest checkpoint is at 0/6A000024 on timeline 2, but in the history of the requested timeline, the server forked off from that timeline at 0/690000C0.
user=,db=,9259LOG:  startup process (PID 9262) exited with exit code 1

In this situation, The basic thing is you can't switch these other than:

- Turn off primary

- Promote secondary

- Rebuild the former primary as new secondary off the new primary.

Step 1: Perform a clean shutdown of Primary[5522]

-bash-4.1$ /usr/pgsql-9.6/bin/pg_ctl -D /var/lib/pgsql/9.6/data_replica2 stop

Check for sync status and recovery status of Standby[5432] before promoting it:

-bash-4.1$ psql -p 5432 -c 'select pg_last_xlog_receive_location() "receive_location",
> pg_last_xlog_replay_location() "replay_location",pg_is_in_recovery() "recovery_status"

;'

receive_location | replay_location | recovery_status
------------------+-----------------+-----------------
 0/7700008C       | 0/7700008C      | t
(1 row)

Standby in complete sync. At this stage and we are safe to promote it as Primary.

Step 2: Open the Standby as new Primary by pg_ctl promote or creating a trigger file.

-bash-4.1$ /usr/pgsql-9.6/bin/pg_ctl promote
server promoting

user=,db=,28043LOG:  received promote request
user=,db=,28043LOG:  redo done at 0/77000024
user=,db=,28043LOG:  last completed transaction was at log time 2020-04-08 01:20:40.119967+05:30
user=,db=,28043LOG:  selected new timeline ID: 4
user=,db=,28043LOG:  archive recovery complete
user=,db=,28043LOG:  MultiXact member wraparound protections are now enabled
user=,db=,28044LOG:  checkpoint starting: force
user=,db=,28041LOG:  database system is ready to accept connections
user=,db=,30934LOG:  autovacuum launcher started
user=,db=,28044LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s, total=0.040 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=15119 kB

Standby has been promoted as master and a new timeline followed which you can notice in logs

Step 3: Edit the recovery.conf file

Restart old Primary as standby and allow to follow the new timeline by specifying "recovery_target_timline='latest' " in $PGDATA/recovery.conf file

-bash-4.1$ pwd
/var/lib/pgsql/9.6/data_replica2

-bash-4.1$ cat /var/lib/pgsql/9.6/data_replica2/recovery.conf

recovery_target_timeline='latest'
standby_mode = 'on'
primary_conninfo = 'user=replicator password=replicator host=127.0.0.1 port=5432 
sslmode=prefer sslcompression=1 krbsrvname=postgres'
primary_slot_name = 'slot1'

In the above recovery.conf file it is clear that old Primary will now try to connect to 5432 (primary) port as new Standby pointing to common WAL Archives location.

Step 4: Now, Start the standby and verify the new standby log :

-bash-4.1$ ls -ltr
-rwxrwxrwx 1 postgres postgres 141687 Apr  8 02:14 postgresql-Wed.log

user=,db=,31220DETAIL:  End of WAL reached on timeline 3 at 0/7700008C.
user=,db=,31220LOG:  restarted WAL streaming at 0/77000000 on timeline 3
user=,db=,31289LOG:  entering standby mode
user=,db=,31289LOG:  consistent recovery state reached at 0/7700008C
user=,db=,31287LOG:  database system is ready to accept read only connections
user=,db=,31293LOG:  started streaming WAL from primary at 0/77000000 on timeline 4
user=,db=,31289LOG:  redo starts at 0/7700008C
user=[unknown],db=[unknown],31297LOG:  connection received: host=[local]
.
.
user=,db=,31290LOG:  recovery restart point at 0/770280E4
user=,db=,31290DETAIL:  last completed transaction was at log time 2020-04-08 01:49:21.588317+05:30
user=,db=,31290LOG:  restartpoint starting: time
user=,db=,31290LOG:  restartpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.003 s, sync=0.000 s, total=0.422 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=144 kB

Step 5. Validate the new Standby status. We can use pg_controldata

Perfect.. without any re-setup we have brought back old Primary as new Standby 😉

Let us try to input some data and see if it replicates to new standby now.

Note :

trigger_file :

While doing Role-switch using trigger file, specify the parameter in recovery.conf file as below.

trigger_file='/var/lib/pgsql/9.6/data/primary_down.txt'

-bash-4.1$ touch /var/lib/pgsql/9.6/data/primary_down.txt

This will cause Postgres to:

Notice the file. It finishes off the slave process and recovers any log files transferred from the main server that still needs recovery.

Further Renames the 'recovery.conf' file to 'recovery.done'… when that happens, it is now the primary server.

recovery_target_timeline :

- If we use recovery_target_timeline='latest' then it will use the timeline which was found in the archive.

- If you want the switch-back functionality, you will have to set recovery_target_timeline='latest', as any promotion will increment the timeline.

- The recovery_target_timeline is used to specify a timeline in the event that you are recovering after having already previously performed a point-in-time recovery, which creates divergent timelines.

Further, if you wish to promote your replica to master (into open; read/write mode), you will not need to change the hot_standby parameter in postgresql.conf , because it will not affect the database with the primary role.

The rule "host replication all" in pg_hba.conf file influences only who could connect to your new primary to create a hot standby database / streaming replication.

These days pg_rewind can be used to turn the former master into a standby of the new master much more quickly than a complete rebuild. Previously rsync or similar would have been a quicker way to do this.

All about Physical Replication and Log shipping in Postgres

Streaming-Replication Sync and Async, benefits of streaming replication over Log-shipping

Source :
https://help.theatremanager.com/book/export/html/3679

https://www.enterprisedb.com/blog/switchover-switchback-in-postgresql-9-3