PostgreSQL의 auto-failover 솔루션인 pgpool 은 multi-master 를 지원하지 않기 때문에
MySQL의 MMM 처럼 failover 된 OLD Master 를 자동으로 cluster의 slave 로 포함시키는 기능이 없습니다.
그래서 old master 를 new master의 slave 로 재구성하는 작업을 수작업으로 해야만 하는데
이 과정을 pgpool 의 recovery_node 기능을 통해 자동화 할 수 있습니다.
pgpool 구성 방법은 아래 내용 참고
자동화 설정
수행 과정
- db down
- pgpool failover 감지
- failover command 수행 됨
- failover 완료 후 pcp_recovery_node -n node-id ( new slave 노드 ) 커맨드 수행
- pgpool.conf의 recovery_1st_stage_command = ‘recover.sh’ 설정에 따라 recover.sh 수행 됨
- old master를 slave 로 재구성 후
- old master 가 pgpool cluster로 attach 됨
아래 extension 설치 , pgpool 설정, 스크립트를 모든 노드에 동일하게 설정한다
pgpool_recovery extension 설치
cp /usr/pgsql-11/lib/pgpool* /home1/testuser/psql/engn/postgresql-11.7/lib/
cp /usr/pgsql-11/share/extension/pgpool* /home1/testuser/psql/engn/postgresql-11.7/share/extension
postgres=# create extension pgpool_recovery;
CREATE EXTENSION
postgres=# select * from pg_extension;
extname | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition
--------------------+----------+--------------+----------------+------------+-----------+--------------
plpgsql | 10 | 11 | f | 1.0 | |
pg_stat_statements | 10 | 2200 | t | 1.6 | |
pgpool_recovery | 10 | 2200 | t | 1.2 | |
(3 rows)
postgres=# \c template1
You are now connected to database "template1" as user "testuser".
template1=# create extension pgpool_recovery;
CREATE EXTENSION
template1=# select * from pg_extension;
extname | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition
--------------------+----------+--------------+----------------+------------+-----------+--------------
plpgsql | 10 | 11 | f | 1.0 | |
pg_stat_statements | 10 | 2200 | t | 1.6 | |
pgpool_recovery | 10 | 2200 | t | 1.2 | |
(3 rows)
postgres=# select proname,pronamespace,prosrc from pg_proc where proname like '%pgpool%';
proname | pronamespace | prosrc
---------------------+--------------+---------------------
pgpool_recovery | 2200 | pgpool_recovery
pgpool_recovery | 2200 | pgpool_recovery
pgpool_recovery | 2200 | pgpool_recovery
pgpool_remote_start | 2200 | pgpool_remote_start
pgpool_pgctl | 2200 | pgpool_pgctl
pgpool_switch_xlog | 2200 | pgpool_switch_xlog
(6 rows)
=> pgpool_recovery extension 을 postgres , template1 database 에 설치
pgpool 설정
[testuser@testserver2 12:04:52 ~/psql/scripts ]$ vi /etc/pgpool-II/pgpool.conf
#------------------------------------------------------------------------------
# ONLINE RECOVERY
#------------------------------------------------------------------------------
recovery_user = 'testuser'
recovery_password = 'qhdks123'
recovery_1st_stage_command = 'recover.sh'
script 생성
[testuser@testserver2 11:59:52 ~/psql/scripts ]$ pwd
/home1/testuser/psql/scripts
[testuser@testserver2 11:59:57 ~/psql/scripts ]$ ls -ltr
total 16
-rwx------ 1 testuser testuser 177 Sep 26 20:49 start_instance.sh
-rwx------ 1 testuser testuser 280 Sep 26 20:49 rewind.sh
-rwx------ 1 testuser testuser 146 Sep 26 20:49 stop_instance.sh
-rwx------ 1 testuser testuser 724 Sep 26 23:31 failover.sh
- stop_instance.sh
#!/bin/bash
PGDATA="/home1/testuser/psql/engn/PGSQL"
PATH="/home1/testuser/psql/engn/postgresql-11.7/bin/:$PATH"
pg_ctl -D $PGDATA stop -m immediate
- start_instance.sh
#!/bin/bash
PGDATA="/home1/testuser/psql/engn/PGSQL"
PATH="/home1/testuser/psql/engn/postgresql-11.7/bin/:$PATH"
pg_ctl -w -s -D $PGDATA start 2>/dev/null 1>/dev/null < /dev/null &
- rewind.sh
#!/bin/bash
master_node_host_name=$1
PGDATA="/home1/testuser/psql/engn/PGSQL"
PATH="/home1/testuser/psql/engn/postgresql-11.7/bin/:$PATH"
pg_rewind --target-pgdata=${PGDATA} --source-server="port=3000 user=testuser dbname=postgres host=${master_node_host_name}" >> /var/tmp/recover.log
- failover.sh
#!/bin/bash
if [ $# -ne 3 ]
then
echo "failover failed_node new_master trigger_file"
exit 1
fi
FAILED_NODE=$1
NEW_MASTER=$2
TRIGGER_FILE=$3
# Do nothing if standby server goes down
if [$FAILED_NODE = 1]; then
echo "Standby Server is downed\n" >> /etc/pgpool-II/failover.log
exit 0
fi
echo "failover.sh FAILED_NODE:${FAILED_NODE}; NEW_MASTER:${NEW_MASTER}; at $(date)\n" >> /etc/pgpool-II/failover.log
#sudo -u testuser ssh -T testuser@$NEW_MASTER touch $TRIGGER_FILE
su - testuser -c "ssh -T testuser@$NEW_MASTER touch $TRIGGER_FILE"
su - testuser -c "ssh -T testuser@$NEW_MASTER 'psql -p 3000 -d postgres -c "\""select * from pg_create_physical_replication_slot('\''repl_slot_01'\'')"\""'"
exit 0
- recover.sh => recover.sh 는 반드시 $PGDATA 아래에 있어야 함, pgpool 에 path가 하드코딩 되어있음
#!/bin/sh
set -x
master_node_host_name=`hostname`
master_db_cluster=$1
recovery_node_hostname=$2
recovery_db_cluster=$3
tmp=/tmp/mytemp$$
trap "rm -f $tmp" 0 1 2 3 15
PGDATA="/home1/testuser/psql/engn/PGSQL"
PATH="/home1/testuser/psql/engn/postgresql-11.7/bin/:$PATH"
export PATH PGDATA
# make sure the old master is down
ssh -T $recovery_node_hostname /home1/testuser/psql/scripts/stop_instance.sh
# rewind the old master
ssh -T ${recovery_node_hostname} /home1/testuser/psql/scripts/rewind.sh ${master_node_host_name}
cat > $tmp <<EOF
standby_mode = 'on'
primary_conninfo = 'host=$master_node_host_name port=3000 user=testuser'
primary_slot_name = 'repl_slot_01'
recovery_target_timeline = 'latest'
trigger_file='/home1/testuser/psql/engn/PGSQL/failover_trigger'
EOF
scp $tmp $recovery_node_hostname:$recovery_db_cluster/recovery.conf
ssh -T ${recovery_node_hostname} /home1/testuser/psql/scripts/start_instance.sh
- pgpool_remote_start => recover.sh 수행 후 $PGDATA/pgpool_remote_start 커맨드도 수행되도록 pgpool 에 하드코딩 되어있음 new slave 기동하는 커맨드를 recover.sh 에 포함했으므로 touch로 파일 생성만 함, $PGDATA 아래에 생성해야함
[testuser@testserver2 12:51:01 ~/psql/engn/PGSQL ]$ cat pgpool_remote_start
exit 0
테스트
- status
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | last_status_change
---------+-------------------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
0 | testserver1 | 3000 | up | 0.500000 | standby | 688 | true | 0 | 2020-09-27 11:42:00
1 | testserver2 | 3000 | up | 0.500000 | primary | 2632 | false | 0 | 2020-09-27 10:23:29
=> t1802 가 primary 인 상황
- primary db down
[testuser@testserver2 12:19:45 ~/psql/scripts ]$ stop
waiting for server to shut down.... done
server stopped
- failover 확인
[testuser@testserver1 12:20:02 ~ ]$ psql -p 4000 -d postgres
psql (11.7)
Type "help" for help.
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | last_status_change
---------+-------------------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
0 | testserver1 | 3000 | up | 0.500000 | primary | 826 | true | 0 | 2020-09-27 12:20:17
1 | testserver2 | 3000 | down | 0.500000 | standby | 2596 | false | 0 | 2020-09-27 12:20:17
(2 rows)
=> t1801로 failover 완료
- pcp_recovery_node 커맨드 수행
[testuser@testserver1 12:20:42 ~/psql/scripts ]$ pcp_recovery_node -n 1
Password:
pcp_recovery_node -- Command Successful
2020-09-27 12:21:04: pid 48880: DEBUG: PCP: processing recovery request
2020-09-27 12:21:04: pid 48880: DETAIL: start online recovery
2020-09-27 12:21:04: pid 48880: LOG: starting recovering node 1
2020-09-27 12:21:04: pid 48880: LOG: executing recovery
2020-09-27 12:21:04: pid 48880: DETAIL: starting recovery command: "SELECT pgpool_recovery('recover.sh', 'testserver2', '/home1/testuser/psql/engn/PGSQL', '3000', 1)"
2020-09-27 12:21:04: pid 48880: LOG: executing recovery
2020-09-27 12:21:04: pid 48880: DETAIL: disabling statement_timeout
2020-09-27 12:21:04: pid 48880: DEBUG: executing recovery, start recovery
2020-09-27 12:21:04: pid 48880: DEBUG: executing recovery, finish recovery
2020-09-27 12:21:04: pid 48880: LOG: node recovery, 1st stage is done
2020-09-27 12:21:04: pid 48880: DEBUG: executing remote start
2020-09-27 12:21:04: pid 48880: DETAIL: start pgpool_remote_start
2020-09-27 12:21:05: pid 48880: DEBUG: executing remote start
2020-09-27 12:21:05: pid 48880: DETAIL: finish pgpool_remote_start
=> t1801, t1802 아무 노드나 pgpool 살아있는 곳에서 수행하면 됨
- 클러스터 확인
[testuser@testserver1 12:21:00 ~/psql/engn/PGSQL ]$ psql -p 4000 -d postgres
psql (11.7)
Type "help" for help.
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | last_status_change
---------+-------------------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
0 | testserver1 | 3000 | up | 0.500000 | primary | 868 | false | 0 | 2020-09-27 12:21:10
1 | testserver2 | 3000 | up | 0.500000 | standby | 2602 | true | 0 | 2020-09-27 12:21:10
(2 rows)
=> old master인 t1802 가 slave 로 재구성 후 cluster 에 자동으로 포함된 것 확인