pgloader란?

csv 같은 File이나 실제 DB로부터 데이터를 읽어와 target postgresql 로 데이터 migration을 지원해주는 툴
아래와 같은 단계로 source DB로 부터 데이터를 읽어와 target DB로 copy 수행

  • Fetch meta data and catalogs
### table, column metadata

select c.table_name, t.table_comment,
       c.column_name, c.column_comment,
       c.data_type, c.column_type, c.column_default,
       c.is_nullable, c.extra
from information_schema.columns c
join information_schema.tables t using(table_schema, table_name)
where c.table_schema = 'source DB' and t.table_type = 'BASE TABLE'

    
### FK 제약조건
SELECT s.table_name, s.constraint_name, s.ft, s.cols, s.fcols,
       rc.update_rule, rc.delete_rule
FROM
 (
  SELECT tc.table_schema, tc.table_name,
         tc.constraint_name, k.referenced_table_name ft,
             group_concat(         k.column_name
                          order by k.ordinal_position) as cols,
             group_concat(         k.referenced_column_name
                          order by k.position_in_unique_constraint) as fcols
        FROM information_schema.table_constraints tc
        LEFT JOIN information_schema.key_column_usage k
               ON k.table_schema = tc.table_schema
              AND k.table_name = tc.table_name
              AND k.constraint_name = tc.constraint_name
      WHERE     tc.table_schema = 'source DB'
            AND k.referenced_table_schema = 'source DB'
            AND tc.constraint_type = 'FOREIGN KEY'
   GROUP BY tc.table_schema, tc.table_name, tc.constraint_name, ft
 ) s
             JOIN information_schema.referential_constraints rc
               ON rc.constraint_schema = s.table_schema
              AND rc.constraint_name = s.constraint_name
              AND rc.table_name = s.table_name
                  

### index 구성
SELECT table_name, index_name, index_type,
       sum(non_unique),
       cast(GROUP_CONCAT(column_name order by seq_in_index) as char)
FROM information_schema.statistics
WHERE table_schema = 'source DB'
GROUP BY table_name, index_name, index_type

=> source DB(MySQL) 의 catalog를 쿼리하여 가져오는 단계

  • Prepare the target PostgreSQL database
source DB의 catalog를 target postgresql의 catalog로 변환 후 
schema를 생성하고
casting rule 이 있다면 정의된 casting rule을 적용함 
(column type 이나 default value 같은 설정)
  • data copy
source db에서 data를 select하는 reader Threads 와
target db로 data를 write하는 writer threads 로 나뉘어 migration 작업을 수행함
parallel로 수행 가능
  • Create the indexes in parallel mode
object copy 후 index 생성하는 단계, parallel로 수행됨
  • 제약조건 적용 및 pgloader 종료
PK, FK 제약조건 적용 후 migration 종료

pgloader 설치

yum install -y sbcl git curl patch unzip devscripts pandoc libsqlite3-dev freetds-dev

wget https://github.com/dimitri/pgloader/archive/v3.6.1.tar.gz
tar xvf v3.6.1.tar.gz
cd pgloader-3.6.1/
make pgloader

sudo mv ./build/bin/pgloader /usr/local/bin/

pgloader --version
pgloader version "3.6.2"
compiled with SBCL 1.3.6

source MySQL table schema

CREATE TABLE `place` (
  `place_seq` bigint(20) NOT NULL AUTO_INCREMENT,
  `place_type_code` varchar(20) NOT NULL,
  `nation_code` char(2) NOT NULL DEFAULT 'KR',
  `name` varchar(200) NOT NULL,
  `description` varchar(1000) DEFAULT NULL,
  `latitude` decimal(11,8) NOT NULL,
  `longitude` decimal(11,8) NOT NULL,
  `area_info_seq` bigint(20) NOT NULL,
  `region_sido_area_info_seq` int(11) NOT NULL DEFAULT '0',
  `region_sigungu_area_info_seq` int(11) NOT NULL DEFAULT '0',
  `region_upmyundong_area_info_seq` int(11) NOT NULL DEFAULT '0',
  `address_detail` varchar(400) NOT NULL,
  `postcode` char(5) NOT NULL,
  `homepage_url` varchar(1000) DEFAULT NULL,
  `sns_urls` json DEFAULT NULL,
  `telephone` varchar(30) DEFAULT NULL,
  `telephone_display` varchar(30) DEFAULT NULL,
  `main_image_url` varchar(1000) DEFAULT NULL,
  `additional_info` varchar(1000) DEFAULT NULL,
  `min_price` int(11) NOT NULL DEFAULT '0',
  `max_price` int(11) NOT NULL DEFAULT '0',
  `ad_yn` char(1) NOT NULL DEFAULT 'N',
  `partner_yn` char(1) NOT NULL DEFAULT 'N',
  `search_yn` char(1) NOT NULL DEFAULT 'Y',
  `delete_yn` char(1) NOT NULL DEFAULT 'N',
  `source_info` varchar(100) DEFAULT NULL,
  `load_ymdt` datetime DEFAULT NULL,
  `search_indexing_ymdt` datetime DEFAULT NULL,
  `register_ymdt` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `modify_ymdt` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`place_seq`)
) ENGINE=InnoDB AUTO_INCREMENT=293057 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

migtraion test

  • prepare
### postgre schema 생성
CREATE SCHEMA postgre authorization testuser;
=>MySQL source DB가 postgresql 의 schema로 migration 되기 때문에 사전에 생성해둠

### search_path 설정
* DB level
ALTER DATABASE "testdb" SET search_path TO testuser, postgre;

* session level
set search_path=testuser,postgre;
  • pgloader 수행
$ pgloader mysql://\[user\[:password\]@\]\[netloc\]\[:port\]\[/dbname\]\[?option=value&…\] postgresql://\[user\[:password\]@\]\[netloc\]\[:port\]\[/dbname\]\[?option=value&…\]


[irteam@testserver2 15:11:26 ~/psql/pgloader-bundle-3.6.2/bin ]$ pgloader mysql://mig_test:'qhdks123'@11.111.11.11:13306/postgre postgresql://testuser:''@22.222.22.22:3000/testdb
2020-05-30T15:16:58.031000+09:00 LOG pgloader version "3.6.2"
2020-05-30T15:16:58.119000+09:00 LOG Migrating from #<MYSQL-CONNECTION mysql://mig_test@11.111.11.11:13306/postgre {100892E793}>
2020-05-30T15:16:58.119000+09:00 LOG Migrating into #<PGSQL-CONNECTION pgsql://testuser@22.222.22.22:3000/testdb {1008B80E03}>
    
    
2020-05-30T15:17:04.811000+09:00 ERROR PostgreSQL Database error 42501: must be owner of database testdb
QUERY: ALTER DATABASE "testdb" SET search_path TO testuser, postgre;
2020-05-30T15:17:04.815000+09:00 LOG report summary reset
                     table name     errors       rows      bytes      total time
-------------------------------  ---------  ---------  ---------  --------------
                fetch meta data          0          7                     0.087s
                 Create Schemas          0          0                     0.000s
               Create SQL Types          0          0                     0.007s
                  Create tables          0          6                     0.031s
                 Set Table OIDs          0          3                     0.005s
-------------------------------  ---------  ---------  ---------  --------------
postgre.mobile_verification_log          0          0                     0.335s
                  postgre.place          0     225127    69.7 MB          6.255s
postgre.place_category_mappings          0     412454    19.7 MB          2.232s
-------------------------------  ---------  ---------  ---------  --------------
        COPY Threads Completion          0          4                     6.254s
                 Create Indexes          0          4                     0.589s
         Index Build Completion          0          4                     0.119s
                Reset Sequences          0          2                     0.041s
                   Primary Keys          0          3                     0.003s
            Create Foreign Keys          0          0                     0.000s
                Create Triggers          0          0                     0.000s
                Set Search Path          1          0                     0.001s
               Install Comments          0          0                     0.000s
-------------------------------  ---------  ---------  ---------  --------------
              Total import time          ?     637581    89.4 MB          7.007s
  • pgloader 결과
testdb=> \dt
                  List of relations
 Schema  |          Name           | Type  |  Owner
---------+-------------------------+-------+----------
 postgre | place                   | table | testuser
 postgre | place_category_mappings | table | testuser
(2 rows)

testdb=> select count(*) from place;
 count
--------
 225127
(1 row)

testdb=> select count(*) from place_category_mappings;
 count
--------
 412454
(1 row)


testdb=> \d place
                                                    Table "postgre.place"
             Column              |           Type           | Collation | Nullable |                 Default
---------------------------------+--------------------------+-----------+----------+------------------------------------------
 place_seq                       | bigint                   |           | not null | nextval('place_place_seq_seq'::regclass)
 place_type_code                 | character varying(20)    |           | not null |
 nation_code                     | character(2)             |           | not null | 'KR'::bpchar
 name                            | character varying(200)   |           | not null |
 description                     | character varying(1000)  |           |          |
 latitude                        | numeric(11,8)            |           | not null |
 longitude                       | numeric(11,8)            |           | not null |
 area_info_seq                   | bigint                   |           | not null |
 region_sido_area_info_seq       | bigint                   |           | not null | '0'::bigint
 region_sigungu_area_info_seq    | bigint                   |           | not null | '0'::bigint
 region_upmyundong_area_info_seq | bigint                   |           | not null | '0'::bigint
 address_detail                  | character varying(400)   |           | not null |
 postcode                        | character(5)             |           | not null |
 homepage_url                    | character varying(1000)  |           |          |
 sns_urls                        | json                     |           |          |
 telephone                       | character varying(30)    |           |          |
 telephone_display               | character varying(30)    |           |          |
 main_image_url                  | character varying(1000)  |           |          |
 additional_info                 | character varying(1000)  |           |          |
 min_price                       | bigint                   |           | not null | '0'::bigint
 max_price                       | bigint                   |           | not null | '0'::bigint
 ad_yn                           | character(1)             |           | not null | 'N'::bpchar
 partner_yn                      | character(1)             |           | not null | 'N'::bpchar
 search_yn                       | character(1)             |           | not null | 'Y'::bpchar
 delete_yn                       | character(1)             |           | not null | 'N'::bpchar
 source_info                     | character varying(100)   |           |          |
 load_ymdt                       | timestamp with time zone |           |          |
 search_indexing_ymdt            | timestamp with time zone |           |          |
 register_ymdt                   | timestamp with time zone |           | not null | CURRENT_TIMESTAMP
 modify_ymdt                     | timestamp with time zone |           | not null | CURRENT_TIMESTAMP
Indexes:
    "idx_49946_primary" PRIMARY KEY, btree (place_seq)

=> 데이터 건 수 , schema 모두 정상적으로 이관 됨

pgloader 옵션 주면서 수행하기

  • ORACLE parfile 처럼 export, import 설정을 파일에 저장해놓고 사용할 수 있음
  • Source / Target DB 에 대한 설정, pgloader 작업 전 / 후 의 작업도 설정가능함
$ vi test.load

LOAD DATABASE
  FROM mysql://mig_test:'qhdks123'@11.111.11.11:13306/postgre
  INTO postgresql://testuser:''@22.222.22.22:3000/testdb

  WITH on error stop, include drop, batch rows=10000, workers = 4, concurrency = 1,
       multiple readers per thread,
       max parallel create index =4 
  SET MySQL PARAMETERS
      net_read_timeout  = '120',
      net_write_timeout = '120'

  BEFORE LOAD DO
    $$ create schema if not exists testuser; $$

  AFTER LOAD DO
    $$ set search_path=testuser,postgre; $$
;

### pgloader load 수행

$ pgloader test.load

2020-05-31T00:53:42.044000+09:00 LOG pgloader version "3.6.2"
2020-05-31T00:53:42.178000+09:00 LOG Migrating from #<MYSQL-CONNECTION mysql://mig_test@11.111.11.11:13306/postgre {1009750FE3}>
2020-05-31T00:53:42.178000+09:00 LOG Migrating into #<PGSQL-CONNECTION pgsql://testuser@22.222.22.22:3000/testdb {10097523F3}>
2020-05-31T00:53:48.688000+09:00 LOG report summary reset
                     table name     errors       rows      bytes      total time
-------------------------------  ---------  ---------  ---------  --------------
                    before load          0          1                     0.017s
                fetch meta data          0          7                     0.086s
                 Create Schemas          0          0                     0.001s
               Create SQL Types          0          0                     0.006s
                  Create tables          0          6                     0.035s
                 Set Table OIDs          0          3                     0.005s
-------------------------------  ---------  ---------  ---------  --------------
postgre.mobile_verification_log          0          0                     0.118s
                  postgre.place          0     225127    69.7 MB          6.089s
postgre.place_category_mappings          0     412454    19.7 MB          2.234s
-------------------------------  ---------  ---------  ---------  --------------
        COPY Threads Completion          0          4                     6.084s
                 Create Indexes          0          4                     0.485s
         Index Build Completion          0          4                     0.108s
                Reset Sequences          0          2                     0.023s
                   Primary Keys          0          3                     0.003s
            Create Foreign Keys          0          0                     0.000s
                Create Triggers          0          0                     0.000s
               Install Comments          0          0                     0.000s
                     after load          0          1                     0.014s
-------------------------------  ---------  ---------  ---------  --------------
              Total import time          ?     637581    89.4 MB          6.717s