꽤 오래전 일이지만 재미있었던 장애 사례를 공유드리고자 합니다.
간단히 요약하자면, 신규로 컬럼을 추가하고 쿼리를 검수하여 인덱스를 생성했지만 인덱스에 대한 통계가 갱신되지 않아서
기대했던 대로 인덱스를 타지 못하고 풀스캔을 하면서 발생했던 장애입니다.
MySQL의 통계와 DDL 작업에 대해 다시한번 생각해보고 좀 더 꼼꼼히 작업할 수 있게 된 계기가 된 장애입니다
장애 개요
- 테이블의 통계 갱신이 안된 상태에서 신규 쿼리가 유입되어 적절한 인덱스를 타지 못하고 풀스캔한 현상
- 데이터 변화량이 MySQL의 통계 수집 기준에 못미쳐서 자동갱신이 안되었음 ( 기준 10%이상, 실제 변경량 4% )
- 동일한 컬럼을 인덱싱하는 인덱스를 재생성하여 통계를 새로 반영하면서 해소됨
문제의 slowquery
select
tb_test0_.`id` as id1_53_,
tb_test0_.`amt` as amt2_53_,
tb_test0_.`intr_inv_id` as intr_inv5_53_,
tb_test0_.manual_version as manual_v3_53_,
tb_test0_.`version` as version4_53_
from
`tb_test` tb_test0_
inner join `tb_test1` tb_test1 on tb_test0_.`intr_inv_id` = tb_test1.`id`
inner join `tb_test2` tb_test2 on tb_test1.`lct_id` = tb_test2.`id`
left outer join `tb_test3` tb_test3 on tb_test2.`tb_test3_id` = tb_test3.`id`
inner join `tb_test4` tb_test4 on tb_test1.`intr_sku_id` = tb_test4.`id`
inner join tb_test5 tb_test5 on tb_test5.id = tb_test4.tb_test5_id
where
tb_test2.`slbl` = 1
and tb_test5.inventory_id = 530123
and tb_test0_.`amt` > 0
order by
tb_test4.`exp_dt` asc,
tb_test4.`rcv_date` asc,
tb_test3.`seq` asc,
tb_test2.`code` asc
실행계획
+----+-------------+--------------+------------+--------+------------------------------------------------------------------+----------------------------+---------+--------------------------------------------+--------+----------+--------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+--------+------------------------------------------------------------------+----------------------------+---------+--------------------------------------------+--------+----------+--------------------------------------------------------------+
| 1 | SIMPLE | tb_test5 | NULL | ref | PRIMARY,ix_sale_group_inventory_id | ix_sale_group_inventory_id | 8 | const | 1 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | tb_test0_ | NULL | range | uix_tb_test_tb_test1,ix_tb_test_amt | ix_tb_test_amt | 9 | NULL | 918574 | 100.00 | Using index condition; Using join buffer (Block Nested Loop) |
| 1 | SIMPLE | tb_test1 | NULL | eq_ref | PRIMARY,uix_tb_test1_tb_test2_sku,fk_intr_inv_intr_sku | PRIMARY | 8 | baemin_market_wms.tb_test0_.intr_inv_id | 1 | 100.00 | NULL |
| 1 | SIMPLE | tb_test2 | NULL | eq_ref | PRIMARY,idx_tb_test2_01 | PRIMARY | 8 | baemin_market_wms.tb_test1.lct_id | 1 | 50.00 | Using where |
| 1 | SIMPLE | tb_test3 | NULL | eq_ref | PRIMARY | PRIMARY | 8 | baemin_market_wms.tb_test2.tb_test3_id | 1 | 100.00 | NULL |
| 1 | SIMPLE | tb_test4 | NULL | eq_ref | PRIMARY,ix_tb_test4_tb_test5_id | PRIMARY | 8 | baemin_market_wms.tb_test1.intr_sku_id | 1 | 100.00 | Using where |
+----+-------------+--------------+------------+--------+------------------------------------------------------------------+----------------------------+---------+--------------------------------------------+--------+----------+--------------------------------------------------------------+
6 rows in set, 1 warning (0.01 sec)
장애 원인
## tb_test5 tb_test4 테이블 조인 시 인덱스를 못타서 쿼리가 비효율적으로 풀리는 현상
select
*
from tb_test5 tb_test5
inner join `tb_test4` tb_test4 use index(ix_tb_test4_tb_test5_id) on tb_test4.tb_test5_id = tb_test5.id
where
tb_test5.inventory_id = 530123
+----+-------------+--------------+------------+------+-----------------------------------------+----------------------------+---------+-------+----------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+------+-----------------------------------------+----------------------------+---------+-------+----------+----------+----------------------------------------------------+
| 1 | SIMPLE | tb_test5 | NULL | ref | PRIMARY,ix_sale_group_inventory_id | ix_sale_group_inventory_id | 8 | const | 1 | 100.00 | NULL |
| 1 | SIMPLE | tb_test4 | NULL | ALL | ix_tb_test4_tb_test5_id | NULL | NULL | NULL | 11609224 | 100.00 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+--------------+------------+------+-----------------------------------------+----------------------------+---------+-------+----------+----------+----------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
비효율 플랜 원인
mysql> show index from tb_test4;
+--------------+------------+-----------------------------------------+--------------+-------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+-----------------------------------------+--------------+-------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| tb_test4 | 0 | PRIMARY | 1 | id | A | 11609224 | NULL | NULL | | BTREE | | |
| tb_test4 | 0 | uix_tb_test4 | 1 | sku_id | A | 16611 | NULL | NULL | | BTREE | | |
| tb_test4 | 0 | uix_tb_test4 | 2 | crtd_cntr_id | A | 526440 | NULL | NULL | | BTREE | | |
| tb_test4 | 0 | uix_tb_test4 | 3 | rcv_date | A | 11340375 | NULL | NULL | | BTREE | | |
| tb_test4 | 0 | uix_tb_test4 | 4 | exp_dt | A | 11562687 | NULL | NULL | YES | BTREE | | |
| tb_test4 | 0 | uix_tb_test4 | 5 | mnf_dt | A | 11552975 | NULL | NULL | YES | BTREE | | |
| tb_test4 | 1 | fk_intr_sku_crtd_center | 1 | crtd_cntr_id | A | 56 | NULL | NULL | | BTREE | | |
| tb_test4 | 1 | ix_tb_test4_tb_test5_id | 1 | tb_test5_id | A | 1 | NULL | NULL | YES | BTREE | | |
+--------------+------------+-----------------------------------------+--------------+-------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
ix_tb_test4_tb_test5_id index의 cardinality 가 1로 optimizer는 해당 인덱스를 타는 것이 비효율적이라고 판단하여 fullscan을 하였음
장애 조치내용
alter table tb_test4 add key idx_tb_test4_tb_test5_id (tb_test5_id),algorithm=inplace,lock=none;
mysql> show index from tb_test4;
+--------------+------------+------------------------------------------+--------------+-------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+------------------------------------------+--------------+-------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| tb_test4 | 0 | PRIMARY | 1 | id | A | 11609224 | NULL | NULL | | BTREE | | |
| tb_test4 | 0 | uix_tb_test4 | 1 | sku_id | A | 16611 | NULL | NULL | | BTREE | | |
| tb_test4 | 0 | uix_tb_test4 | 2 | crtd_cntr_id | A | 526440 | NULL | NULL | | BTREE | | |
| tb_test4 | 0 | uix_tb_test4 | 3 | rcv_date | A | 11340374 | NULL | NULL | | BTREE | | |
| tb_test4 | 0 | uix_tb_test4 | 4 | exp_dt | A | 11562688 | NULL | NULL | YES | BTREE | | |
| tb_test4 | 0 | uix_tb_test4 | 5 | mnf_dt | A | 11552976 | NULL | NULL | YES | BTREE | | |
| tb_test4 | 1 | fk_intr_sku_crtd_center | 1 | crtd_cntr_id | A | 56 | NULL | NULL | | BTREE | | |
| tb_test4 | 1 | ix_tb_test4_tb_test5_id | 1 | tb_test5_id | A | 1 | NULL | NULL | YES | BTREE | | |
| tb_test4 | 1 | idx_tb_test4_tb_test5_id | 1 | tb_test5_id | A | 262408 | NULL | NULL | YES | BTREE | | |
+--------------+------------+------------------------------------------+--------------+-------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
=> tb_test5_id 컬럼을 인덱싱 하는 인덱스를 새로 생성하여 테이블 통계를 새로 반영함
index cardinality가 1 이었던 이유 ?
mysql> select count(*) from tb_test4 where tb_test5_id is null;
+----------+
| count(*) |
+----------+
| 11225307 |
+----------+
1 row in set (4.12 sec)
mysql> select count(*) from tb_test4 where tb_test5_id is not null;
+----------+
| count(*) |
+----------+
| 442238 |
+----------+
1 row in set (0.14 sec)
mysql> select 442238/11225307 * 100;
+-----------------------+
| 442238/11225307 * 100 |
+-----------------------+
| 3.9397 |
+-----------------------+
1 row in set (0.00 sec)
- tb_test4 문제 테이블의 컬럼과 인덱스는 12/20일날 추가되었음
- MySQL의 통계정보 갱신 정책은 테이블의 변경되는 row가 10% 혹은 6.25% 이상이어야 변경됨 (hard coding 되어있음)
- 1,000만건 테이블에 컬럼과 인덱스를 추가했을 땐 NULL 값으로 cardinality가 1로 index가 생성되었음 (정상)
- 그 이후 신규 데이터가 들어왔지만 44만건으로, 전체 테이블 중 4%정도밖에 되지 않기 때문에 인덱스의 통계가 갱신되지 않은 상태에서 문제가 되는 신규 쿼리가 유입됨
- 이후 9:43 분, 인덱스를 새로 생성하면서 통계가 갱신되어 이슈 해소
MySQL 통계 관련 코드
vi mysql-8.0.25/storage/innobase/row/row0mysql.cc 1100번째 라인
/** Updates the table modification counter and calculates new estimates
for table and index statistics if necessary. */
UNIV_INLINE
void row_update_statistics_if_needed(dict_table_t *table) /*!< in: table */
{
ib_uint64_t counter;
ib_uint64_t n_rows;
if (!table->stat_initialized) {
DBUG_EXECUTE_IF("test_upd_stats_if_needed_not_inited",
fprintf(stderr,
"test_upd_stats_if_needed_not_inited"
" was executed\n"););
return;
}
counter = table->stat_modified_counter++;
n_rows = dict_table_get_n_rows(table);
if (dict_stats_is_persistent_enabled(table)) {
if (counter > n_rows / 10 /* 10% */
&& dict_stats_auto_recalc_is_enabled(table)) {
dict_stats_recalc_pool_add(table);
table->stat_modified_counter = 0;
}
return;
}
/* Calculate new statistics if 1 / 16 of table has been modified
since the last time a statistics batch was run.
We calculate statistics at most every 16th round, since we may have
a counter table which is very small and updated very often. */
if (counter > 16 + n_rows / 16 /* 6.25% */) {
ut_ad(!mutex_own(&dict_sys->mutex));
/* this will reset table->stat_modified_counter to 0 */
dict_stats_update(table, DICT_STATS_RECALC_TRANSIENT);
}
}
관련 파라미터
- innodb_stats_persistent
- ON -> 테이블 변경 10% 발생시 통계 갱신
- optimizer가 사용할 통계를 disk에 저장하여 서버가 재시작되어도 통계정보를 지속적으로 유지할 수 있음
- (mysql.innodb_table_stats , mysql.innodb_index_stats)
- OFF -> 6.25% 이상 발생시 통계 갱신
- ON -> 테이블 변경 10% 발생시 통계 갱신