구성
- helm 으로 설치한 prometheus-operator 의 prometheus와 alertmanager를 사용함
- 신규로 redis용 PrometheusRule object를 생성하고 이를 prometheus와 연동함
- PrometheusRule는 새로 생성되거나 delete 되는 Pod도 자동으로 인식할 수 있게 helm chart 와 Pod에 연동되어야함
- slack api를 webhook으로 사용
prometheusrules 생성
- vi redis-rule.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
annotations:
meta.helm.sh/release-name: kimdubi-test
meta.helm.sh/release-namespace: default
labels:
app: prometheus-operator
app.kubernetes.io/instance: kimdubi-test
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: redis-cluster
helm.sh/chart: redis-cluster-4.3.1
release: monitoring
name: kimdubi-test-redis-cluster
namespace: default
spec:
groups:
- name: redis-cluster
rules:
- alert: RedisDown
annotations:
description: Redis(TM) instance {{$labels.pod}} is down.
summary: Redis(TM) instance {{$labels.pod}} is down
expr: redis_up{service="kimdubi-test-redis-cluster-metrics"} == 0
for: 1s
labels:
severity: error
- alert: RedisMemoryHigh
annotations:
description: Redis(TM) instance {{$labels.pod}} is using {{ $value }} of
its available memory.
summary: Redis(TM) instance {{$labels.pod}} is using too much memory
expr: |
redis_memory_used_bytes{service="kimdubi-test-redis-cluster-metrics"} * 100 / redis_memory_max_bytes{service="kimdubi-test-redis-cluster-metrics"} > 90
for: 2m
labels:
severity: error
- rule 생성
$ kubectl apply -f redis-rule.yaml
alertmanager 연동
- prometheus 설정 확인
$ kubectl get prometheus -n monitoring
NAME VERSION REPLICAS AGE
monitoring-prometheus-oper-prometheus v2.18.2 1 9m55s
$ kubectl describe prometheus monitoring-prometheus-oper-prometheus -n monitoring
.
.
.
Rule Namespace Selector:
Rule Selector:
Match Labels:
App: prometheus-operator
Release: monitoring
.
.
.
=> prometheus 는 Rule Selector 설정을 통해 release: monitoring label 이 달린 PrometheusRule 만 alert 체크를 함
위에서 새로 생성하는 PrometheusRule label에도 release: monitoring 을 달아줘야 정상적으로 체크가 된다
- alertmanager service 수정
$ kubectl edit service/monitoring-prometheus-oper-alertmanager -n monitoring
spec:
clusterIP: 10.254.88.241
externalTrafficPolicy: Cluster
ports:
- name: web
nodePort: 30903
port: 9093
protocol: TCP
targetPort: 9093
selector:
alertmanager: monitoring-prometheus-oper-alertmanager
app: alertmanager
sessionAffinity: None
type: NodePort
=> type : ClusterIP -> NodePort 로 변경, 9093 port를 30903 port 로 매핑시킴
- prometheus service 수정
$ kubectl edit service/monitoring-prometheus-oper-prometheus -n monitoring
spec:
clusterIP: 10.254.41.25
externalTrafficPolicy: Cluster
ports:
- name: web
nodePort: 30900
port: 9090
protocol: TCP
targetPort: 9090
selector:
app: prometheus
prometheus: monitoring-prometheus-oper-prometheus
sessionAffinity: None
type: NodePort
=> type : ClusterIP -> NodePort 로 변경, 9090 port를 30900 port 로 매핑시킴
- 수정된 service 확인
$ kubectl get service -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 2m14s
service/monitoring-grafana NodePort 10.254.217.6 <none> 80:31000/TCP 2m19s
service/monitoring-kube-state-metrics ClusterIP 10.254.89.125 <none> 8080/TCP 2m19s
service/monitoring-prometheus-node-exporter ClusterIP 10.254.233.58 <none> 9100/TCP 2m19s
service/monitoring-prometheus-oper-alertmanager NodePort 10.254.165.227 <none> 9093:30903/TCP 2m19s
service/monitoring-prometheus-oper-operator ClusterIP 10.254.239.202 <none> 8080/TCP,443/TCP 2m19s
service/monitoring-prometheus-oper-prometheus NodePort 10.254.41.25 <none> 9090:30900/TCP 2m19s
service/prometheus-operated ClusterIP None <none> 9090/TCP 2m4s
=> grafana, alertmanager, prometheus 를 ClusterIP -> NodePort로 변경하고 외부 port 매핑됨
alert 확인
토클에서 매핑시킨 FIP 와 NodePort를 통해 매핑한 외부포트로 접근 가능함
rule
alert
=> alert 체크 되는지 확인을 위해 redis up ==1 로 수정하였음. 잘된다
webhook 연동
- vi prometheus-operator helm 위치 /values.yaml
config:
global:
slack_api_url: 'https://hooks.slack.com/services/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'
route:
receiver: 'redis-team'
group_by: ['alertname']
group_wait: 0s
group_interval: 30s
routes:
- receiver: 'redis-team'
group_wait: 0s
receivers:
- name: 'redis-team'
slack_configs:
- channel: '#kimdubi'
text: '{{ template "custom_title" . }}{{- "\n" -}}{{ template "custom_slack_message" . }}'
templates:
- /alertmanager/template.tmpl
templateFiles:
template.tmpl: |-
{{ define "custom_title" }}
{{- if (eq .Status "firing") -}}
{{- printf "*Triggered: %s (%s)*\n" .CommonAnnotations.triggered .CommonAnnotations.identifier -}}
{{- else if (eq .Status "resolved") -}}
{{- printf "*Recovered: %s (%s)*\n" .CommonAnnotations.resolved .CommonAnnotations.identifier -}}
{{- else -}}
{{- printf "Unknown status repored: %s\n" .CommonAnnotations.triggered -}}
{{- end -}}
{{ end }}
{{ define "custom_slack_message" }}
{{- if gt (len .Alerts.Firing) 0 -}}
{{- range .Alerts.Firing -}}
{{- printf "[alerts] : %s\n" .Annotations.summary -}}
{{- end -}}
{{- end -}}
{{ end }}