Prometheus AlertManager告警路由与集群高可用配置
如何配置Prometheus AlertManager实现智能告警路由(如不同级别的告警发送给不同接收人)和集群高可用(多实例+Gossip协议)?请给出一个完整的alertmanager.yml配置,包含:inhibit_rules(告警抑制,避免重复告警)、route(路由规则,按severity分发)、以及receivers(Email/钉钉/Webhook)。同时说明AlertManager集群的Mesh通信原理。
回答
编译有声
AlertManager告警路由与高可用配置:
1. 完整alertmanager.yml配置:
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.example.com:587'
smtp_from: 'alerts@example.com'
smtp_auth_username: 'alerts'
smtp_auth_password: '****'
# 抑制规则:避免重复告警
inhibit_rules:
- source_match:
severity: 'critical' # 出现critical告警时
target_match:
severity: 'warning' # 抑制同实例的warning告警
equal: ['instance', 'alertname']
- source_match:
alertname: 'NodeDown' # 节点宕机时
target_match:
alertname: 'HighCpuUsage' # 抑制CPU告警(无明显意义)
equal: ['instance']
# 路由规则
route:
receiver: 'default'
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
routes:
- match:
severity: 'critical'
receiver: 'sre-team'
repeat_interval: 1h # 关键告警每1小时重复
- match:
severity: 'warning'
receiver: 'data-team'
- match:
alertname: 'FlinkCheckpointFailed'
receiver: 'flink-oncall'
# 接收器配置
receivers:
- name: 'default'
webhook_configs:
- url: 'http://webhook/alert'
- name: 'sre-team'
webhook_configs:
- url: 'https://oapi.dingtalk.com/robot/send?access_token=xxx'
send_resolved: true
email_configs:
- to: 'sre@example.com'
- name: 'data-team'
email_configs:
- to: 'data-team@example.com'
2. AlertManager集群高可用(Mesh/Gossip):
# 3个AlertManager实例,通过Mesh协议同步
./alertmanager --cluster.listen-address=0.0.0.0:9094 \
--cluster.peer=am1:9094,am2:9094,am3:9094 \
--config.file=/etc/alertmanager/alertmanager.yml
# 集群特性:
- Gossip协议确保告警信息在所有节点间同步
- 任意节点收到告警,都会同步到其他节点
- 即使部分节点宕机,告警也不会丢失
3. Prometheus与AlertManager集成:
# prometheus.yml
rule_files:
- '/etc/prometheus/alerts/*.yml'
alerting:
alertmanagers:
- static_configs:
- targets:
- 'am1:9093'
- 'am2:9093'
- 'am3:9093' # Prometheus对多个AlertManager做负载均衡
4. 告警分组与间隔: | 参数 | 说明 | 建议值 | |------|------|--------| | group_wait | 同类告警等待时间(合并)| 30s | | group_interval | 同类告警发送间隔 | 5m | | repeat_interval | 已发送告警的重复间隔 | 4h(warning)/1h(critical)|