1. 介绍
本文介绍了 TiDB 集群监控组件Blackbox Exporter监控运行机制和配置方式。Blackbox Exporter是Prometheus官方提供的 Exporter,它能够通过多种协议对网络服务进行探测,包括HTTP、HTTPS、DNS、TCP以及ICMP。通过这些协议,Blackbox Exporter 可以检测网络延迟、服务可用性和证书有效性等关键指标。Blackbox Exporter作为一个独立的应用程序,与Prometheus服务器一起运行。Blackbox Exporter是用Go编写的,Go是一种以高效著称的编译语言,它提供了一种模块化设计,允许通过添加新的协议和端点支持进行未来的扩展。
Blackbox Exporter应用场景
- HTTP 测试:定义 Request Header 信息判断、Http status Http Respones Header Http Body 内容
- TCP测试:业务组件端口状态监听、应用层协议定义与监听
- ICMP 测试:主机探活机制
- POST 测试:接口联通性
- SSL 证书过期时间
2. Blackbox Exporter架构
Prometheus Blackbox Exporter的工作原理是对端点执行探测,并根据这些探测的结果返回指标。作为一个独立程序,以服务形式对我提供提供,可以通过curl命令调用Blackbox的接口实现服务探测,例如:curl http://10.2.103.54:9115/probe?target=10.2.103.129\&module=icmp,通过返回值判断探测目标正常与否。通常Blackbox与Protheus进行集成,Prometheus配置目标端点地址,探测模块和探测频率实现外部服务连续监控
3. Blackbox Exporter部署
3.1下载安装
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.22.0/blackbox_exporter-0.22.0.linux-amd64.tar.gz
tar zxvf blackbox_exporter-0.22.0.linux-amd64.tar.gz
cp blackbox_exporter-0.22.0.linux-amd64/blackbox_exporter /usr/local/bin
cp blackbox_exporter-0.22.0.linux-amd64/
cp blackbox_exporter-0.22.0.linux-amd64/blackbox.yml /etc/
blackbox_exporter --version
blackbox_exporter, version 0.22.0 (branch: HEAD, revision: 0bbd65d1264722f7afb87a72ec4128b9214e5840)
build user: root@4d81de342d10
build date: 20220802-13:56:00
go version: go1.18.5
platform: linux/amd64
3.2 systemctl管理服务
- 准备服务配置文件
vim /usr/lib/systemd/system/blackbox_exporter.service
[Unit]
Description=blackbox_exporter
After=network.target
[Service]
User=root
Type=simple
ExecStart=/usr/local/bin/blackbox_exporter/blackbox_exporter --config.file=/etc/blackbox.yml
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
[Install]
WantedBy=multi-user.target
- 启动服务
systemctl start blackbox_exporter && systemctl enable blackbox_exporter
ps -ef |grep blackbox_exporter
- Http 查看black_box服务
http 访问测试(blackbox_exporter默认监听9115端口)
4. Prometheus blackbox_exporter 配置
- icmp探活器使用
- job_name: "blackbox_exporter_10.2.103.54:9115_icmp"
scrape_interval: 6s
metrics_path: /probe
params:
module: [icmp]
static_configs:
- targets:
- '10.2.103.54'
- '10.2.103.162'
- '10.2.103.74'
- '10.2.103.125'
- '10.2.103.44'
- '10.2.103.42'
- '10.2.103.78'
curl http://10.2.103.54:9115/probe?target=10.2.103.129\&module=icmp
# TYPE probe_success gauge
probe_success 1
- tcp端口状态
- job_name: "monitor_port_probe"
scrape_interval: 30s
metrics_path: /probe
params:
module: [tcp_connect]
static_configs:
- targets:
- '10.2.103.78:3000'
labels:
group: 'grafana'
- targets:
- '10.2.103.54:9100'
- '10.2.103.162:9100'
- '10.2.103.74:9100'
- '10.2.103.125:9100'
- '10.2.103.44:9100'
- '10.2.103.42:9100'
- '10.2.103.78:9100'
labels:
group: 'node_exporter'
curl http://10.2.103.54:9115/probe?target=10.2.103.78:3000\&module=tcp_connect
# TYPE probe_success gauge
probe_success 1
- http探活器使用
测试granfana的网页
- job_name: http-status
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- http://10.2.103.78:3000
labels:
group: web
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 10.2.103.54:9115
curl http://10.2.103.54:9115/probe?target=http://\&module=http_2xx
reload prometheus
5.告警设置
icmp、tcp、http、post 监测是否正常可以观察probe_success 这一指标
probe_success == 0 ##联通性异常
probe_success == 1 ##联通性正常
配置tidb port告警
- name: alert.rules
rules:
- alert: TiDB_server_is_down
expr: probe_success{group="tidb"} == 0
for: 1m
labels:
env: tidb-v6
level: emergency
expr: probe_success{group="tidb"} == 0
annotations:
description: 'cluster: tidb-v6, instance: {{ $labels.instance }}'
value: '{{ $value }}'
summary: TiDB server is down
prometheus->alert页面查看TiDB_server_is_down的告警信息