Search Posts

prometheus添加域名SSL到期监控推送slack 钉钉告警

内容纲要

效果截图

English version in the End

prometheus添加域名SSL到期监控推送slack 钉钉告警

安装black_exporter并设置开启启动

# 下载 blackbox_exporter 压缩文件
wget -O /opt/blackbox_exporter-0.25.0.linux-amd64.tar.gz https://github.com/prometheus/blackbox_exporter/releases/download/v0.25.0/blackbox_exporter-0.25.0.linux-amd64.tar.gz

# 解压 blackbox_exporter 压缩文件
cd /opt
tar -zxvf blackbox_exporter-0.25.0.linux-amd64.tar.gz

# 创建 blackbox_exporter systemd 服务文件
cat > /etc/systemd/system/blackbox_exporter.service << EOF
[Unit]
Description=Prometheus Blackbox Exporter
After=network.target

[Service]
User=root
ExecStart=/opt/blackbox_exporter-0.25.0.linux-amd64/blackbox_exporter --config.file=/opt/blackbox_exporter-0.25.0.linux-amd64/blackbox.yml

[Install]
WantedBy=multi-user.target
EOF

# 创建 blackbox_exporter 配置文件
cat > /opt/blackbox_exporter-0.25.0.linux-amd64/blackbox.yml <<EOF
modules:
  http_2xx:
    prober: http
    http:
      valid_http_versions: ["HTTP/1.1", "HTTP/2"]
      valid_status_codes: [200]
      method: GET
      preferred_ip_protocol: "ip4"
  http_post_2xx:
    prober: http
    http:
      method: POST
  tcp_connect:
    prober: tcp
  pop3s_banner:
    prober: tcp
    tcp:
      query_response:
      - expect: "^+OK"
      tls: true
      tls_config:
        insecure_skip_verify: false
  grpc:
    prober: grpc
    grpc:
      tls: true
      preferred_ip_protocol: "ip4"
  grpc_plain:
    prober: grpc
    grpc:
      tls: false
      service: "service1"
  ssh_banner:
    prober: tcp
    tcp:
      query_response:
      - expect: "^SSH-2.0-"
      - send: "SSH-2.0-blackbox-ssh-check"
  irc_banner:
    prober: tcp
    tcp:
      query_response:
      - send: "NICK prober"
      - send: "USER prober prober prober :prober"
      - expect: "PING :([^ ]+)"
        send: "PONG ${1}"
      - expect: "^:[^ ]+ 001"
  icmp:
    prober: icmp
  icmp_ttl5:
    prober: icmp
    timeout: 5s
    icmp:
      ttl: 5
EOF

# 重新加载 systemd daemon
sudo systemctl daemon-reload

# 启动 blackbox_exporter 服务
sudo systemctl start blackbox_exporter

# 检查 blackbox_exporter 服务状态
sudo systemctl status blackbox_exporter

# 设置 blackbox_exporter 开机启动
sudo systemctl enable blackbox_exporter

prometheus添加作业配置

172.30.171.60:9115 为black_exporter监听地址

rule_files:
  #- "/etc/prometheus/first_rules.yml"
  - "/etc/prometheus/ssl_cert_alerts.yml"
scrape_configs:
  - job_name: 'SSL证书监控'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
        - https://domain1.com
        - https://domain2.com
        - https://domain3.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 172.30.171.60:9115

prometheus 添加告警规则

vim  ssl_cert_alerts.yml 

groups:
- name: "SSL证书过期提醒"
  rules:
  - alert: "证书过期时间<30天"
    expr: probe_ssl_earliest_cert_expiry{job="SSL证书监控"} - time() < 86400 * 30
    for: 0s
    labels:
      severity: "提示"
    annotations:
      summary: "{{ $labels.instance }} SSL 证书将在30天后过期,请注意及时续期!"
      description: "{{ $labels.instance }} SSL 证书将在30天后过期,请注意及时续期!"
  - alert: "证书过期时间<7天"
    expr: probe_ssl_earliest_cert_expiry{job="SSL证书监控"} - time() < 86400 * 7
    for: 0s
    labels:
      severity: "告警"
    annotations:
      summary: "{{ $labels.instance }} SSL 证书将在7天后过期,请注意及时续期!"
      description: "{{ $labels.instance }} SSL 证书将在7天后过期,请注意及时续期!"
  - alert: "证书过期时间<1天"
    expr: probe_ssl_earliest_cert_expiry{job="SSL证书监控"} - time() < 86400 * 1
    for: 0s
    labels:
      severity: "灾难"
    annotations:
      summary: "{{ $labels.instance }} SSL 证书将在1天后过期,请注意及时续期!"
      description: "{{ $labels.instance }} SSL 证书将在1天后过期,请注意及时续期!

prometheus重载配置

#有密码
curl -X POST -u user1:password http://localhost:9090/-/reload
#无密码
curl -X POST  http://localhost:9090/-/reload

alertmanager 添加告警配置

钉钉告警接口

docker run -d -p 8060:8060 \
--name webhook1  timonwong/prometheus-webhook-dingtalk:v1.4.0 \
--ding.profile="webhook1=https://oapi.dingtalk.com/robot/send?access_token=511cb5651xxxae2267xxxxx649c86c511579bf5dd7dbc" 

slack获取webhook地址
要获取Slack的Webhook地址,您需要执行以下步骤:

  1. 登录到您的Slack工作区。
  2. 转到您想要将Webhook链接添加到的频道或私人消息中。
  3. 单击频道或私人消息中的设置图标(通常是齿轮或三个点)。
  4. 在设置菜单中,选择自动化"或"“附加服务”或“集成”(具体名称可能会有所不同)。
  5. 在页面中,查找“Incoming WebHook”选项。
  6. 单击“Incoming WebHook”选项,然后选择“添加Incoming WebHook”。
  7. 在弹出的窗口中,您将看到一个唯一的Webhook URL。这是您的Slack Webhook地址。
  8. 复制该地址并保存在您需要发送消息到Slack的应用程序或服务中。

请注意,使用Webhook时,请务必小心保护您的Webhook URL,不要在公共场所或不受信任的地方公开它,以防止未经授权的人发送消息到您的Slack频道。

global:
    resolve_timeout: 5m
route:
    group_by: ['alertname']
    group_wait: 30s
    group_interval: 1m
    repeat_interval: 2m
    receiver: 'webhook1'
    routes:
    - receiver: "webhook1"
      continue: true
    - receiver: "slack"
receivers:
  - name: 'slack'
    slack_configs:
    - send_resolved: true
      username: 'Alertmanager'
      channel: '#运维'  # 替换为你的 Slack 频道
      #title: "{{ range .Alerts }}{{ .Annotations.summary }}\n{{ end }}"
      text: "{{ range .Alerts }}{{ .Annotations.description }}\n{{ end }}"
      #text: "<!channel> \nsummary: {{ .CommonAnnotations.summary }}\ndescription: {{ .CommonAnnotations.description }}"
      api_url: 'https://hooks.slack.com/services/xxxxxxxx'  # 替换为你的 Slack Webhook URL
  - name: 'webhook1'
    webhook_configs:
      - url: 'http://172.30.171.61:8060/dingtalk/webhook1/send'
        send_resolved: true     # 表示服务恢复后会收到恢复告警
#    当已经发送的告警通知匹配到target_match和target_match_re规则,
#    当有新的告警规则如果满足source_match或者定义的匹配规则,
#    并且已发送的告警与新产生的告警中equal定义的标签完全相同,
#    则启动抑制机制,新的告警不会发送  

alertmanager 规则细节

单双告警

我这边因为中国大陆工作使用slack但是需要稳定的科学网络,所以加了钉钉双告警

    #单告警
    receiver: 'webhook1'
    #双告警
    receiver: 'webhook1'
    routes:
    - receiver: "webhook1"
      continue: true
    - receiver: "slack"

slack告警显示问题

不加text选项slack不显示明细,显示如下

[FIRING:4] 证书过期时间<30天 (SSL证书监控 提示)

text: "{{ range .Alerts }}{{ .Annotations.description }}\n{{ end }}"

显示为而且标题点击跳转到alertmanager

[FIRING:1] 证书过期时间<30天 (https://xxx.com SSL证书监控 提示)
https://xxx.com SSL 证书将在30天后过期,请注意及时续期!

告警配置官方文档:https://prometheus.io/docs/alerting/latest/notification_examples/

grafana Dashboard

Dashboards:https://grafana.com/grafana/dashboards/

Dashboard ID(SSL 证书监控):13230

Dashboard ID(HTTP 状态监控):13659

Dashboard ID(SSL TCP HTTP 监控):9965

本人使用 9965

参考文章:
https://blog.csdn.net/IT_ZRS/article/details/129812920
https://blog.csdn.net/IT_ZRS/article/details/129860080

English version

Install black_exporter and set up startup

# Download the blackbox_exporter zip file
wget -O /opt/blackbox_exporter-0.25.0.linux-amd64.tar.gz https://github.com/prometheus/blackbox_exporter/releases/download/v0.25.0/blackbox_exporter-0.25.0.linux-amd64.tar.gz

# Decompress the blackbox_exporter compressed file
cd /opt
tar -zxvf blackbox_exporter-0.25.0.linux-amd64.tar.gz

# Create the blackbox_exporter systemd service file
cat > /etc/systemd/system/blackbox_exporter.service << EOF
[Unit]
Description=Prometheus Blackbox Exporter
After=network.target

[Service]
User=root
ExecStart=/opt/blackbox_exporter-0.25.0.linux-amd64/blackbox_exporter --config.file=/opt/blackbox_exporter-0.25.0.linux-amd64/blackbox.yml

[Install]
WantedBy=multi-user.target
EOF

# Create the blackbox_exporter configuration file
cat > /opt/blackbox_exporter-0.25.0.linux-amd64/blackbox.yml <<EOF
modules:
  http_2xx:
    prober: http
    http:
      valid_http_versions: ["HTTP/1.1", "HTTP/2"]
      valid_status_codes: [200]
      method: GET
      preferred_ip_protocol: "ip4"
  http_post_2xx:
    prober: http
    http:
      method: POST
  tcp_connect:
    prober: tcp
  pop3s_banner:
    prober: tcp
    tcp:
      query_response:
      -expect: "^+ OK"
      tls: true
      tls_config:
        insecure_skip_verify: false
  grpc:
    prober: grpc
    grpc:
      tls: true
      preferred_ip_protocol: "ip4"
  grpc_plain:
    prober: grpc
    grpc:
      tls: false
      service: "service1"
  ssh_banner:
    prober: tcp
    tcp:
      query_response:
      -expect: "^SSH-2.0 -"
      -send: "SSH-2.0-blackbox-ssh-check"
  irc_banner:
    prober: tcp
    tcp:
      query_response:
      -send: "NICK prober"
      -send: "USER prober prober prober :prober"
      -expect: "PING :([^ ]+)"
        send: "PONG ${1}"
      -expect: "^:[^ ]+ 001"
  icmp:
    prober: icmp
  icmp_ttl5:
    prober: icmp
    timeout: 5s
    icmp:
      ttl: 5
EOF

# Reload systemd daemon
sudo systemctl daemon-reload

# Start the blackbox_exporter service
sudo systemctl start blackbox_exporter

# Check the blackbox_exporter service status
sudo systemctl status blackbox_exporter

# Set blackbox_exporter to start
sudo systemctl enable blackbox_exporter

prometheus Add Job Configuration

172.30.171.60:9115 listens for black_exporter

rule_files:
  #- "/etc/prometheus/first_rules.yml"
  -"/etc/prometheus/ssl_cert_alerts.yml"
scrape_configs:
  -job_name: 'SSL Certificate monitoring'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      -targets:
        -https://domain1.com
        -https://domain2.com
        -https://domain3.com
    relabel_configs:
      -source_labels: [__address__]
        target_label: __param_target
      -source_labels: [__param_target]
        target_label: instance
      -target_label: __address__
        replacement: 172.30.171.60:9115

prometheus Add Alarm Rules

vim ssl_cert_alerts.yml

groups:
-name: "SSL certificate expiration reminder"
  rules:
  -alert: "Certificate expiration <30 days"
    expr: {job = "SSL certificate monitoring"} - time() < 86400*30
    for: 0s
    labels:
      severity: "Prompt"
    annotations:
      summary: "{{ $labels.instance }} SSL certificate will expire in 30 days. Please renew it in time! "
      description: "{{ $labels.instance }} SSL certificate will expire in 30 days. Please renew it in time! "
  -alert: "Certificate expiration time <7 days"
    expr: {job = "SSL certificate monitoring"} - time() < 86400*7
    for: 0s
    labels:
      severity: "Alarm"
    annotations:
      summary: "{{ $labels.instance }} SSL certificate will expire in 7 days. Please renew it in time! "
      description: "{{ $labels.instance }} SSL certificate will expire in 7 days. Please renew it in time! "
  -alert: "Certificate expiration time <1 Day"
    expr: {job = "SSL certificate monitoring"} - time() < 86400*1
    for: 0s
    labels:
      severity: "disaster"
    annotations:
      summary: "The {{ $labels.instance }} SSL certificate will expire in one day. Please renew it in time! "
      description: "{{ $labels.instance }} The SSL certificate will expire in 1 day. Please renew it in time!

prometheus overload configuration

# Have a password
curl -X POST -u user1:password http://localhost:9090/-/reload
# No password
curl -X POST http://localhost:9090/-/reload

alertmanager adds alarm configuration

DingTalk alarm interface

docker run -d -p 8060:8060 \
--name webhook1 timonwong/prometheus-webhook-dingtalk:v1.4.0 \
--ding.profile="webhook1=https://oapi.dingtalk.com/robot/send? access_token = 511"

slack obtains the webhook address.
To get the Webhook address of a Slack, you need to perform the following steps:

  1. Log in to your Slack workspace.
  2. Go to the channel or private message you want to add the Webhook link.
  3. Click the settings icon (usually a gear or three dots) in a channel or private message.
  4. In the settings menu, select Automation or Additional Services or Integrations (names may vary).
  5. In the page, look for the "Incoming WebHook" option.
  6. Click the "Incoming WebHook" option, and then select "Add Incoming WebHook".
  7. In the pop-up window, you will see a unique Webhook URL. This is your Slack Webhook address.
  8. Copy the address and save it in the application or service that you need to send messages to Slack.

Please note that when using Webhook, please be careful to protect your Webhook URL and don’t expose it in public places or untrusted places, to prevent unauthorized people from sending messages to your Slack channel.

global:
    resolve_timeout: 5m
route:
    group_by: ['alertname']
    group_wait: 30s
    group_interval: 1m
    repeat_interval: 2m
    receiver: 'webhook1'
    routes:
    -receiver: "webhook1"
      continue: true
    -receiver: "slack"
receivers:
  -name: 'slack'
    slack_configs:
    -send_resolved: true
      username: 'Alertmanager'
      channel: '# operation'# Replace your Slack channel
      #title: "{{ range .Alerts }}{{ .Annotations.summary }}\n{{ end }}"
      text: "{{ range .Alerts }}{{ .Annotations.description }}\n{{ end }}"
      #text: "<! channel> \nsummary: {{ .CommonAnnotations.summary }}\ndescription: {{ .CommonAnnotations.description }}"
      api_url: 'https://hooks.slack.com/services/xxxxxxxx'# replace with your Slack Webhook URL
  -name: 'webhook1'
    webhook_configs:
      -url: 'http://172.30.171.61:8060/dingtalk/webhook1/send'
        send_resolved: true# indicates that a recovery alarm will be received after the service is restored
# When the sent alert notification matches the target_match and target_match_re rules,
# If a new alarm rule meets the source_match or defined match rule,
# And the sent alarm is exactly the same as the equal label defined in the newly generated alarm,
# The suppression mechanism is activated, and new alarms will not be sent  

alertmanager Rule Details

Single and double alarm

I added DingTalk double alarms because I use slack for work in mainland China but need a stable scientific network.

    # Single alarm
    receiver: 'webhook1'
    # Double alarm
    receiver: 'webhook1'
    routes:
    -receiver: "webhook1"
      continue: true
    -receiver: "slack"

slack alarm display problem

No text option slack does not display details, as shown below

[FIRING:4] certificate expiration time <30 days (SSL certificate monitoring prompt)

Plus

text: "{{ range .Alerts }}{{ .Annotations.description }}\n{{ end }}"

Show as and title click to jump to alertmanager

[FIRING:1] certificate expiration time <30 days (https://xxx.com SSL certificate monitoring prompt)
https://xxx.com SSL certificate will expire in 30 days, please pay attention to timely renewal!

Alarm Configuration Official Document: https://prometheus.io/docs/alerting/latest/notification_examples/

grafana Dashboard

Dashboards:https://grafana.com/grafana/dashboards/

Dashboard ID(SSL certificate monitoring):13230

Dashboard ID(HTTP status monitoring):13659

Dashboard ID(SSL TCP HTTP monitoring):9965

Use 9965 myself

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注