一、Prometheus 安装配置
1、下载地址:https://prometheus.io/download/, 后续的各种 exporter 也在这里下载。
2、进入解压后的目录找到配置文件 prometheus.yml
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
上面是 prometheus 监控自己的一个示例,剩下的按需配置就行。
3、启动
/home/soft/prometheus/prometheus --config.file="/home/soft/prometheus/prometheus.yml" &
上面的路径按实际情况修改即可;
prometheus 默认端口是 9090,浏览器访问 http://192.168.179.185:9090/ ,菜单栏找到 Status -> Targets 即可。
4、告警配置
进入 prometheus 新建个目录 rule 和 文件 node-alerts.yml
/home/soft/prometheus/rule/node-alerts.yml
修改文件内容
groups:
- name: 服务器监控指标
rules:
- alert: 实例存活告警
expr: up == 0 # expr 是计算公式,up指标可以获取到当前所有运行的Exporter实例以及其状态,即告警阈值为up==0
for: 30s # for语句会使 Prometheus 服务等待指定的时间, 然后执行查询表达式。(for 表示告警持续的时长,若持续时长小于该时间就不发给alertmanager了,大于该时>间再发。for的值不要小于prometheus中的scrape_interval,例如scrape_interval为30s,for为15s,如果触发告警规则,则再经过for时长后也一定会告警,这是因为最新的度量指>标还没有拉取,在15s时仍会用原来值进行计算。另外,要注意的是只有在第一次触发告警时才会等待(for)时长。)
labels: # labels语句允许指定额外的标签列表,把它们附加在告警上。
severity: Disaster
annotations: # annotations语句指定了另一组标签,它们不被当做告警实例的身份标识,它们经常用于存储一些额外的信息,用于报警信息的展示之类的。
summary: "节点失联"
- alert: "内存使用率告警"
expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes )) / node_memory_MemTotal_bytes * 100 > 75
for: 1m
labels:
user: prometheus
severity: warning
db: sql
annotations:
summary: "服务器: {{$labels.alertname}} 内存报警"
description: "{{ $labels.alertname }} 内存资源利用率大于75%!(当前值: {{ $value }}%)"
value: "{{ $value }}"
- alert: CPU使用率告警
expr: 100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[1m]) )) * 100 > 80
for: 1m
labels:
user: prometheus
severity: warning
annotations:
summary: "服务器: {{$labels.alertname}} CPU报警"
description: "服务器: CPU使用超过80%!(当前值: {{ $value }}%)"
value: "{{ $value }}"
- alert: 磁盘使用率告警
expr: (node_filesystem_size_bytes - node_filesystem_avail_bytes) / node_filesystem_size_bytes * 100 > 80
for: 1m
labels:
user: prometheus
severity: warning
annotations:
summary: "服务器: {{$labels.alertname}} 磁盘报警"
description: "服务器:{{$labels.alertname}},磁盘设备: 使用超过80%!(挂载点: {{ $labels.mountpoint }} 当前值: {{ $value }}%)"
value: "{{ $value }}"
- name: 数据库监控指标
- alert: SQL慢查询告警
expr: rate(mysql_global_status_slow_queries[5m]) > 10
for: 2m
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: 慢查询报警"
description: "服务器:{labels.instance}}: MySQL慢查询数量超过阈值,请检查优化!"
- alert: 连接数过高
expr: mysql_global_status_threads_connected > 100
for: 10m
labels:
severity: Critical
annotations:
summary: "连接数过高"
description: "MySQL连接数超过100,请及时处理!"
- alert: 死锁发生
expr: increase(mysql_global_status_innodb_row_lock_waits[5m]) > 0
for: 5m
labels:
severity: Critical
annotations:
summary: "死锁发生"
description: "MySQL发生死锁,请立即处理!"
- name: 微服务监控指标
rules:
- alert: Hikari连接告警
expr: hikaricp_connections{pool="HikariPool"} > 10
for: 5m
labels:
severity: Critical
annotations:
summary: "Hikari连接数过高"
description: "Hikari连接池{{ $labels.pool }}连接数超过50,请注意监控和优化!"
- alert: 请求超时告警
expr: http_server_requests_seconds{quantile="0.95"} > 1
for: 5m
labels:
severity: Warning
annotations:
summary: "HTTP请求处理时间过长"
description: "95th percentile HTTP请求处理时间超过1秒 (当前值: {{ $value }})"
- alert: 请求速率异常告警
expr: rate(http_server_requests_seconds_count{method="GET", status="2xx"}[1m]) > 100
for: 5m
labels:
severity: Warning
annotations:
summary: "接口请求速率异常"
description: "GET请求成功率超过100次/分钟,请检查接口性能!"
- alert: 堆内存使用量告警
expr: jvm_memory_used_bytes{area="heap"} / jvm_memory_max_bytes{area="heap"} > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "堆内存使用量告警"
description: "JVM 堆内存使用率超过 80%."
重启 prometheus ,浏览器访问 http://192.168.179.185:9090/ ,菜单栏找到 Alerts 即可。

5、启动/停止脚本
startup.sh
#!/bin/bash
/home/soft/prometheus/prometheus --config.file="/home/soft/prometheus/prometheus.yml" > /dev/null 2>&1 &
stop.sh
#!/bin/bash
service_name=prometheus
pid=$(pgrep -f "${service_name}")
if [ -n "${pid}" ]
then
echo "Stopping ${service_name}..."
kill -15 "${pid}"
echo "${service_name} has been stopped."
else
echo "${service_name} is not running."
fi
二、Grafana 安装配置
下载地址:https://grafana.com/grafana/download
这个比较简单,解压进入 bin 目录直接启动,浏览器访问 http://192.168.179.185:3000/ 默认用户名密码都是 admin 。
nohup ./grafana-server > /dev/null 2>&1 &
配置 Prometheus


其他的默认直接保存就行。
导入 exporter

三、各种 exporter 应用
介绍两个比较常见的 node_exporter 和 mysqld_exporter
1、node_exporter 配置
进入解压目录直接启动即可
/home/soft/exporter/node_exporter/node_exporter &
在 prometheus 中添加 job,修改 /home/soft/prometheus/prometheus.yml,然后重启 prometheus
(如果需监控多台,在对应服务器上启node_exporter,再添加对应的JOB即可)
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "node_exporter"
static_configs:
- targets: ["192.168.179.185:9100"]
2、mysqld_exporter 配置
进入解压目录配置 my.cnf 文件,没有的话自己建一个
[client]
port=3306
user=exporter
password=123456
host=192.168.170.243
启动 mysqld_exporter,注意这里配置的端口是 9104(如果需要监控多台,启多个脚本监听不同端口,再添加对应的JOB即可)
/home/soft/exporter/mysqld_exporter/mysqld_exporter --web.listen-address=:9104 --config.my-cnf=/home/soft/exporter/mysqld_exporter/my.cnf &
在 prometheus 中添加 job,修改 /home/soft/prometheus/prometheus.yml,然后重启 prometheus
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "mysql_exporter"
static_configs:
- targets: ["192.168.179.185:9104"]
- job_name: "node_exporter"
static_configs:
- targets: ["192.168.179.185:9100"]
3、监控微服务配置
添加依赖
<!-- 开启springboot的应用监控 -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<!-- 增加prometheus整合 -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
<version>1.9.0</version>
</dependency>
修改yml
#开启SpringBoot Admin的监控
management:
endpoints:
promethus:
enable: true
web:
exposure:
include: '*'
endpoint:
health:
show-details: always
在 prometheus 中添加 job,修改 /home/soft/prometheus/prometheus.yml,然后重启 prometheus
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "jb_service"
scrape_interval: 1s
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ["192.168.179.185:28085"]

1118

被折叠的 条评论
为什么被折叠?



