全局配置
参考官网:https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config
global:
# How frequently to scrape targets by default.
[ scrape_interval: | default = 1m ]
# How long until a scrape request times out.
[ scrape_timeout: | default = 10s ]
# How frequently to evaluate rules.
[ evaluation_interval: | default = 1m ]
# The labels to add to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
[ : ... ]
# Rule files specifies a list of globs. Rules and alerts are read from
# all matching files.
rule_files:
[ - ... ]
# A list of scrape configurations.
scrape_configs:
[ - ... ]
# Alerting specifies settings related to the Alertmanager.
alerting:
alert_relabel_configs:
[ - ... ]
alertmanagers:
[ - ... ]
# Settings related to the remote write feature.
remote_write:
[ - ... ]
# Settings related to the remote read feature.
remote_read:
[ - ... ]
更改指标标签
更改标签的时机:抓取前修改、抓取后修改、告警时修改
- 采集数据之前,通过
relabel_config
; - 采集数据之后,写入存储之前,通过
metric_relabel_configs
- 在告警前修改标签,通过
alert_relabel_configs
JOB配置
- job_name: prometheus
honor_labels: false
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
scrape_interval: 30s
relabel_configs:
- action: keep
source_labels:
- __meta_kubernetes_service_label_prometheus
regex: k8s
- source_labels:
- __meta_kubernetes_endpoint_address_target_kind
- __meta_kubernetes_endpoint_address_target_name
separator: ;
regex: Pod;(.*)
replacement: ${1}
target_label: pod
- source_labels:
- __meta_kubernetes_namespace
target_label: namespace
kubernetes_sd_configs
:使用这个配置可以自动发现 k8s 中 node、service、pod、endpoint、ingress,并为其添加监控,更多的内容可以直接查看官方文档。__meta_kubernetes_xxxxx具体什么意思都可以在官网找到。endpoints
:采用endpoints方式采集,每创建一个 service 就会创建一个对应的 endpoint,通过endpoint方式可以将service下所有的pod都采集到。- 下面配置的意思是只有 service 的标签包含 prometheus=k8s,k8s 才会对其对应的 endpoint 进行采集。所以我们后面要为 Prometheus 创建一个 service,并且要为这个 service 加上 prometheus: k8s 这样的标签。
- action: keep
source_labels:
- __meta_kubernetes_service_label_prometheus
regex: k8s
- 下面配置意识是如果 __meta_kubernetes_endpoint_address_target_kind 的值为 Pod,__meta_kubernetes_endpoint_address_target_name 的值为 prometheus-0,在它们之间加上一个 ; 之后,它们合起来就是 Pod;prometheus-0。使用正则表达式 Pod;(.*) 对其进行匹配,那么 ${1} 就是取第一个分组,它值就是 prometheus-0,最后将这个值交给 pod 这个标签。因此这一段配置就是为所有采集到的监控指标增加一个 pod=prometheus-0 的标签。如果 __meta_kubernetes_endpoint_address_target_kind 的值不是 Pod,那么不会添加任何标签。
- source_labels:
- __meta_kubernetes_endpoint_address_target_kind
- __meta_kubernetes_endpoint_address_target_name
separator: ;
regex: Pod;(.*)
replacement: ${1}
target_label: pod
- 没有指定 url,Prometheus 会采集默认的 url
/metrics
。
定义告警规则
groups:
- name: example
rules:
- alert: HighRequestLatency
expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
for: 10m
labels:
severity: page
annotations:
summary: High request latency
参考官网: https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/
for
:Prometheus将在每次发出警报之前检查警报在10分钟内是否继续处于活动状态labels
:允许指定一组附加标签来附加到警报。任何现有的冲突标签都将被覆盖。标签值可以模板化。annotations
:指定了一组信息标签,可用于存储更长的附加信息,例如警报说明或运行手册链接。注释值可以模板化。
模板化
标签和注释值可以使用控制台模板进行模板化。该$labels
变量保存警报实例的标签键/值对。可以通过$externalLabels
变量访问已组态的外部标签。该 $value
变量保存警报实例的评估值。
# To insert a firing element's label values:
{{ $labels. }}
# To insert the numeric expression value of the firing element:
{{ $value }}
例子:
groups:
- name: example
rules:
# Alert for any instance that is unreachable for >5 minutes.
- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
# Alert for any instance that has a median request latency >1s.
- alert: APIHighRequestLatency
expr: api_http_request_latencies_second{quantile="0.5"} > 1
for: 10m
annotations:
summary: "High request latency on {{ $labels.instance }}"
description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"