Alertmanager
是安装prometheus-operator
时默认新增的自定义资源类型(CRD
),我们可以直接在K8s中创建这样的资源。
创建alert-test.yaml
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
generation: 1
labels:
app: prometheus-operator-alertmanager
chart: prometheus-operator-8.2.4
heritage: Tiller
release: prometheus-operator
name: prometheus-operator-alertmanager-test
namespace: monitoring
spec:
baseImage: quay.io/prometheus/alertmanager
version: v0.19.0
portName: web
replicas: 1
retention: 120h
routePrefix: /
serviceAccountName: prometheus-operator-alertmanager
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
app: alertmanager
alertmanager: prometheus-operator-alertmanager-test
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
app: alertmanager
alertmanager: prometheus-operator-alertmanager-test
storage:
volumeClaimTemplate:
selector: {}
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: nfs-client
针对以上配置文件简要说明:
- 所有配置项可以从
stable/prometheus-operator/templates/alertmanager/alertmanager.yaml
获取,参考前文,prometheus-operator
环境是使用helm
安装的,可以通过命令helm fetch stable/prometheus-operator
将所有配置下载到本地,然后参考。helm安装会默认安装一个Alertmanager
服务,也是通过alertmanager.yaml
安装的。 kind
类型写Alertmanager,无需多言。metadata.name
指定你这个Alertmanager名称,可以通过命令查询
kubectl get alertmanager -n monitoring
spec.baseImage/version
需要指定,不然默认使用的镜像版本可能跟helm安装时使用的版本不一致,导致你需要重新下载,部署就非常慢。spec.storage
指定你新部署的Alertmanager存储,建议指定。spec.affinity
需要指定一些label,Alertmanager对象本质还是一个StatefulSet对象,后面你为Alertmanager对象创建Service时需要通过Label选择。spec.portName
指定你端口的名称,这个后面配置和Prometheus关联的时候需要。建议保持默认。metadata.namespace
指定命名空间,这个后面配置和Prometheus关联的时候需要。建议保持默认。spec.routePrefix
指定路径前缀,这个后面配置和Prometheus关联的时候需要。建议保持默认。
过命令或者dashborad创建Alertmanager
kubectl create -f alert-test.yaml
注意:
现在创建这个,肯定会报错,类似MountVolume.SetUp failed for volume "config-volume" : secrets "alertmanager-XXXX-xX" not found
原因:(参考:https://yunlzheng.gitbook.io/prometheus-book/part-iii-prometheus-shi-zhan/operator/use-operator-manage-monitor
)
这是由于Prometheus Operator
通过Statefulset
的方式创建的Alertmanager
实例,在默认情况下,会通过alertmanager-{ALERTMANAGER_NAME}
的命名规则去查找Secret
配置并以文件挂载的方式,将Secret
的内容作为配置文件挂载到Alertmanager
实例当中。因此,需要提前为Alertmanager
创建相应的配置内容。
参考前文Alertmanager配置
我们创建alertmanager.yaml
,template_1.tmpl
然后用命令创建secret
,secret
名称格式:alertmanager-{ALERTMANAGER_NAME}
,例如我们前文指定的Alertmanager
名称为prometheus-operator-alertmanager-test
,那么这里secret
名称为alertmanager-prometheus-operator-alertmanager-test
。
kubectl create secret generic alertmanager-prometheus-operator-alertmanager-test -n monitoring --from-file=alertmanager.yaml --from-file=template_1.tmpl
最后创建Alertmanager
创建Alertmanager的service
这里直接指定Service类型是NodePort,便于我们访问,实际应通过Ingress来做。
apiVersion: v1
kind: Service
metadata:
labels:
app: prometheus-operator-alertmanager
chart: prometheus-operator-8.2.4
heritage: Tiller
release: prometheus-operator
name: prometheus-operator-alertmanager-test
namespace: monitoring
spec:
ports:
- name: web
port: 9093
protocol: TCP
targetPort: 9093
selector:
alertmanager: prometheus-operator-alertmanager-test
app: alertmanager
sessionAffinity: None
type: NodePort
注意:这里通过
selector
来选择,和创建Alertmanager
配置中保持一致。
通过命令查询Service的映射端口
kubectl get svc -n monitoring
通过命令查询Service的映射端口,即可访问我们刚刚创建的Alertmanager.
现在Alertmanager上应该还没有任何通知,原因是还没有将我们创建的Alertmanager和Prometheus关联。
关联Prometheus
如何关联Prometheus呢?首先查看下prometheus-operator创建的Prometheus的配置,prometheus-operator也是通过自定义资源类型(CRD)prometheus来创建prometheus server的,直接通过命令查看。
kubectl get Prometheus prometheus-operator-prometheus -n monitoring -o yaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
creationTimestamp: "2019-11-28T02:42:48Z"
generation: 1
labels:
app: prometheus-operator-prometheus
chart: prometheus-operator-8.2.4
heritage: Tiller
release: prometheus-operator
name: prometheus-operator-prometheus
namespace: monitoring
resourceVersion: "6316434"
selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/prometheuses/prometheus-operator-prometheus
uid: de60d68f-6818-484d-ba30-4f381e7cb016
spec:
alerting:
alertmanagers:
- name: prometheus-operator-alertmanager
namespace: monitoring
pathPrefix: /
port: web
baseImage: quay.io/prometheus/prometheus
enableAdminAPI: false
externalUrl: http://prom.deri.com/
listenLocal: false
logFormat: logfmt
logLevel: info
paused: false
podMonitorNamespaceSelector: {}
podMonitorSelector:
matchLabels:
release: prometheus-operator
portName: web
replicas: 1
retention: 10d
routePrefix: /
ruleNamespaceSelector: {}
ruleSelector:
matchLabels:
app: prometheus-operator
release: prometheus-operator
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-operator-prometheus
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector:
matchLabels:
release: prometheus-operator
storage:
volumeClaimTemplate:
selector: {}
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
storageClassName: nfs-client
version: v2.13.1
Prometheus配置
spec.alerting.alertmanagers
就是指定Prometheus将告警发给哪些alertmanagers。spec.ruleSelector.matchLabels
通过标签关联用户创建的自定义PrometheusRule。spec.serviceMonitorSelector.matchLabels
通过标签关联用户创建的自定义ServiceMonitor
使用命令编辑已有的Prometheus服务配置
kubectl edit Prometheus prometheus-operator-prometheus -n monitoring
增加一个Alertmanager
的Endpoint
,其中name
、namespace
、pathPrefix
、port
和创建Alertmanager
配置保持一致。
spec:
alerting:
alertmanagers:
- name: prometheus-operator-alertmanager
namespace: monitoring
pathPrefix: /
port: web
- name: prometheus-operator-alertmanager-test
namespace: monitoring
pathPrefix: /
port: web