mysql日志类型
redo log |
重做日志 | 确保事务的持久性 |
undo log |
回滚日志 | 用于回滚 |
bin log |
二进制日志 | 主从复制 |
relay log |
中继日志 | 主从复制 |
error log |
错误日志 | |
slow query log |
慢查询日志 | |
general log |
一般查询日志 |
percona-toolkit
介绍percona-toolkit
是一组高级命令行工具的集合,用来执行各种通过手工执行非常复杂和麻烦的mysql和系统任务,这些任务包括:
percona-toolkit
工具中最主要的三个组件分别是:
pt-table-checksum
负责监测mysql主从数据一致性pt-table-sync
负责当主从数据不一致时修复数据,让它们保存数据的一致性pt-heartbeat
负责监控mysql主从同步延迟percona-toolkit
安装# 安装依赖
yum install perl-IO-Socket-SSL perl-DBD-MySQL perl-Time-HiRes perl perl-DBI -y
wget https://www.percona.com/downloads/percona-toolkit/2.2.7/RPM/percona-toolkit-2.2.7-1.noarch.rpm
rpm -ivh percona-toolkit-2.2.7-1.noarch.rpm
percona-toolkit
使用pt-table-checksum
是 Percona-Toolkit
的组件之一,用于检测MySQL主、从库的数据是否一致。其原理是在主库执行基于statement
的sql语句来生成主库数据块的checksum
,把相同的sql语句传递到从库执行,并在从库上计算相同数据块的checksum
,最后,比较主从库上相同数据块的checksum
值,由此判断主从数据是否一致。
为了减少对数据库的干预,pt-table-checksum还会自动侦测并连接到从库,当然如果失败,可以指定–recursion-method选项来告诉从库在哪里。它的易用性还体现在,复制若有延迟,在从库 checksum 会暂停直到赶上主库的计算时间点。
为了保证主数据库服务的安全,该工具实现了许多保护措施:
innodb_lock_wait_timeout
为1s
,避免引起pt-table-checksum
会暂停。可以设置 --max-load
选项来设置这个阀值Ctrl+C
停止任务后,工具会正常的完成当前chunk
检测,下次使用--resume
选项启动可以恢复继续下一个chunk
-- 工具在主库上执行,IP写主库IP
mysql> GRANT SELECT, PROCESS, SUPER, REPLICATION SLAVE,CREATE,DELETE,INSERT,UPDATE ON *.* TO 'pt'@'192.168.41.141' identified by '123456';
mysql> flush privileges;
在主(master
)上通过执行校验的查询对复制的一致性进行检查,对比主从的校验值,从而产生结果。
注意:第一次运行的时候需要加上
--create-replicate-table
参数,生成checksums
表!
常用参数解释:
--nocheck-replication-filters |
不检查复制过滤器,建议启用。 |
--no-check-binlog-format |
不检查复制的binlog模式,要是binlog模式是ROW,则会报错。 |
--replicate-check-only |
只显示不同步的信息。 |
--replicate= |
把checksum的信息写入到指定表中,建议直接写到被检查的数据库当中。 |
--databases= |
指定需要被检查的数据库,多个则用逗号隔开。 |
--tables= |
指定需要被检查的表,多个用逗号隔开 |
--recursion-method |
指定复制检查的方式,默认为processlist,hosts |
h= |
Master的地址 |
u= |
用户名 |
p= |
密码 |
P= |
端口 |
-- 从库需要开启slave;
start slave;
show slave status\G
-- 主库processlist可以查到slave
mysql> show processlist;
+--------+-------------+---------------+------+------------------+--------+---------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+--------+-------------+---------------+------+------------------+--------+---------------------------------------------------------------+------------------+
| 2 | system user | | NULL | Connect | 589655 | Slave has read all relay log; waiting for more updates | NULL |
| 2049 | slave | mysql02:37382 | NULL | Binlog Dump GTID | 587611 | Master has sent all binlog to slave; waiting for more updates | NULL |
| 588590 | root | localhost | NULL | Query | 0 | starting | show processlist |
+--------+-------------+---------------+------+------------------+--------+---------------------------------------------------------------+------------------+
3 rows in set (0.03 sec)
# 执行检查命令,注意第一次添加--create-replicate-table参数创建表
[root@mysql01 ~]# pt-table-checksum --nocheck-replication-filters --no-check-binlog-format --replicate=test2.checksums --create-replicate-table --databases=test h=192.168.41.141,u=pt,p=123456,P=3306
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
08-05T17:35:54 0 1 13 1 0 0.091 test.t_department
08-05T17:35:54 0 0 8 1 0 0.202 test.testttt
# 结果字段说明
# TS :完成检查的时间。
# ERRORS :检查时候发生错误和警告的数量。
# DIFFS :0表示一致,1表示不一致。当指定--no-replicate-check时,会一直为0,当指定--replicate-check-only会显示不同的信息。
# ROWS :表的行数。
# CHUNKS :被划分到表中的块的数目。
# SKIPPED :由于错误或警告或过大,则跳过块的数目。
# TIME :执行的时间。
# TABLE :被检查的表名。
[root@mysql01 ~]# pt-table-checksum --nocheck-replication-filters --no-check-binlog-format --replicate=test2.checksums --no-replicate-check --databases=test h=192.168.41.141,u=pt,p=123456,P=3306
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
08-06T09:30:43 0 0 13 1 0 0.023 test.t_department
08-06T09:30:43 0 0 8 1 0 0.033 test.testttt
[root@mysql01 ~]# pt-table-checksum --nocheck-replication-filters --no-check-binlog-format --replicate=test2.checksums --replicate-check-only --databases=test h=192.168.41.141,u=pt,p=123456,P=3306
Differences on mysql02
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
test.t_department 1 0 1
pt-table-sync
: 高效的同步MySQL表之间的数据,他可以做单向和双向同步的表数据。他可以同步单个表,也可以同步整个库。它不同步表结构、索引、或任何其他模式对象。所以在修复一致性之前需要保证他们表存在。
# 参数解释:
# --replicate= : 指定通过pt-table-checksum得到的表,这2个工具差不多都会一直用。
# --databases= : 指定执行同步的数据库。
# --tables= : 指定执行同步的表,多个用逗号隔开。
# --sync-to-master :指定一个DSN,即从的IP,他会通过show processlist或show slave status 去自动的找主。
# h= : 服务器地址,命令里有2个ip,第一次出现的是Master的地址,第2次是Slave的地址。
# u= : 帐号。
# p= : 密码。
# --print : 打印,但不执行命令。打印出来修复数据的sql语句,可以手动的在slave从库上执行,让他们数据保持一致性
# --execute : 执行命令。
[root@mysql01 ~]# pt-table-sync --replicate=test2.checksums h=192.168.41.141,u=pt,p=123456 h=192.168.41.142,u=pt,p=123456 --print
REPLACE INTO `test`.`t_department`(`department_id`, `department_name`, `parent_id`, `level`, `create_time`, `update_time`) VALUES ('7', '8', '0', '1', '2020-07-28 16:06:40', '2020-07-28 16:06:40') /*percona-toolkit src_db:test src_tbl:t_department src_dsn:h=192.168.41.141,p=...,u=pt dst_db:test dst_tbl:t_department dst_dsn:h=mysql02,p=...,u=pt lock:1 transaction:1 changing_src:test2.checksums replicate:test2.checksums bidirectional:0 pid:114329 user:root host:mysql01*/;
[root@mysql01 ~]# pt-table-sync --replicate=test2.checksums h=192.168.41.141,u=pt,p=123456 h=192.168.41.142,u=pt,p=123456 --print --execute
REPLACE INTO `test`.`t_department`(`department_id`, `department_name`, `parent_id`, `level`, `create_time`, `update_time`) VALUES ('7', '8', '0', '1', '2020-07-28 16:06:40', '2020-07-28 16:06:40') /*percona-toolkit src_db:test src_tbl:t_department src_dsn:h=192.168.41.141,p=...,u=pt dst_db:test dst_tbl:t_department dst_dsn:h=mysql02,p=...,u=pt lock:1 transaction:1 changing_src:test2.checksums replicate:test2.checksums bidirectional:0 pid:115260 user:root host:mysql01*/;
# 同步之后再次查看,主从数据已经同步
[root@mysql01 ~]# pt-table-checksum --nocheck-replication-filters --no-check-binlog-format --replicate=test2.checksums --databases=test h=192.168.41.141,u=pt,p=123456,P=3306
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
08-06T09:36:23 0 0 13 1 0 0.293 test.t_department
08-06T09:36:23 0 0 8 1 0 0.018 test.testttt
可以编写脚本定时同步主备库数据
#!/bin/bash
NUM=$(/usr/bin/pt-table-checksum --nocheck-replication-filters --no-check-binlog-format --replicate=test2.checksums --databases=test h=192.168.41.141,u=pt,p=123456,P=3306|awk -F" " '{print $3}'|sed -n '2p')
if [ $NUM -eq 1 ];then
/usr/bin/pt-table-sync --replicate=test2.checksums h=192.168.41.141,u=pt,p=123456 h=192.168.41.142,u=pt,p=123456 --print --execute
else
echo "data is ok"
fi
对于MySQL数据库主从复制延迟的监控,可以借助pt-heartbeat
来实现。
pt-heartbeat
的工作原理通过使用时间戳方式在主库上更新特定表,然后在从库上读取被更新的时间戳然后与本地系统时间对比来得出其延迟。具体流程:
heartbeat
表,按照一定的时间频率更新该表的字段(把时间更新进去)。监控操作运行后,heartbeat
表能促使主从同步!mysql
主从同步时难免出现问题导致数据不一致情况.此时需要手动将主从数据同步.
Maatkit
是一个开源的工具包,为mySQL
日常管理提供了帮助,它包含很多工具,这里主要说下面两个:
mk-table-checksum
: 用来检测master
和slave
上的表结构和数据是否一致的;mk-table-sync
: 在主从数据不一致时,用来修复数据的;先主后从有效保证表一致的工具,不必重载从表而能够保证一致。上面两个perl脚本在运行时都会锁表,表的大小取决于执行的快慢,勿在高峰期间运行,可选择凌晨
# 安装依赖
yum install perl-IO-Socket-SSL perl-DBD-MySQL perl-Time-HiRes perl perl-DBI -y
yum install perl-ExtUtils-Embed perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker -y
yum -y install perl-Digest-MD5
maatkit
vbi1
)maatkit
# 解压进入目录
tar -zvxf maatkit-7540.tar.gz && cd maatkit-7540
[root@mysql01 maatkit-7540]# perl Makefile.PL
Checking if your kit is complete...
Looks good
Writing Makefile for maatkit
# 安装
make && make install
# 测试查看mk-相关命令
[root@mysql01 maatkit-7540]# mk-
mk-archiver mk-duplicate-key-checker mk-heartbeat mk-log-player mk-profile-compact mk-query-profiler mk-slave-move mk-table-sync mk-variable-advisor
mk-checksum-filter mk-error-log mk-index-usage mk-merge-mqd-results mk-purge-logs mk-show-grants mk-slave-prefetch mk-table-usage mk-visual-explain
mk-config-diff mk-fifo-split mk-kill mk-parallel-dump mk-query-advisor mk-slave-delay mk-slave-restart mk-tcp-model
mk-deadlock-logger mk-find mk-loadavg mk-parallel-restore mk-query-digest mk-slave-find mk-table-checksum mk-upgrade
mysql
检查帐号-- 主从服务器上创建只有SELECT权限的帐号
mysql> GRANT SELECT ON *.* TO 'data_check'@'%' IDENTIFIED BY '123456';
Query OK, 0 rows affected (0.01 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.01 sec)
# 第一个h: 填写主库ip
# 第一个u: 主库授权的用户名
# 第一个p: 主库授权的密码
# 第二个h: 填写从库ip
# 第二个u: 从库授权的用户名
# 第二个p: 从库授权的密码
# 大写P: mysql的端口
[root@mysql01 maatkit-7540]# mk-table-checksum h=192.168.41.141,u=data_check,p=123456,P=3306 h=192.168.41.142,u=data_check,p=123456,P=3306
DATABASE TABLE CHUNK HOST ENGINE COUNT CHECKSUM TIME WAIT STAT LAG
test t_department 0 192.168.41.141 InnoDB NULL 83657212 0 0 NULL NULL
test t_department 0 192.168.41.142 InnoDB NULL 3344593131 0 0 NULL NULL
test testttt 0 192.168.41.141 InnoDB NULL 3722958311 0 0 NULL NULL
test testttt 0 192.168.41.142 InnoDB NULL 3722958311 0 0 NULL NULL
DATABASE TABLE CHUNK HOST ENGINE COUNT CHECKSUM TIME WAIT STAT LAG
mysql columns_priv 0 192.168.41.141 MyISAM NULL 0 0 0 NULL NULL
mysql columns_priv 0 192.168.41.142 MyISAM NULL 0 0 0 NULL NULL
mysql db 0 192.168.41.141 MyISAM NULL 2629905140 0 0 NULL NULL
mysql db 0 192.168.41.142 MyISAM NULL 2629905140 0 0 NULL NULL
mysql engine_cost 0 192.168.41.141 InnoDB NULL 2678612005 0 0 NULL NULL
...
# 参数-d(或者--database) 指定要检查的数据库,多个数据库之间用逗号隔开,如果不加-d,校验的是所有库;
# --count 会计算出表的行数
[root@mysql01 maatkit-7540]# mk-table-checksum h=192.168.41.141,u=data_check,p=123456,P=3306 h=192.168.41.142,u=data_check,p=123456,P=3306 --count -dtest
DATABASE TABLE CHUNK HOST ENGINE COUNT CHECKSUM TIME WAIT STAT LAG
test t_department 0 192.168.41.141 InnoDB 13 5e339d70 0 0 NULL NULL
test t_department 0 192.168.41.142 InnoDB 13 7a4c9153 0 0 NULL NULL
test testttt 0 192.168.41.141 InnoDB 8 41db7636 0 0 NULL NULL
test testttt 0 192.168.41.142 InnoDB 8 41db7636 0 0 NULL NULL
# 结果参数说明
# DATABASE 检查的库名
# TABLE: 检查的库中的表名
# CHUNK: checksum时的近似数值
# HOST: 主从MYSQL的地址
# ENGINE: 表引擎
# COUNT: 表的行数
# CHECKSUM: 校验值
# TIME: 所用时间
# WAIT: 等待时间
# STAT: MASTER_POS_WAIT()返回值
# LAG: slave的延时时间
# mk-checksum-filter过滤出不一样的结果
[root@mysql01 maatkit-7540]# mk-table-checksum h=192.168.41.141,u=data_check,p=123456,P=3306 h=192.168.41.142,u=data_check,p=123456,P=3306 --count -dtest | mk-checksum-filter
test t_department 0 192.168.41.141 InnoDB 13 5e339d70 0 0 NULL NULL
test t_department 0 192.168.41.142 InnoDB 13 7a4c9153 0 0 NULL NULL
mk-table-sync
用来修复多个实例之间数据的不一致. mk-table-sync
的工作方式是:先一行一行检查主从库的表是否一样,如果哪里不一样,就执行删除,更新,插入等操作,使其达到一致。
由于上面在mk-table-checksum
检查时用的data_check
只有select
权限,权限太小,不能用于mk-table-sync
修复数据只用。
所以还需要在主库和从库数据库里创建用于mk-table-sync
修复数据之用的账号权限
mysql> GRANT ALL ON *.* to mksync@'%' identified by "123456";
Query OK, 0 rows affected (0.01 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
# 注意:主从数据库表和表结构必须一致,否则会报错!
# --print 显示同步需要执行的语句
# --execute 执行数据同步
# --charset=utf8mb4 设置字符集,避免从库乱码。
# --no-check-slave 如果不加这个参数,会提示报错,因为此工具默认不允许在从库上修改数据
# -t 指定同步的表名,多个表时可以在-t后加=号,即-t=tables1,tables2,..
# > /root/result 命令后面加上 > /root/result 将结果导出
[root@mysql01 maatkit-7540]# mk-table-sync --execute --charset=utf8mb4 --print --no-check-slave --databases test h=192.168.41.141,u=mksync,p=123456,P=3306 h=192.168.41.142,u=mksync,p=123456,P=3306
UPDATE `test`.`t_department` SET `department_name`='44', `parent_id`='0', `level`='1', `create_time`='2020-07-28 16:06:24', `update_time`='2020-07-28 16:06:24' WHERE `department_id`='3' LIMIT 1 /*maatkit src_db:test src_tbl:t_department src_dsn:A=utf8mb4,P=3306,h=192.168.41.141,p=...,u=mksync dst_db:test dst_tbl:t_department dst_dsn:A=utf8mb4,P=3306,h=192.168.41.142,p=...,u=mksync lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:3869 user:root host:mysql01*/;
# 修复之后检查主从,通过CHECKSUM可以看出,数据已经完全一样了
[root@mysql01 maatkit-7540]# mk-table-checksum h=192.168.41.141,u=data_check,p=check@123,P=3306 h=192.168.41.142,u=data_check,p=check@123,P=3306 --count -dtest
DATABASE TABLE CHUNK HOST ENGINE COUNT CHECKSUM TIME WAIT STAT LAG
test t_department 0 192.168.41.141 InnoDB 13 5e339d70 0 0 NULL NULL
test t_department 0 192.168.41.142 InnoDB 13 5e339d70 0 0 NULL NULL
test testttt 0 192.168.41.141 InnoDB 8 41db7636 0 0 NULL NULL
test testttt 0 192.168.41.142 InnoDB 8 41db7636 0 0 NULL NULL
Redis基于Reactor模式开发了网络事件处理器,这个处理器被称为文件事件处理器。它的组成结构为4部分:多个套接字、IO多路复用程序、文件事件分派器、事件处理器。因为文件事件分派器队列的消费是单线程的,所以Redis才叫单线程模型。
Redis 4.0 开始就有多线程的概念了,比如 Redis 通过多线程方式在后台删除对象、以及通过 Redis 模块实现的阻塞命令等。
Redis 6.0 网络处理多线程,指的是在网络 IO 处理方面上了多线程,如网络数据的读写和协议解析等,需要注意的是,执行命令的核心模块还是单线程的。
Redis 的瓶颈并不在 CPU,而在内存和网络。内存不够的话,可以加内存或者做数据结构优化和其他优化等,但网络的性能优化才是大头,网络 IO 的读写在 Redis 整个执行期间占用了大部分的 CPU 时间,如果把网络处理这部分做成多线程处理方式,那对整个 Redis 的性能会有很大的提升。
-
prometheus
监控插件mysqld_exporter
可以采集mysql
指标并告警展示.
本文使用docker
部署的方式
CREATE USER 'exporter'@'%' IDENTIFIED BY 'exporter#2020';
GRANT PROCESS, REPLICATION CLIENT ON *.* TO 'exporter'@'%';
GRANT SELECT ON performance_schema.* TO 'exporter'@'%';
-- GRANT ALL ON *.* TO 'exporter'@'%';
mysqld_exporter
docker run -d \
--name mysql_exporter \
--restart always \
-p 9104:9104 \
-e DATA_SOURCE_NAME="exporter:123456@(192.168.41.141:3306)/" \
prom/mysqld-exporter
如果没有问题,可以将指标采集到prometheus
prometheus
配置文件,scrape_configs
下面添加prometheus
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
rule_files:
# - "rules.yml"
# - "second_rules.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['192.168.41.128:9090']
- job_name: 'mysql'
static_configs:
- targets: ['192.168.41.141:9104']
# 启动
docker run --name=prometheus -d -p 9090:9090 -v /root/prometheus/config/prometheus.yml:/etc/prometheus/prometheus.yml hub.deri.org.cn/k8s_monitor/prometheus:latest
# vim /usr/local/prometheus/prometheus.yml
scrape_configs:
- job_name: 'mysql'
static_configs:
- targets: ['192.168.41.141:9104']
注意:如果是采用
prometheus-operator
方式安装prometheus
,新增target
示例:
新增mysql-exporter-endpoints.yaml
,引入外部服务
apiVersion: v1
kind: Endpoints
metadata:
name: mysql-exporter
namespace: monitoring
subsets:
- addresses:
- ip: 192.168.1.47
targetRef:
name: huawei-mysql
namespace: default
ports:
- name: metrics
port: 9104
protocol: TCP
新增mysql-exporter-service.yaml
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: "true"
labels:
app: prometheus-mysql-exporter
chart: prometheus-mysql-exporter
heritage: Tiller
jobLabel: mysql-exporter
release: prometheus-operator
# name与endpoints name保持一致
name: mysql-exporter
namespace: monitoring
spec:
ports:
- name: metrics
port: 9104
targetPort: 9104
protocol: TCP
type: ClusterIP
新增mysql-exporter-ServiceMonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
# 注意标签 当前版本prometheus-operator是通过release: prometheus-operator标签来发现ServiceMonitor的
labels:
app: prometheus-operator-mysql-exporter
chart: prometheus-operator-8.2.4
heritage: Tiller
release: prometheus-operator
name: prometheus-operator-mysql-exporter
namespace: monitoring
spec:
endpoints:
- port: metrics
jobLabel: jobLabel
selector:
# 通过标签发现service
matchLabels:
app: prometheus-mysql-exporter
release: prometheus-operator
kubectl apply -f .
# 新增上面三个服务之后,prometheus页面就可以发现新增的mysql-exporter.
# prometheus数据库中也能查询到mysql指标数据
groups:
- name: GaleraAlerts
rules:
- alert: MySQLGaleraNotReady
expr: mysql_global_status_wsrep_ready != 1
for: 5m
labels:
severity: warning
annotations:
description: '{{$labels.job}} on {{$labels.instance}} is not ready.'
summary: Galera cluster node not ready
- alert: MySQLGaleraOutOfSync
expr: (mysql_global_status_wsrep_local_state != 4 and mysql_global_variables_wsrep_desync
== 0)
for: 5m
labels:
severity: warning
annotations:
description: '{{$labels.job}} on {{$labels.instance}} is not in sync ({{$value}}
!= 4).'
summary: Galera cluster node out of sync
- alert: MySQLGaleraDonorFallingBehind
expr: (mysql_global_status_wsrep_local_state == 2 and mysql_global_status_wsrep_local_recv_queue
> 100)
for: 5m
labels:
severity: warning
annotations:
description: '{{$labels.job}} on {{$labels.instance}} is a donor (hotbackup)
and is falling behind (queue size {{$value}}).'
summary: xtradb cluster donor node falling behind
- alert: MySQLReplicationNotRunning
expr: mysql_slave_status_slave_io_running == 0 or mysql_slave_status_slave_sql_running
== 0
for: 2m
labels:
severity: critical
annotations:
description: Slave replication (IO or SQL) has been down for more than 2 minutes.
summary: Slave replication is not running
- alert: MySQLReplicationLag
expr: (mysql_slave_lag_seconds > 30) and on(instance) (predict_linear(mysql_slave_lag_seconds[5m],
60 * 2) > 0)
for: 1m
labels:
severity: critical
annotations:
description: The mysql slave replication has fallen behind and is not recovering
summary: MySQL slave replication is lagging
- alert: MySQLReplicationLag
expr: (mysql_heartbeat_lag_seconds > 30) and on(instance) (predict_linear(mysql_heartbeat_lag_seconds[5m],
60 * 2) > 0)
for: 1m
labels:
severity: critical
annotations:
description: The mysql slave replication has fallen behind and is not recovering
summary: MySQL slave replication is lagging
- alert: MySQLInnoDBLogWaits
expr: rate(mysql_global_status_innodb_log_waits[15m]) > 10
labels:
severity: warning
annotations:
description: The innodb logs are waiting for disk at a rate of {{$value}} /
second
summary: MySQL innodb log writes stalling
# 告警规则示例
groups:
- name: MySQLStatsAlert
rules:
- alert: MySQL is down
expr: mysql_up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} MySQL is down"
description: "MySQL database is down. This requires immediate action!"
- alert: open files high
expr: mysql_global_status_innodb_num_open_files > (mysql_global_variables_open_files_limit) * 0.75
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} open files high"
description: "Open files is high. Please consider increasing open_files_limit."
- alert: Read buffer size is bigger than max. allowed packet size
expr: mysql_global_variables_read_buffer_size > mysql_global_variables_slave_max_allowed_packet
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} Read buffer size is bigger than max. allowed packet size"
description: "Read buffer size (read_buffer_size) is bigger than max. allowed packet size (max_allowed_packet).This can break your replication."
- alert: Sort buffer possibly missconfigured
expr: mysql_global_variables_innodb_sort_buffer_size <256*1024 or mysql_global_variables_read_buffer_size > 4*1024*1024
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} Sort buffer possibly missconfigured"
description: "Sort buffer size is either too big or too small. A good value for sort_buffer_size is between 256k and 4M."
- alert: Thread stack size is too small
expr: mysql_global_variables_thread_stack <196608
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} Thread stack size is too small"
description: "Thread stack size is too small. This can cause problems when you use Stored Language constructs for example. A typical is 256k for thread_stack_size."
- alert: Used more than 80% of max connections limited
expr: mysql_global_status_max_used_connections > mysql_global_variables_max_connections * 0.8
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} Used more than 80% of max connections limited"
description: "Used more than 80% of max connections limited"
- alert: InnoDB Force Recovery is enabled
expr: mysql_global_variables_innodb_force_recovery != 0
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} InnoDB Force Recovery is enabled"
description: "InnoDB Force Recovery is enabled. This mode should be used for data recovery purposes only. It prohibits writing to the data."
- alert: InnoDB Log File size is too small
expr: mysql_global_variables_innodb_log_file_size < 16777216
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} InnoDB Log File size is too small"
description: "The InnoDB Log File size is possibly too small. Choosing a small InnoDB Log File size can have significant performance impacts."
- alert: InnoDB Flush Log at Transaction Commit
expr: mysql_global_variables_innodb_flush_log_at_trx_commit != 1
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} InnoDB Flush Log at Transaction Commit"
description: "InnoDB Flush Log at Transaction Commit is set to a values != 1. This can lead to a loss of commited transactions in case of a power failure."
- alert: Table definition cache too small
expr: mysql_global_status_open_table_definitions > mysql_global_variables_table_definition_cache
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} Table definition cache too small"
description: "Your Table Definition Cache is possibly too small. If it is much too small this can have significant performance impacts!"
- alert: Table open cache too small
expr: mysql_global_status_open_tables >mysql_global_variables_table_open_cache * 99/100
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} Table open cache too small"
description: "Your Table Open Cache is possibly too small (old name Table Cache). If it is much too small this can have significant performance impacts!"
- alert: Thread stack size is possibly too small
expr: mysql_global_variables_thread_stack < 262144
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} Thread stack size is possibly too small"
description: "Thread stack size is possibly too small. This can cause problems when you use Stored Language constructs for example. A typical is 256k for thread_stack_size."
- alert: InnoDB Buffer Pool Instances is too small
expr: mysql_global_variables_innodb_buffer_pool_instances == 1
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} InnoDB Buffer Pool Instances is too small"
description: "If you are using MySQL 5.5 and higher you should use several InnoDB Buffer Pool Instances for performance reasons. Some rules are: InnoDB Buffer Pool Instance should be at least 1 Gbyte in size. InnoDB Buffer Pool Instances you can set equal to the number of cores of your machine."
- alert: InnoDB Plugin is enabled
expr: mysql_global_variables_ignore_builtin_innodb == 1
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} InnoDB Plugin is enabled"
description: "InnoDB Plugin is enabled"
- alert: Binary Log is disabled
expr: mysql_global_variables_log_bin != 1
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} Binary Log is disabled"
description: "Binary Log is disabled. This prohibits you to do Point in Time Recovery (PiTR)."
- alert: Binlog Cache size too small
expr: mysql_global_variables_binlog_cache_size < 1048576
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} Binlog Cache size too small"
description: "Binlog Cache size is possibly to small. A value of 1 Mbyte or higher is OK."
- alert: Binlog Statement Cache size too small
expr: mysql_global_variables_binlog_stmt_cache_size <1048576 and mysql_global_variables_binlog_stmt_cache_size > 0
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} Binlog Statement Cache size too small"
description: "Binlog Statement Cache size is possibly to small. A value of 1 Mbyte or higher is typically OK."
- alert: Binlog Transaction Cache size too small
expr: mysql_global_variables_binlog_cache_size <1048576
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} Binlog Transaction Cache size too small"
description: "Binlog Transaction Cache size is possibly to small. A value of 1 Mbyte or higher is typically OK."
- alert: Sync Binlog is enabled
expr: mysql_global_variables_sync_binlog == 1
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} Sync Binlog is enabled"
description: "Sync Binlog is enabled. This leads to higher data security but on the cost of write performance."
- alert: IO thread stopped
expr: mysql_slave_status_slave_io_running != 1
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} IO thread stopped"
description: "IO thread has stopped. This is usually because it cannot connect to the Master any more."
- alert: SQL thread stopped
expr: mysql_slave_status_slave_sql_running == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} SQL thread stopped"
description: "SQL thread has stopped. This is usually because it cannot apply a SQL statement received from the master."
- alert: SQL thread stopped
expr: mysql_slave_status_slave_sql_running != 1
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} Sync Binlog is enabled"
description: "SQL thread has stopped. This is usually because it cannot apply a SQL statement received from the master."
- alert: Slave lagging behind Master
expr: rate(mysql_slave_status_seconds_behind_master[1m]) >30
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} Slave lagging behind Master"
description: "Slave is lagging behind Master. Please check if Slave threads are running and if there are some performance issues!"
- alert: Slave is NOT read only(Please ignore this warning indicator.)
expr: mysql_global_variables_read_only != 0
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} Slave is NOT read only"
description: "Slave is NOT set to read only. You can accidentally manipulate data on the slave and get inconsistencies..."
如果需要,部署grafana
docker run -d -p 3000:3000 --name grafana hub.deri.org.cn/k8s_monitor/grafana:6.4.2
Grafana dashboard
一些可用的ID
,可直接通过ID导入dashboard
.
6239
7362
11329
10654
11323
docker
镜像mysql:5.7.18
;mysql
配置文件目录/etc/mysql/
;#启动,并设置root密码
docker run -d --name mysql -e MYSQL_ROOT_PASSWORD=123456 hub.deri.org.cn/library/mysql:5.7
#复制
docker cp mysql:/etc/mysql /root
#配置文件目录结构
[root@mysql01 mysql]# tree
.
├── conf.d
│ ├── docker.cnf
│ └── mysql.cnf
├── my.cnf -> /etc/alternatives/my.cnf
├── my.cnf.fallback
├── mysql.cnf
└── mysql.conf.d
└── mysqld.cnf
2 directories, 6 files
mysql data
目录,便于将docker
中数据持久化到宿主机-- 登录
docker exec -it mysql bash
mysql -uroot -p123456
-- 查询
mysql> show variables like '%datadir%';
+---------------+-----------------+
| Variable_name | Value |
+---------------+-----------------+
| datadir | /var/lib/mysql/ |
+---------------+-----------------+
1 row in set (0.01 sec)
mysql.cnf
#参考mysql主从配置
[mysqld]
server-id=1 # 主从设置不一样
log-bin=mysql-bin #备无需开启
auto-increment-increment=2
auto-increment-offset=2 # 主从设置不一样:1,2
log-slave-updates=true
#replicate-ignore-db=information_schema
max_binlog_size=1024M
binlog-format=ROW
#binlog_rows_query_log_events=on # 在row 模式的binlog中包含SQL EVENTS(即SQL语句也会保留)
#master-info-repository=TABLE
#relay-log-info-repository=TABLE
#slave-skip-errors=all # 忽略复制产生的错误
#slave-skip-errors=1062,1032,1060 # 跳过已知错误,主键冲突、表已存在等错误代码如1062,1032,1060等
#relay-log-purge = 1 #是否自动清空不再需要中继日志时。默认值为1(启用)
#expire_logs_days = 30 #超过 30 天的 binlog 删除
#映射端口、配置文件和data
docker run -d --name mysql -v /root/mysql:/etc/mysql -v /var/lib/mysql/:/var/lib/mysql/ -e MYSQL_ROOT_PASSWORD=123456 -p 3306:3306 hub.deri.org.cn/library/mysql:5.7
#也可以按需只映射一个配置文件mysql.cnf
docker run -d --name mysql -v /root/mysql/mysql.cnf:/etc/mysql/mysql.cnf -v /var/lib/mysql/:/var/lib/mysql/ -e MYSQL_ROOT_PASSWORD=123456 -p 3307:3306 hub.deri.org.cn/library/mysql:5.7
docker
安装配置keepalive
工具写一个脚本/etc/docker_mysql.sh
判断docker
中mysql
是否存活
#! /bin/bash
count=$(docker ps |grep hub.deri.org.cn/library/mysql:5.7 | grep Up| wc -l)
if [ $count == "1" ];then
exit 0
else
exit 1
fi
#测试这个脚本
[root@mysql01 ~]# docker stop mysql
mysql
[root@mysql01 ~]# ./docker_mysql.sh
[root@mysql01 ~]# ./docker_mysql.sh ;echo $?
1
[root@mysql01 ~]# docker start mysql
mysql
[root@mysql01 ~]# ./docker_mysql.sh ;echo $?
0
#配置keepalived
vrrp_script chk_mysql {
script "/etc/docker_mysql.sh"
interval 2
timeout 2
fall 3
}