0%

前提

  • 已部署k8s集群
  • 集群内部署prometheus

步骤

  • 在外部节点安装node-exporter
    docker run -d --name node-exporter -p 9100:9100 prom/node-exporter
    启动完成可以访问http://IP:9100/metrics.
  • 创建endpoints
    apiVersion: v1
    kind: Endpoints
    metadata:
    name: node-data
    namespace: kubesphere-monitoring-system
    labels:
      app.kubernetes.io/name: node-data
    subsets:
    - addresses:
        - ip: 192.168.3.17
        - ip: 192.168.3.19
        - ip: 192.168.3.20
      ports:
        - port: 9100
          name: http
  • 创建service
    apiVersion: v1
    kind: Service
    metadata:
    name: node-data
    namespace: kubesphere-monitoring-system
    labels:
      app.kubernetes.io/name: node-data
    spec:
    ports:
      - port: 9100
        name: http
  • 创建ServiceMonitor
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
    name: node-exporter-data
    namespace: kubesphere-monitoring-system
    spec:
    endpoints:
    - port: http
    namespaceSelector:
      matchNames:
      - kubesphere-monitoring-system
    selector:
      matchLabels:
        app.kubernetes.io/name: node-data
    创建完成可以prometheus web页面查看是否有新建的targets.

    PromQL

  1. 计算CPU使用率
    # 1m查不到数据可以设置成3m,5m
    (1-(sum(increase(node_cpu_seconds_total{mode="idle"}[1m]))by(instance))/(sum(increase(node_cpu_seconds_total[1m]))by(instance)))*100
  2. 内存使用率
    (1-(node_memory_MemAvailable_bytes{}/(node_memory_MemTotal_bytes{})))*100
  3. 磁盘分区
    # 注意mountpoint
    100 - (node_filesystem_free_bytes{mountpoint="/",fstype=~"ext4|xfs"} / node_filesystem_size_bytes{mountpoint="/",fstype=~"ext4|xfs"} * 100)

docker stats

docker stats --no-stream

[root@data1 ~]# docker stats
CONTAINER ID   NAME             CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
b4fb48ce6a23   magical_bell     147.29%   4.207GiB / 7.638GiB   55.08%    8.72GB / 7.79GB   2.46GB / 7.21GB   71
e189b149f025   kafka            3.94%     1.549GiB / 7.638GiB   20.27%    0B / 0B           637GB / 25GB      78
14cc0c468a14   zookeeper        0.24%     178.2MiB / 7.638GiB   2.28%     0B / 0B           381GB / 7.49MB    102

统计的结果和实际有出入.

TOP

# 获取容器的PID
docker inspect -f '{{.State.Pid}}' container_name
# 根据pid查询
[root@data1 ~]# top -p 2203
PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+  COMMAND 
2203 1001      20   0 6470744   1.3g   6076 S   0.0 16.6   2717:49 java

VmRSS

[root@data1 ~]# cat /proc/2203/status
VmPeak:     6475388 kB
VmSize:     6470744 kB
VmLck:           0 kB
VmPin:           0 kB
VmHWM:     1336420 kB
VmRSS:     1329516 kB # 此处是所要查询的内存大小

脚本

# 找出所有运行的容器
idNames=`docker ps --format "{{.ID}}|{{.Names}},"`

# 按,号分隔
OLD_IFS="$IFS"
IFS=","
arr=($idNames)
IFS="$OLD_IFS"

# 输出 Title
printf "%-15s %-30s %-15s\n" Id Name Mem

# 遍历所有容器
for item in ${arr[@]}
do
    # 容器ID和容器名字 按 | 分隔
    OLD_IFS="$IFS"
    IFS="|"
    array=($item)
    IFS="$OLD_IFS"

    # 当前容器的Pid
    pid=`docker inspect -f '{{.State.Pid}}' ${array[0]}`

    # 当前容器的内存
    mem=$(cat /proc/$pid/status|grep -e VmRSS| awk '{print $2}')

    # 输出结果
    printf "%-15s %-30s %-15s\n" ${array[0]} ${array[1]} $[$mem / 1024]M
done

节点离开延迟分配

index.unassigned.node_left.delayed_timeout默认为1m. 当一个节点出于某种原因离开集群时,无论是有意的还是其他的,主节点的反应是:

  • 将副本分片提升为主分片以替换节点上的任何主分片。
  • 分配副本分片以替换丢失的副本(假设有足够的节点)。
  • 在剩余节点上均匀地重新平衡分片。

如果一个节点被删除永远不会返回,希望 Elasticsearch 立即分配丢失的分片,只需将超时更新为0

PUT _all/_settings
{
  "settings": {
    "index.unassigned.node_left.delayed_timeout": "0"
  }
}

索引恢复优先级

尽可能按优先级顺序恢复未分配的分片。指数按优先级排序如下:

  • 可选index.priority设置(先高后低)
  • 索引创建日期(先高后低)
  • 索引名称(先高后低)
    PUT index_4/_settings
    {
    "index.priority": 1
    }

    节点分片总数限制

  • 为单个索引设置限制index.routing.allocation.total_shards_per_node默认为无限制;
  • 为集群统一设置cluster.routing.allocation.total_shards_per_node默认为-1无限制.

数据冷热节点角色

  • data_content
  • data_hot
  • data_warm
  • data_cold
  • data_frozen
    # es 7.13之前用下面的语法,7.13之后过时
    index.routing.allocation.include._tier: data_warm
    index.routing.allocation.require._tier: data_warm
    index.routing.allocation.exclude._tier: data_warm
    # es 7.13版本之后使用
    index.routing.allocation.include._tier_preference: data_warm,data_hot

    索引块

    # 设置索引和索引元数据只读
    index.blocks.read_only: true
    # 设置只读,不能删除索引内doc,但是允许删除索引
    index.blocks.read_only_allow_delete: true
    # 
    index.blocks.read: true
    index.blocks.write: true
    index.blocks.metadata: true
    操作
    # <block>可以是metadata,read,read_only,write
    PUT /my-index-000001/_block/<block>

慢日志

  • 系统级别
    // 可以动态设置,threshold默认disabled,为-1
    PUT /my-index-000001/_settings
    {
    "index.search.slowlog.threshold.query.warn": "10s",
    "index.search.slowlog.threshold.query.info": "5s",
    "index.search.slowlog.threshold.query.debug": "2s",
    "index.search.slowlog.threshold.query.trace": "500ms",
    "index.search.slowlog.threshold.fetch.warn": "1s",
    "index.search.slowlog.threshold.fetch.info": "800ms",
    "index.search.slowlog.threshold.fetch.debug": "500ms",
    "index.search.slowlog.threshold.fetch.trace": "200ms"
    }
  • 索引级别,文件名以_index_indexing_slowlog.log结尾
    PUT /my-index-000001/_settings
    {
    "index.indexing.slowlog.threshold.index.warn": "10s",
    "index.indexing.slowlog.threshold.index.info": "5s",
    "index.indexing.slowlog.threshold.index.debug": "2s",
    "index.indexing.slowlog.threshold.index.trace": "500ms",
    "index.indexing.slowlog.source": "1000"
    }

存储

store模块允许控制索引数据在磁盘上的存储和访问方式,建议采用默认值.

  • 系统级别设置,elasticsearch.yml
    index.store.type: hybridfs
  • 索引级别
    PUT /my-index-000001
    {
    "settings": {
      "index.store.type": "hybridfs"
    }
    }

    可选的值:fs,simplefs,niofs,mmapfs,hybridfs.

事务日志

ES提交到Lucene的索引、删除、分片拷贝、写等操作在未确认之前都会写入translog.

# 默认request意味所有操作(index, delete, update, bulk)只有同步到所有分片和副本后才会返回success
index.translog.durability: request
# 可以设置异步提交到磁盘
index.translog.durability: async
# 异步同步到磁盘的时间,最少100ms
index.translog.sync_interval: 5s
# 达到这个大小立即刷新磁盘
index.translog.flush_threshold_size: 512mb

历史保留-软删除

# Elasticsearch 6.5.0 之后可用,默认为true
index.soft_deletes.enabled: true
# 保留时间,默认12h
index.soft_deletes.retention_lease.period: 12h

索引排序

默认不排序.

# 支持boolean, numeric, date and keyword
index.sort.field: ["username"]
# 支持asc,desc
index.sort.order: ["asc"]
# 支持min,max
index.sort.mode: min
# 支持_last,_first
index.sort.missing: _last

示例

PUT my-index-000001
{
  "settings": {
    "index": {
      "sort.field": [ "username", "date" ], 
      "sort.order": [ "asc", "desc" ]       
    }
  },
  "mappings": {
    "properties": {
      "username": {
        "type": "keyword",
        "doc_values": true
      },
      "date": {
        "type": "date"
      }
    }
  }
}

indexing pressure

# Defaults to 10% of the heap.
indexing_pressure.memory.limit: 10%

分片过滤

  • 第一步

在节点elasticsearch.yml配置中可以为节点增加自定义属性,如

# small,big,...
node.attr.size: medium

或者在启动时增加自定义属性

./bin/elasticsearch -Enode.attr.size=medium

节点自带属性
|属性|说明|
|—|—|
|_name|Match nodes by node name
|_host_ip|Match nodes by host IP address (IP associated with hostname)
|_publish_ip|Match nodes by publish IP address
|_ip|Match either _host_ip or _publish_ip
|_host|Match nodes by hostname
|_id|Match nodes by node id
|_tier|Match nodes by the node’s data tier role.

  • 第二步

创建或修改索引的分片过滤条件,支持require,exclude,include

PUT test/_settings
# 例1
{
  "index.routing.allocation.include.size": "big,medium"
}
# 例2
{
  "index.routing.allocation.require.size": "big",
  "index.routing.allocation.require.rack": "rack1"
}
# 例3
{
  "index.routing.allocation.include._ip": "192.168.2.*"
}

连接客户端

// 方法1
@Bean
public RestHighLevelClient restHighLevelClient() {
    ClientConfiguration clientConfiguration = ClientConfiguration.builder()
            .connectedTo("192.168.3.17:9200", "192.168.3.19:9200", "192.168.3.20:9200")
            .withBasicAuth("elastic", "elastic")
            .build();
    return RestClients.create(clientConfiguration).rest();
}
// 方法2
@Bean
public RestHighLevelClient restHighLevelClient() {
    RestHighLevelClient restHighLevelClient = new RestHighLevelClient(
            RestClient.builder(
                    new HttpHost("192.168.3.17", 9200, "http"),
                    new HttpHost("192.168.3.19", 9200, "http"),
                    new HttpHost("192.168.3.20", 9200, "http")
            )
    );
    return restHighLevelClient;
}
// 方法3
// 无需密码的连接
@Bean
public RestHighLevelClient restHighLevelClient() {
    return new RestHighLevelClient(RestClient.builder(new HttpHost[]{
            new HttpHost("localhost", 9200, "http")
    }));
}

发送文档

// 单条发送
IndexRequest request = new IndexRequest(indexName);
User user = new User(RandomStringUtils.randomAlphanumeric(50), 20 + new Random().nextInt(30));
request.source(JSON.toJSONString(user), XContentType.JSON);
try {
    restHighLevelClient.index(request, RequestOptions.DEFAULT);
} catch (IOException e) {
}

// 批量发送
BulkRequest requests = new BulkRequest();
for (int j = 0; j < 100; j++) {
    User user = new User(RandomStringUtils.randomAlphanumeric(50), 20 + new Random().nextInt(30));
    IndexRequest request = new IndexRequest(indexName);
    request.source(JSON.toJSONString(user), XContentType.JSON);
    requests.add(request);
}
try {
    restHighLevelClient.bulk(requests, RequestOptions.DEFAULT);
} catch (IOException e) {
}

节点

ES节点分master和data节点.为避免脑裂,应设置discovery.zen.minimum_master_nodes参数,值为集群中master节点数量/2 + 1,默认1.

index

合理设置索引分片数量,单个分片的存储doc数为2,147,483,519,建议小于30G,索引分片的数量依据节点数量,单个节点上单个索引的分片数量小于3,小规模集群分片数量小于等于数据节点数量最佳.

ES每个查询在每个分片的单个线程中执行再聚合,如果指定routing会提升查询效率.

type

  • 5.x版本支持多个type
  • 6.x版本仅支持一个type,可以自定义
  • 7.x版本所有type默认为_doc,也可以自定义但不推荐

id

  • 自动生成的是20位GUID
  • 支持手动生成

Mapping字段

字段尽量少用nestedparent/child,非常影响查询效率.尽量采用宽表设计.如果一定要使用nested fields,保证nested fields字段不能过多,目前ES默认限制是50。
index.mapping.nested_fields.limit: 50