k8s搭建ELK

Monday, April 26, 2021

概述

这篇文章是我用 K8s 搭建 ELK 的实践记录,配置是通用的,每一部分需要额外修改的部分都会标注出来。整体配置文件目录结构如下:

├── elk
│   ├── elk-data.yml
│   ├── elk-kibana.yml
│   ├── elk-logstash.yml
│   ├── elk-master.yml
│   ├── elk-ns.yml
│   └── elk-service.yml

我的服务器的配置为 4C8G,配置好 ELK + Grafana + 其它的一些项目后资源使用情况如下:

image-20210426221955158

需要注意:

  • 这个版本没有开启 XPack,后续会把这块更新上。

  • 所有的 PV 配置我都配置的相对很小,需要根据自己的实际情况调整大小。

  • 我的机器配置不高,只有 4C8G,省吃简用的节省配置后,ELK 依然占据了 2.28Gi 的内存,不过对于自用的话足够了。

image-20210426221822974

NameSpace

首先给 ELK 服务的所有内容创建命名空间将它们隔离起来,我的明明空间是 elk,下述所有内容命名空间都是这个。

elk-ns.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: elk
  labels:
    app: elasticsearch

ElasticSearch

es 分为 master 结点和 data 结点,需要将索引数据映射出来,由于我配置的时候只有一台服务器,所以直接挂载到了host磁盘上。

es 定义了四项内容:

  1. PersistentVolume
  2. PersistentVolumeClaim
  3. StatefulSet
  4. PodDisruptionBudget

需要注意:

  1. 需要将 master 和 data 的索引数据映射出来,我的映射目录是 /data/elk/master/data/elk/data,这两个 PV 映射的时候存储大小要根据实际情况来定,我由于是自己用,数据量小,每个都只设置了 10Gi。
  2. master 和 data 都可以配置多个 pod,但是我的机器 cpu 和 内存 有限,所以都只配了1个,如果要改的话,修改 StatefulSet 的 spec.replicas 值。
  3. ES JVM 堆内存最小为 2GB,但我没这么多,所以配置的小,内存足够的话记得修改 data 结点 StatefulSet 的 ES_JAVA_OPTS,我配置的是 -Xms512m -Xmx512m,内存足够建议配置 2GB 以上,根据实际情况决定,比如:-Xms4g -Xmx4g,master 结点可以不用太大,512MB 就可以。
  4. master 结点需要注意 env.cluster.initial_master_nodes 的值,这里需要根据实际配置的 Pod 数量来决定值,我只配置了一个 Pod,所以值是 elasticsearch-master-0,如果是多个的话,按照数字顺序往后拼,比如 3 个就是 elasticsearch-master-0,elasticsearch-master-1,elasticsearch-master-2
  5. master 和 data 结点都需要注意配置下 env.discovery.seed_hosts,这里需要将我配置的值中的 elk 改为自己的命名空间。
  6. PodDisruptionBudget 配置用于限制在同一时间因自愿干扰导致的复制应用程序中宕机的 pod 数量,这里我配置的是1。
  7. master 和 data 结点都暴露了 92009300 端口,结点端口只映射了 master 的 9200这里需要将配置文件 elk_service.yaml 中的 your node port 替换为你想要对外暴露的结点端口
elk-master.yaml
kind: PersistentVolume
apiVersion: v1
metadata:
  name: pv-volume-elastic-master
  namespace: elk
  labels:
    type: local
spec:
  storageClassName: manual
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/data/elk/master"

---

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pv-claim-elastic-master
  namespace: elk
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

---

apiVersion: apps/v1
kind: StatefulSet
metadata:
  namespace: elk
  name: elasticsearch-master
  labels:
    app: elasticsearch
    role: master
spec:
  serviceName: elasticsearch-master
  replicas: 1
  selector:
    matchLabels:
      app: elasticsearch
      role: master
  template:
    metadata:
      labels:
        app: elasticsearch
        role: master
    spec:
      volumes:
       - name: pv-storage-elastic-master
         persistentVolumeClaim:
           claimName: pv-claim-elastic-master
      containers:
        - name: elasticsearch
          image: elasticsearch:7.2.0
          resources:
            requests:
              memory: 1Gi
              cpu: 0.5
            limits:
              memory: 1Gi
              cpu: 0.5
          command: ["bash", "-c", "ulimit -l unlimited && sysctl -w vm.max_map_count=262144 && chown -R elasticsearch:elasticsearch /usr/share/elasticsearch/data && exec su elasticsearch docker-entrypoint.sh"]
          ports:
            - containerPort: 9200
              name: http
            - containerPort: 9300
              name: transport
          env:
            - name: discovery.seed_hosts
              value: "elasticsearch-master.elk.svc.cluster.local"
            - name: cluster.initial_master_nodes
              value: "elasticsearch-master-0"
            - name: ES_JAVA_OPTS
              value: -Xms512m -Xmx512m

            - name: node.master
              value: "true"
            - name: node.ingest
              value: "false"
            - name: node.data
              value: "false"

            - name: cluster.name
              value: "elasticsearch-cluster-v7"
            - name: node.name
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name

          volumeMounts:
           - mountPath: /usr/share/elasticsearch/data
             name: pv-storage-elastic-master

          # Gave permission to init container
          securityContext:
            privileged: true

      # Pull image from private repo
      imagePullSecrets:
      - name: regcred-elastic
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  namespace: elk
  name: elasticsearch-master
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: elasticsearch
      role: master

elk-data.yaml
kind: PersistentVolume
apiVersion: v1
metadata:
  name: pv-volume-elastic-data
  namespace: elk
  labels:
    type: local
spec:
  storageClassName: manual
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/data/elk/data"

---

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pv-claim-elastic-data
  namespace: elk
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

---

apiVersion: apps/v1
kind: StatefulSet
metadata:
  namespace: elk
  name: elasticsearch-data
  labels:
    app: elasticsearch
    role: data
spec:
  serviceName: elasticsearch-data
  replicas: 1
  selector:
    matchLabels:
      app: elasticsearch
      role: data
  template:
    metadata:
      labels:
        app: elasticsearch
        role: data
    spec:
      volumes:
       - name: pv-storage-elastic-data
         persistentVolumeClaim:
           claimName: pv-claim-elastic-data
      containers:
        - name: elasticsearch
          image: elasticsearch:7.2.0
          resources:
            requests:
              memory: 1Gi
              cpu: 0.5
            limits:
              memory: 1Gi
              cpu: 0.5
          command: ["bash", "-c", "ulimit -l unlimited && sysctl -w vm.max_map_count=262144 && chown -R elasticsearch:elasticsearch /usr/share/elasticsearch/data && exec su elasticsearch docker-entrypoint.sh"]
          ports:
            - containerPort: 9200
              name: http
            - containerPort: 9300
              name: transport
          env:
            - name: discovery.seed_hosts
              value: "elasticsearch-master.elk.svc.cluster.local"
            - name: ES_JAVA_OPTS
              value: -Xms512m -Xmx512m

            - name: node.master
              value: "false"
            - name: node.ingest
              value: "true"
            - name: node.data
              value: "true"

            - name: cluster.name
              value: "elasticsearch-cluster-v7"
            - name: node.name
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
          volumeMounts:
           - mountPath: /usr/share/elasticsearch/data
             name: pv-storage-elastic-data

          # Gave permission to init container
          securityContext:
            privileged: true

      # Pull image from private repo
      imagePullSecrets:
      - name: regcred-elastic
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  namespace: elk
  name: elasticsearch-data
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: elasticsearch
      role: data

elk-service.yaml
apiVersion: v1
kind: Service
metadata:
  namespace: elk
  name: elasticsearch-master
  labels:
    app: elasticsearch
    role: master
spec:
  clusterIP: None
  selector:
    app: elasticsearch
    role: master
  ports:
    - port: 9200
      name: http
    - port: 9300
      name: node-to-node

---

apiVersion: v1
kind: Service
metadata:
  namespace: elk
  name: elasticsearch
  labels:
    app: elasticsearch
    role: data
spec:
  clusterIP: None
  selector:
    app: elasticsearch
    role: data
  ports:
    - port: 9200
      name: http
    - port: 9300
      name: node-to-node

---

apiVersion: v1
kind: Service
metadata:
  namespace: elk
  name: elasticsearch-service
  labels:
    app: elasticsearch
    role: master
spec:
  type: NodePort
  ports:
    - port: 9200
      targetPort: 9200
      nodePort: your node port
  selector:
    app: elasticsearch
    role: master

Kibana

Kibana 需要注意的点不多,唯一需要注意的是 env.ELASTICSEARCH_URL 的值是上面配置的 master 服务的 name,端口是上面配置的 9200。

elk-kibana.yaml
---
kind: Deployment
apiVersion: apps/v1
metadata:
  labels:
    app: kibana
  name: kibana
  namespace: elk
spec:
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: kibana
  template:
    metadata:
      labels:
        app: kibana
    spec:
      containers:
        - name: kibana
          image: kibana:7.2.0
          ports:
            - containerPort: 5601
              protocol: TCP
          env:
            - name: "ELASTICSEARCH_URL"
              value: "http://elasticsearch-service:9200"
      imagePullSecrets:
      - name: regcred-elastic
      tolerations:
        - key: node-role.kubernetes.io/master
          effect: NoSchedule

---

kind: Service
apiVersion: v1
metadata:
  labels:
    app: kibana
  name: kibana-service
  namespace: elk
spec:
  type: NodePort
  ports:
    - port: 5601
      targetPort: 5601
      nodePort: 32502
  selector:
    app: kibana

Logstash

logstash 是重头戏,需要注意的地方比较多,这里需要仔细。

  1. nginx:今天配置的时候我是以机器上的 nginx 日志来做测试,将机器的 nginx 日志挂载到了pod 内部,并且修改了 nginx 的日志格式为json,当然,有条件的话可以用 kafka。

    nginx 的日志格式如下:

    log_format  json  '{"@timestamp":"$time_iso8601",'
                      '"@source":"$server_addr",'
                      '"hostname":"$hostname",'
                      '"ip":"$remote_addr",'
                      '"client":"$remote_addr",'
                      '"request_method":"$request_method",'
                      '"scheme":"$scheme",'
                      '"domain":"$server_name",'
                      '"referer":"$http_referer",'
                      '"request":"$request_uri",'
                      '"args":"$args",'
                      '"size":$body_bytes_sent,'
                      '"status": $status,'
                      '"responsetime":$request_time,'
                      '"upstreamtime":"$upstream_response_time",'
                      '"upstreamaddr":"$upstream_addr",'
                      '"http_user_agent":"$http_user_agent",'
                      '"https":"$https"'
                      '}';
    

    需要将对应虚拟主机的 access log 的格式改为 json

    server {
    
    	...
      access_log /path/to/your/log/file json;
      ...
    
    }
    
  2. GeoIP:日志解析用到了 IP 地址解析的功能,所以需要在主机上安装 GeoIp,更新城市数据后将数据文件映射到 Pod 中,如果这里安装更新有疑惑的话可以参考我的这篇文章:GeoIP的安装和更新

  3. nginx 和 GeoIP 的 PV 配置中需要修改自己的 nginx 日志路径 和 GeoIP 文件路径。

  4. ConfigMap 中需要根据自己的实际情况修改 :

    • input.file.path 匹配映射到 Pod 中的 access 日志路径。
    • filter.geoip.database 匹配映射到 Pod 中的 GeoLite2-City 数据文件的路径。
    • output.elasticsearch.hosts 为上述 ES 服务配置的 内网IP+结点端口
  5. 我是将所有的 nginx 都收集到一个 index 中,这样后续配合 Grafana 可以统一用一个面板查看所有服务的访问情况,也可以通过切换域名查看单个服务的数据。

elk-logstash.yaml
kind: PersistentVolume
apiVersion: v1
metadata:
  name: pv-volume-log
  namespace: elk
  labels:
    type: local
spec:
  storageClassName: manual
  capacity:
    storage: 2Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/path/to/your/nginx/logs"

---

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pv-claim-log
  namespace: elk
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi

---

kind: PersistentVolume
apiVersion: v1
metadata:
  name: pv-volume-geoip
  namespace: elk
  labels:
    type: local
spec:
  storageClassName: manual
  capacity:
    storage: 200Mi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/path/to/GeoIP"

---

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pv-claim-geoip
  namespace: elk
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 200Mi

---

kind: ConfigMap
apiVersion: v1
metadata:
  name: logstash-config
  namespace: elk
data:
  logstash-config-named-k8s: |
    input {
      file {
        path => "/var/log/nginx/*access.log"
        type => "nginx-access-log"
        ignore_older => 0 
        codec => json
        start_position => "beginning"
      }
    }

    filter {
      mutate {
        convert => [ "status","integer" ]
        convert => [ "size","integer" ]
        convert => [ "upstreatime","float" ]
        convert => ["[geoip][coordinates]", "float"]
        remove_field => "message"
      }
      date {
        match => [ "timestamp" ,"dd/MMM/YYYY:HH:mm:ss Z" ]
      }
      geoip {
        source => "client"
        target => "geoip"
        database =>"/usr/share/GeoIP/GeoLite2-City.mmdb"
        add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
        add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}"  ]
      }
      mutate {
        remove_field => "timestamp"
      }
      if "_geoip_lookup_failure" in [tags] { drop { } } ### 如果解析的地址是内网IP geoip解析将会失败,会生成_geoip_lookup_failure字段,这段话的意思是如果内网地址 drop掉这个字段。
    }

    output {
      #stdout { codec => rubydebug }
      elasticsearch {
        hosts => ["your internal ip:your node port"]
        index => "nginx"
      }
    }    

---

kind: Deployment
apiVersion: apps/v1
metadata:
  name: logstash
  namespace: elk
  labels:
    app: logstash
spec:
  replicas: 1
  selector:
    matchLabels:
      app: logstash
  template:
    metadata:
      labels:
        app: logstash
    spec:
      containers:
      - name: logstash
        image: logstash:7.2.0
        command: ["/bin/sh","-c"]
        args: ["/usr/share/logstash/bin/logstash -f /usr/share/logstash/config/indexer-kafka-named-k8s.conf"]
        volumeMounts:
        - name: vm-config
          mountPath: /usr/share/logstash/config
        - name: pv-storage-log
          mountPath: /var/log/nginx
        - name: pv-storage-geoip
          mountPath: /usr/share/GeoIP
      imagePullSecrets:
      - name: regcred-elastic
      volumes:
      - name: vm-config
        configMap:
          name: logstash-config
          items:
          - key: logstash-config-named-k8s
            path: indexer-kafka-named-k8s.conf
      - name: pv-storage-log
        persistentVolumeClaim:
          claimName: pv-claim-log
      - name: pv-storage-geoip
        persistentVolumeClaim:
          claimName: pv-claim-geoip

总结

通过用 K8s 部署 Elk,现在将常用的几个服务就都从 docker-compose 迁移到了 k8s 上,并且上述操作将 nginx 日志也收集到了 ELK 中,接下来结合 Grafana 我们可以将 Nginx 的日志可视化,所有服务的服务情况就可以有一个统一的面板。

Kubernetes ELK Kubernetes

GeoIP的安装和更新大话数据结构学习笔记