概述
这篇文章是我用 K8s 搭建 ELK 的实践记录,配置是通用的,每一部分需要额外修改的部分都会标注出来。整体配置文件目录结构如下:
├── elk
│ ├── elk-data.yml
│ ├── elk-kibana.yml
│ ├── elk-logstash.yml
│ ├── elk-master.yml
│ ├── elk-ns.yml
│ └── elk-service.yml
我的服务器的配置为 4C8G,配置好 ELK + Grafana + 其它的一些项目后资源使用情况如下:
需要注意:
-
这个版本没有开启
XPack
,后续会把这块更新上。 -
所有的 PV 配置我都配置的相对很小,需要根据自己的实际情况调整大小。
-
我的机器配置不高,只有 4C8G,省吃简用的节省配置后,ELK 依然占据了 2.28Gi 的内存,不过对于自用的话足够了。
NameSpace
首先给 ELK 服务的所有内容创建命名空间将它们隔离起来,我的明明空间是 elk
,下述所有内容命名空间都是这个。
elk-ns.yaml
apiVersion: v1
kind: Namespace
metadata:
name: elk
labels:
app: elasticsearch
ElasticSearch
es 分为 master 结点和 data 结点,需要将索引数据映射出来,由于我配置的时候只有一台服务器,所以直接挂载到了host磁盘上。
es 定义了四项内容:
- PersistentVolume
- PersistentVolumeClaim
- StatefulSet
- PodDisruptionBudget
需要注意:
- 需要将 master 和 data 的索引数据映射出来,我的映射目录是
/data/elk/master
和/data/elk/data
,这两个 PV 映射的时候存储大小要根据实际情况来定,我由于是自己用,数据量小,每个都只设置了 10Gi。 - master 和 data 都可以配置多个 pod,但是我的机器 cpu 和 内存 有限,所以都只配了1个,如果要改的话,修改 StatefulSet 的
spec.replicas
值。 - ES JVM 堆内存最小为 2GB,但我没这么多,所以配置的小,内存足够的话记得修改 data 结点 StatefulSet 的
ES_JAVA_OPTS
,我配置的是 -Xms512m -Xmx512m,内存足够建议配置 2GB 以上,根据实际情况决定,比如:-Xms4g -Xmx4g,master 结点可以不用太大,512MB 就可以。 - master 结点需要注意
env.cluster.initial_master_nodes
的值,这里需要根据实际配置的 Pod 数量来决定值,我只配置了一个 Pod,所以值是elasticsearch-master-0
,如果是多个的话,按照数字顺序往后拼,比如 3 个就是elasticsearch-master-0,elasticsearch-master-1,elasticsearch-master-2
。 - master 和 data 结点都需要注意配置下 env.discovery.seed_hosts,这里需要将我配置的值中的
elk
改为自己的命名空间。 - PodDisruptionBudget 配置用于限制在同一时间因自愿干扰导致的复制应用程序中宕机的 pod 数量,这里我配置的是1。
- master 和 data 结点都暴露了
9200
和9300
端口,结点端口只映射了 master 的9200
,这里需要将配置文件 elk_service.yaml 中的 your node port 替换为你想要对外暴露的结点端口。
elk-master.yaml
kind: PersistentVolume
apiVersion: v1
metadata:
name: pv-volume-elastic-master
namespace: elk
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data/elk/master"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pv-claim-elastic-master
namespace: elk
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
namespace: elk
name: elasticsearch-master
labels:
app: elasticsearch
role: master
spec:
serviceName: elasticsearch-master
replicas: 1
selector:
matchLabels:
app: elasticsearch
role: master
template:
metadata:
labels:
app: elasticsearch
role: master
spec:
volumes:
- name: pv-storage-elastic-master
persistentVolumeClaim:
claimName: pv-claim-elastic-master
containers:
- name: elasticsearch
image: elasticsearch:7.2.0
resources:
requests:
memory: 1Gi
cpu: 0.5
limits:
memory: 1Gi
cpu: 0.5
command: ["bash", "-c", "ulimit -l unlimited && sysctl -w vm.max_map_count=262144 && chown -R elasticsearch:elasticsearch /usr/share/elasticsearch/data && exec su elasticsearch docker-entrypoint.sh"]
ports:
- containerPort: 9200
name: http
- containerPort: 9300
name: transport
env:
- name: discovery.seed_hosts
value: "elasticsearch-master.elk.svc.cluster.local"
- name: cluster.initial_master_nodes
value: "elasticsearch-master-0"
- name: ES_JAVA_OPTS
value: -Xms512m -Xmx512m
- name: node.master
value: "true"
- name: node.ingest
value: "false"
- name: node.data
value: "false"
- name: cluster.name
value: "elasticsearch-cluster-v7"
- name: node.name
valueFrom:
fieldRef:
fieldPath: metadata.name
volumeMounts:
- mountPath: /usr/share/elasticsearch/data
name: pv-storage-elastic-master
# Gave permission to init container
securityContext:
privileged: true
# Pull image from private repo
imagePullSecrets:
- name: regcred-elastic
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
namespace: elk
name: elasticsearch-master
spec:
maxUnavailable: 1
selector:
matchLabels:
app: elasticsearch
role: master
elk-data.yaml
kind: PersistentVolume
apiVersion: v1
metadata:
name: pv-volume-elastic-data
namespace: elk
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data/elk/data"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pv-claim-elastic-data
namespace: elk
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
namespace: elk
name: elasticsearch-data
labels:
app: elasticsearch
role: data
spec:
serviceName: elasticsearch-data
replicas: 1
selector:
matchLabels:
app: elasticsearch
role: data
template:
metadata:
labels:
app: elasticsearch
role: data
spec:
volumes:
- name: pv-storage-elastic-data
persistentVolumeClaim:
claimName: pv-claim-elastic-data
containers:
- name: elasticsearch
image: elasticsearch:7.2.0
resources:
requests:
memory: 1Gi
cpu: 0.5
limits:
memory: 1Gi
cpu: 0.5
command: ["bash", "-c", "ulimit -l unlimited && sysctl -w vm.max_map_count=262144 && chown -R elasticsearch:elasticsearch /usr/share/elasticsearch/data && exec su elasticsearch docker-entrypoint.sh"]
ports:
- containerPort: 9200
name: http
- containerPort: 9300
name: transport
env:
- name: discovery.seed_hosts
value: "elasticsearch-master.elk.svc.cluster.local"
- name: ES_JAVA_OPTS
value: -Xms512m -Xmx512m
- name: node.master
value: "false"
- name: node.ingest
value: "true"
- name: node.data
value: "true"
- name: cluster.name
value: "elasticsearch-cluster-v7"
- name: node.name
valueFrom:
fieldRef:
fieldPath: metadata.name
volumeMounts:
- mountPath: /usr/share/elasticsearch/data
name: pv-storage-elastic-data
# Gave permission to init container
securityContext:
privileged: true
# Pull image from private repo
imagePullSecrets:
- name: regcred-elastic
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
namespace: elk
name: elasticsearch-data
spec:
maxUnavailable: 1
selector:
matchLabels:
app: elasticsearch
role: data
elk-service.yaml
apiVersion: v1
kind: Service
metadata:
namespace: elk
name: elasticsearch-master
labels:
app: elasticsearch
role: master
spec:
clusterIP: None
selector:
app: elasticsearch
role: master
ports:
- port: 9200
name: http
- port: 9300
name: node-to-node
---
apiVersion: v1
kind: Service
metadata:
namespace: elk
name: elasticsearch
labels:
app: elasticsearch
role: data
spec:
clusterIP: None
selector:
app: elasticsearch
role: data
ports:
- port: 9200
name: http
- port: 9300
name: node-to-node
---
apiVersion: v1
kind: Service
metadata:
namespace: elk
name: elasticsearch-service
labels:
app: elasticsearch
role: master
spec:
type: NodePort
ports:
- port: 9200
targetPort: 9200
nodePort: your node port
selector:
app: elasticsearch
role: master
Kibana
Kibana 需要注意的点不多,唯一需要注意的是 env.ELASTICSEARCH_URL
的值是上面配置的 master 服务的 name,端口是上面配置的 9200。
elk-kibana.yaml
---
kind: Deployment
apiVersion: apps/v1
metadata:
labels:
app: kibana
name: kibana
namespace: elk
spec:
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: kibana
template:
metadata:
labels:
app: kibana
spec:
containers:
- name: kibana
image: kibana:7.2.0
ports:
- containerPort: 5601
protocol: TCP
env:
- name: "ELASTICSEARCH_URL"
value: "http://elasticsearch-service:9200"
imagePullSecrets:
- name: regcred-elastic
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
---
kind: Service
apiVersion: v1
metadata:
labels:
app: kibana
name: kibana-service
namespace: elk
spec:
type: NodePort
ports:
- port: 5601
targetPort: 5601
nodePort: 32502
selector:
app: kibana
Logstash
logstash 是重头戏,需要注意的地方比较多,这里需要仔细。
-
nginx:今天配置的时候我是以机器上的 nginx 日志来做测试,将机器的 nginx 日志挂载到了pod 内部,并且修改了 nginx 的日志格式为json,当然,有条件的话可以用 kafka。
nginx 的日志格式如下:
log_format json '{"@timestamp":"$time_iso8601",' '"@source":"$server_addr",' '"hostname":"$hostname",' '"ip":"$remote_addr",' '"client":"$remote_addr",' '"request_method":"$request_method",' '"scheme":"$scheme",' '"domain":"$server_name",' '"referer":"$http_referer",' '"request":"$request_uri",' '"args":"$args",' '"size":$body_bytes_sent,' '"status": $status,' '"responsetime":$request_time,' '"upstreamtime":"$upstream_response_time",' '"upstreamaddr":"$upstream_addr",' '"http_user_agent":"$http_user_agent",' '"https":"$https"' '}';
需要将对应虚拟主机的 access log 的格式改为
json
:server { ... access_log /path/to/your/log/file json; ... }
-
GeoIP:日志解析用到了 IP 地址解析的功能,所以需要在主机上安装 GeoIp,更新城市数据后将数据文件映射到 Pod 中,如果这里安装更新有疑惑的话可以参考我的这篇文章:GeoIP的安装和更新 。
-
nginx 和 GeoIP 的 PV 配置中需要修改自己的 nginx 日志路径 和 GeoIP 文件路径。
-
ConfigMap 中需要根据自己的实际情况修改 :
- input.file.path 匹配映射到 Pod 中的 access 日志路径。
- filter.geoip.database 匹配映射到 Pod 中的 GeoLite2-City 数据文件的路径。
- output.elasticsearch.hosts 为上述 ES 服务配置的 内网IP+结点端口。
-
我是将所有的 nginx 都收集到一个 index 中,这样后续配合 Grafana 可以统一用一个面板查看所有服务的访问情况,也可以通过切换域名查看单个服务的数据。
elk-logstash.yaml
kind: PersistentVolume
apiVersion: v1
metadata:
name: pv-volume-log
namespace: elk
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 2Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/path/to/your/nginx/logs"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pv-claim-log
namespace: elk
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
---
kind: PersistentVolume
apiVersion: v1
metadata:
name: pv-volume-geoip
namespace: elk
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 200Mi
accessModes:
- ReadWriteOnce
hostPath:
path: "/path/to/GeoIP"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pv-claim-geoip
namespace: elk
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 200Mi
---
kind: ConfigMap
apiVersion: v1
metadata:
name: logstash-config
namespace: elk
data:
logstash-config-named-k8s: |
input {
file {
path => "/var/log/nginx/*access.log"
type => "nginx-access-log"
ignore_older => 0
codec => json
start_position => "beginning"
}
}
filter {
mutate {
convert => [ "status","integer" ]
convert => [ "size","integer" ]
convert => [ "upstreatime","float" ]
convert => ["[geoip][coordinates]", "float"]
remove_field => "message"
}
date {
match => [ "timestamp" ,"dd/MMM/YYYY:HH:mm:ss Z" ]
}
geoip {
source => "client"
target => "geoip"
database =>"/usr/share/GeoIP/GeoLite2-City.mmdb"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
mutate {
remove_field => "timestamp"
}
if "_geoip_lookup_failure" in [tags] { drop { } } ### 如果解析的地址是内网IP geoip解析将会失败,会生成_geoip_lookup_failure字段,这段话的意思是如果内网地址 drop掉这个字段。
}
output {
#stdout { codec => rubydebug }
elasticsearch {
hosts => ["your internal ip:your node port"]
index => "nginx"
}
}
---
kind: Deployment
apiVersion: apps/v1
metadata:
name: logstash
namespace: elk
labels:
app: logstash
spec:
replicas: 1
selector:
matchLabels:
app: logstash
template:
metadata:
labels:
app: logstash
spec:
containers:
- name: logstash
image: logstash:7.2.0
command: ["/bin/sh","-c"]
args: ["/usr/share/logstash/bin/logstash -f /usr/share/logstash/config/indexer-kafka-named-k8s.conf"]
volumeMounts:
- name: vm-config
mountPath: /usr/share/logstash/config
- name: pv-storage-log
mountPath: /var/log/nginx
- name: pv-storage-geoip
mountPath: /usr/share/GeoIP
imagePullSecrets:
- name: regcred-elastic
volumes:
- name: vm-config
configMap:
name: logstash-config
items:
- key: logstash-config-named-k8s
path: indexer-kafka-named-k8s.conf
- name: pv-storage-log
persistentVolumeClaim:
claimName: pv-claim-log
- name: pv-storage-geoip
persistentVolumeClaim:
claimName: pv-claim-geoip
总结
通过用 K8s 部署 Elk,现在将常用的几个服务就都从 docker-compose 迁移到了 k8s 上,并且上述操作将 nginx 日志也收集到了 ELK 中,接下来结合 Grafana 我们可以将 Nginx 的日志可视化,所有服务的服务情况就可以有一个统一的面板。