Skip to main content

NKS에서 메트릭을 수집하는 메트릭 서버

개요

파드의 메트릭 서버를 찾아도 존재하지 않으나 top 명령어 실행 결과 메트릭 서버가 동작하는 것을 알 수 있었고 API 서버에 등록된 API 서비스 중 메트릭 서버 API 서비스를 찾을 수 있었다. 그러면 노드의 메트릭은 어떻게 수집하는 걸까라는 의문이 생겼다.

# 메트릭 파드 검색 (존재하지 않음)
$ k get pods -n kube-system | grep metrics

# 메트릭 표시
$ k top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
nks-node-pool-w-a2v 713m 36% 4323Mi 71%
nks-node-pool-w-a2w 270m 13% 5886Mi 97%
nks-node-pool-w-a7a 101m 5% 3245Mi 53%
nks-node-pool-w-al4 88m 4% 3140Mi 51%
nks-node-pool-w-anu 111m 5% 2495Mi 41%
nks-node-pool-w-ap7 88m 4% 1745Mi 28%

# API 서비스에서 메트릭 API 검색
$ k get apiservices.apiregistration.k8s.io | egrep '(AVAIALBE|metrics)'
v1beta1.metrics.k8s.io kube-system/metrics-server True 44h

# 노드의 메트릭 데이터를 반환하는 Path로 API 서버에 직접 HTTP 요청을 전송 및 결과 반환
$ kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" | jq
{
"kind": "NodeMetricsList",
"apiVersion": "metrics.k8s.io/v1beta1",
"metadata": {},
"items": [
{
"metadata": {
"name": "nks-node-pool-w-a2v",
"creationTimestamp": "2024-05-11T06:09:53Z",
"labels": {
"beta.kubernetes.io/arch": "amd64",
"beta.kubernetes.io/instance-type": "SVR.VSVR.STAND.C002.M008.NET.SSD.B050.G002",
"beta.kubernetes.io/os": "linux",
"failure-domain.beta.kubernetes.io/region": "8",
"failure-domain.beta.kubernetes.io/zone": "84",
"kubernetes.io/arch": "amd64",
"kubernetes.io/hostname": "nks-node-pool-w-a2v",
"kubernetes.io/os": "linux",
"node.kubernetes.io/instance-type": "SVR.VSVR.STAND.C002.M008.NET.SSD.B050.G002",
"nodeId": "22694748",
"regionNo": "8",
"topology.kubernetes.io/region": "8",
"topology.kubernetes.io/zone": "84",
"zoneNo": "84"
}
},
"timestamp": "2024-05-11T06:08:50Z",
"window": "1m0.292s",
"usage": {
"cpu": "712742018n",
"memory": "4427044Ki"
}
},
...(생략)

그래서 가이드를 찾아보니 아래 내용을 찾아볼 수 있었다.

네이버 클라우드 플랫폼의 Ncloud Kubernetes Service는 Kubelet으로부터 리소스의 메트릭을 수집하고, 이를 Kubernetes API서버에 노출하는 Metrics Server를 내장하고 있습니다.

다음은 metrics-server의 간략한 정보입니다.

  • 해당 Metrics 서버는 15초마다 측정항목을 수집
  • kube api 서버에서는 --enable-aggregator-routing=true 를 활성화
    • 위 플래그 설정의 경우 API 서버를 실행하는 호스트에서 kube-proxy를 실행하지 않음
  • 직접적으로 kubelet의 /metrics/resource 경로로 지표를 수집할 수 있음

그럼 이제 워커노드의 kubelet이 몇 번 포트로 메트릭 지표를 노출하고 있는지 확인하고 유저 토큰을 사용하여 워커노드에 직접 API 요청을 보내 메트릭 정보를 수집해보겠습니다. 위 설명처럼 워커노드의 아이피로 /metrics/resource로 API 요청을 전송합니다.

Test

# 워커노드에 접속하여 사용 포트 확인, 10250 포트
root@ip-192-168-100-9-jpn:/var/run# netstat -nltp | grep "kubelet"
tcp 0 0 127.0.0.1:10248 0.0.0.0:* LISTEN 7944/kubelet
tcp 0 0 192.168.100.9:10250 0.0.0.0:* LISTEN 7944/kubelet

# 헤더에는 서비스 어카운트의 토큰이 필요하므로 저는 NKS에서 사용중인 유저 계정의 토큰을 사용합니다.
$ ncp-iam-authenticator token --region JPN --clusterUuid 1b712dsadc-bba8-4d01-9223-92acef342824 | jq .status.token

# 아래 토글에서 지표 확인
$ curl -sSk -H "Authorization: Bearer [토큰]" -v https://192.168.100.9:10250/metrics/resource
192.168.100.9:10250/metrics/resource
container_cpu_usage_seconds_total{container="autoscaler",namespace="kube-system",pod="dns-autoscaler-5b44dd6985-hqll4"} 44.246141915 1712991069247
container_cpu_usage_seconds_total{container="cilium-agent",namespace="kube-system",pod="cilium-wtq4s"} 708.178547891 1712991078210
container_cpu_usage_seconds_total{container="cilium-monitor",namespace="kube-system",pod="cilium-monitor-6vnxj"} 7.735380417 1712991067977
container_cpu_usage_seconds_total{container="cilium-operator",namespace="kube-system",pod="cilium-operator-7b8cdb67b4-gkwwn"} 70.707023734 1712991077417
container_cpu_usage_seconds_total{container="coredns",namespace="kube-system",pod="coredns-8657cc7cb4-jj9xv"} 170.055176303 1712991077619
container_cpu_usage_seconds_total{container="csi-attacher",namespace="kube-system",pod="csi-nks-controller-696d68f589-lf472"} 96.144418332 1712991069710
container_cpu_usage_seconds_total{container="csi-external-health-monitor-controller",namespace="kube-system",pod="csi-nks-controller-696d68f589-lf472"} 66.923675876 1712991070183
container_cpu_usage_seconds_total{container="csi-nks-plugin",namespace="kube-system",pod="csi-nks-controller-696d68f589-lf472"} 27.513114908 1712991064219
container_cpu_usage_seconds_total{container="csi-nks-plugin",namespace="kube-system",pod="csi-nks-node-ctcr5"} 20.62431211 1712991071232
container_cpu_usage_seconds_total{container="csi-node-driver-registrar",namespace="kube-system",pod="csi-nks-node-ctcr5"} 34.18489124 1712991070386
container_cpu_usage_seconds_total{container="csi-provisioner",namespace="kube-system",pod="csi-nks-controller-696d68f589-lf472"} 179.590821685 1712991067139
container_cpu_usage_seconds_total{container="csi-provisioner",namespace="kube-system",pod="nks-nas-csi-controller-ffdf697b7-mcgsv"} 122.712034472 1712991073825
container_cpu_usage_seconds_total{container="csi-resizer",namespace="kube-system",pod="csi-nks-controller-696d68f589-lf472"} 98.816083319 1712991069583
container_cpu_usage_seconds_total{container="csi-resizer",namespace="kube-system",pod="nks-nas-csi-controller-ffdf697b7-mcgsv"} 17.982610566 1712991068180
container_cpu_usage_seconds_total{container="csi-snapshotter",namespace="kube-system",pod="csi-nks-controller-696d68f589-lf472"} 95.351346901 1712991073588
container_cpu_usage_seconds_total{container="kube-proxy",namespace="kube-system",pod="kube-proxy-kz9lp"} 41.889627012 1712991062069
container_cpu_usage_seconds_total{container="liveness-probe",namespace="kube-system",pod="csi-nks-controller-696d68f589-lf472"} 16.191674094 1712991065073
container_cpu_usage_seconds_total{container="liveness-probe",namespace="kube-system",pod="csi-nks-node-ctcr5"} 15.406572768 1712991063329
container_cpu_usage_seconds_total{container="liveness-probe",namespace="kube-system",pod="nks-nas-csi-controller-ffdf697b7-mcgsv"} 15.164464783 1712991071347
container_cpu_usage_seconds_total{container="liveness-probe",namespace="kube-system",pod="nks-nas-csi-node-vw5jh"} 15.176487093 1712991063555
container_cpu_usage_seconds_total{container="nfs",namespace="kube-system",pod="nks-nas-csi-controller-ffdf697b7-mcgsv"} 15.221074567 1712991074811
container_cpu_usage_seconds_total{container="nfs",namespace="kube-system",pod="nks-nas-csi-node-vw5jh"} 14.586397807 1712991065139
container_cpu_usage_seconds_total{container="node-cache",namespace="kube-system",pod="nodelocaldns-mxj2l"} 210.726500715 1712991075802
container_cpu_usage_seconds_total{container="node-driver-registrar",namespace="kube-system",pod="nks-nas-csi-node-vw5jh"} 33.826777778 1712991071257
container_cpu_usage_seconds_total{container="snapshot-controller",namespace="kube-system",pod="snapshot-controller-0"} 54.71799964 1712991070776
# HELP container_memory_working_set_bytes [ALPHA] Current working set of the container in bytes
# TYPE container_memory_working_set_bytes gauge
container_memory_working_set_bytes{container="autoscaler",namespace="kube-system",pod="dns-autoscale* TLSv1.2 (IN), TLS header, Supplemental data (23):
r-5b44dd6985-hqll4"} 9.576448e+06 1712991069247
container_memory_working_set_bytes{container="cilium-agent",namespace="kube-system",pod="cilium-wtq4s"} 8.0613376e+07 1712991078210
container_memory_working_set_bytes{container="cilium-monitor",namespace="kube-system",pod="cilium-monitor-6vnxj"} 3.080192e+06 1712991067977
container_memory_working_set_bytes{container="cilium-operator",namespace="kube-system",pod="cilium-operator-7b8cdb67b4-gkwwn"} 1.6531456e+07 1712991077417
container_memory_working_set_bytes{container="coredns",namespace="kube-system",pod="coredns-8657cc7cb4-jj9xv"} 1.4577664e+07 1712991077619
container_memory_working_set_bytes{container="csi-attacher",namespace="kube-system",pod="csi-nks-controller-696d68f589-lf472"} 1.2222464e+07 1712991069710
container_memory_working_set_bytes{container="csi-external-health-monitor-controller",namespace="kube-system",pod="csi-nks-controller-696d68f589-lf472"} 1.4516224e+07 1712991070183
container_memory_working_set_bytes{container="csi-nks-plugin",namespace="kube-system",pod="csi-nks-controller-696d68f589-lf472"} 9.744384e+06 1712991064219
container_memory_working_set_bytes{container="csi-nks-plugin",namespace="kube-system",pod="csi-nks-node-ctcr5"} 8.892416e+06 1712991071232
container_memory_working_set_bytes{container="csi-node-driver-registrar",namespace="kube-system",pod="csi-nks-node-ctcr5"} 1.0551296e+07 1712991070386
container_memory_working_set_bytes{container="csi-provisioner",namespace="kube-system",pod="csi-nks-controller-696d68f589-lf472"} 1.527808e+07 1712991067139
container_memory_working_set_bytes{container="csi-provisioner",namespace="kube-system",pod="nks-nas-csi-controller-ffdf697b7-mcgsv"} 1.1001856e+07 1712991073825
container_memory_working_set_bytes{container="csi-resizer",namespace="kube-system",pod="csi-nks-controller-696d68f589-lf472"} 1.349632e+07 1712991069583
container_memory_working_set_bytes{container="csi-resizer",namespace="kube-system",pod="nks-nas-csi-controller-ffdf697b7-mcgsv"} 1.0604544e+07 1712991068180
container_memory_working_set_bytes{container="csi-snapshotter",namespace="kube-system",pod="csi-nks-controller-696d68f589-lf472"} 9.826304e+06 1712991073588
container_memory_working_set_bytes{container="kube-proxy",namespace="kube-system",pod="kube-proxy-kz9lp"} 1.4635008e+07 1712991062069
container_memory_working_set_bytes{container="liveness-probe",namespace="kube-system",pod="csi-nks-controller-696d68f589-lf472"} 3.592192e+06 1712991065073
container_memory_working_set_bytes{container="liveness-probe",namespace="kube-system",pod="csi-nks-node-ctcr5"} 3.54304e+06 1712991063329
container_memory_working_set_bytes{container="liveness-probe",namespace="kube-system",pod="nks-nas-csi-controller-ffdf697b7-mcgsv"} 5.996544e+06 1712991071347
container_memory_working_set_bytes{container="liveness-probe",namespace="kube-system",pod="nks-nas-csi-node-vw5jh"} 5.971968e+06 1712991063555
container_memory_working_set_bytes{container="nfs",namespace="kube-system",pod="nks-nas-csi-controller-ffdf697b7-mcgsv"} 8.51968e+06 1712991074811
container_memory_working_set_bytes{container="nfs",namespace="kube-system",pod="nks-nas-csi-node-vw5jh"} 8.265728e+06 1712991065139
container_memory_working_set_bytes{container="node-cache",namespace="kube-system",pod="nodelocaldns-mxj2l"} 1.3074432e+07 1712991075802
container_memory_working_set_bytes{container="node-driver-registrar",namespace="kube-system",pod="nks-nas-csi-node-vw5jh"} 1.0272768e+07 1712991071257
container_memory_working_set_bytes{container="snapshot-controller",namespace="kube-system",pod="snapshot-controller-0"} 7.487488e+06 1712991070776
# HELP container_start_time_seconds [ALPHA] Start time of the container since unix epoch in seconds
# TYPE container_start_time_seconds gauge
container_start_time_seconds{container="autoscaler",namespace="kube-system",pod="dns-autoscaler-5b44dd6985-hqll4"} 1.7128267394617064e+09 1712826739461
container_start_time_seconds{container="cilium-agent",namespace="kube-system",pod="cilium-wtq4s"} 1.712826696092166e+09 1712826696092
container_start_time_seconds{container="cil* TLSv1.2 (IN), TLS header, Supplemental data (23):
ium-monitor",namespace="kube-system",pod="cilium-monitor-6vnxj"} 1.7128267794559438e+09 1712826779455
container_start_time_seconds{container="cilium-operator",namespace="kube-system",pod="cilium-operator-7b8cdb67b4-gkwwn"} 1.7128266862560472e+09 1712826686256
container_start_time_seconds{container="coredns",namespace="kube-system",pod="coredns-8657cc7cb4-jj9xv"} 1.712826738990283e+09 1712826738990
container_start_time_seconds{container="csi-attacher",namespace="kube-system",pod="csi-nks-controller-696d68f589-lf472"} 1.712826745634325e+09 1712826745634
container_start_time_seconds{container="csi-external-health-monitor-controller",namespace="kube-system",pod="csi-nks-controller-696d68f589-lf472"} 1.7128267395687413e+09 1712826739568
container_start_time_seconds{container="csi-nks-plugin",namespace="kube-system",pod="csi-nks-controller-696d68f589-lf472"} 1.7128267514047596e+09 1712826751404
container_start_time_seconds{container="csi-nks-plugin",namespace="kube-system",pod="csi-nks-node-ctcr5"} 1.7128266983776684e+09 1712826698377
container_start_time_seconds{container="csi-node-driver-registrar",namespace="kube-system",pod="csi-nks-node-ctcr5"} 1.7128266954860532e+09 1712826695486
container_start_time_seconds{container="csi-provisioner",namespace="kube-system",pod="csi-nks-controller-696d68f589-lf472"} 1.7128267427615244e+09 1712826742761
container_start_time_seconds{container="csi-provisioner",namespace="kube-system",pod="nks-nas-csi-controller-ffdf697b7-mcgsv"} 1.7128267376492622e+09 1712826737649
container_start_time_seconds{container="csi-resizer",namespace="kube-system",pod="csi-nks-controller-696d68f589-lf472"} 1.7128267484574661e+09 1712826748457
container_start_time_seconds{container="csi-resizer",namespace="kube-system",pod="nks-nas-csi-controller-ffdf697b7-mcgsv"} 1.7128267408556705e+09 1712826740855
container_start_time_seconds{container="csi-snapshotter",namespace="kube-system",pod="csi-nks-controller-696d68f589-lf472"} 1.7128267513063169e+09 1712826751306
container_start_time_seconds{container="kube-proxy",namespace="kube-system",pod="kube-proxy-kz9lp"} 1.7128266866632397e+09 1712826686663
container_start_time_seconds{container="liveness-probe",namespace="kube-system",pod="csi-nks-controller-696d68f589-lf472"} 1.7128267514924357e+09 1712826751492
container_start_time_seconds{container="liveness-probe",namespace="kube-system",pod="csi-nks-node-ctcr5"} 1.7128267009935925e+09 1712826700993
container_start_time_seconds{container="liveness-probe",namespace="kube-system",pod="nks-nas-csi-controller-ffdf697b7-mcgsv"} 1.7128267409581392e+09 1712826740958
container_start_time_seconds{container="liveness-probe",namespace="kube-system",pod="nks-nas-csi-node-vw5jh"} 1.7128267064253006e+09 1712826706425
container_start_time_seconds{container="nfs",namespace="kube-system",pod="nks-nas-csi-controller-ffdf697b7-mcgsv"} 1.712826741045665e+09 1712826741045
container_start_time_seconds{container="nfs",namespace="kube-system",pod="nks-nas-csi-node-vw5jh"} 1.7128267108891766e+09 1712826710889
container_start_time_seconds{container="node-cache",namespace="kube-system",pod="nodelocaldns-mxj2l"} 1.7128266868412979e+09 1712826686841
container_start_time_seconds{container="node-driver-registrar",namespace="kube-system",pod="nks-nas-csi-node-vw5jh"} 1.7128267065112371e+09 1712826706511
container_start_time_seconds{container="snapshot-controller",namespace="kube-system",pod="snapshot-controller-0"} 1.7128267394360812e+09 1712826739436
# HELP node_cpu_usage_seconds_total [ALPHA] Cumulative cpu time consumed by the node in core-seconds
# TYPE node_cpu_usage_seconds_total counter
node_cpu_usage_seconds_total 14908.95892843 1712991076894
# HELP node_memory_working_set_bytes [ALPHA] Current working set of the node in bytes
# TYPE node_memory_working_set_bytes gauge
node_memory_working_set_bytes 1.88766208e+09 1712991076894
# HELP pod_cpu_usage_seconds_total [ALPHA] Cumulative cpu time consumed by the pod in core-seconds
# TYPE pod_cpu_usage_seconds_total counter
pod_cpu_usage_seconds_total{namespace="kube-system",pod="cilium-monitor-6vnxj"} * TLSv1.2 (IN), TLS header, Supplemental data (23):
7.746592201 1712991076627
pod_cpu_usage_seconds_total{namespace="kube-system",pod="cilium-operator-7b8cdb67b4-gkwwn"} 70.728407971 1712991064653
pod_cpu_usage_seconds_total{namespace="kube-system",pod="cilium-wtq4s"} 708.180507063 1712991067919
pod_cpu_usage_seconds_total{namespace="kube-system",pod="coredns-8657cc7cb4-jj9xv"} 170.080593489 1712991077368
pod_cpu_usage_seconds_total{namespace="kube-system",pod="csi-nks-controller-696d68f589-lf472"} 580.57088305 1712991074168
pod_cpu_usage_seconds_total{namespace="kube-system",pod="csi-nks-node-ctcr5"} 70.224898467 1712991064619
pod_cpu_usage_seconds_total{namespace="kube-system",pod="dns-autoscaler-5b44dd6985-hqll4"} 44.267483681 1712991067635
pod_cpu_usage_seconds_total{namespace="kube-system",pod="kube-proxy-kz9lp"} 41.898383743 1712991065960
pod_cpu_usage_seconds_total{namespace="kube-system",pod="nks-nas-csi-controller-ffdf697b7-mcgsv"} 171.09327922 1712991078359
pod_cpu_usage_seconds_total{namespace="kube-system",pod="nks-nas-csi-node-vw5jh"} 63.596281541 1712991063782
pod_cpu_usage_seconds_total{namespace="kube-system",pod="nodelocaldns-mxj2l"} 210.726966279 1712991066719
pod_cpu_usage_seconds_total{namespace="kube-system",pod="snapshot-controller-0"} 54.743027879 1712991062708
# HELP pod_memory_working_set_bytes [ALPHA] Current working set of the pod in bytes
# TYPE pod_memory_working_set_bytes gauge
pod_memory_working_set_bytes{namespace="kube-system",pod="cilium-monitor-6vnxj"} 3.743744e+06 1712991076627
pod_memory_working_set_bytes{namespace="kube-system",pod="cilium-operator-7b8cdb67b4-gkwwn"} 1.7383424e+07 1712991064653
pod_memory_working_set_bytes{namespace="kube-system",pod="cilium-wtq4s"} 8.1690624e+07 1712991067919
pod_memory_working_set_bytes{namespace="kube-system",pod="coredns-8657cc7cb4-jj9xv"} 1.5388672e+07 1712991077368
pod_memory_working_set_bytes{namespace="kube-system",pod="csi-nks-controller-696d68f589-lf472"} 7.9450112e+07 1712991074168
pod_memory_working_set_bytes{namespace="kube-system",pod="csi-nks-node-ctcr5"} 2.3789568e+07 1712991064619
pod_memory_working_set_bytes{namespace="kube-system",pod="dns-autoscaler-5b44dd6985-hqll4"} 1.0330112e+07 1712991067635
pod_memory_working_set_bytes{namespace="kube-system",pod="kube-proxy-kz9lp"} 1.548288e+07 1712991065960
pod_memory_working_set_bytes{namespace="kube-system",pod="nks-nas-csi-controller-ffdf697b7-mcgsv"} 3.6864e+07 1712991078359
pod_memory_working_set_bytes{namespace="kube-system",pod="nks-nas-csi-node-vw5jh"} 2.5186304e+07 1712991063782
pod_memory_working_set_bytes{namespace="kube-system",pod="nodelocaldns-mxj2l"} 1.3824e+07 1712991066719
pod_memory_working_set_bytes{namespace="kube-system",pod="snapshot-controller-0"} 8.323072e+06 1712991062708
# HELP scrape_error [ALPHA] 1 if there was an error while getting container metrics, 0 otherwise
# TYPE scrape_error gauge
scrape_error 0

결론은 NKS의 ControlPlane에 존재하는 API 서버에는 --enable-aggregator-routing 플래그가 true로 설정되어 있을 것이고 워커노드의 kubelet이 내장하는 메트릭 서버를 통해서 각 노드의 메트릭 지표를 수집합니다.

Link