kubernetes相关组件监控指标采集

kevingrace kevingrace     2022-12-13     236

关键词:

 

线上部署了kuberneter集群环境,需要在zabbix上对相关组件运行情况进行监控。kuberneter组件监控指标分为固定指标数据采集和动态指标数据采集。其中,固定指标数据在终端命令行可以通过metrics接口获取, 在zabbix里"自动发现";动态指标数据通过python脚本获获取,并返回JSON 字符串格式,在zabbix里添加模板或配置主机的自动发现策略。

一、固定指标数据采集(zabbix自动发现,采集间隔建议5min)

1. Master指标【采集范围:Master集群的3个节点,测试环境为192.168.10.93/94/95】

1、指标标识:kube_apiserver_process_cpu_seconds_total
采集指令示例:curl -s --cacert kubernetes-ca/ca.pem --cert kubernetes-ca/admin.pem --key kubernetes-ca/admin-key.pem  https://192.168.10.93:6443/metrics | grep process_cpu_seconds_total | grep -v ‘#‘ | awk ‘print $2‘

2、指标标识:kube_apiserver_process_open_fds
采集指令示例:curl -s --cacert kubernetes-ca/ca.pem --cert kubernetes-ca/admin.pem --key kubernetes-ca/admin-key.pem  https://192.168.10.93:6443/metrics | grep process_open_fds | grep -v ‘#‘ | awk ‘print $2‘

3、指标标识:kube_apiserver_process_virtual_memory_bytes
采集指令示例:curl -s --cacert kubernetes-ca/ca.pem --cert kubernetes-ca/admin.pem --key kubernetes-ca/admin-key.pem  https://192.168.10.93:6443/metrics | grep process_virtual_memory_bytes | grep -v ‘#‘ | awk ‘print $2‘

4、指标标识:kube_apiserver_rest_client_requests_total_200_put
采集指令示例:curl -s --cacert kubernetes-ca/ca.pem --cert kubernetes-ca/admin.pem --key kubernetes-ca/admin-key.pem  https://192.168.10.93:6443/metrics | grep rest_client_requests_total | grep -v ‘#‘ | grep PUT | grep 200 | awk ‘print $2‘

5、指标标识:kube_apiserver_rest_client_requests_total_200_get
采集指令示例:curl -s --cacert kubernetes-ca/ca.pem --cert kubernetes-ca/admin.pem --key kubernetes-ca/admin-key.pem  https://192.168.10.93:6443/metrics | grep rest_client_requests_total | grep -v ‘#‘ | grep GET | grep 200 | awk ‘print $2‘

6、指标标识:etcd_debugging_mvcc_db_total_size_in_bytes
采集指令示例:curl -s --cacert etcd/ca.pem --cert etcd/healthcheck-client.pem --key etcd/healthcheck-client-key.pem  https://192.168.10.93:2379/metrics | grep etcd_debugging_mvcc_db_total_size_in_bytes | grep -v ‘#‘ | awk ‘print $2‘

7、指标标识:etcd_server_has_leader
采集指令示例:curl -s --cacert etcd/ca.pem --cert etcd/healthcheck-client.pem --key etcd/healthcheck-client-key.pem  https://192.168.10.93:2379/metrics | grep etcd_server_has_leader | grep -v ‘#‘ | awk ‘print $2‘

8、指标标识:etcd_server_leader_changes_seen_total
采集指令示例:curl -s --cacert etcd/ca.pem --cert etcd/healthcheck-client.pem --key etcd/healthcheck-client-key.pem  https://192.168.10.93:2379/metrics | grep etcd_server_leader_changes_seen_total | grep -v ‘#‘ | awk ‘print $2‘

9、指标标识:etcd_server_proposals_failed_total
采集指令示例:curl -s --cacert etcd/ca.pem --cert etcd/healthcheck-client.pem --key etcd/healthcheck-client-key.pem  https://192.168.10.93:2379/metrics | grep etcd_server_proposals_failed_total | grep -v ‘#‘ | awk ‘print $2‘

10、指标标识:etcd_process_cpu_seconds_total
采集指令示例:curl -s --cacert etcd/ca.pem --cert etcd/healthcheck-client.pem --key etcd/healthcheck-client-key.pem  https://192.168.10.93:2379/metrics | grep process_cpu_seconds_total | grep -v ‘#‘ | awk ‘print $2‘

11、指标标识:etcd_process_open_fds
采集指令示例:curl -s --cacert etcd/ca.pem --cert etcd/healthcheck-client.pem --key etcd/healthcheck-client-key.pem  https://192.168.10.93:2379/metrics | grep process_open_fds | grep -v ‘#‘ | awk ‘print $2‘

12、指标标识:etcd_process_virtual_memory_bytes
采集指令示例:curl -s --cacert etcd/ca.pem --cert etcd/healthcheck-client.pem --key etcd/healthcheck-client-key.pem  https://192.168.10.93:2379/metrics | grep process_virtual_memory_bytes | grep -v ‘#‘ | awk ‘print $2‘

13、指标标识:kube_controller_manager_process_cpu_seconds_total
采集指令示例:curl -s 192.168.10.93:10252/metrics | grep process_cpu_seconds_total | grep -v ‘#‘ | awk ‘print $2‘

14、指标标识:kube_controller_manager_process_open_fds
采集指令示例:curl -s 192.168.10.93:10252/metrics | grep process_open_fds | grep -v ‘#‘ | awk ‘print $2‘

15、指标标识:kube_controller_manager_process_virtual_memory_bytes
采集指令示例:curl -s 192.168.10.93:10252/metrics | grep process_virtual_memory_bytes | grep -v ‘#‘ | awk ‘print $2‘

16、指标标识:kube_controller_manager_rest_client_requests_total_200_put
采集指令示例:curl -s 192.168.10.93:10252/metrics | grep rest_client_requests_total | grep -v ‘#‘ | grep PUT | grep 200 | awk ‘print $2‘

17、指标标识:kube_controller_manager_rest_client_requests_total_200_get
采集指令示例:curl -s 192.168.10.93:10252/metrics | grep rest_client_requests_total | grep -v ‘#‘ | grep GET | grep 200 | awk ‘print $2‘

18、指标标识:kube_scheduler_process_cpu_seconds_total
采集指令示例:curl -s 192.168.10.93:10251/metrics | grep process_cpu_seconds_total | grep -v ‘#‘ | awk ‘print $2‘

19、指标标识:kube_scheduler_process_open_fds
采集指令示例:curl -s 192.168.10.93:10251/metrics | grep process_open_fds | grep -v ‘#‘ | awk ‘print $2‘

20、指标标识:kube_scheduler_process_virtual_memory_bytes
采集指令示例:curl -s 192.168.10.93:10251/metrics | grep process_virtual_memory_bytes | grep -v ‘#‘ | awk ‘print $2‘

21、指标标识:kube_scheduler_rest_client_requests_total_200_put
采集指令示例:curl -s 192.168.10.93:10251/metrics | grep rest_client_requests_total | grep -v ‘#‘ | grep PUT | grep 200 | awk ‘print $2‘

22、指标标识:kube_scheduler_rest_client_requests_total_200_get
采集指令示例:curl -s 192.168.10.93:10251/metrics | grep rest_client_requests_total | grep -v ‘#‘ | grep GET | grep 200 | awk ‘print $2‘

2. Node指标【采集范围:Node的5个节点,测试环境为192.168.10.230/231/232/233/234】

1、指标标识:kubelet_docker_operations_errors_inspect_container
采集指令示例:curl -s 192.168.10.230:10255/metrics | grep kubelet_docker_operations_errors | grep -v ‘#‘ | grep inspect_container | awk ‘print $2‘

2、指标标识:kubelet_docker_operations_errors_inspect_image
采集指令示例:curl -s 192.168.10.230:10255/metrics | grep kubelet_docker_operations_errors | grep -v ‘#‘ | grep inspect_image | awk ‘print $2‘

3、指标标识:kubelet_docker_operations_errors_start_container
采集指令示例:curl -s 192.168.10.230:10255/metrics | grep kubelet_docker_operations_errors | grep -v ‘#‘ | grep start_container | awk ‘print $2‘

4、指标标识:kubelet_docker_operations_errors_stop_container
采集指令示例:curl -s 192.168.10.230:10255/metrics | grep kubelet_docker_operations_errors | grep -v ‘#‘ | grep stop_container | awk ‘print $2‘

5、指标标识:kubelet_node_config_error
采集指令示例:curl -s 192.168.10.230:10255/metrics | grep kubelet_node_config_error | grep -v ‘#‘ | awk ‘print $2‘

6、指标标识:kubelet_process_cpu_seconds_total
采集指令示例:curl -s 192.168.10.230:10255/metrics | grep process_cpu_seconds_total | grep -v ‘#‘ | awk ‘print $2‘

7、指标标识:kubelet_process_open_fds
采集指令示例:curl -s 192.168.10.230:10255/metrics | grep process_open_fds | grep -v ‘#‘ | awk ‘print $2‘

8、指标标识:kubelet_process_virtual_memory_bytes
采集指令示例:curl -s 192.168.10.230:10255/metrics | grep process_virtual_memory_bytes | grep -v ‘#‘ | awk ‘print $2‘

9、指标标识:kubelet_rest_client_requests_total_200_put
采集指令示例:curl -s 192.168.10.230:10255/metrics | grep rest_client_requests_total | grep -v ‘#‘ | grep PUT | grep 200 | awk ‘print $2‘

10、指标标识:kubelet_rest_client_requests_total_200_get
采集指令示例:curl -s 192.168.10.230:10255/metrics | grep rest_client_requests_total | grep -v ‘#‘ | grep GET | grep 200 | awk ‘print $2‘

11、指标标识:kube_proxy_process_cpu_seconds_total
采集指令示例:curl -s 192.168.10.230:10249/metrics | grep process_cpu_seconds_total | grep -v ‘#‘ | awk ‘print $2‘

12、指标标识:kube_proxy_process_open_fds
采集指令示例:curl -s 192.168.10.230:10249/metrics | grep process_open_fds | grep -v ‘#‘ | awk ‘print $2‘

13、指标标识:kube_proxy_process_virtual_memory_bytes
采集指令示例:curl -s 192.168.10.230:10249/metrics | grep process_virtual_memory_bytes | grep -v ‘#‘ | awk ‘print $2‘

14、指标标识:kube_proxy_rest_client_requests_total_200_put
采集指令示例:curl -s 192.168.10.230:10249/metrics | grep rest_client_requests_total | grep -v ‘#‘ | grep PUT | grep 200 | awk ‘print $2‘

15、指标标识:kube_proxy_rest_client_requests_total_200_get
采集指令示例:curl -s 192.168.10.230:10249/metrics | grep rest_client_requests_total | grep -v ‘#‘ | grep GET | grep 200 | awk ‘print $2‘

3. 整体指标【采集Node集群中任一节点即可,测试环境可采集其中一台192.168.10.230即可。 在采集对应node节点的指标数据中,如果node节点宕机,则监控指标数据就会失败。为了防止这种情况,采集的IP可以建议修改为Nginx-Ingress IP或内部Service IP

1、指标标识:coredns_process_cpu_seconds_total
采集指令示例:curl -s 192.168.10.230:9153/metrics | grep process_cpu_seconds_total | grep -v ‘#‘ | awk ‘print $2‘

2、指标标识:coredns_process_open_fds
采集指令示例:curl -s 192.168.10.230:9153/metrics | grep process_open_fds | grep -v ‘#‘ | awk ‘print $2‘

3、指标标识:coredns_process_virtual_memory_bytes
采集指令示例:curl -s 192.168.10.230:9153/metrics | grep process_virtual_memory_bytes | grep -v ‘#‘ | awk ‘print $2‘

4、指标标识:kube_state_metrics_metrics_process_cpu_seconds_total
采集指令示例:curl -s 192.168.10.230:8081/metrics | grep process_cpu_seconds_total | grep -v ‘#‘ | awk ‘print $2‘

5、指标标识:kube_state_metrics_metrics_process_open_fds
采集指令示例:curl -s 192.168.10.230:8081/metrics | grep process_open_fds | grep -v ‘#‘ | awk ‘print $2‘

6、指标标识:kube_state_metrics_metrics_process_virtual_memory_bytes
采集指令示例:curl -s 192.168.10.230:8081/metrics | grep process_virtual_memory_bytes | grep -v ‘#‘ | awk ‘print $2‘

 二、固定指标数据采集

动态指标采集的python脚本(将各个动态指标数据采集脚本整合到了一个脚本里)

[[email protected] ~]# cat zabbix-metrics-find.py 
#!/usr/bin/env python
# coding:utf-8

import json
import os
import re
import sys

#kube-state-metrics自动发现for zabbix
#python传参value/values(不区分大小写)时显示监控值,其他参数或无参数显示监控KEY
#采集范围:任一Node节点,测试可在192.168.10.230,此IP后续建议改为Nginx-Ingress的负载IP,或内部service IP
#采集间隔建议5min
#Author: GaoKan
#Created: 2019-5-22
#Updated:
def main():
    ip = ‘192.168.10.230‘
    flag = ‘key‘
    if len(sys.argv) > 1:
        if sys.argv[1].lower() in (‘value‘, ‘values‘):
            flag = ‘value‘
    keys = []
    values = []    
    metrics_dict = 
        #DaemonSet-Metrics
        ‘kube_daemonset_status_number_misscheduled‘ : 
            ‘forshort‘ : ‘ds_misscheduled‘,
            ‘tags‘ : [‘namespace‘, ‘daemonset‘,],            
        ,
        ‘kube_daemonset_status_number_unavailable‘ : 
            ‘forshort‘ : ‘ds_unavailable‘,
            ‘tags‘ : [‘namespace‘, ‘daemonset‘,],
        ,
        #Deployment-Metrics
        ‘kube_deployment_status_replicas_unavailable‘ : 
            ‘forshort‘ : ‘deploy_unavailable‘,
            ‘tags‘ : [‘namespace‘, ‘deployment‘,],
        ,
        #Pod-Metrics
        ‘kube_pod_container_status_waiting_reason‘ : 
            ‘forshort‘ : ‘po_cntr_waiting_reason‘,
            ‘tags‘ : [‘namespace‘, ‘pod‘, ‘container‘, ‘reason‘,],
        ,
        ‘kube_pod_container_status_terminated_reason‘ : 
            ‘forshort‘ : ‘po_cntr_terminated_reason‘,
            ‘tags‘ : [‘namespace‘, ‘pod‘, ‘container‘, ‘reason‘,],
        ,
        ‘kube_pod_container_status_restarts_total‘ : 
            ‘forshort‘ : ‘po_cntr_restarts_total‘,
            ‘tags‘ : [‘namespace‘, ‘pod‘, ‘container‘,],
        ,
        #ReplicaSet-Metrics
        ‘kube_replicaset_status_ready_replicas‘ : 
            ‘forshort‘ : ‘rs_ready_replicas‘,
            ‘tags‘ : [‘namespace‘, ‘replicaset‘,],           
        ,
        ‘kube_replicaset_status_replicas‘ : 
            ‘forshort‘ : ‘rs_replicas‘,
            ‘tags‘ : [‘namespace‘, ‘replicaset‘,],  
        ,
        #Endpoint-Metrics
        ‘kube_endpoint_address_not_ready‘ : 
            ‘forshort‘ : ‘ep_not_ready‘,
            ‘tags‘ : [‘namespace‘, ‘endpoint‘,],                       
        ,
       
    metrics = os.popen(‘curl -s ‘ + ip + ‘:8080/metrics‘)   
    for row in metrics:
        if row.startswith(‘#‘):
            continue
        pos1 = row.find(‘‘)
        pos2 = row.find(‘‘)
        if row[: pos1] in metrics_dict.keys():
            key = metrics_dict[row[: pos1]][‘forshort‘]
            for tag in metrics_dict[row[: pos1]][‘tags‘]:
                key += ‘_‘ + re.search(r‘‘ + tag + ‘=\"(.*?)\"‘, row[pos1 + 1 : pos2]).group(1)
            keys.append("#METRICSNAME" : key)
            values.append("#METRICSVALUE" : row[pos2 + 2 : -1])                    
    if flag == ‘value‘:
        print(json.dumps("data":values,indent = 4))
    else:
        print(json.dumps("data":keys,indent = 4))

if __name__ == "__main__":
    main()

执行脚本,返回json字符串格式(执行结果显示的是kubernetes所有的对象资源,如pod,deploy,service等的运行状态,根据跑的业务量,可能会有成百上千个)

[[email protected] ~]# python zabbix-metrics-find.py |head -30

    "data": [
        
            "#METRICSNAME": "ds_misscheduled_test-rg_test-rg-005"
        , 
        
            "#METRICSNAME": "ds_misscheduled_cattle-system_cattle-node-agent"
        , 
        
            "#METRICSNAME": "ds_misscheduled_test-rg_test-rg-001"
        , 
        
            "#METRICSNAME": "ds_misscheduled_test-rg_test-rg-002"
        , 
        
            "#METRICSNAME": "ds_misscheduled_test-rg_test-rg-003"
        , 
        
            "#METRICSNAME": "ds_misscheduled_test-rg_test-rg-004"
        , 
        
            "#METRICSNAME": "ds_unavailable_test-rg_test-rg-003"
        , 
        
            "#METRICSNAME": "ds_unavailable_test-rg_test-rg-004"
        , 
        
            "#METRICSNAME": "ds_unavailable_test-rg_test-rg-005"
        , 
...................
...................
       
            "#METRICSNAME": "po_cntr_restarts_total_test-rg_test-rg-005-jvkm6_test-rg-005"
        , 
        
            "#METRICSNAME": "po_cntr_restarts_total_cattle-system_cattle-node-agent-mdl9x_agent"
        , 
        
            "#METRICSNAME": "po_cntr_restarts_total_test-rg_test-rg-005-wpsbq_test-rg-005"
        , 
        
            "#METRICSNAME": "po_cntr_restarts_total_test-rg_test-rg-004-9s57x_test-rg-004"
        , 
        
            "#METRICSNAME": "po_cntr_restarts_total_test-rg_test-rg-005-wxk54_test-rg-005"
        , 
        
            "#METRICSNAME": "po_cntr_restarts_total_cattle-system_cattle-node-agent-r46bz_agent"
        , 
        
            "#METRICSNAME": "po_cntr_restarts_total_default_mysql-ceph-test-76697d98d6-4gj9v_mysql-ceph-test"
        , 
        
            "#METRICSNAME": "po_cntr_restarts_total_kube-system_coredns-5cbf6655f-6wxqz_coredns"
        , 
        
            "#METRICSNAME": "po_cntr_restarts_total_kube-system_kube-state-metrics-576fbb446d-ctl4p_addon-resizer"
        , 
        
            "#METRICSNAME": "po_cntr_restarts_total_kube-system_kube-state-metrics-576fbb446d-ctl4p_kube-state-metrics"
        ,

...................
...................
        
            "#METRICSNAME": "rs_ready_replicas_test_nginx-5c689d88bb"
        , 
        
            "#METRICSNAME": "rs_ready_replicas_two-test_aicase-docker-5784b5749b"
        , 
        
            "#METRICSNAME": "rs_ready_replicas_cattle-system_cattle-cluster-agent-d59dbdb55"
        , 
        
            "#METRICSNAME": "rs_ready_replicas_test_nginx-589dcbcbd6"
        , 
        
            "#METRICSNAME": "rs_ready_replicas_test_nginx-5b677cdf4f"
        , 
        
            "#METRICSNAME": "rs_ready_replicas_default_mysql-ceph-test-76697d98d6"
        , 
        
            "#METRICSNAME": "rs_ready_replicas_kube-system_kube-state-metrics-75bbc44548"
        , 
        
            "#METRICSNAME": "rs_ready_replicas_kube-system_traefik-ingress-controller-6db4877748"
        , 
        
            "#METRICSNAME": "rs_ready_replicas_two-test_aicase-docker-57d445cbf"
        
    ]

查询values

[[email protected] ~]# python zabbix-metrics-find.py values

    "data": [
        
            "#METRICSVALUE": "0"
        , 
        
            "#METRICSVALUE": "0"
        , 
        
            "#METRICSVALUE": "0"
        , 
        
            "#METRICSVALUE": "0"
        , 
        
            "#METRICSVALUE": "0"
        , 
        
            "#METRICSVALUE": "0"
        , 
        
            "#METRICSVALUE": "0"
        , 
        
            "#METRICSVALUE": "0"
        , 
        
            "#METRICSVALUE": "0"
        , 
.................
.................
        
            "#METRICSVALUE": "1"
        , 
        
            "#METRICSVALUE": "27"
        , 
        
            "#METRICSVALUE": "0"
        , 
        
            "#METRICSVALUE": "3"
        , 
        
            "#METRICSVALUE": "0"
        ,
.................
.................
        
            "#METRICSVALUE": "1"
        , 
        
            "#METRICSVALUE": "0"
        , 
        
            "#METRICSVALUE": "2"
        , 
        
            "#METRICSVALUE": "1"
        , 
        
            "#METRICSVALUE": "0"
        , 
        
            "#METRICSVALUE": "0"
        , 
        
            "#METRICSVALUE": "0"
        , 
        
            "#METRICSVALUE": "2"
        , 
        
            "#METRICSVALUE": "0"
        
    ]

 

共享不同 kubernetes 组件的指标的常用选项是啥?

】共享不同kubernetes组件的指标的常用选项是啥?【英文标题】:Whatarethecommonoptionsforshardingmetricsfordifferentkuberentescomponents?共享不同kubernetes组件的指标的常用选项是什么?【发布时间】:2017-08-1209:47:24【问题描述】:1)在kubernetes... 查看详情

prometheus监控指标的label注入方法

...的转换层,例如Prometheus社区提供了一个名为NodeExporter的组件,该组件的作用为采集节点的CPU,内存等指标并转换以符合Prometheus标准的格式暴露出去。一般对于代码改动难度较大或已经有成熟的指标暴露机制的程序,这种方式是... 查看详情

prometheus简介(基于kubernetes)

全栈工程师开发手册(作者:栾鹏)#Prometheus简介(基于Kubernetes)本文中不包含Alertmanager和远程存储的内容,下次有时间在补充!!!##1、Prometheus简介Prometheus是一个开源的系统监控工具。根据配置的任务(job)以http/s周期性的... 查看详情

kubernetes边缘节点抓不到监控指标?试试这个方法!

KubeSpherev3.1.0通过集成KubeEdge,将节点和资源的管理延伸到了边缘,也是KubeSphere正式支持边缘计算的第一个版本。笔者也第一时间搭建和试用了边缘节点相关的功能,但是在边缘节点纳管之后遇到了一些监控的小问题,在排查过... 查看详情

运维工程师监控工作之elasticsearch关键指标采集方法

...赖Java环境运行,除了监控ES所在服务器的操作系统、JVM等相关指标外,本文重点关注Elasticsearch自身监控指标。   ES核心功能是对外提供数据搜索服务,因此用户搜索请求的吞吐量、延迟时间是重点关注的,其内部是通... 查看详情

prometheus使用pushgateway进行数据上报采集

参考技术APushgateway是prometheus的一个重要组件,利用该组件可以实现自动以监控指标,从字面意思来看,该部件不是将数据push到prometheus,而是作为一个中间组件收集外部push来的数据指标,prometheus会定时从pushgateway上pull数据。【... 查看详情

简述kubernetesmetricservice?

⌄参考技术A在Kubernetes从1.10版本后采用MetricsServer作为默认的性能数据采集和监控,主要用于提供核心指标(CoreMetrics),包括Node、Pod的CPU和内存使用指标。对其他自定义指标(CustomMetrics)的监控则由Prometheus等组件来完成。... 查看详情

使用 Prometheus Operator 监控自定义 kubernetes pod 指标

】使用PrometheusOperator监控自定义kubernetespod指标【英文标题】:MonitorcustomkubernetespodmetricsusingPrometheusOperator【发布时间】:2021-02-1201:28:28【问题描述】:我有一个现成的Kubernetes集群,配置了grafana+prometheus(operator)监控。我使用我的... 查看详情

kubernetes集群部署metricsserver获取集群metric数据(代码片段)

...某个Pod的指标信息获取全部Namespace下的Pod的指标信息前言Kubernetes从v1.8开始,资源使用情况的监控可以通过MetricsAPI的形式获取,具体的组件为MetricsServer,用来替换之前的Heapster,Heapster从v1.11开始逐渐被废弃。什么... 查看详情

在 prometheus 中排除 Kubernetes Metrics 监控

】在prometheus中排除KubernetesMetrics监控【英文标题】:ExcludeKubernetesMetricsmonitoringinprometheus【发布时间】:2021-09-1203:24:53【问题描述】:我们正在通过Prometheus监控我们的Kubernetes集群指标。它工作正常,但我们不想监控所有默认指标... 查看详情

zabbix监控系统性能采集指标

             监控项目                   &nbs 查看详情

交换机监控

...标都有采集到。采集维度是1分钟。本文主要介绍交换机监控系统后台逻辑。  监控指 查看详情

opengauss数据库源码解析系列文章——ai技术之“指标采集预测与异常检测”(代码片段)

上一篇介绍了“8.4智能索引推荐”的相关内容,本篇我们介绍“8.5指标采集、预测与异常检测”的相关精彩内容介绍。8.5指标采集、预测与异常检测数据库指标监控与异常检测技术,通过监控数据库指标,并基于时序... 查看详情

1.prometheus组件基础

...、拉监控数据的主要的两种类型:Metrics、logs二、prometheus组件组成 转自:https://my.oschina.net/54188zz/blog/3070367PrometheusServerPrometheusServer是Prometheus组件中的核心部分,负责实现对监控数据的获取,存储以及查询。PrometheusServer可以... 查看详情

monitor监控架构(代码片段)

...一个采集器,采集机器的CPU、内存、硬盘、IO、网络相关的指标远程探针式,如:选取一个中心机器做探针,同时探测很多机器的PING连通性采集器的数据,推给服务端的两种方法:直接推给时序库先推给Kafka,再... 查看详情

可观测性:监控与日志

...。事件监控:normal的事件变成-warning事件Heapster每一个Kubernetes节点上有一个cadvisor,是负责数据采集的组件。当cadvisor把数据采集完成,Kubernetes会把cadvisor采集到的数据进行包裹,暴露成相应的API。以下是三种API接... 查看详情

prometheus的工作原理是啥?

...监控系统的开源版本。在2016年,Prometheus加入CNCF,成为继Kubernetes之后第二个被CNCF托管的项目。随着Kubernetes在容器编排领头羊地位的确立,Prometheus也成为Kubernetes容器监控的标配。监控系统的总体架构大多是类似的,都有数据采... 查看详情

性能监控之golang应用接入prometheus监控(代码片段)

...文快速为你介绍如何使用官方版Golang库来暴露Golangruntime相关的数据,以及其它一些基本简单的示例,并使用Prometheus监控服务来采集指标展示数据。​TIP 查看详情