Grafana 插件(grafana/v1-alpha)
Grafana 插件是一个可选插件,用于脚手架生成 Grafana Dashboard,帮助你查看由使用 controller-runtime 的项目导出的默认指标。
何时使用?
- 当你希望使用 Grafana 通过 Prometheus 查看 controller-metrics 导出的指标。
如何使用?
前置条件
- 项目需使用 controller-runtime 暴露 默认控制器指标,并被 Prometheus 采集。
- 可访问 Prometheus:
- Prometheus 需暴露可访问的 endpoint(例如
prometheus-operator的http://prometheus-k8s.monitoring.svc:9090)。 - 该 endpoint 已配置为 Grafana 的数据源。参考 Add a data source。
- Prometheus 需暴露可访问的 endpoint(例如
- 可访问 Grafana,并确保:
- 拥有仪表盘编辑权限(Dashboard edit permission)。
- 已配置 Prometheus 数据源。

基本用法
Grafana 插件挂载在 init 与 edit 子命令上:
# 使用 grafana 插件初始化新项目
kubebuilder init --plugins grafana.kubebuilder.io/v1-alpha
# 在已有项目上启用 grafana 插件
kubebuilder edit --plugins grafana.kubebuilder.io/v1-alpha
插件会创建一个新目录并在其中生成 JSON 文件(例如 grafana/controller-runtime-metrics.json)。
使用演示
如下动图展示了在项目中启用该插件:

如何在 Grafana 中导入这些 Dashboard
- 复制 JSON 文件内容。
- 打开
<your-grafana-url>/dashboard/import,按指引导入新的仪表盘。 - 将 JSON 粘贴到 “Import via panel json”,点击 “Load”。

- 选择作为数据源的 Prometheus。

- 成功导入后,Dashboard 即可使用。
Dashboard 说明
Controller Runtime Reconciliation 总数与错误数
- 指标:
controller_runtime_reconcile_totalcontroller_runtime_reconcile_errors_total
- 查询:
sum(rate(controller_runtime_reconcile_total{job="$job"}[5m])) by (instance, pod)sum(rate(controller_runtime_reconcile_errors_total{job="$job"}[5m])) by (instance, pod)
- 描述:
- 近 5 分钟内 Reconcile 总次数的每秒速率。
- 近 5 分钟内 Reconcile 错误次数的每秒速率。
- 示例:

控制器 CPU 与内存使用
- 指标:
process_cpu_seconds_totalprocess_resident_memory_bytes
- 查询:
rate(process_cpu_seconds_total{job="$job", namespace="$namespace", pod="$pod"}[5m]) * 100process_resident_memory_bytes{job="$job", namespace="$namespace", pod="$pod"}
- 描述:
- 近 5 分钟内 CPU 使用率的每秒速率。
- 控制器进程的常驻内存字节数。
- 示例:

P50/90/99 工作队列等待时长(秒)
- 指标:
workqueue_queue_duration_seconds_bucket
- 查询:
histogram_quantile(0.50, sum(rate(workqueue_queue_duration_seconds_bucket{job="$job", namespace="$namespace"}[5m])) by (instance, name, le))
- 描述:
- 条目在工作队列中等待被取用的时长。
- 示例:

P50/90/99 工作队列处理时长(秒)
- 指标:
workqueue_work_duration_seconds_bucket
- 查询:
histogram_quantile(0.50, sum(rate(workqueue_work_duration_seconds_bucket{job="$job", namespace="$namespace"}[5m])) by (instance, name, le))
- 描述:
- 从工作队列中取出并处理一个条目所花费的时间。
- 示例:

Add Rate in Work Queue
- Metrics
- workqueue_adds_total
- Query:
- sum(rate(workqueue_adds_total{job=“$job”, namespace=“$namespace”}[5m])) by (instance, name)
- Description
- Per-second rate of items added to work queue
- Sample:

Retries Rate in Work Queue
- Metrics
- workqueue_retries_total
- Query:
- sum(rate(workqueue_retries_total{job=“$job”, namespace=“$namespace”}[5m])) by (instance, name)
- Description
- Per-second rate of retries handled by workqueue
- Sample:

Number of Workers in Use
- Metrics
- controller_runtime_active_workers
- Query:
- controller_runtime_active_workers{job=“$job”, namespace=“$namespace”}
- Description
- The number of active controller workers
- Sample:
WorkQueue Depth
- Metrics
- workqueue_depth
- Query:
- workqueue_depth{job=“$job”, namespace=“$namespace”}
- Description
- Current depth of workqueue
- Sample:
Unfinished Seconds
- Metrics
- workqueue_unfinished_work_seconds
- Query:
- rate(workqueue_unfinished_work_seconds{job=“$job”, namespace=“$namespace”}[5m])
- Description
- How many seconds of work has done that is in progress and hasn’t been observed by work_duration.
- Sample:
Visualize Custom Metrics
The Grafana plugin supports scaffolding manifests for custom metrics.
Generate Config Template
When the plugin is triggered for the first time, grafana/custom-metrics/config.yaml is generated.
---
customMetrics:
# - metric: # Raw custom metric (required)
# type: # Metric type: counter/gauge/histogram (required)
# expr: # Prom_ql for the metric (optional)
# unit: # Unit of measurement, examples: s,none,bytes,percent,etc. (optional)
Add Custom Metrics to Config
You can enter multiple custom metrics in the file. For each element, you need to specify the metric and its type.
The Grafana plugin can automatically generate expr for visualization.
Alternatively, you can provide expr and the plugin will use the specified one directly.
---
customMetrics:
- metric: memcached_operator_reconcile_total # Raw custom metric (required)
type: counter # Metric type: counter/gauge/histogram (required)
unit: none
- metric: memcached_operator_reconcile_time_seconds_bucket
type: histogram
Scaffold Manifest
Once config.yaml is configured, you can run kubebuilder edit --plugins grafana.kubebuilder.io/v1-alpha again.
This time, the plugin will generate grafana/custom-metrics/custom-metrics-dashboard.json, which can be imported to Grafana UI.
Show case:
See an example of how to visualize your custom metrics:

Subcommands
The Grafana plugin implements the following subcommands:
-
edit (
$ kubebuilder edit [OPTIONS]) -
init (
$ kubebuilder init [OPTIONS])
Affected files
The following scaffolds will be created or updated by this plugin:
grafana/*.json
Further resources
- Check out video to show how it works
- Checkout the video to show how the custom metrics feature works
- Refer to a sample of
serviceMonitorprovided by kustomize plugin - Check the plugin implementation
- Grafana Docs of importing JSON file
- The usage of serviceMonitor by Prometheus Operator
