Grafana 插件（`grafana/v1-alpha`）

Grafana 插件是一个可选插件，用于脚手架生成 Grafana Dashboard，帮助你查看由使用 controller-runtime 的项目导出的默认指标。

何时使用？

当你希望使用 Grafana 通过 Prometheus 查看 controller-metrics 导出的指标。

如何使用？

前置条件

项目需使用 controller-runtime 暴露默认控制器指标，并被 Prometheus 采集。
可访问 Prometheus：
- Prometheus 需暴露可访问的 endpoint（例如 prometheus-operator 的 http://prometheus-k8s.monitoring.svc:9090）。
- 该 endpoint 已配置为 Grafana 的数据源。参考 Add a data source。
可访问 Grafana，并确保：
- 拥有仪表盘编辑权限（Dashboard edit permission）。
- 已配置 Prometheus 数据源。

基本用法

Grafana 插件挂载在 init 与 edit 子命令上：

# 使用 grafana 插件初始化新项目
kubebuilder init --plugins grafana.kubebuilder.io/v1-alpha

# 在已有项目上启用 grafana 插件
kubebuilder edit --plugins grafana.kubebuilder.io/v1-alpha

插件会创建一个新目录并在其中生成 JSON 文件（例如 grafana/controller-runtime-metrics.json）。

如何在 Grafana 中导入这些 Dashboard

复制 JSON 文件内容。
打开 <your-grafana-url>/dashboard/import，按指引导入新的仪表盘。
将 JSON 粘贴到 “Import via panel json”，点击 “Load”。
选择作为数据源的 Prometheus。
成功导入后，Dashboard 即可使用。

Dashboard 说明

Controller Runtime Reconciliation 总数与错误数

指标：
- controller_runtime_reconcile_total
- controller_runtime_reconcile_errors_total
查询：
- sum(rate(controller_runtime_reconcile_total{job="$job"}[5m])) by (instance, pod)
- sum(rate(controller_runtime_reconcile_errors_total{job="$job"}[5m])) by (instance, pod)
描述：
- 近 5 分钟内 Reconcile 总次数的每秒速率。
- 近 5 分钟内 Reconcile 错误次数的每秒速率。
示例：

控制器 CPU 与内存使用

指标：
- process_cpu_seconds_total
- process_resident_memory_bytes
查询：
- rate(process_cpu_seconds_total{job="$job", namespace="$namespace", pod="$pod"}[5m]) * 100
- process_resident_memory_bytes{job="$job", namespace="$namespace", pod="$pod"}
描述：
- 近 5 分钟内 CPU 使用率的每秒速率。
- 控制器进程的常驻内存字节数。
示例：

P50/90/99 工作队列等待时长（秒）

指标：
- workqueue_queue_duration_seconds_bucket
查询：
- histogram_quantile(0.50, sum(rate(workqueue_queue_duration_seconds_bucket{job="$job", namespace="$namespace"}[5m])) by (instance, name, le))
描述：
- 条目在工作队列中等待被取用的时长。
示例：

P50/90/99 工作队列处理时长（秒）

指标：
- workqueue_work_duration_seconds_bucket
查询：
- histogram_quantile(0.50, sum(rate(workqueue_work_duration_seconds_bucket{job="$job", namespace="$namespace"}[5m])) by (instance, name, le))
描述：
- 从工作队列中取出并处理一个条目所花费的时间。
示例：

Add Rate in Work Queue

Metrics
- workqueue_adds_total
Query:
- sum(rate(workqueue_adds_total{job=“$job”, namespace=“$namespace”}[5m])) by (instance, name)
Description
- Per-second rate of items added to work queue
Sample:

Retries Rate in Work Queue

Metrics
- workqueue_retries_total
Query:
- sum(rate(workqueue_retries_total{job=“$job”, namespace=“$namespace”}[5m])) by (instance, name)
Description
- Per-second rate of retries handled by workqueue
Sample:

Number of Workers in Use

Metrics
- controller_runtime_active_workers
Query:
- controller_runtime_active_workers{job=“$job”, namespace=“$namespace”}
Description
- The number of active controller workers
Sample:

WorkQueue Depth

Metrics
- workqueue_depth
Query:
- workqueue_depth{job=“$job”, namespace=“$namespace”}
Description
- Current depth of workqueue
Sample:

Unfinished Seconds

Metrics
- workqueue_unfinished_work_seconds
Query:
- rate(workqueue_unfinished_work_seconds{job=“$job”, namespace=“$namespace”}[5m])
Description
- How many seconds of work has done that is in progress and hasn’t been observed by work_duration.
Sample:

Visualize Custom Metrics

The Grafana plugin supports scaffolding manifests for custom metrics.

Generate Config Template

When the plugin is triggered for the first time, grafana/custom-metrics/config.yaml is generated.

---
customMetrics:
#  - metric: # Raw custom metric (required)
#    type:   # Metric type: counter/gauge/histogram (required)
#    expr:   # Prom_ql for the metric (optional)
#    unit:   # Unit of measurement, examples: s,none,bytes,percent,etc. (optional)

You can enter multiple custom metrics in the file. For each element, you need to specify the metric and its type. The Grafana plugin can automatically generate expr for visualization. Alternatively, you can provide expr and the plugin will use the specified one directly.

---
customMetrics:
  - metric: memcached_operator_reconcile_total # Raw custom metric (required)
    type: counter # Metric type: counter/gauge/histogram (required)
    unit: none
  - metric: memcached_operator_reconcile_time_seconds_bucket
    type: histogram

Scaffold Manifest

Once config.yaml is configured, you can run kubebuilder edit --plugins grafana.kubebuilder.io/v1-alpha again. This time, the plugin will generate grafana/custom-metrics/custom-metrics-dashboard.json, which can be imported to Grafana UI.

Show case:

See an example of how to visualize your custom metrics:

output2

Subcommands

The Grafana plugin implements the following subcommands:

edit ($ kubebuilder edit [OPTIONS])
init ($ kubebuilder init [OPTIONS])

Affected files

The following scaffolds will be created or updated by this plugin:

grafana/*.json

Further resources

Check out video to show how it works
Checkout the video to show how the custom metrics feature works
Refer to a sample of serviceMonitor provided by kustomize plugin
Check the plugin implementation
Grafana Docs of importing JSON file
The usage of serviceMonitor by Prometheus Operator