Magnus K Karlsson: OpenShift 4.6 Automation and Integration: Cluster Monitoring and Metrics

November 2, 2022

OpenShift 4.6 Automation and Integration: Cluster Monitoring and Metrics

Overview Monitoring

1.2. Understanding the monitoring stack
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/monitoring/index#understanding-the-monitoring-stack_monitoring-overview

Alertmanager

5.7. Applying a custom Alertmanager configuration
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/monitoring/index#applying-custom-alertmanager-configuration_managing-alerts

$ oc extract secret/alertmanager-main --to /tmp/ -n openshift-monitoring --confirm

OCP Web Console

Navigate to the Administration -> Cluster Settings -> Global Configuration -> Alertmanager -> YAML.

global:
  resolve_timeout: 5m
route:
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: default
  routes:
  - match:
      alertname: Watchdog
    repeat_interval: 5m
    receiver: watchdog
receivers:
- name: default
- name: watchdog

Sending Alerts to Email

global:
  resolve_timeout: 5m
  smtp_smarthost: "mail.mkk.se:25"
  smtp_from: alerts@ocp4.mkk.se
  smtp_auth_username: mail_username
  smtp_auth_password: mail_password
  smtp_require_tls: false
route:
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: default
  routes:
    - match:
        alertname: Watchdog
      repeat_interval: 5m
      receiver: watchdog
    - match:
        severity: critical
      receiver: email-notification
receivers:
  - name: default
  - name: watchdog
  - name: email-notification
    email_configs:
      - to: ocp-admins@mkk.se

$ oc set data secret/alertmanager-main -n openshift-monitoring --from-file=/tmp/alertmanager.yaml

$ oc logs -f -n openshift-monitoring alertmanager-main-0 -c alertmanager

Grafana

Grafana includes the following default dashboards:

etcd	Information on etcd in cluster.
Kubernetes / Compute Resources / Cluster	High-level view of cluster resources.
Kubernetes / Compute Resources / Namespace (Pods)	Resource usage for pods per namespace.
Kubernetes / Compute Resources / Namespace (Workloads)	Resource usage per namespace and then by workload type, such as deployment, daemonset, and statefulset.
Kubernetes / Compute Resources / Node (Pods)	Resource usage per node.
Kubernetes / Compute Resources / Pod	Resource usage for individual pods.
Kubernetes / Compute Resources / Workload	Resources usage per namespace, workload, and workload type.
Kubernetes / Networking/Cluster	Network usage in cluster
Prometheus	Information about prometheus-k8s pods running in the openshift-monitoring namespace.
USE Method / Cluster	USE, Utilization Saturation and Errors.

Persistent Storage

Configuring Prometheus Persistent Storage

2.8.2. Configuring a local persistent volume claim
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/monitoring/index#configuring-a-local-persistent-volume-claim_configuring-the-monitoring-stack

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      retention: 15d
      volumeClaimTemplate:
        spec:
          storageClassName: local-storage
          volumeMode: Filesystem
          resources:
            requests:
              storage: 40Gi

Configuring Alert Manager Persistent Storage

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    alertmanagerMain:
      volumeClaimTemplate:
        spec:
          storageClassName: local-storage
          volumeMode: Filesystem
          resources:
            requests:
              storage: 20Gi

$ oc exec -it prometheus-k8s-0 -c prometheus -n openshift-monitoring -- ls -l /prometheus

$ oc exec -it prometheus-k8s-0 -c prometheus -n openshift-monitoring -- df -h /prometheus