Overview Monitoring
1.2. Understanding the monitoring stack
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/monitoring/index#understanding-the-monitoring-stack_monitoring-overview
Alertmanager
5.7. Applying a custom Alertmanager configuration
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/monitoring/index#applying-custom-alertmanager-configuration_managing-alerts
$ oc extract secret/alertmanager-main --to /tmp/ -n openshift-monitoring --confirm
OCP Web Console
Navigate to the Administration -> Cluster Settings -> Global Configuration -> Alertmanager -> YAML.
global:
resolve_timeout: 5m
route:
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: default
routes:
- match:
alertname: Watchdog
repeat_interval: 5m
receiver: watchdog
receivers:
- name: default
- name: watchdog
Sending Alerts to Email
global:
resolve_timeout: 5m
smtp_smarthost: "mail.mkk.se:25"
smtp_from: alerts@ocp4.mkk.se
smtp_auth_username: mail_username
smtp_auth_password: mail_password
smtp_require_tls: false
route:
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: default
routes:
- match:
alertname: Watchdog
repeat_interval: 5m
receiver: watchdog
- match:
severity: critical
receiver: email-notification
receivers:
- name: default
- name: watchdog
- name: email-notification
email_configs:
- to: ocp-admins@mkk.se
$ oc set data secret/alertmanager-main -n openshift-monitoring --from-file=/tmp/alertmanager.yaml
$ oc logs -f -n openshift-monitoring alertmanager-main-0 -c alertmanager
Grafana
Grafana includes the following default dashboards:
etcd | Information on etcd in cluster. |
Kubernetes / Compute Resources / Cluster | High-level view of cluster resources. |
Kubernetes / Compute Resources / Namespace (Pods) | Resource usage for pods per namespace. |
Kubernetes / Compute Resources / Namespace (Workloads) | Resource usage per namespace and then by workload type, such as deployment, daemonset, and statefulset. |
Kubernetes / Compute Resources / Node (Pods) | Resource usage per node. |
Kubernetes / Compute Resources / Pod | Resource usage for individual pods. |
Kubernetes / Compute Resources / Workload | Resources usage per namespace, workload, and workload type. |
Kubernetes / Networking/Cluster | Network usage in cluster |
Prometheus | Information about prometheus-k8s pods running in the openshift-monitoring namespace. |
USE Method / Cluster | USE, Utilization Saturation and Errors. |
Persistent Storage
Configuring Prometheus Persistent Storage
2.8.2. Configuring a local persistent volume claim
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/monitoring/index#configuring-a-local-persistent-volume-claim_configuring-the-monitoring-stack
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
prometheusK8s:
retention: 15d
volumeClaimTemplate:
spec:
storageClassName: local-storage
volumeMode: Filesystem
resources:
requests:
storage: 40Gi
Configuring Alert Manager Persistent Storage
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
alertmanagerMain:
volumeClaimTemplate:
spec:
storageClassName: local-storage
volumeMode: Filesystem
resources:
requests:
storage: 20Gi
$ oc exec -it prometheus-k8s-0 -c prometheus -n openshift-monitoring -- ls -l /prometheus
$ oc exec -it prometheus-k8s-0 -c prometheus -n openshift-monitoring -- df -h /prometheus
No comments:
Post a Comment