November 2, 2022

OpenShift 4.6 Automation and Integration: Recovering Failed Worker Nodes

Node Status

$ oc get nodes <NODE>

$ oc adm top node <NODE>

$ oc describe node <NODE> | grep -i taint

OpenShift Taint Effects

3.6.1. Understanding taints and tolerations
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/nodes/index#nodes-scheduler-taints-tolerations-about_nodes-scheduler-taints-tolerations

  • PreferNoSchedule
  • NoSchedule
  • NoExecute
apiVersion: v1
kind: Node
metadata:
  annotations:
    machine.openshift.io/machine: openshift-machine-api/ci-ln-62s7gtb-f76d1-v8jxv-master-0
    machineconfiguration.openshift.io/currentConfig: rendered-master-cdc1ab7da414629332cc4c3926e6e59c
...
spec:
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master

Worker Node Not Ready

$ oc describe node/worker01
...output omitted...
Taints:             node.kubernetes.io/not-ready:NoExecute
                    node.kubernetes.io/not-ready:NoSchedule
...
Ready       False   ...     KubeletNotReady        [container runtime is down...
$ ssh core@worker01 "sudo systemctl is-active crio"

$ ssh core@worker01 "sudo systemctl start crio"

$ oc describe node/worker01 | grep -i taints

Worker Node Storage Exhaustion

3.6.1. Understanding taints and tolerations
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/nodes/index#nodes-scheduler-taints-tolerations-about_nodes-scheduler-taints-tolerations

node.kubernetes.io/disk-pressure: The node has disk pressure issues. This corresponds to the node condition DiskPressure=True.

$ oc describe node/worker01 
...
Taints:             disk-pressure:NoSchedule 
                    disk-pressure:NoExecute 
...

Worker Node Capacity

$ oc get pod -o wide
NAME             READY   STATUS    ...  NODE      ...
diskuser-4cfdd   0/1     Pending   ...  <none>    ...
diskuser-ck4df   0/1     Evicted   ...  worker02  ...

$ oc describe node/worker01
...output omitted...
Taints:             node.kubernetes.io/not-ready:NoSchedule
...
Conditions:
  Type             Status  ...   Reason                       ...
  ----             ------  ...   ------                       ...
  DiskPressure     True    ...   KubeletHasDiskPressure       ...

Worker Node Unreachable

3.6.1. Understanding taints and tolerations
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/nodes/index#nodes-scheduler-taints-tolerations-about_nodes-scheduler-taints-tolerations

node.kubernetes.io/unreachable: The node is unreachable from the node controller. This corresponds to the node condition Ready=Unknown.

$ ssh core@worker02 "sudo systemctl is-active kubelet" 

$ ssh core@worker02 "sudo systemctl start kubelet" 

OpenShift 4.6 Automation and Integration: Kibana

Filtering Queries

12.3. Kubernetes exported fields
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/logging/index#cluster-logging-exported-fields-kubernetes_cluster-logging-exported-fields

These are the Kubernetes fields exported by the OpenShift Container Platform cluster logging available for searching from Elasticsearch and Kibana.

hostname The hostname of OpenShift node that generated the message.
kubernetes.flat_labels The label for the pod that generated the message. Format: key=value
kubernetes.container_name The name of the container in Kubernetes.
kubernetes.namespace_name The name of the namespace in Kubernetes.
kubernetes.pod_name The name of the pod that generated the log message.
level The log level of the message.
message The actual log message.

Example Lucene query:

+kubernetes.namespace_name:"openshift-etcd" +message:elected

Finding OpenShift Event Logs

kubernetes.event  
kubernetes.event.involvedObject.name Resource name invloved in event.
kubernetes.event.involvedObject.namespace Namespace of the resource name invloved in event.
kubernetes.event.reason The reason for the event. Correspond to the values in the REASON column that displays in the output of the oc get events command.
kubernetes.event.type The type of message, e.g. kubernetes.event.type:warning

Visualizing Time Series with Timelion

Timelion Tutorial – From Zero to Hero
https://www.elastic.co/blog/timelion-tutorial-from-zero-to-hero

.es('+kubernetes.namespace_name:logging-query +message:200'),
.es('+kubernetes.namespace_name:logging-query +message:404'),
.es('+kubernetes.namespace_name:logging-query +message:500')

.es('+kubernetes.container_name:logger +message:500')
.divide(.es('+kubernetes.container_name:logger +message:*'))
.multiply(100)

.es('+kubernetes.container_name:logger +message:500').label(current),
.es(q='+kubernetes.container_name:logger +message:500', offset=-5m).label(previous)

Troubleshooting cluster logging

Chapter 10. Troubleshooting cluster logging
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/logging/index#troubleshooting-cluster-logging

$ oc get -n openshift-logging clusterlogging instance -o yaml

apiVersion: logging.openshift.io/v1
kind: ClusterLogging
....
status:  
...
  logstore:
    elasticsearchStatus:
    - ShardAllocationEnabled:  all
      cluster:
        activePrimaryShards:    5
        activeShards:           5
        initializingShards:     0
        numDataNodes:           1
        numNodes:               1
        pendingTasks:           0
        relocatingShards:       0
        status:                 green
        unassignedShards:       0
      clusterName:             elasticsearch
...

Using Grafana

Monitoring -> Dashboards:

Dashboards: Kubernetes / Compute Resources / Node (Pods)
Namespace: openshift-logging

Using Kibana

Infra index
+kubernetes.namespace_name:openshift-logging +kubernetes.container_name:

OpenShift 4.6 Automation and Integration: Cluster Logging

Overview

1.1.8. About cluster logging components
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/logging/index#cluster-logging-about-components_cluster-logging

The major components of cluster logging are:

LogStore

The logStore is the Elasticsearch cluster that

  • Stores the logs into indexes.
  • Provides RBAC access to the logs.
  • Provides data redundancy.

Collection

Implemented with Fluentd, By default, the log collector uses the following sources:

  • journald for all system logs
  • /var/log/containers/*.log for all container logs

The logging collector is deployed as a daemon set that deploys pods to each OpenShift Container Platform node.

Visualization

This is the UI component you can use to view logs, graphs, charts, and so forth. The current implementation is Kibana.

Event Routing

The Event Router is a pod that watches OpenShift Container Platform events so they can be collected by cluster logging. The Event Router collects events from all projects and writes them to STDOUT. Fluentd collects those events and forwards them into the OpenShift Container Platform Elasticsearch instance. Elasticsearch indexes the events to the infra index.

You must manually deploy the Event Router.

Installing cluster logging

Chapter 2. Installing cluster logging https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/logging/index#cluster-logging-deploying

Install the OpenShift Elasticsearch Operator

namespace: openshift-operators-redhat

Install the Cluster Logging Operator

namespace: openshift-logging

Deploying a Cluster Logging Instance

This default cluster logging configuration should support a wide array of environments.

apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance"
  namespace: "openshift-logging"
spec:
  managementState: "Managed"
  logStore:
    type: "elasticsearch"
    retentionPolicy:
      application:
        maxAge: 1d
      infra:
        maxAge: 7d
      audit:
        maxAge: 7d
    elasticsearch:
      nodeCount: 3 5
      storage:
        storageClassName: "<storage-class-name>"
        size: 200G
      resources:
        limits:
          memory: "16Gi"
        requests:
          memory: "16Gi"
      proxy:
        resources:
          limits:
            memory: 256Mi
          requests:
            memory: 256Mi
      redundancyPolicy: "SingleRedundancy"
  visualization:
    type: "kibana"
    kibana:
      replicas: 1
  curation:
    type: "curator"
    curator:
      schedule: "30 3 * * *"
  collection:
    logs:
      type: "fluentd"
      fluentd: {}

Verify

$ oc get clusterlogging -n openshift-logging instance -o yaml

Install the Event Router

7.1. Deploying and configuring the Event Router
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/logging/index#cluster-logging-eventrouter-deploy_cluster-logging-curator

Creating Kibana Index Patterns

Index Pattern: app-*
Time Filter Field Name: @timestamp

Index Pattern: infra-*
Time Filter Field Name: @timestamp

Index Pattern: audit-*
Time Filter Field Name: @timestamp

OpenShift 4.6 Automation and Integration: Cluster Monitoring and Metrics

Overview Monitoring

1.2. Understanding the monitoring stack
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/monitoring/index#understanding-the-monitoring-stack_monitoring-overview

Alertmanager

5.7. Applying a custom Alertmanager configuration
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/monitoring/index#applying-custom-alertmanager-configuration_managing-alerts

$ oc extract secret/alertmanager-main --to /tmp/ -n openshift-monitoring --confirm

OCP Web Console

Navigate to the Administration -> Cluster Settings -> Global Configuration -> Alertmanager -> YAML.

global:
  resolve_timeout: 5m
route:
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: default
  routes:
  - match:
      alertname: Watchdog
    repeat_interval: 5m
    receiver: watchdog
receivers:
- name: default
- name: watchdog

Sending Alerts to Email

global:
  resolve_timeout: 5m
  smtp_smarthost: "mail.mkk.se:25"
  smtp_from: alerts@ocp4.mkk.se
  smtp_auth_username: mail_username
  smtp_auth_password: mail_password
  smtp_require_tls: false
route:
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: default
  routes:
    - match:
        alertname: Watchdog
      repeat_interval: 5m
      receiver: watchdog
    - match:
        severity: critical
      receiver: email-notification
receivers:
  - name: default
  - name: watchdog
  - name: email-notification
    email_configs:
      - to: ocp-admins@mkk.se
$ oc set data secret/alertmanager-main -n openshift-monitoring --from-file=/tmp/alertmanager.yaml

$ oc logs -f -n openshift-monitoring alertmanager-main-0 -c alertmanager

Grafana

Grafana includes the following default dashboards:

etcd Information on etcd in cluster.
Kubernetes / Compute Resources / Cluster High-level view of cluster resources.
Kubernetes / Compute Resources / Namespace (Pods) Resource usage for pods per namespace.
Kubernetes / Compute Resources / Namespace (Workloads) Resource usage per namespace and then by workload type, such as deployment, daemonset, and statefulset.
Kubernetes / Compute Resources / Node (Pods) Resource usage per node.
Kubernetes / Compute Resources / Pod Resource usage for individual pods.
Kubernetes / Compute Resources / Workload Resources usage per namespace, workload, and workload type.
Kubernetes / Networking/Cluster Network usage in cluster
Prometheus Information about prometheus-k8s pods running in the openshift-monitoring namespace.
USE Method / Cluster USE, Utilization Saturation and Errors.

Persistent Storage

Configuring Prometheus Persistent Storage

2.8.2. Configuring a local persistent volume claim
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/monitoring/index#configuring-a-local-persistent-volume-claim_configuring-the-monitoring-stack

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      retention: 15d
      volumeClaimTemplate:
        spec:
          storageClassName: local-storage
          volumeMode: Filesystem
          resources:
            requests:
              storage: 40Gi

Configuring Alert Manager Persistent Storage

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    alertmanagerMain:
      volumeClaimTemplate:
        spec:
          storageClassName: local-storage
          volumeMode: Filesystem
          resources:
            requests:
              storage: 20Gi
$ oc exec -it prometheus-k8s-0 -c prometheus -n openshift-monitoring -- ls -l /prometheus

$ oc exec -it prometheus-k8s-0 -c prometheus -n openshift-monitoring -- df -h /prometheus

OpenShift 4.6 Automation and Integration: Storage

Overview

3.1. Persistent storage overview
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/storage/index#persistent-storage-overview_understanding-persistent-storage

The OpenShift storage architecture has three primary components:

  • Storage Classes
  • Persistent Volumes
  • Persistent Volume Claims

Persistent Volume Claims (pvc)

The project defines pvc with following

  • Storage Size: [G|Gi...]
  • Storage Class:
  • Access Mode: [ReadWriteMany|ReadWriteOnce|ReadOnlyMany]
  • Volume Mode: [Filesystem|Block|Object]

Persistent Volume (pv)

4.11. Persistent storage using NFS
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/storage/index#persistent-storage-using-nfs

Example Persistent Volume

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv0001
spec:
  capacity:
    storage: 5Gi
  storageClassName: nfs-storage
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  nfs:
    path: /tmp
    server: 172.17.0.2
  persistentVolumeReclaimPolicy: Retain

This persistent volume uses the NFS volume plug-in. The nfs section defines parameters that the NFS volume plug-in requires to mount the volume on a node. This section includes sensitive NFS configuration information.

Provisioning and Binding Persistent Volumes

  • Install a storage operator
  • Write and use Ansible Playbooks

Persistent Volume Reclaim Policy

3.2.6. Reclaim policy for persistent volumes
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/storage/index#reclaiming_understanding-persistent-storage

  • Delete: reclaim policy deletes both the PersistentVolume object from OpenShift Container Platform and the associated storage asset in external infrastructure, such as AWS EBS or VMware vSphere. All dynamically-provisioned persistent volumes use a Delete reclaim policy.
  • Retain: Reclaim policy allows manual reclamation of the resource for those volume plug-ins that support it.
  • Recycle: Reclaim policy recycles the volume back into the pool of unbound persistent volumes once it is released from its claim.

Supported access modes for PVs

Table 3.2. Supported access modes for PVs
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/storage/index#pv-access-modes_understanding-persistent-storage

Available dynamic provisioning plug-ins

7.2. Available dynamic provisioning plug-ins
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/storage/index#available-plug-ins_dynamic-provisioning

Setting a Default Storage Class

7.3.2. Storage class annotations
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/storage/index#storage-class-annotations_dynamic-provisioning

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"

Restricting Access to Storage Resources

5.1.1. Resources managed by quotas
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/applications/index#quotas-resources-managed_quotas-setting-per-project

requests.storage The sum of storage requests across all persistent volume claims in any state cannot exceed this value.
persistentvolumeclaims The total number of persistent volume claims that can exist in the project.
<storage-class-name>.storageclass.storage.k8s.io/requests.storage The sum of storage requests across all persistent volume claims in any state that have a matching storage class, cannot exceed this value.
<storage-class-name>.storageclass.storage.k8s.io/persistentvolumeclaims The total number of persistent volume claims with a matching storage class that can exist in the project.

Block Volume

3.5.1. Block volume examples
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/storage/index#block-volume-examples_understanding-persistent-storage

apiVersion: v1
kind: PersistentVolume
metadata:
  name: block-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  volumeMode: Block 1
  persistentVolumeReclaimPolicy: Retain
  fc:
    targetWWNs: ["50060e801049cfd1"]
    lun: 0
    readOnly: false
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: block-pvc
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Block
  resources:
    requests:
      storage: 10Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: pod-with-block-volume
spec:
  containers:
    - name: fc-container
      image: fedora:26
      command: ["/bin/sh", "-c"]
      args: [ "tail -f /dev/null" ]
      volumeDevices: 
        - name: data
          devicePath: /dev/xvda
  volumes:
    - name: data
      persistentVolumeClaim:
        claimName: block-pvc

Persistent storage using iSCSI

4.9. Persistent storage using iSCSI
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/storage/index#persistent-storage-using-iscsi

PersistentVolume object definition

apiVersion: v1
kind: PersistentVolume
metadata:
  name: iscsi-pv
spec:
  capacity:
    storage: 1Gi
  volumeMode: Filesystem
  storageClassName: iscsi-blk
  accessModes:
    - ReadWriteOnce
  iscsi:
    targetPortal: 10.0.0.1:3260
    iqn: iqn.2016-04.test.com:storage.target00
    lun: 0
    initiatorName: iqn.2016-04.test.com:custom.iqn 1
    fsType: ext4
    readOnly: false

Persistent storage using local volumes

Installing the Local Storage Operator

4.10.1. Installing the Local Storage Operator
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/storage/index#local-storage-install_persistent-storage-local

$ oc debug node/worker06 -- lsblk
...
vdb    252:16   0   20G  0 disk

$ oc adm new-project openshift-local-storage

$ OC_VERSION=$(oc version -o yaml | grep openshiftVersion | \
    grep -o '[0-9]*[.][0-9]*' | head -1)
apiVersion: operators.coreos.com/v1alpha2
kind: OperatorGroup
metadata:
  name: local-operator-group
  namespace: openshift-local-storage
spec:
  targetNamespaces:
    - openshift-local-storage
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: local-storage-operator
  namespace: openshift-local-storage
spec:
  channel: "${OC_VERSION}"
  installPlanApproval: Automatic 1
  name: local-storage-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
$ oc apply -f openshift-local-storage.yaml

Verify installation

$ oc -n openshift-local-storage get pods

$ oc get csv -n openshift-local-storage
NAME                                         DISPLAY         VERSION               REPLACES   PHASE
local-storage-operator.4.2.26-202003230335   Local Storage   4.2.26-202003230335              Succeeded

Provisioning local volumes by using the Local Storage Operator

$ export CSV_NAME=$(oc get csv -n openshift-local-storage -o name)

$ oc get ${CSV_NAME} -o jsonpath='{.spec.customresourcedefinitions.owned[*].kind}{"\n"}'
LocalVolume LocalVolumeSet LocalVolumeDiscovery LocalVolumeDiscoveryResult

$ oc get ${CSV_NAME} -o jsonpath='{.metadata.annotations.alm-examples}{"\n"}'
[
  {
    "apiVersion": "local.storage.openshift.io/v1",
    "kind": "LocalVolume",
    "metadata": {
      "name": "example"
    },
    "spec": {
      "storageClassDevices": [
        {
          "devicePaths": [
              "/dev/vde",
              "/dev/vdf"
          ],
          "fsType": "ext4",
          "storageClassName": "foobar",
          "volumeMode": "Filesystem"
        }
      ]
    }
  }
  ...
]
apiVersion: local.storage.openshift.io/v1
kind: LocalVolume
metadata:
  name: local-storage
spec:
  storageClassDevices:
  - devicePaths:
    - /dev/vdb
    fsType: ext4
    storageClassName: local-blk
    volumeMode: Filesystem

OpenShift 4.6 Automation and Integration: Machine Config Pool and Machine Config

Introduction

1.4. About Red Hat Enterprise Linux CoreOS (RHCOS) and Ignition
1.2. About the control plane
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/architecture/index#coreos-and-ignition

Red Hat discourages directly manipulating a RHCOS configuration. Instead, provide initial instance configuration in the form of Ignition files.

After the instance is provisioned, changes to RHCOS are managed by the Machine Config Operator.

7.2.2. Creating a machine set
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/machine_management/index#machineset-creating_creating-infrastructure-machinesets

4.2.7. Customization
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/security_and_compliance/index#customization-2

Example MachineConfig (mc)

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: infra
  name: 50-foo-config
spec:
  config:
    ignition:
      version: 3.1.0
    storage:
      files:
      - contents:
          source: data:text/plain;charset=utf-8;base64,LS0t...LQo=
        filesystem: root
        mode: 0644
        path: /etc/foo-config

7.2.4. Creating a machine config pool for infrastructure machines
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/machine_management/index#creating-infra-machines_creating-infrastructure-machinesets

Example MachineConfigPool (mcp)

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: infra
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]}
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/infra: ""
$ oc get mcp

$ oc get mc --show-labels

$ oc get mc --selector=machineconfiguration.openshift.io/role=infra

Label Nodes

Add a label to worker node

$ oc label node/worker03 node-role.kubernetes.io/infra=

Remove label from worker node

$ oc label node/worker03 node-role.kubernetes.io/infra-

Configuring Pod Scheduling

7.4. Moving resources to infrastructure machine sets
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/machine_management/index#moving-resources-to-infrastructure-machinesets

3.7. Placing pods on specific nodes using node selectors
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/nodes/index#nodes-scheduler-node-selectors

apiVersion: apps/v1
kind: Deployment
metadata:
  name: foo
spec:
  replicas: 2
  selector:
    matchLabels:
      app: foo
  template:
    metadata:
      labels:
        app: foo
    spec:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      containers:
...

4.1.2. Creating daemonsets
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/nodes/index

If you fail to debug a node, this could be because of a defaultNodeSelector is defined, then you must specify a node selector to override the default.

$ oc adm new-project debug --node-selector=""
$ oc debug node/master03 -n debug

Observing Machine Config Pool Updates

https://github.com/openshift/machine-config-operator/blob/master/docs/MachineConfigController.md

Following annotations on node object will be used by UpdateController to coordinate node update with MachineConfigDaemon.

  • machine-config-daemon.v1.openshift.com/currentConfig: defines the current MachineConfig applied by MachineConfigDaemon.
  • machine-config-daemon.v1.openshift.com/desiredConfig: defines the desired MachineConfig that need to be applied by MachineConfigDaemon
  • machine-config-daemon.v1.openshift.com/state: defines the state of the MachineConfigDaemon, It can be done, working and degraded.
$ oc describe node/worker03

OpenShift 4.6 Automation and Integration: Adding Working Nodes

Installer-Provisioned Infrastructure

3.2. Scaling a machine set manually
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/machine_management/index#machineset-manually-scaling_manually-scaling-machineset

In installer-provisioned OCP cluster does the the Machine API automatically performs scaling operations, just modify the number of replicas specified in a Machine Set, and the OCP communicates to the provider to provision or deprovision instances.

User-Provisioned Infrastructure

Adding compute machines to bare metal

10.4. Adding compute machines to bare metal
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/machine_management/index#adding-bare-metal-compute-user-infra

Here you must create the new machines yourself. You can create new Red Hat Enterprise Linux CoreOS (RHCOS) machines either from ISO image or use Preboot eXecution Environment (PXE) boot.

PXE relies on a set of very basic technologies:

  • Dynamic Host Configuration Protocol (DHCP) for locating instances.
  • Trivial File Transfer Protocol (TFTP) for serving the PXE files.
  • HTTP for the ISO images and configuration files.

Example PXE. NOTE THE APPEND PARAMETERS NEED TO BE ON A SINGLE LINE

DEFAULT pxeboot
TIMEOUT 20
PROMPT 0
LABEL pxeboot
  KERNEL http://<HTTP_server>/rhcos-<version>-live-kernel-<architecture>
  APPEND initrd=http://<HTTP_server>/rhcos-<version>-live-initramfs.<architecture>.img 
    coreos.inst.install_dev=/dev/sda 
    coreos.inst.ignition_url=http://<HTTP_server>/worker.ign 
    coreos.live.rootfs_url=http://<HTTP_server>/rhcos-<version>-live-rootfs.<architecture>.img
    coreos.inst=yes
    console=tty0 
    console=ttyS0  
    ip=dhcp rd.neednet=1 

The coreos.inst.ignition_url param points to a working ignition file.

5.1.10. Creating the Kubernetes manifest and Ignition config files
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/installing/index#installation-user-infra-generate-k8s-manifest-ignition_installing-bare-metal

The OpenShift Container Platform installation program ($ ./openshift-install create manifests --dir <installation_directory>) generates

  • bootstrap.ign
  • master.ign
  • worker.ign

Example worker.ign

{
  "ignition": {
    "config": {
      "merge": [
        {
          "source": "https://api-int.mkk.example.com:22623/config/worker",
          "verification": {}
        }
      ]
    },
    "security": {
      "tls": {
        "certificateAuthorities": [
          {
            "source": "data:text/plain;charset=utf-8;base64,XXX...XX",
            "verification": {}
          }
        ]
      }
    },
    "version": "3.1.0"
  },
}

certificateAuthorities contains the custom truststore for the internal CA. You can check a HTTPS endpoint cert chain with openssl, and for above endpoint.

$ openssl s_client -connect api-int.mkk.example.com:22623 -showcerts

And you can check that it is the same Root CA in worker.ign with

$ echo "XXX...XX" | base64 -d | openssl -text -noout

Red Hat OpenStack Platform HAProxy

Chapter 5. Using HAProxy
https://access.redhat.com/documentation/fr-fr/red_hat_openstack_platform/10/html-single/understanding_red_hat_openstack_platform_high_availability/index#haproxy

On a Red Hat OpenStack Platform you must then update the HAProxy (/etc/haproxy/haproxy.cfg) with the nodes

Approving the certificate signing requests for your machines

10.4.3. Approving the certificate signing requests for your machines
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html-single/machine_management/index#installation-approve-csrs_adding-bare-metal-compute-user-infra

$ oc get csr -A

$ oc adm certificate approve csr-abc

Verify

You should now see the new worker nodes, but it will take some time for them to reach Ready state.

$ oc get nodes