Baremetal
Unblock a stuck oc delete bmh
command
Sometimes, when deleting a BMH object with oc delete bmh -n openshift-machine-api <node name>
the delete command is stuck forever.
This happens because ironic is trying to decommision and delete stuff from the node itself, and does not always succeed with that.
To unblock the delete command, simply remove the object finalizer:
oc patch -n openshift-machine-api <node name> -p '{"metadata":{"finalizers":null}}' --type=merge
Reprovisioning a node
Manual way (editing object YAML files)
- for conveniency
oc project openshift-machine-api
- locate the correct secret, it’ll have the same name as the bmh with a ‘-bmc-secret’ postfix.
- save the secret -
oc get secret <bmh-name-bmc-secret> -o yaml > secret.yaml
- save the bmh -
oc get bmh <bmh-name> -o yaml > bmh.yaml
- only then delete the bmh -
oc delete bmh <bmh-name>
- edit the
secert.yaml
file so it includes only the date, type, metadata.name and meteadata.namespace fields - edit the bmh.yaml so it includes only the oc spec, metadata.name and meteadata.namespace fields
Automatic way
- NOTE: Make sure to install
oc neat
if you don’t have it (https://github.com/itaysk/kubectl-neat)
- for conveniency
oc project openshift-machine-api
- locate the correct secret, it’ll have the same name as the bmh with a ‘-bmc-secret’ postfix.
- save the secret -
oc get secret <bmh-name-bmc-secret> -o yaml | oc neat > secret.yaml
- save the bmh -
oc get bmh <bmh-name> -o yaml | oc neat > bmh.yaml
- only then delete the bmh -
oc delete bmh <bmh-name>
- apply -
oc apply -f secert.yaml
and thenoc apply -f bmh.yaml
The node should start reprovisioning and be ready after a while.
Rename a node
Evacuate the node:
oc adm drain NODE --ignore-daemonsets
Delete the node
oc delete node NODE
Make the DNS / hostname change if hostnames are not DNS names, you can use the following command on the node itself:
hostnamectl set-hostname NEW-NAME
Delete old certificates (which are valid only for the old name) on the node:
sudo rm /var/lib/kubelet/pki/*
Reboot the server
sudo reboot
Approve csr either use the procedure here, or look for the pending bootstrapper csr:
$ oc get csr
NAME AGE REQUESTOR CONDITION
csr-6f9w7 33m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued
csr-8b9nm 40m system:node:master-1.nnchange.lab.example.com Approved,Issued
csr-c6w6n 40m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued
csr-g5fpm 31m system:node:worker-1.nnchange.lab.example.com Approved,Issued
csr-hsmlj 33m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued
csr-j4pct 31m system:node:worker-0.nnchange.lab.example.com Approved,Issued
csr-jkkh7 39m system:node:master-2.nnchange.lab.example.com Approved,Issued
csr-jnc5l 39m system:node:master-0.nnchange.lab.example.com Approved,Issued
csr-nlpmv 2m27s system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
csr-pfmcl 40m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued
csr-r2d62 40m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued
$ oc adm certificate approve csr-XXX
Then accept the CSR for the node service account:
$ oc get csr
NAME AGE REQUESTOR CONDITION
csr-6f9w7 35m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued
csr-7sw7b 14s system:node:worker-a.nnchange.lab.example.com Pending
csr-8b9nm 41m system:node:master-1.nnchange.lab.example.com Approved,Issued
csr-c6w6n 41m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued
csr-g5fpm 33m system:node:worker-1.nnchange.lab.example.com Approved,Issued
csr-hsmlj 34m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued
csr-j4pct 33m system:node:worker-0.nnchange.lab.example.com Approved,Issued
csr-jkkh7 41m system:node:master-2.nnchange.lab.example.com Approved,Issued
csr-jnc5l 41m system:node:master-0.nnchange.lab.example.com Approved,Issued
csr-nlpmv 3m51s system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued
csr-pfmcl 41m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued
csr-r2d62 41m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued
$ oc adm certificate approve csr-XXX
And you should now be able to see the node with the new name:
oc get nodes