Etcd service status: check for certs error

The script /etc/init.d/etcd is used by the service manager for
management of the etcd service. The call '/etc/init.d/etcd status'
uses etcdctl health API to determine if the service is running
fine or not. In an event if etcd certs are replaced with new ones
but the service has not yet been restarted to use new ones, the
status call will fail even though the service is running fine and
the service manager will treat that as service is failed.
'sm-audit' (which is run periodically) uses '/etc/init.d/etcd status'
call to determine and maintain the service health. Service manager
receiving false service status may introduce a lot bugs.

One such scenario is that 'sm' ignores the 'service restart' call
if it thinks service is disabled. This leads to etcd not being
restarted with new certs during upgrade activate and not being
reachable to the kube-apiserver (which may have started using new
client certs).

This change modifies '/etc/init.d/etcd status' call to not just
rely on etcd health api to determine if the etcd service is running
and checks for the existence of etcd runtime information in case
the health api fails with the 'bad certificate' error.

Test Plan:
PASS: Replace old certs with new certs at /etc/etcd/ and do not
      restart the service. Check that the '/etc/init.d/etcd status'
      is 'running'.
PASS: Replace old certs with new certs at /etc/etcd/ and restart
      the service. Check that the '/etc/init.d/etcd status' is
      'running'.

Closes-Bug: 2033942

Change-Id: Id30a262ca1bde6d8acb85de10882ca9bd4b59bdd
Signed-off-by: kaustubh.dhokte <kaustubh.dhokte@windriver.com>
This commit is contained in:
kaustubh.dhokte 2023-09-02 01:48:14 +00:00
parent 8722928985
commit 3ffe8b7e1e

View File

@ -44,12 +44,30 @@ ETCD_LISTEN_CLIENT_URL="${URLS[-1]}"
status()
{
if [[ $ETCD_LISTEN_CLIENT_URL =~ "https" ]]; then
etcd_health="$(etcdctl --timeout 5s --ca-file /etc/etcd/ca.crt -cert-file /etc/etcd/etcd-server.crt --key-file /etc/etcd/etcd-server.key --endpoints="$ETCD_LISTEN_CLIENT_URL" cluster-health 2>&1 | head -n 1)"
etcd_health="$(etcdctl --timeout 5s --ca-file /etc/etcd/ca.crt -cert-file /etc/etcd/etcd-server.crt --key-file /etc/etcd/etcd-server.key --endpoints="$ETCD_LISTEN_CLIENT_URL" cluster-health 2>&1)"
else
etcd_health="$(etcdctl --timeout 5s --endpoints="$ETCD_LISTEN_CLIENT_URL" cluster-health 2>&1 | head -n 1)"
fi
if [[ $etcd_health =~ "is healthy" ]]; then
# LP: 2033942. In case if the status method is called in between
# certs are replaced and etcd service is restarted, etcd health call
# will result negative even though service is running fine.
# In this case we rely on PID file for the status of the service.
if [[ $etcd_health =~ "bad certificate" ]]; then
if [ -e $PIDFILE ]; then
PIDDIR=/proc/$(cat $PIDFILE)
if [ -d $PIDDIR ]; then
RETVAL=0
echo "$DESC is running but invalid certificates detected."
return
fi
echo "$DESC is Not running. Also, invalid certificates detected."
RETVAL=1
else
echo "$DESC is Not running. Also, invalid certificates detected."
RETVAL=1
fi
elif [[ $etcd_health =~ "is healthy" ]]; then
RETVAL=0
echo "$DESC is running"
return