docs/doc/source/updates/kubernetes/software-upgrades.rst


.. upe1593016272562
.. _software-upgrades:

=================
Software Upgrades
=================

|prod-long| upgrades enable you to move |prod| software from one release of
|prod| to the next release of |prod|.

.. contents:: |minitoc|
   :local:
   :depth: 1

|prod| software upgrade is a multi-step rolling-upgrade process, where |prod|
hosts are upgraded one at time while continuing to provide its hosting services
to its hosted applications. An upgrade can be performed manually or using
Upgrade Orchestration, which automates much of the upgrade procedure, leaving a
few manual steps to prevent operator oversight. For more information on manual
upgrades, see :ref:`Manual Platform Components Upgrade
<manual-upgrade-overview>`. For more information on upgrade orchestration, see
:ref:`Orchestrated Platform Component Upgrade
<orchestration-upgrade-overview>`.

.. warning::
    Do NOT use information in the |updates-doc| guide for |prod-dc|
    orchestrated software upgrades. If information in this document is used for
    a |prod-dc| orchestrated upgrade, the upgrade will fail, resulting
    in an outage. The |prod-dc| Upgrade Orchestrator automates a
    recursive rolling upgrade of all subclouds and all hosts within the
    subclouds.

.. xbooklink    For more information on the |prod-dc| Upgrade Orchestrator, see,
    |distcloud-doc|: :ref:`Upgrade Orchestration for Distributed Cloud
    Subclouds Using CLI
    <upgrade-orchestration-for-distributed-cloud-subclouds-using-the-cli>`.

Before starting the upgrades process:

.. _software-upgrades-ul-ant-vgq-gmb:

-   The system must be 'patch current'.

-   There must be no management-affecting alarms present on the system.

-   Ensure that any certificates managed by cert manager will not be renewed
    during the upgrade process.

-   The new software load must be imported.

-   A valid license file for the new software release must be installed.

The upgrade process starts by upgrading the controllers. The standby controller
is upgraded first and involves loading the standby controller with the new
release of software and migrating all the controller services' databases for the
new release of software. Activity is switched to the upgraded controller,
running in a 'compatibility' mode where all inter-node messages are using
message formats from the old release of software. Prior to upgrading the second
controller, you reach a "point-of-no-return for an in-service abort" of the
upgrades process. The second controller is loaded with the new release of
software and becomes the new Standby controller. For more information on manual
upgrades, see :ref:`Manual Platform Components Upgrade
<manual-upgrade-overview>` .

If present, storage nodes are locked, upgraded and unlocked one at a time in
order to respect the redundancy model of |prod| storage nodes. Storage nodes
can be upgraded in parallel if using upgrade orchestration.

Worker nodes are then upgraded. Worker nodes are tainted when locked, such that
Kubernetes shuts down any pods on this worker node and restarts the pods on
another worker node. When upgrading the worker node, the worker node network
boots/installs the new software from the active controller. After unlocking the
worker node, the worker services are running in a 'compatibility' mode where all
inter-node messages are using message formats from the old release of software.
Note that the worker nodes can only be upgraded in parallel if using upgrade
orchestration.

The final step of the upgrade process is to activate and complete the upgrade.
This involves disabling 'compatibility' modes on all hosts and clearing the
Upgrade Alarm.

.. only:: partner

    .. include:: /_includes/software-upgrades.rest
       :start-after: software-upgrade-begin
       :end-before: software-upgrade-end

.. _software-upgrades-section-N1002F-N1001F-N10001:

----------------------------------
Rolling Back / Aborting an Upgrade
----------------------------------

In general, any issues encountered during an upgrade should be addressed during
the upgrade with the intention of completing the upgrade after the issues are
resolved. Issues specific to a storage or worker host can be addressed by
temporarily downgrading the host, addressing the issues and then upgrading the
host again, or in some cases by replacing the node.

In extremely rare cases, it may be necessary to abort an upgrade. This is a last
resort and should only be done if there is no other way to address the issue
within the context of the upgrade. There are two scenarios for doing such an
abort:

.. _software-upgrades-ul-dqp-brt-cx:

-   Before controller-0 has been upgraded (that is, only controller-1 has been
    upgraded): In this case the upgrade can be aborted and the system will
    remain in service during the abort, see, :ref:`Rolling Back a Software
    Upgrade Before the Second Controller Upgrade
    <rolling-back-a-software-upgrade-before-the-second-controller-upgrade>`.

-   After controller-0 has been upgraded (that is, both controllers have been
    upgraded): In this case the upgrade can only be aborted with a complete
    outage and a reinstall of all hosts. This would only be done as a last
    resort, if there was absolutely no other way to recover the system, see,
    :ref:`Rolling Back a Software Upgrade After the Second Controller Upgrade
    <rolling-back-a-software-upgrade-after-the-second-controller-upgrade>`.