docs/doc/source/updates/kubernetes/manual-host-software-deployment-ee17ec6f71a4.rst
2024-12-11 19:38:21 +00:00

693 lines
27 KiB
ReStructuredText

.. WARNING: Add no lines of text between the label immediately following
.. and the title.
.. _manual-host-software-deployment-ee17ec6f71a4:
===============================
Manual Host Software Deployment
===============================
|prod| software management enables you to upversion the |prod| software to a
new patch release or a new major release using a manual procedure of
step-by-step host-by-host commands.
.. note::
This section is applicable to users that DO NOT use the dcmanager software
deploy orchestration strategy to manage upgrades across subclouds.
.. rubric:: |context|
This procedure describes upversioning to either a new patch release (in-service
or reboot-required) or a new major release.
.. note::
Upversioning can also be performed to a new major release that has been
pre-patched with the current patches for the major release. This is
packaged and delivered as a pre-patched major release ISO. All the comments in
this section that are specific to a major release also apply to a
pre-patched major release.
This procedure covers all standalone configurations: |AIO-SX|, |AIO-DX| and
standard configuration.
.. note::
For a major release software deployment or patched release deployment, the
following procedure can be aborted and rolled back at any time between the
:command:`software deploy start` and :command:`software deploy delete`. For
procedure on rolling back, see
:ref:`manual-rollback-host-software-deployment-9295ce1e6e29`.
.. rubric:: |prereq|
- A recent full backup is available. It is not explicitly required for software
deployment, however, it is a good practice to have a recent full backup
available prior to performing major changes to the system.
- Highly recommended to not have any active alarms on the system, otherwise
upgrades will not proceed.
- If you are using a private container image registry for installs/updates/upgrades,
- The private registry must be populated with any new container images
required for the new software release.
- The list of container images required by the new software release can be
found in your software distribution site.
- For a new major release software deployment only,
- All hosts are unlocked and enabled/available.
- The system should be patch current, that is, all the available patch
releases for the current major release should be deployed.
- The platform issuer (system-local-ca) must be configured with an RSA
certificate/private key. If ``system-local-ca`` was configured with a
different type of certificate/private key, use the
:ref:`migrate-platform-certificates-to-use-cert-manager-c0b1727e4e5d` procedure
to reconfigure it with the RSA certificate/private key.
.. rubric:: |proc|
#. For a duplex (dual controller) system, switch the activity from
controller-1 such that controller-0 becomes active.
.. note::
This step is not required for an |AIO-SX| system.
.. code-block::
~(keystone_admin)]$ system host-swact controller-1
Wait for the activity to switch to controller-0. This may take up to a
minute depending on the hardware.
Reconnect to the system.
#. Transfer the new software release files to the active controller-0.
- For major release, it includes major release install ISO, software
signature file, and license file.
- For patch release, it includes the patch release `.patch` archive file.
#. For major release only, install the license file for the release you are upgrading to.
.. note::
This step is not required for a patch release deployment.
.. code-block::
~(keystone_admin)]$ system license-install <new-release-license-file>
#. Upload the new software release into the system.
#. For major release
.. code-block::
~(keystone_admin)]$ software upload [ --local ] <new-release>.iso <new-release>.sig
<new-release-id> is now uploaded
+-------------------------------+-------------------+
| Uploaded File | Release |
+-------------------------------+-------------------+
| <new-release>.iso | <new-release-id> |
+-------------------------------+-------------------+
where ``--local`` can be used when running this command in an |SSH|
session on the active controller to optimize performance. With this
option, the system will read files directly from the local disk rather
than transferring files over REST APIs backing the |CLI|. When using the
``--local`` option, the '<new-release>.iso' and the <new-release>.sig
arguments MUST be a full path name.
This command may take 5-10 mins depending on the hardware.
#. For patch release
.. code-block::
~(keystone_admin)]$ software upload <filename>.patch
<release-id> is now uploaded
+-------------------------------+-------------------+
| Uploaded File | Release |
+-------------------------------+-------------------+
| <new-release>.patch | <new-release-id> |
+-------------------------------+-------------------+
#. Ensure that the new software release was successfully uploaded.
.. code-block::
~(keystone_admin)]$ software list
+--------------------------+-------+-----------+
| Release | RR | State |
+--------------------------+-------+-----------+
| starlingx-10.0.0 | True | deployed |
| <new-release-id> | True | available |
+--------------------------+-------+-----------+
#. Run software deployment prechecks and confirm that the system is healthy.
.. code-block::
~(keystone_admin)]$ software deploy precheck [ -f ] <new-release-id>
System Health:
All hosts are provisioned: [OK]
All hosts are unlocked/enabled: [OK]
All hosts have current configurations: [OK]
All hosts are patch current: [OK]
Ceph Storage Healthy: [OK]
No alarms: [OK]
All kubernetes nodes are ready: [OK]
All kubernetes control plane pods are ready: [OK]
All PodSecurityPolicies are removed: [OK]
All kubernetes applications are in a valid state: [OK]
Installed license is valid: [OK]
Required patches are applied: [OK]
Resolve any checks that are not ok and re-run the :command:`software deploy precheck` command.
Use the `-f` option to ignore non-management affecting alarms.
.. note::
The failed prechecks must be cleared before software deployment is
allowed to proceed.
#. Start the software deployment procedure.
.. note::
The :command:`software deploy start` command will automatically run
the prechecks of previous steps if the prechecks have not been run or
have not passed.
By default, the software deployment procedure cannot be started unless
prechecks pass.
.. note::
The failed prechecks must be cleared before software deployment is
allowed to proceed.
.. note::
Configuration cannot be changed during the software deployment process.
.. code-block::
~(keystone_admin)]$ software deploy start [ -f ] <new-release-id>
Deployment for <new-release-id> started
Then, monitor the progress of :command:`software deploy start` using the
following commands:
.. code-block::
~(keystone_admin)]$ software deploy show
+--------------+------------------+------+---------------+
| From Release | To Release | RR | State |
+--------------+------------------+------+---------------+
| 10.0.0 | <new-release-id> | True | deploy-start |
+--------------+------------------+------+---------------+
~(keystone_admin)]$ software deploy show
+--------------+------------------+------+-------------------+
| From Release | To Release | RR | State |
+--------------+------------------+------+-------------------+
| 10.0.0 | <new-release-id> | True | deploy-start-done |
+--------------+------------------+------+-------------------+
The :command:`software deploy start` command may take 5-10 mins to reach
the ``deploy-start-done`` state depending on hardware.
.. note::
If :command:`software deploy start` fails, that is, if the state is
`deploy-start-failed`, review ``/var/log/software.log`` on the active
controller for failure details, address the issues, and run the
:command:`software deploy delete` command to delete the deploy and
re-execute the :command:`software deploy start` command.
#. Deploy the new software release to all hosts.
- For an |AIO-SX| system
#. Deploy the new software release to controller-0.
#. Only if the software deployment is ``RR=True`` (reboot required),
lock controller-0.
.. code-block::
~(keystone_admin)]$ system host-lock controller-0
#. Deploy the new software release to controller-0.
.. code-block::
~(keystone_admin)]$ software deploy host controller-0
Host installation request sent to controller-0.
Host installation was successful on controller-0.
After this command completes:
- If ``RR=TRUE``, the host is still running the old software
release, however boot parameters have been updated to boot into
the new software release on the next host reboot, which will occur
in the next step which unlocks the host.
- If ``RR=FALSE``, the host is running the new software release.
#. Only if the software deployment is ``RR=True``, unlock controller-0.
.. code-block::
~(keystone_admin)]$ system host-unlock controller-0
The host will now reboot into the new software release. Wait for the
host to finish rebooting and become enabled. This may take 3-5 mins
depending on hardware.
#. Proceed to step :ref:`Activate the software deployment
<manual-host-software-deployment-ee17ec6f71a4-step>` (software deploy
activate) of the main procedure.
- For an |AIO-DX| system or standard system
#. Deploy the new software release to controller-1 (standby controller).
#. Only if the software deployment is ``RR=True``, lock controller-1.
.. code-block::
~(keystone_admin)]$ system host-lock controller-1
#. Deploy the new software release to controller-1.
.. code-block::
~(keystone_admin)]$ software deploy host controller-1
Host installation request sent to controller-1.
Host installation was successful on controller-1.
After this command completes:
- If ``RR=TRUE``, the host is still running the old software
release, however boot parameters have been updated to boot into
the new software release on the next host reboot, which will
occur in the next step which unlocks the host.
- If ``RR=FALSE``, the host is running the new software release.
#. Only if the software deployment is ``RR=True``, unlock controller-1.
.. code-block::
~(keystone_admin)]$ system host-unlock controller-1
The host will now reboot into the new software release. Wait for the
host to finish rebooting and become enabled.
This may take 3-5 mins depending on hardware.
#. Display state of software deployment.
.. code-block::
~(keystone_admin)]$ software deploy show
+--------------+------------------+------+-------------+
| From Release | To Release | RR | State |
+--------------+------------------+------+-------------+
| 10.0.0 | <new-release-id> | True | deploy-host |
+--------------+------------------+------+-------------+
.. code-block::
~(keystone_admin)]$ software deploy host-list
+--------------+--------------+-------------------+-------+----------------------+
| Host | From Release | To Release | RR | State |
+--------------+--------------+-------------------+-------+----------------------+
| controller-0 | 10.0.0 | <new-release-id> | True | deploy-host-pending |
| controller-1 | 10.0.0 | <new-release-id> | True | deploy-host-deployed |
| storage-0 | 10.0.0 | <new-release-id> | True | deploy-host-pending |
| storage-1 | 10.0.0 | <new-release-id> | True | deploy-host-pending |
| worker-0 | 10.0.0 | <new-release-id> | True | deploy-host-pending |
| worker-1 | 10.0.0 | <new-release-id> | True | deploy-host-pending |
+--------------+--------------+-------------------+-------+----------------------+
#. Switch the activity from controller-0 such that controller-1 becomes active.
.. code-block::
~(keystone_admin)]$ system host-swact controller-0
Wait for the activity to switch to controller-1.
This may take up to a minute depending on hardware.
Reconnect to system.
#. Deploy the new software release to controller-0 (now the standby controller).
#. Only if the software deployment is ``RR=True``, lock controller-0.
.. code-block::
~(keystone_admin)]$ system host-lock controller-0
#. Deploy the new software release to controller-0.
.. code-block::
~(keystone_admin)]$ software deploy host controller-0
Host installation request sent to controller-0.
Host installation was successful on controller-0.
After this command completes:
- If ``RR=TRUE``, the host is still running the old software
release, however boot parameters have been updated to boot into
the new software release on the next host reboot, which will occur
in the next step which unlocks the host.
- If ``RR=FALSE``, the host is running the new software release.
#. Only if the software deployment is ``RR=True``, unlock controller-0.
.. code-block::
~(keystone_admin)]$ system host-unlock controller-0
The host will now reboot into the new software release. Wait for the
host to finish rebooting and become enabled.
This may take 3-5 mins depending on hardware.
#. Display state of software deployment.
.. code-block::
~(keystone_admin)]$ software deploy show
+--------------+------------------+------+-------------+
| From Release | To Release | RR | State |
+--------------+------------------+------+-------------+
| 10.0.0 | <new-release-id> | True | deploy-host |
+--------------+------------------+------+-------------+
.. code-block::
~(keystone_admin)]$ software deploy host-list
+--------------+--------------+-------------------+-------+----------------------+
| Host | From Release | To Release | RR | State |
+--------------+--------------+-------------------+-------+----------------------+
| controller-0 | 10.0.0 | <new-release-id> | True | deploy-host-deployed |
| controller-1 | 10.0.0 | <new-release-id> | True | deploy-host-deployed |
| storage-0 | 10.0.0 | <new-release-id> | True | deploy-host-pending |
| storage-1 | 10.0.0 | <new-release-id> | True | deploy-host-pending |
| worker-0 | 10.0.0 | <new-release-id> | True | deploy-host-pending |
| worker-1 | 10.0.0 | <new-release-id> | True | deploy-host-pending |
+--------------+--------------+-------------------+-------+----------------------+
#. Check the system health to ensure that there are no unexpected alarms.
.. code-block::
~(keystone_admin)]$ fm alarm-list
Clear all the alarms unrelated to the upgrade process.
#. If storage hosts are present, deploy the new software release to the
storage hosts one at a time.
#. Deploy the new software release to storage-0.
#. Only if the software deployment is ``RR=True``, lock storage-0.
.. code-block::
~(keystone_admin)]$ system host-lock storage-0
#. Deploy the new software release to storage-0.
.. code-block::
~(keystone_admin)]$ software deploy host storage-0
Host installation request sent to storage-0.
Host installation was successful on storage-0.
After this command completes:
- If ``RR=TRUE``, the host is still running the old software
release, however boot parameters have been updated to boot into
the new software release on the next host reboot, which will
occur in the next step which unlocks the host.
- If ``RR=FALSE``, the host is running the new software release.
#. Only if the software deployment is ``RR=True``, unlock storage-0.
.. code-block::
~(keystone_admin)]$ system host-unlock storage-0
The host will now reboot into the new software release. Wait for
the host to finish rebooting and become enabled. Wait for all the
alarms to clear after the unlock before proceeding to the next
storage host.
This may take 3-5 mins depending on hardware.
#. Display state of software deployment.
.. code-block::
~(keystone_admin)]$ software deploy show
+--------------+------------------+------+--------------+
| From Release | To Release | RR | State |
+--------------+------------------+------+--------------+
| 10.0.0 | <new-release-id> | True | deploy-host |
+--------------+------------------+------+--------------+
.. code-block::
~(keystone_admin)]$ software deploy host-list
+--------------+--------------+-------------------+-------+----------------------+
| Host | From Release | To Release | RR | State |
+--------------+--------------+-------------------+-------+----------------------+
| controller-0 | 10.0.0 | <new-release-id> | True | deploy-host-deployed |
| controller-1 | 10.0.0 | <new-release-id> | True | deploy-host-deployed |
| storage-0 | 10.0.0 | <new-release-id> | True | deploy-host-deployed |
| storage-1 | 10.0.0 | <new-release-id> | True | deploy-host-pending |
| worker-0 | 10.0.0 | <new-release-id> | True | deploy-host-pending |
| worker-1 | 10.0.0 | <new-release-id> | True | deploy-host-pending |
+--------------+--------------+-------------------+-------+----------------------+
#. Repeat the above steps for each storage host.
.. note::
After upgrading the first storage host, you can expect
alarm 800.003. The alarm is cleared after all the storage hosts are
upgraded.
#. If worker hosts are present, deploy the new software release to worker
hosts one at a time.
#. Deploy the new software release to worker-0.
#. Only if the software deployment is ``RR=True``, lock worker-0.
.. code-block::
~(keystone_admin)]$ system host-lock worker-0
#. Deploy the new software release to worker-0.
.. code-block::
~(keystone_admin)]$ software deploy host worker-0
Host installation request sent to worker-0.
Host installation was successful on worker-0.
After this command completes:
- If ``RR=TRUE``, the host is still running the old software
release, however boot parameters have been updated to boot into
the new software release on the next host reboot, which will
occur in the next step which unlocks the host.
- If ``RR=FALSE``, the host is running the new software release.
#. Only if the software deployment is ``RR=True``, unlock worker-0.
.. code-block::
~(keystone_admin)]$ system host-unlock worker-0
The host will now reboot into the new software release. Wait for
the host to finish rebooting and become enabled. Wait for all the
alarms to clear after the unlock before proceeding to the next
worker host.
This may take 3-5 mins depending on hardware.
#. Display state of software deployment.
.. code-block::
~(keystone_admin)]$ software deploy show
+--------------+------------------+------+--------------+
| From Release | To Release | RR | State |
+--------------+------------------+------+--------------+
| 10.0.0 | <new-release-id> | True | deploy-host |
+--------------+------------------+------+--------------+
.. code-block::
~(keystone_admin)]$ software deploy host-list
+--------------+--------------+-------------------+-------+----------------------+
| Host | From Release | To Release | RR | State |
+--------------+--------------+-------------------+-------+----------------------+
| controller-0 | 10.0.0 | <new-release-id> | True | deploy-host-deployed |
| controller-1 | 10.0.0 | <new-release-id> | True | deploy-host-deployed |
| storage-0 | 10.0.0 | <new-release-id> | True | deploy-host-deployed |
| storage-1 | 10.0.0 | <new-release-id> | True | deploy-host-deployed |
| worker-0 | 10.0.0 | <new-release-id> | True | deploy-host-deployed |
| worker-1 | 10.0.0 | <new-release-id> | True | deploy-host-pending |
+--------------+--------------+-------------------+-------+----------------------+
#. Repeat the above steps for each worker host.
#. Switch the activity from controller-1 such that controller-0 becomes active.
.. code-block::
~(keystone_admin)]$ system host-swact controller-1
Wait for the activity to switch to controller-0.
This may take up to a minute depending on hardware.
Reconnect to system.
#. Activate the software deployment.
.. _manual-host-software-deployment-ee17ec6f71a4-step:
.. code-block::
~(keystone_admin)]$ software deploy activate
Deploy activate has started
When running the :command:`software deploy activate` command, new configurations are
applied to the controller. The 250.001 (Configuration is out-of-date)
alarms are raised and are cleared as the configurations are applied.
The software deployment state goes from ``deploy-activate`` to
``deploy-activate-done`` once deployment is activated. For a major release
software deployment, this may take up to 15-30 mins to complete depending on
system configuration and hardware.
.. code-block::
~(keystone_admin)]$ software deploy show
+--------------+------------------+------+---------------------+
| From Release | To Release | RR | State |
+--------------+------------------+------+---------------------+
| 10.0.0 | <new-release-id> | True | deploy-activate-done|
+--------------+------------------+------+---------------------+
.. note::
If :command:`software deploy activate` fails, that is, if the state is
``deploy-activate-failed``, review ``/var/log/software.log`` on the active
controller for failure details, address the issues, and re-execute the
:command:`software deploy activate` command.
#. Complete the software deployment.
.. code-block::
~(keystone_admin)]$ software deploy complete
Deployment has been completed
.. code-block::
~(keystone_admin)]$ software deploy show
+--------------+------------------+------+-------------------+
| From Release | To Release | RR | State |
+--------------+------------------+------+-------------------+
| 10.0.0 | <new-release-id> | True | deploy-completed |
+--------------+------------------+------+-------------------+
.. note::
After this command is executed, you can run the Kubernetes version
upgrade procedure, if desired to upversion to new Kubernetes versions
available in the new software release.
#. Delete the software deployment.
.. note::
For a major release deployment, after this command is executed, the
major release software deployment cannot be rolled back.
.. code-block::
~(keystone_admin)]$ software deploy delete
Deployment has been deleted
.. code-block::
~(keystone_admin)]$ software deploy show
No deploy in progress
.. note::
After the deploy delete, if there are previous release entries in the
unavailable state, the alarm 900.024 ``Obsolete release in system`` is
raised.
.. rubric:: |postreq|
#. Delete the old major release.
.. note::
If it is a System Controller, the deployment should not be deleted
until the subclouds are up-to-date.
In the case of software deployment of a new major release, you should remove
the old major release to reclaim disk space.
.. code-block::
~(keystone_admin)]$ software list
+--------------------------+-------+-------------+
| Release | RR | State |
+--------------------------+-------+-------------+
| starlingx-10.0.0 | True | unavailable |
| <new-major-release-id> | True | deployed |
+--------------------------+-------+-------------+
.. code-block::
~(keystone_admin)]$ software delete starlingx-10.0.0
starlingx-10.0.0 has been deleted.
~(keystone_admin)]$ software list
+--------------------------+-------+-------------+
| Release | RR | State |
+--------------------------+-------+-------------+
| <new-major-release-id> | True | deployed |
+--------------------------+-------+-------------+