Remote Redfish Subcloud Restore

Fixed Merge conflicts
Fixed review comments for patchset 8
Fixed review comments for patchset 7
Fixed review comments for Patchset 4
Moved restoring-subclouds-from-backupdata-using-dcmanager to the Distributed Cloud Guide

Story: 2008573
Task: 42332

Signed-off-by: Juanita-Balaraj <juanita.balaraj@windriver.com>
Change-Id: Ife0319125df38c54fb0baa79ac32070446a0d605
Signed-off-by: Juanita-Balaraj <juanita.balaraj@windriver.com>
This commit is contained in:
Juanita-Balaraj 2021-04-23 16:59:39 -04:00
parent 7230189e63
commit e2e42814e6
6 changed files with 163 additions and 18 deletions

View File

@ -0,0 +1,3 @@
{
"restructuredtext.confPath": ""
}

View File

@ -28,24 +28,34 @@ specific applications must be re-applied once a storage cluster is configured.
To restore the data, use the same version of the boot image \(ISO\) that
was used at the time of the original installation.
The |prod| restore supports two modes:
The |prod| restore supports the following optional modes:
.. _restoring-starlingx-system-data-and-storage-ol-tw4-kvc-4jb:
#. To keep the Ceph cluster data intact \(false - default option\), use the
following syntax, when passing the extra arguments to the Ansible Restore
- To keep the Ceph cluster data intact \(false - default option\), use the
following parameter, when passing the extra arguments to the Ansible Restore
playbook command:
.. code-block:: none
wipe_ceph_osds=false
#. To wipe the Ceph cluster entirely \(true\), where the Ceph cluster will
need to be recreated, use the following syntax:
- To wipe the Ceph cluster entirely \(true\), where the Ceph cluster will
need to be recreated, use the following parameter:
.. code-block:: none
wipe_ceph_osds=true
wipe_ceph_osds=true
- To indicate that the backup data file is under /opt/platform-backup
directory on the local machine, use the following parameter:
.. code-block:: none
on_box_data=true
If this parameter is set to **false**, the Ansible Restore playbook expects
both the **initial_backup_dir** and **backup_filename** to be specified.
Restoring a |prod| cluster from a backup file is done by re-installing the
ISO on controller-0, running the Ansible Restore Playbook, applying updates

View File

@ -18,22 +18,20 @@ following command to run the Ansible Restore playbook:
~(keystone_admin)]$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_platform.yml -e "initial_backup_dir=<location_of_tarball ansible_become_pass=<admin_password> admin_password=<admin_password backup_filename=<backup_filename> wipe_ceph_osds=<true/false>"
The |prod| restore supports two optional modes, keeping the Ceph cluster data
intact or wiping the Ceph cluster.
.. rubric:: |proc|
The |prod| restore supports the following optional modes, keeping the Ceph
cluster data intact or wiping the Ceph cluster.
.. _running-restore-playbook-locally-on-the-controller-steps-usl-2c3-pmb:
#. To keep the Ceph cluster data intact \(false - default option\), use the
following command:
- To keep the Ceph cluster data intact \(false - default option\), use the
following parameter:
.. code-block:: none
wipe_ceph_osds=false
#. To wipe the Ceph cluster entirely \(true\), where the Ceph cluster will
need to be recreated, use the following command:
- To wipe the Ceph cluster entirely \(true\), where the Ceph cluster will
need to be recreated, use the following parameter:
.. code-block:: none
@ -50,12 +48,23 @@ intact or wiping the Ceph cluster.
the patches and prompt you to reboot the system. Then you will need to
re-run Ansible Restore playbook.
- To indicate that the backup data file is under /opt/platform-backup
directory on the local machine, use the following parameter:
.. code-block:: none
on_box_data=true
If this parameter is set to **false**, the Ansible Restore playbook expects
both the **initial_backup_dir** and **backup_filename** to be specified.
.. rubric:: |postreq|
After running restore\_platform.yml playbook, you can restore the local
registry images.
.. note::
The backup file of the local registry images may be large. Restore the
backed up file on the controller, where there is sufficient space.

View File

@ -51,18 +51,27 @@ In this method you can run Ansible Restore playbook and point to controller-0.
where optional-extra-vars can be:
- **Optional**: You can select one of the two restore modes:
- **Optional**: You can select one of the following restore modes:
- To keep Ceph data intact \(false - default option\), use the
following syntax:
following parameter:
:command:`wipe_ceph_osds=false`
- Start with an empty Ceph cluster \(true\), to recreate a new
Ceph cluster, use the following syntax:
- To start with an empty Ceph cluster \(true\), where the Ceph
cluster will need to be recreated, use the following parameter:
:command:`wipe_ceph_osds=true`
- To indicate that the backup data file is under /opt/platform-backup
directory on the local machine, use the following parameter:
:command:`on_box_data=true`
If this parameter is set to **false**, the Ansible Restore playbook
expects both the **initial_backup_dir** and **backup_filename**
to be specified.
- The backup\_filename is the platform backup tar file. It must be
provided using the ``-e`` option on the command line, for example:

View File

@ -49,6 +49,7 @@ Operation
changing-the-admin-password-on-distributed-cloud
updating-docker-registry-credentials-on-a-subcloud
migrate-an-aiosx-subcloud-to-an-aiodx-subcloud
restoring-subclouds-from-backupdata-using-dcmanager
----------------------------------------------------------
Kubernetes Version Upgrade Distributed Cloud Orchestration

View File

@ -0,0 +1,113 @@
.. _restoring-subclouds-from-backupdata-using-dcmanager:
=========================================================
Restoring a Subcloud From Backup Data Using DCManager CLI
=========================================================
For subclouds with servers that support Redfish Virtual Media Service
(version 1.2 or higher), you can use the Central Cloud's CLI to restore the
subcloud from data that was backed up previously.
.. rubric:: |context|
The CLI command :command:`dcmanager subcloud restore` can be used to restore a
subcloud from available system data and bring it back to the operational state
it was in when the backup procedure took place. The subcloud restore has three
phases:
- Re-install the controller-0 of the subcloud with the current active load
running in the SystemController. For subcloud servers that support
Redfish Virtual Media Service, this phase can be carried out remotely
as part of the CLI.
- Run Ansible Platform Restore to restore |prod|, from a previous backup on
the controller-0 of the subcloud. This phase is also carried out as part
of the CLI.
- Unlock the controller-0 of the subcloud and continue with the steps to
restore the remaining nodes of the subcloud where applicable. This phase
is carried out by the system administrator, see :ref:`Restoring Platform System Data and Storage <restoring-starlingx-system-data-and-storage>`.
.. rubric:: |prereq|
- The SystemController is healthy, and ready to accept **dcmanager** related
commands.
- The subcloud is unmanaged, and not in the process of installation,
bootstrap or deployment.
- The platform backup tar file is already on the subcloud in
/opt/platform-backup directory or has been transferred to the
SystemController.
- The subcloud install values have been saved in the **dcmanager** database
i.e. the subcloud has been installed remotely as part of :command:`dcmanager subcloud add`.
.. rubric:: |proc|
#. Create the restore_values.yaml file which will be passed to the
:command:`dcmanager subcloud restore` command using the ``--restore-values``
option. This file contains parameters that will be used during the platform
restore phase. Minimally, the **backup_filename** parameter, indicating the
file containing a previous backup of the subcloud, must be specified in the
yaml file, see :ref:`Run Ansible Restore Playbook Remotely <system-backup-running-ansible-restore-playbook-remotely>`,
and, :ref:`Run Restore Playbook Locally on the Controller <running-restore-playbook-locally-on-the-controller>`,
for supported restore parameters.
#. Restore the subcloud, using the dcmanager CLI command, :command:`subcloud restore`
and specify the restore values, with the ``--with-install`` option and the
subcloud's sysadmin password.
.. code-block:: none
~(keystone_admin) $ dcmanager subcloud restore --restore-values /home/sysadmin/subcloud1-restore.yaml --with-install --sysadmin-password <sysadmin_password> subcloud-name-or-id
Where:
- ``--restore-values`` must reference the restore values yaml file
mentioned in Step 1 of this procedure.
- ``--with-install`` indicates that a re-install of controller-0 of the
subcloud should be done remotely using Redfish Virtual Media Service.
If the ``--sysadmin-password`` option is not specified, the system
administrator will be prompted for the password. The password is masked
when it is entered. Enter the sysadmin password for the subcloud.
The **dcmanager subcloud restore** can take up to 30 minutes to reinstall
and restore the platform on controller-0 of the subcloud.
#. On the Central Cloud (SystemController), monitor the progress of the
subcloud reinstall and restore via the deploy status field of the
:command:`dcmanager subcloud list` command.
.. code-block:: none
~(keystone_admin)]$ dcmanager subcloud list
+----+-----------+------------+--------------+---------------+---------+
| id | name | management | availability | deploy status | sync |
+----+-----------+------------+--------------+---------------+---------+
| 1 | subcloud1 | unmanaged | online | installing | unknown |
+----+-----------+------------+--------------+---------------+---------+
#. In case of a failure, check the Ansible log for the corresponding subcloud
under /var/log/dcmanager/ansible directory.
#. When the subcloud deploy status changes to "complete", the controller-0
is ready to be unlocked. Log into the controller-0 of the subcloud using
its bootstrap IP and unlock the host using the following command.
.. code-block:: none
~(keystone_admin)]$ system host-unlock controller-0
#. For |AIO|-DX and Standard subclouds, follow the procedure,
see :ref:`Restoring Platform System Data and Storage <restoring-starlingx-system-data-and-storage>`
to restore the rest of the subcloud nodes.
#. To resume subcloud audit, use the following command.
.. code-block:: none
~(keystone_admin)]$ dcmanager subcloud manage