From e2e42814e6a18eb7cf61f047ec07628707a1ac33 Mon Sep 17 00:00:00 2001 From: Juanita-Balaraj Date: Fri, 23 Apr 2021 16:59:39 -0400 Subject: [PATCH] Remote Redfish Subcloud Restore Fixed Merge conflicts Fixed review comments for patchset 8 Fixed review comments for patchset 7 Fixed review comments for Patchset 4 Moved restoring-subclouds-from-backupdata-using-dcmanager to the Distributed Cloud Guide Story: 2008573 Task: 42332 Signed-off-by: Juanita-Balaraj Change-Id: Ife0319125df38c54fb0baa79ac32070446a0d605 Signed-off-by: Juanita-Balaraj --- doc/source/backup/.vscode/settings.json | 3 + ...ring-starlingx-system-data-and-storage.rst | 22 +++- ...ore-playbook-locally-on-the-controller.rst | 25 ++-- ...ning-ansible-restore-playbook-remotely.rst | 17 ++- doc/source/dist_cloud/index.rst | 1 + ...clouds-from-backupdata-using-dcmanager.rst | 113 ++++++++++++++++++ 6 files changed, 163 insertions(+), 18 deletions(-) create mode 100644 doc/source/backup/.vscode/settings.json create mode 100644 doc/source/dist_cloud/restoring-subclouds-from-backupdata-using-dcmanager.rst diff --git a/doc/source/backup/.vscode/settings.json b/doc/source/backup/.vscode/settings.json new file mode 100644 index 000000000..3cce948f6 --- /dev/null +++ b/doc/source/backup/.vscode/settings.json @@ -0,0 +1,3 @@ +{ + "restructuredtext.confPath": "" +} \ No newline at end of file diff --git a/doc/source/backup/kubernetes/restoring-starlingx-system-data-and-storage.rst b/doc/source/backup/kubernetes/restoring-starlingx-system-data-and-storage.rst index 711e8c02d..d68ba0e46 100644 --- a/doc/source/backup/kubernetes/restoring-starlingx-system-data-and-storage.rst +++ b/doc/source/backup/kubernetes/restoring-starlingx-system-data-and-storage.rst @@ -28,24 +28,34 @@ specific applications must be re-applied once a storage cluster is configured. To restore the data, use the same version of the boot image \(ISO\) that was used at the time of the original installation. -The |prod| restore supports two modes: +The |prod| restore supports the following optional modes: .. _restoring-starlingx-system-data-and-storage-ol-tw4-kvc-4jb: -#. To keep the Ceph cluster data intact \(false - default option\), use the - following syntax, when passing the extra arguments to the Ansible Restore +- To keep the Ceph cluster data intact \(false - default option\), use the + following parameter, when passing the extra arguments to the Ansible Restore playbook command: .. code-block:: none wipe_ceph_osds=false -#. To wipe the Ceph cluster entirely \(true\), where the Ceph cluster will - need to be recreated, use the following syntax: +- To wipe the Ceph cluster entirely \(true\), where the Ceph cluster will + need to be recreated, use the following parameter: .. code-block:: none - wipe_ceph_osds=true + wipe_ceph_osds=true + +- To indicate that the backup data file is under /opt/platform-backup + directory on the local machine, use the following parameter: + + .. code-block:: none + + on_box_data=true + + If this parameter is set to **false**, the Ansible Restore playbook expects + both the **initial_backup_dir** and **backup_filename** to be specified. Restoring a |prod| cluster from a backup file is done by re-installing the ISO on controller-0, running the Ansible Restore Playbook, applying updates diff --git a/doc/source/backup/kubernetes/running-restore-playbook-locally-on-the-controller.rst b/doc/source/backup/kubernetes/running-restore-playbook-locally-on-the-controller.rst index 8834c3f3a..23a70aa7a 100644 --- a/doc/source/backup/kubernetes/running-restore-playbook-locally-on-the-controller.rst +++ b/doc/source/backup/kubernetes/running-restore-playbook-locally-on-the-controller.rst @@ -18,22 +18,20 @@ following command to run the Ansible Restore playbook: ~(keystone_admin)]$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_platform.yml -e "initial_backup_dir= admin_password= wipe_ceph_osds=" -The |prod| restore supports two optional modes, keeping the Ceph cluster data -intact or wiping the Ceph cluster. - -.. rubric:: |proc| +The |prod| restore supports the following optional modes, keeping the Ceph +cluster data intact or wiping the Ceph cluster. .. _running-restore-playbook-locally-on-the-controller-steps-usl-2c3-pmb: -#. To keep the Ceph cluster data intact \(false - default option\), use the - following command: +- To keep the Ceph cluster data intact \(false - default option\), use the + following parameter: .. code-block:: none wipe_ceph_osds=false -#. To wipe the Ceph cluster entirely \(true\), where the Ceph cluster will - need to be recreated, use the following command: +- To wipe the Ceph cluster entirely \(true\), where the Ceph cluster will + need to be recreated, use the following parameter: .. code-block:: none @@ -50,12 +48,23 @@ intact or wiping the Ceph cluster. the patches and prompt you to reboot the system. Then you will need to re-run Ansible Restore playbook. +- To indicate that the backup data file is under /opt/platform-backup + directory on the local machine, use the following parameter: + + .. code-block:: none + + on_box_data=true + + If this parameter is set to **false**, the Ansible Restore playbook expects + both the **initial_backup_dir** and **backup_filename** to be specified. + .. rubric:: |postreq| After running restore\_platform.yml playbook, you can restore the local registry images. .. note:: + The backup file of the local registry images may be large. Restore the backed up file on the controller, where there is sufficient space. diff --git a/doc/source/backup/kubernetes/system-backup-running-ansible-restore-playbook-remotely.rst b/doc/source/backup/kubernetes/system-backup-running-ansible-restore-playbook-remotely.rst index ca19932e6..748c85b8e 100644 --- a/doc/source/backup/kubernetes/system-backup-running-ansible-restore-playbook-remotely.rst +++ b/doc/source/backup/kubernetes/system-backup-running-ansible-restore-playbook-remotely.rst @@ -51,18 +51,27 @@ In this method you can run Ansible Restore playbook and point to controller-0. where optional-extra-vars can be: - - **Optional**: You can select one of the two restore modes: + - **Optional**: You can select one of the following restore modes: - To keep Ceph data intact \(false - default option\), use the - following syntax: + following parameter: :command:`wipe_ceph_osds=false` - - Start with an empty Ceph cluster \(true\), to recreate a new - Ceph cluster, use the following syntax: + - To start with an empty Ceph cluster \(true\), where the Ceph + cluster will need to be recreated, use the following parameter: :command:`wipe_ceph_osds=true` + - To indicate that the backup data file is under /opt/platform-backup + directory on the local machine, use the following parameter: + + :command:`on_box_data=true` + + If this parameter is set to **false**, the Ansible Restore playbook + expects both the **initial_backup_dir** and **backup_filename** + to be specified. + - The backup\_filename is the platform backup tar file. It must be provided using the ``-e`` option on the command line, for example: diff --git a/doc/source/dist_cloud/index.rst b/doc/source/dist_cloud/index.rst index 374b41886..661dab0da 100644 --- a/doc/source/dist_cloud/index.rst +++ b/doc/source/dist_cloud/index.rst @@ -49,6 +49,7 @@ Operation changing-the-admin-password-on-distributed-cloud updating-docker-registry-credentials-on-a-subcloud migrate-an-aiosx-subcloud-to-an-aiodx-subcloud + restoring-subclouds-from-backupdata-using-dcmanager ---------------------------------------------------------- Kubernetes Version Upgrade Distributed Cloud Orchestration diff --git a/doc/source/dist_cloud/restoring-subclouds-from-backupdata-using-dcmanager.rst b/doc/source/dist_cloud/restoring-subclouds-from-backupdata-using-dcmanager.rst new file mode 100644 index 000000000..a4d3f109f --- /dev/null +++ b/doc/source/dist_cloud/restoring-subclouds-from-backupdata-using-dcmanager.rst @@ -0,0 +1,113 @@ + +.. _restoring-subclouds-from-backupdata-using-dcmanager: + +========================================================= +Restoring a Subcloud From Backup Data Using DCManager CLI +========================================================= + +For subclouds with servers that support Redfish Virtual Media Service +(version 1.2 or higher), you can use the Central Cloud's CLI to restore the +subcloud from data that was backed up previously. + +.. rubric:: |context| + +The CLI command :command:`dcmanager subcloud restore` can be used to restore a +subcloud from available system data and bring it back to the operational state +it was in when the backup procedure took place. The subcloud restore has three +phases: + +- Re-install the controller-0 of the subcloud with the current active load + running in the SystemController. For subcloud servers that support + Redfish Virtual Media Service, this phase can be carried out remotely + as part of the CLI. + +- Run Ansible Platform Restore to restore |prod|, from a previous backup on + the controller-0 of the subcloud. This phase is also carried out as part + of the CLI. + +- Unlock the controller-0 of the subcloud and continue with the steps to + restore the remaining nodes of the subcloud where applicable. This phase + is carried out by the system administrator, see :ref:`Restoring Platform System Data and Storage `. + +.. rubric:: |prereq| + +- The SystemController is healthy, and ready to accept **dcmanager** related + commands. + +- The subcloud is unmanaged, and not in the process of installation, + bootstrap or deployment. + +- The platform backup tar file is already on the subcloud in + /opt/platform-backup directory or has been transferred to the + SystemController. + +- The subcloud install values have been saved in the **dcmanager** database + i.e. the subcloud has been installed remotely as part of :command:`dcmanager subcloud add`. + +.. rubric:: |proc| + +#. Create the restore_values.yaml file which will be passed to the + :command:`dcmanager subcloud restore` command using the ``--restore-values`` + option. This file contains parameters that will be used during the platform + restore phase. Minimally, the **backup_filename** parameter, indicating the + file containing a previous backup of the subcloud, must be specified in the + yaml file, see :ref:`Run Ansible Restore Playbook Remotely `, + and, :ref:`Run Restore Playbook Locally on the Controller `, + for supported restore parameters. + +#. Restore the subcloud, using the dcmanager CLI command, :command:`subcloud restore` + and specify the restore values, with the ``--with-install`` option and the + subcloud's sysadmin password. + + .. code-block:: none + + ~(keystone_admin) $ dcmanager subcloud restore --restore-values /home/sysadmin/subcloud1-restore.yaml --with-install --sysadmin-password subcloud-name-or-id + + Where: + + - ``--restore-values`` must reference the restore values yaml file + mentioned in Step 1 of this procedure. + + - ``--with-install`` indicates that a re-install of controller-0 of the + subcloud should be done remotely using Redfish Virtual Media Service. + + If the ``--sysadmin-password`` option is not specified, the system + administrator will be prompted for the password. The password is masked + when it is entered. Enter the sysadmin password for the subcloud. + The **dcmanager subcloud restore** can take up to 30 minutes to reinstall + and restore the platform on controller-0 of the subcloud. + +#. On the Central Cloud (SystemController), monitor the progress of the + subcloud reinstall and restore via the deploy status field of the + :command:`dcmanager subcloud list` command. + + .. code-block:: none + + ~(keystone_admin)]$ dcmanager subcloud list + + +----+-----------+------------+--------------+---------------+---------+ + | id | name | management | availability | deploy status | sync | + +----+-----------+------------+--------------+---------------+---------+ + | 1 | subcloud1 | unmanaged | online | installing | unknown | + +----+-----------+------------+--------------+---------------+---------+ + +#. In case of a failure, check the Ansible log for the corresponding subcloud + under /var/log/dcmanager/ansible directory. + +#. When the subcloud deploy status changes to "complete", the controller-0 + is ready to be unlocked. Log into the controller-0 of the subcloud using + its bootstrap IP and unlock the host using the following command. + + .. code-block:: none + + ~(keystone_admin)]$ system host-unlock controller-0 + +#. For |AIO|-DX and Standard subclouds, follow the procedure, + see :ref:`Restoring Platform System Data and Storage ` + to restore the rest of the subcloud nodes. + +#. To resume subcloud audit, use the following command. + + .. code-block:: none + + ~(keystone_admin)]$ dcmanager subcloud manage