Remote Redfish Subcloud Restore

Fixed Merge conflicts Fixed review comments for patchset 8 Fixed review comments for patchset 7 Fixed review comments for Patchset 4 Moved restoring-subclouds-from-backupdata-using-dcmanager to the Distributed Cloud Guide Story: 2008573 Task: 42332 Signed-off-by: Juanita-Balaraj <juanita.balaraj@windriver.com> Change-Id: Ife0319125df38c54fb0baa79ac32070446a0d605 Signed-off-by: Juanita-Balaraj <juanita.balaraj@windriver.com>
2021-04-23 16:59:39 -04:00 · 2021-04-23 16:59:39 -04:00 · e2e42814e6
commit e2e42814e6
parent 7230189e63
6 changed files with 163 additions and 18 deletions
--- a/doc/source/backup/.vscode/settings.json
+++ b/doc/source/backup/.vscode/settings.json
@ -0,0 +1,3 @@
+{
+    "restructuredtext.confPath": ""
+}
--- a/doc/source/backup/kubernetes/restoring-starlingx-system-data-and-storage.rst
+++ b/doc/source/backup/kubernetes/restoring-starlingx-system-data-and-storage.rst
@ -28,24 +28,34 @@ specific applications must be re-applied once a storage cluster is configured.
    To restore the data, use the same version of the boot image \(ISO\) that
    was used at the time of the original installation.

-The |prod| restore supports two modes:
+The |prod| restore supports the following optional modes:

 .. _restoring-starlingx-system-data-and-storage-ol-tw4-kvc-4jb:

-#.  To keep the Ceph cluster data intact \(false - default option\), use the
-    following syntax, when passing the extra arguments to the Ansible Restore
+-   To keep the Ceph cluster data intact \(false - default option\), use the
+    following parameter, when passing the extra arguments to the Ansible Restore
    playbook command:

    .. code-block:: none

       wipe_ceph_osds=false

-#.  To wipe the Ceph cluster entirely \(true\), where the Ceph cluster will
-    need to be recreated, use the following syntax:
+-   To wipe the Ceph cluster entirely \(true\), where the Ceph cluster will
+    need to be recreated, use the following parameter:

    .. code-block:: none

-       wipe_ceph_osds=true
+        wipe_ceph_osds=true
+
+-   To indicate that the backup data file is under /opt/platform-backup
+    directory on the local machine, use the following parameter:
+
+    .. code-block:: none
+
+        on_box_data=true
+
+    If this parameter is set to **false**, the Ansible Restore playbook expects
+    both the **initial_backup_dir** and **backup_filename** to be specified.

 Restoring a |prod| cluster from a backup file is done by re-installing the
 ISO on controller-0, running the Ansible Restore Playbook, applying updates
--- a/doc/source/backup/kubernetes/running-restore-playbook-locally-on-the-controller.rst
+++ b/doc/source/backup/kubernetes/running-restore-playbook-locally-on-the-controller.rst
@ -18,22 +18,20 @@ following command to run the Ansible Restore playbook:

    ~(keystone_admin)]$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_platform.yml -e "initial_backup_dir=<location_of_tarball ansible_become_pass=<admin_password> admin_password=<admin_password backup_filename=<backup_filename> wipe_ceph_osds=<true/false>"

-The |prod| restore supports two optional modes, keeping the Ceph cluster data
-intact or wiping the Ceph cluster.
-
-.. rubric:: |proc|
+The |prod| restore supports the following optional modes, keeping the Ceph
+cluster data intact or wiping the Ceph cluster.

 .. _running-restore-playbook-locally-on-the-controller-steps-usl-2c3-pmb:

-#.  To keep the Ceph cluster data intact \(false - default option\), use the
-    following command:
+-   To keep the Ceph cluster data intact \(false - default option\), use the
+    following parameter:

    .. code-block:: none

       wipe_ceph_osds=false

-#.  To wipe the Ceph cluster entirely \(true\), where the Ceph cluster will
-    need to be recreated, use the following command:
+-   To wipe the Ceph cluster entirely \(true\), where the Ceph cluster will
+    need to be recreated, use the following parameter:

    .. code-block:: none

@ -50,12 +48,23 @@ intact or wiping the Ceph cluster.
        the patches and prompt you to reboot the system. Then you will need to
        re-run Ansible Restore playbook.

+-   To indicate that the backup data file is under /opt/platform-backup
+    directory on the local machine, use the following parameter:
+
+    .. code-block:: none
+
+        on_box_data=true
+
+    If this parameter is set to **false**, the Ansible Restore playbook expects
+    both the **initial_backup_dir** and **backup_filename** to be specified.
+
 .. rubric:: |postreq|

 After running restore\_platform.yml playbook, you can restore the local
 registry images.

 .. note::
+
    The backup file of the local registry images may be large. Restore the
    backed up file on the controller, where there is sufficient space.

--- a/doc/source/backup/kubernetes/system-backup-running-ansible-restore-playbook-remotely.rst
+++ b/doc/source/backup/kubernetes/system-backup-running-ansible-restore-playbook-remotely.rst
@ -51,18 +51,27 @@ In this method you can run Ansible Restore playbook and point to controller-0.

    where optional-extra-vars can be:

-    -   **Optional**: You can select one of the two restore modes:
+    -   **Optional**: You can select one of the following restore modes:

        -   To keep Ceph data intact \(false - default option\), use the
-            following syntax:
+            following parameter:

            :command:`wipe_ceph_osds=false`

-        -   Start with an empty Ceph cluster \(true\), to recreate a new
-            Ceph cluster, use the following syntax:
+        -   To start with an empty Ceph cluster \(true\), where the Ceph
+            cluster will need to be recreated, use the following parameter:

            :command:`wipe_ceph_osds=true`

+        -   To indicate that the backup data file is under /opt/platform-backup
+            directory on the local machine, use the following parameter:
+
+            :command:`on_box_data=true`
+
+            If this parameter is set to **false**, the Ansible Restore playbook
+            expects both the **initial_backup_dir** and **backup_filename**
+            to be specified.
+
    -   The backup\_filename is the platform backup tar file. It must be
        provided using the ``-e`` option on the command line, for example:

--- a/doc/source/dist_cloud/index.rst
+++ b/doc/source/dist_cloud/index.rst
@ -49,6 +49,7 @@ Operation
    changing-the-admin-password-on-distributed-cloud
    updating-docker-registry-credentials-on-a-subcloud
    migrate-an-aiosx-subcloud-to-an-aiodx-subcloud
+    restoring-subclouds-from-backupdata-using-dcmanager

 ----------------------------------------------------------
 Kubernetes Version Upgrade Distributed Cloud Orchestration
--- a/doc/source/dist_cloud/restoring-subclouds-from-backupdata-using-dcmanager.rst
+++ b/doc/source/dist_cloud/restoring-subclouds-from-backupdata-using-dcmanager.rst
@ -0,0 +1,113 @@
+
+.. _restoring-subclouds-from-backupdata-using-dcmanager:
+
+=========================================================
+Restoring a Subcloud From Backup Data Using DCManager CLI
+=========================================================
+
+For subclouds with servers that support Redfish Virtual Media Service
+(version 1.2 or higher), you can use the Central Cloud's CLI to restore the
+subcloud from data that was backed up previously.
+
+.. rubric:: |context|
+
+The CLI command :command:`dcmanager subcloud restore` can be used to restore a
+subcloud from available system data and bring it back to the operational state
+it was in when the backup procedure took place. The subcloud restore has three
+phases:
+
+-   Re-install the controller-0 of the subcloud with the current active load
+    running in the SystemController. For subcloud servers that support
+    Redfish Virtual Media Service, this phase can be carried out remotely
+    as part of the CLI.
+
+-   Run Ansible Platform Restore to restore |prod|, from a previous backup on
+    the controller-0 of the subcloud. This phase is also carried out as part
+    of the CLI.
+
+-   Unlock the controller-0 of the subcloud and continue with the steps to
+    restore the remaining nodes of the subcloud where applicable. This phase
+    is carried out by the system administrator, see :ref:`Restoring Platform System Data and Storage <restoring-starlingx-system-data-and-storage>`.
+
+.. rubric:: |prereq|
+
+-   The SystemController is healthy, and ready to accept **dcmanager** related
+    commands.
+
+-   The subcloud is unmanaged, and not in the process of installation,
+    bootstrap or deployment.
+
+-   The platform backup tar file is already on the subcloud in
+    /opt/platform-backup directory or has been transferred to the
+    SystemController.
+
+-   The subcloud install values have been saved in the **dcmanager** database
+    i.e. the subcloud has been installed remotely as part of :command:`dcmanager subcloud add`.
+
+.. rubric:: |proc|
+
+#.  Create the restore_values.yaml file which will be passed to the
+    :command:`dcmanager subcloud restore` command using the ``--restore-values``
+    option. This file contains parameters that will be used during the platform
+    restore phase. Minimally, the **backup_filename** parameter, indicating the
+    file containing a previous backup of the subcloud, must be specified in the
+    yaml file, see :ref:`Run Ansible Restore Playbook Remotely <system-backup-running-ansible-restore-playbook-remotely>`,
+    and, :ref:`Run Restore Playbook Locally on the Controller <running-restore-playbook-locally-on-the-controller>`,
+    for supported restore parameters.
+
+#.  Restore the subcloud, using the dcmanager CLI command, :command:`subcloud restore`
+    and specify the restore values, with the ``--with-install`` option and the
+    subcloud's sysadmin password.
+
+    .. code-block:: none
+
+        ~(keystone_admin) $ dcmanager subcloud restore --restore-values /home/sysadmin/subcloud1-restore.yaml --with-install --sysadmin-password <sysadmin_password> subcloud-name-or-id
+
+    Where:
+
+    -  ``--restore-values`` must reference the restore values yaml file
+       mentioned in Step 1 of this procedure.
+
+    -  ``--with-install`` indicates that a re-install of controller-0 of the
+       subcloud should be done remotely using Redfish Virtual Media Service.
+
+    If the ``--sysadmin-password`` option is not specified, the system
+    administrator will be prompted for the password. The password is masked
+    when it is entered. Enter the sysadmin password for the subcloud.
+    The **dcmanager subcloud restore** can take up to 30 minutes to reinstall
+    and restore the platform on controller-0 of the subcloud.
+
+#.  On the Central Cloud (SystemController), monitor the progress of the
+    subcloud reinstall and restore via the deploy status field of the
+    :command:`dcmanager subcloud list` command.
+
+    .. code-block:: none
+
+        ~(keystone_admin)]$ dcmanager subcloud list
+
+        +----+-----------+------------+--------------+---------------+---------+
+        | id | name      | management | availability | deploy status | sync    |
+        +----+-----------+------------+--------------+---------------+---------+
+        |  1 | subcloud1 | unmanaged  | online       | installing    | unknown |
+        +----+-----------+------------+--------------+---------------+---------+
+
+#.  In case of a failure, check the Ansible log for the corresponding subcloud
+    under /var/log/dcmanager/ansible directory.
+
+#.  When the subcloud deploy status changes to "complete", the controller-0
+    is ready to be unlocked. Log into the controller-0 of the subcloud using
+    its bootstrap IP and unlock the host using the following command.
+
+    .. code-block:: none
+
+        ~(keystone_admin)]$ system host-unlock controller-0
+
+#.  For |AIO|-DX and Standard subclouds, follow the procedure,
+    see :ref:`Restoring Platform System Data and Storage <restoring-starlingx-system-data-and-storage>`
+    to restore the rest of the subcloud nodes.
+
+#.  To resume subcloud audit, use the following command.
+
+    .. code-block:: none
+
+        ~(keystone_admin)]$ dcmanager subcloud manage