Update Backup and Restore

Some minor changes added Signed-off-by: Rafael Jardim <rafaeljordao.jardim@windriver.com> Change-Id: I6eabc540b7c1ec7f73a9665401f107721808aa64
2021-03-08 15:17:39 -03:00 · 2021-03-08 15:17:39 -03:00 · 946f7d1f4c
commit 946f7d1f4c
parent b60579b001
6 changed files with 83 additions and 86 deletions
--- a/doc/source/backup/backing-up-starlingx-system-data.rst
+++ b/doc/source/backup/backing-up-starlingx-system-data.rst
@ -101,7 +101,7 @@ The backup contains details as listed below:

        -   item=/opt/extension

-    -   dc-vault filesystem for Distributed Cloud system-controller:
+    -   dc-vault filesystem for |prod-dc| system-controller:

        -   item=/opt/dc-vault

--- a/doc/source/backup/restoring-starlingx-system-data-and-storage.rst
+++ b/doc/source/backup/restoring-starlingx-system-data-and-storage.rst
@ -83,14 +83,14 @@ conditions are in place:
    network when powered on. If this is not the case, you must configure each
    host manually for network boot immediately after powering it on.

-   If you are restoring a Distributed Cloud subcloud first, ensure it is in
+-   If you are restoring a |prod-dc| subcloud first, ensure it is in
    an **unmanaged** state on the Central Cloud \(SystemController\) by using
    the following commands:

    .. code-block:: none

        $ source /etc/platform/openrc
-        ~(keystone_admin)$ dcmanager subcloud unmanage <subcloud-name>
+        ~(keystone_admin)]$ dcmanager subcloud unmanage <subcloud-name>

    where <subcloud-name> is the name of the subcloud to be unmanaged.

@ -117,17 +117,20 @@ conditions are in place:

 #.  Install network connectivity required for the subcloud.

-#.  Ensure the backup file is available on the controller. Run the Ansible
-    Restore playbook. For more information on restoring the back up file, see
-    :ref:`Run Restore Playbook Locally on the Controller
+#.  Ensure that the backup file are available on the controller. Run both
+    Ansible Restore playbooks, restore\_platform.yml and restore\_user\_images.yml.
+    For more information on restoring the back up file, see :ref:`Run Restore
+    Playbook Locally on the Controller
    <running-restore-playbook-locally-on-the-controller>`, and :ref:`Run
    Ansible Restore Playbook Remotely
    <system-backup-running-ansible-restore-playbook-remotely>`.

    .. note::
-        The backup file contains the system data and updates.
+        The backup files contains the system data and updates.

-#.  Update the controller's software to the previous updating level.
+#.  If the backup file contains patches, Ansible Restore playbook
+    restore\_platform.yml will apply the patches and prompt you to reboot the
+    system, you will need to re-run Ansible Restore playbook

    The current software version on the controller is compared against the
    version available in the backup file. If the backed-up version includes
@ -146,13 +149,16 @@ conditions are in place:
        LIBCUNIT_CONTROLLER_ONLY   Applied    20.06      n/a
        STORAGECONFIG              Applied    20.06      n/a

-    Rerun the Ansible Restore Playbook.
+    Rerun the Ansible Playbook if there were patches applied and you were
+    prompted to reboot the system.

-#.  Unlock Controller-0.
+#.  Restore the local registry using the file restore\_user\_images.yml.
+
+    This must be done before unlocking controller-0.

    .. code-block:: none

-        ~(keystone_admin)$ system host-unlock controller-0
+        ~(keystone_admin)]$ system host-unlock controller-0

    After you unlock controller-0, storage nodes become available and Ceph
    becomes operational.
@ -165,37 +171,22 @@ conditions are in place:

        $ source /etc/platform/openrc

-#.  For Simplex systems only, if :command:`wipe_ceph_osds` is set to false, 
-    wait for the apps to transition from 'restore-requested' to the 'applied'
-    state.
+#.  Apps transition from 'restore-requested' to 'applying' state, and
+    from 'applying' state to 'applied' state.

-    If the apps are in 'apply-failed' state, ensure access to the docker
-    registry, and execute the following command for all custom applications
-    that need to be restored:
+    If apps are transitioned from 'applying' to 'restore-requested' state,
+    ensure there is network access and access to the docker registry.

-    .. code-block:: none
+    The process is repeated once per minute until all apps are transitioned to
+    'applied'.

-        ~(keystone_admin)$ system application-apply <application>
-
-    For example, execute the following to restore stx-openstack.
-
-    .. code-block:: none
-
-        ~(keystone_admin)$ system application-apply stx-openstack
-
-    .. note::
-        If you have a Simplex system, this is the last step in the process.
-
-        Wait for controller-0 to be in the unlocked, enabled, and available
-        state.
-
-#. If you have a Duplex system, restore the controller-1 host.
+#. If you have a Duplex system, restore the **controller-1** host.

   #.  List the current state of the hosts.

       .. code-block:: none

-            ~(keystone_admin)$ system host-list
+            ~(keystone_admin)]$ system host-list
            +----+-------------+------------+---------------+-----------+------------+
            | id | hostname    | personality| administrative|operational|availability|
            +----+-------------+------------+---------------+-----------+------------+
@ -220,7 +211,7 @@ conditions are in place:

       .. code-block:: none

-            ~(keystone_admin)$ system host-unlock controller-1
+            ~(keystone_admin)]$ system host-unlock controller-1
            +-----------------+--------------------------------------+
            | Property        | Value                                |
            +-----------------+--------------------------------------+
@ -235,7 +226,7 @@ conditions are in place:

       .. code-block:: none

-            ~(keystone_admin)$ system host-list
+            ~(keystone_admin)]$ system host-list
            +----+-------------+------------+---------------+-----------+------------+
            | id | hostname    | personality| administrative|operational|availability|
            +----+-------------+------------+---------------+-----------+------------+
@ -247,9 +238,9 @@ conditions are in place:
            | 6  | compute-1   | worker     | locked        |disabled   |offline     |
            +----+-------------+------------+---------------+-----------+------------+

-#. Restore storage configuration. If :command:`wipe_ceph_osds` is set to
-   **True**, follow the same procedure used to restore controller-1,
-   beginning with host storage-0 and proceeding in sequence.
+#. Restore storage configuration. If :command:`wipe\_ceph\_osds` is set to
+   **True**, follow the same procedure used to restore **controller-1**,
+   beginning with host **storage-0** and proceeding in sequence.

   .. note::
      This step should be performed ONLY if you are restoring storage hosts.
@ -261,12 +252,12 @@ conditions are in place:
       the restore procedure without interruption.

       Standard with Controller Storage install or reinstall depends on the
-       :command:`wipe_ceph_osds` configuration:
+       :command:`wipe\_ceph\_osds` configuration:

-       #.  If :command:`wipe_ceph_osds` is set to **true**, reinstall the
+       #.  If :command:`wipe\_ceph\_osds` is set to **true**, reinstall the
           storage hosts.

-       #.  If :command:`wipe_ceph_osds` is set to **false** \(default
+       #.  If :command:`wipe\_ceph\_osds` is set to **false** \(default
           option\), do not reinstall the storage hosts.

           .. caution::
@ -280,7 +271,7 @@ conditions are in place:

       .. code-block:: none

-            ~(keystone_admin)$ ceph -s
+            ~(keystone_admin)]$ ceph -s
            cluster:
                id:     3361e4ef-b0b3-4f94-97c6-b384f416768d
                health: HEALTH_OK
@ -316,41 +307,29 @@ conditions are in place:
   Restore the compute \(worker\) hosts following the same procedure used to
   restore controller-1.

-#. Unlock the compute hosts. The restore is complete.
+#. Allow Calico and Coredns pods to be recovered by Kubernetes. They should
+   all be in 'N/N Running' state.

   The state of the hosts when the restore operation is complete is as
   follows:

   .. code-block:: none

-        ~(keystone_admin)$ system host-list
-        +----+-------------+------------+---------------+-----------+------------+
-        | id | hostname    | personality| administrative|operational|availability|
-        +----+-------------+------------+---------------+-----------+------------+
-        | 1  | controller-0| controller | unlocked      |enabled    |available   |
-        | 2  | controller-1| controller | unlocked      |enabled    |available   |
-        | 3  | storage-0   | storage    | unlocked      |enabled    |available   |
-        | 4  | storage-1   | storage    | unlocked      |enabled    |available   |
-        | 5  | compute-0   | worker     | unlocked      |enabled    |available   |
-        | 6  | compute-1   | worker     | unlocked      |enabled    |available   |
-        +----+-------------+------------+---------------+-----------+------------+
+        ~(keystone_admin)]$ kubectl get pods -n kube-system | grep -e calico -e coredns 
+        calico-kube-controllers-5cd4695574-d7zwt  1/1     Running
+        calico-node-6km72                         1/1     Running
+        calico-node-c7xnd                         1/1     Running
+        coredns-6d64d47ff4-99nhq                  1/1     Running
+        coredns-6d64d47ff4-nhh95                  1/1     Running

-#. For Duplex systems only, if :command:`wipe_ceph_osds` is set to false, wait
-   for the apps to transition from 'restore-requested' to the 'applied' state.
-
-   If the apps are in 'apply-failed' state, ensure access to the docker
-   registry, and execute the following command for all custom applications
-   that need to be restored:
+#. Run the :command:`system restore-complete` command.

   .. code-block:: none

-       ~(keystone_admin)$ system application-apply <application>
+       ~(keystone_admin)]$ system restore-complete

-   For example, execute the following to restore stx-openstack.
-
-   .. code-block:: none
-
-      ~(keystone_admin)$ system application-apply stx-openstack
+#. Alarms 750.006 alarms disappear one at a time, as the apps are auto
+applied.

 .. rubric:: |postreq|

@ -359,14 +338,14 @@ conditions are in place:
 -   Passwords for local user accounts must be restored manually since they
    are not included as part of the backup and restore procedures.

-   After restoring a Distributed Cloud subcloud, you need to bring it back
+-   After restoring a |prod-dc| subcloud, you need to bring it back
    to the **managed** state on the Central Cloud \(SystemController\), by
    using the following commands:

    .. code-block:: none

        $ source /etc/platform/openrc
-        ~(keystone_admin)$ dcmanager subcloud manage <subcloud-name>
+        ~(keystone_admin)]$ dcmanager subcloud manage <subcloud-name>

    where <subcloud-name> is the name of the subcloud to be managed.

--- a/doc/source/backup/running-ansible-backup-playbook-locally-on-the-controller.rst
+++ b/doc/source/backup/running-ansible-backup-playbook-locally-on-the-controller.rst
@ -9,16 +9,15 @@ Run Ansible Backup Playbook Locally on the Controller
 In this method the Ansible Backup playbook is run on the active controller.

 Use the following command to run the Ansible Backup playbook and back up the
-|prod| configuration, data, and optionally the user container images in
-registry.local data:
+|prod| configuration, data, and user container images in registry.local data:

 .. code-block:: none

-    ~(keystone_admin)$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/backup.yml -e "ansible_become_pass=<sysadmin password> admin_password=<sysadmin password>" [ -e "backup_user_local_registry=true" ]
+    ~(keystone_admin)]$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/backup.yml -e "ansible_become_pass=<sysadmin password> admin_password=<sysadmin password>" -e "backup_user_local_registry=true" 

-The <admin\_password\> and <ansible\_become\_pass\> need to be set  correctly
-using the ``-e`` option on the command line, or an override file, or in the Ansible
-secret file.
+The <admin\_password> and <ansible\_become\_pass\> need to be set  correctly
+using the ``-e`` option on the command line, or an override file, or in the
+Ansible secret file.

 The output files will be named:

--- a/doc/source/backup/running-ansible-backup-playbook-remotely.rst
+++ b/doc/source/backup/running-ansible-backup-playbook-remotely.rst
@ -42,16 +42,35 @@ and target it at controller-0.
            |prefix|\_Cluster:
              ansible_host: 128.224.141.74

+#.  Create an ansible secrets file.
+
+    .. code-block:: none
+
+        ~(keystone_admin)]$ cat <<EOF > secrets.yml
+        vault_password_change_responses:
+            yes/no: 'yes'
+            sysadmin*: 'sysadmin'
+            (current) UNIX password: 'sysadmin'
+            New password: 'Li69nux*'
+            Retype new password: 'Li69nux*'
+        admin_password: Li69nux*
+        ansible_become_pass: Li69nux*
+        ansible_ssh_pass: Li69nux*
+        EOF
+
 #.  Run Ansible Backup playbook:

    .. code-block:: none

-        ~(keystone_admin)$ ansible-playbook <path-to-backup-playbook-entry-file> --limit host-name -i <inventory-file> -e <optional-extra-vars>
+        ~(keystone_admin)]$ ansible-playbook <path-to-backup-playbook-entry-file> --limit host-name -i <inventory-file> -e "backup_user_local_registry=true"

    The generated backup tar file can be found in <host\_backup\_dir>, that
-    is, /home/sysadmin, by default. You can overwrite it using the ``-e``
+    is, /home/sysadmin, by default. You can overwrite it using the **-e**
    option on the command line or in an override file.

    .. warning::
-        If a backup of the **local registry images** file is created, the
-        file is not copied from the remote machine to the local machine.
+        If a backup of the **local registry images** file is created, the file
+        is not copied from the remote machine to the local machine. The
+        inventory\_hostname\_docker\_local\_registry\_backup\_timestamp.tgz
+        file needs to copied off the host machine to be used if a restore is
+        needed.
--- a/doc/source/backup/running-restore-playbook-locally-on-the-controller.rst
+++ b/doc/source/backup/running-restore-playbook-locally-on-the-controller.rst
@ -16,7 +16,7 @@ following command to run the Ansible Restore playbook:

 .. code-block:: none

-    ~(keystone_admin)$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_platform.yml -e "initial_backup_dir=<location_of_tarball ansible_become_pass=<admin_password> admin_password=<admin_password backup_filename=<backup_filename> wipe_ceph_osds=<true/false>"
+    ~(keystone_admin)]$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_platform.yml -e "initial_backup_dir=<location_of_tarball ansible_become_pass=<admin_password> admin_password=<admin_password backup_filename=<backup_filename> wipe_ceph_osds=<true/false>"

 The |prod| restore supports two optional modes, keeping the Ceph cluster data
 intact or wiping the Ceph cluster.
@ -43,7 +43,7 @@ intact or wiping the Ceph cluster.

    .. code-block:: none

-        ~(keystone_admin)$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_platform.yml -e "initial_backup_dir=/home/sysadmin ansible_become_pass=St0rlingX* admin_password=St0rlingX* backup_filename=localhost_platform_backup_2020_07_27_07_48_48.tgz wipe_ceph_osds=true"
+        ~(keystone_admin)]$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_platform.yml -e "initial_backup_dir=/home/sysadmin ansible_become_pass=St0rlingX* admin_password=St0rlingX* backup_filename=localhost_platform_backup_2020_07_27_07_48_48.tgz wipe_ceph_osds=true"

    .. note::
        If the backup contains patches, Ansible Restore playbook will apply
@ -63,4 +63,4 @@ For example:

 .. code-block:: none

-    ~(keystone_admin)$  ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_user_images.yml -e "initial_backup_dir=/home/sysadmin backup_filename=localhost_docker_local_registry_backup_2020_07_15_21_24_22.tgz ansible_become_pass=St0rlingX*"
+    ~(keystone_admin)]$  ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_user_images.yml -e "initial_backup_dir=/home/sysadmin backup_filename=localhost_docker_local_registry_backup_2020_07_15_21_24_22.tgz ansible_become_pass=St0rlingX*"
--- a/doc/source/backup/system-backup-running-ansible-restore-playbook-remotely.rst
+++ b/doc/source/backup/system-backup-running-ansible-restore-playbook-remotely.rst
@ -47,7 +47,7 @@ In this method you can run Ansible Restore playbook and point to controller-0.

    .. code-block:: none

-        ~(keystone_admin)$ ansible-playbook path-to-restore-platform-playbook-entry-file --limit host-name -i inventory-file -e optional-extra-vars
+        ~(keystone_admin)]$ ansible-playbook path-to-restore-platform-playbook-entry-file --limit host-name -i inventory-file -e optional-extra-vars

    where optional-extra-vars can be:

@ -89,7 +89,7 @@ In this method you can run Ansible Restore playbook and point to controller-0.

    .. parsed-literal::

-        ~(keystone_admin)$ ansible-playbook /localdisk/designer/jenkins/tis-stx-dev/cgcs-root/stx/ansible-playbooks/playbookconfig/src/playbooks/restore_platform.yml --limit |prefix|\_Cluster -i $HOME/br_test/hosts -e "ansible_become_pass=St0rlingX* admin_password=St0rlingX* ansible_ssh_pass=St0rlingX* initial_backup_dir=$HOME/br_test backup_filename= |prefix|\_Cluster_system_backup_2019_08_08_15_25_36.tgz ansible_remote_tmp=/home/sysadmin/ansible-restore"
+        ~(keystone_admin)]$ ansible-playbook /localdisk/designer/jenkins/tis-stx-dev/cgcs-root/stx/ansible-playbooks/playbookconfig/src/playbooks/restore_platform.yml --limit |prefix|\_Cluster -i $HOME/br_test/hosts -e "ansible_become_pass=St0rlingX* admin_password=St0rlingX* ansible_ssh_pass=St0rlingX* initial_backup_dir=$HOME/br_test backup_filename= |prefix|\_Cluster_system_backup_2019_08_08_15_25_36.tgz ansible_remote_tmp=/home/sysadmin/ansible-restore"

    .. note::
        If the backup contains patches, Ansible Restore playbook will apply
@ -105,7 +105,7 @@ In this method you can run Ansible Restore playbook and point to controller-0.

    .. code-block:: none

-        ~(keystone_admin)$ ansible-playbook path-to-restore-user-images-playbook-entry-file --limit host-name -i inventory-file -e optional-extra-vars
+        ~(keystone_admin)]$ ansible-playbook path-to-restore-user-images-playbook-entry-file --limit host-name -i inventory-file -e optional-extra-vars

    where optional-extra-vars can be:

@ -144,4 +144,4 @@ In this method you can run Ansible Restore playbook and point to controller-0.

    .. parsed-literal::

-        ~(keystone_admin)$ ansible-playbook /localdisk/designer/jenkins/tis-stx-dev/cgcs-root/stx/ansible-playbooks/playbookconfig/src/playbooks/restore_user_images.ym --limit |prefix|\_Cluster -i $HOME/br_test/hosts -e "ansible_become_pass=St0rlingX* ansible_ssh_pass=St0rlingX* initial_backup_dir=$HOME/br_test backup_filename= |prefix|\_Cluster_docker_local_registry_backup_2020_07_15_21_24_22.tgz ansible_remote_tmp=/sufficient/space backup_dir=/sufficient/space"
+        ~(keystone_admin)]$ ansible-playbook /localdisk/designer/jenkins/tis-stx-dev/cgcs-root/stx/ansible-playbooks/playbookconfig/src/playbooks/restore_user_images.ym --limit |prefix|\_Cluster -i $HOME/br_test/hosts -e "ansible_become_pass=St0rlingX* ansible_ssh_pass=St0rlingX* initial_backup_dir=$HOME/br_test backup_filename= |prefix|\_Cluster_docker_local_registry_backup_2020_07_15_21_24_22.tgz ansible_remote_tmp=/sufficient/space backup_dir=/sufficient/space"