diff --git a/doc/source/backup/backing-up-starlingx-system-data.rst b/doc/source/backup/backing-up-starlingx-system-data.rst index 8801c1e9d..8b84c7c22 100644 --- a/doc/source/backup/backing-up-starlingx-system-data.rst +++ b/doc/source/backup/backing-up-starlingx-system-data.rst @@ -101,7 +101,7 @@ The backup contains details as listed below: - item=/opt/extension - - dc-vault filesystem for Distributed Cloud system-controller: + - dc-vault filesystem for |prod-dc| system-controller: - item=/opt/dc-vault diff --git a/doc/source/backup/restoring-starlingx-system-data-and-storage.rst b/doc/source/backup/restoring-starlingx-system-data-and-storage.rst index 8bef3923d..da8ab0d42 100644 --- a/doc/source/backup/restoring-starlingx-system-data-and-storage.rst +++ b/doc/source/backup/restoring-starlingx-system-data-and-storage.rst @@ -83,14 +83,14 @@ conditions are in place: network when powered on. If this is not the case, you must configure each host manually for network boot immediately after powering it on. -- If you are restoring a Distributed Cloud subcloud first, ensure it is in +- If you are restoring a |prod-dc| subcloud first, ensure it is in an **unmanaged** state on the Central Cloud \(SystemController\) by using the following commands: .. code-block:: none $ source /etc/platform/openrc - ~(keystone_admin)$ dcmanager subcloud unmanage + ~(keystone_admin)]$ dcmanager subcloud unmanage where is the name of the subcloud to be unmanaged. @@ -117,17 +117,20 @@ conditions are in place: #. Install network connectivity required for the subcloud. -#. Ensure the backup file is available on the controller. Run the Ansible - Restore playbook. For more information on restoring the back up file, see - :ref:`Run Restore Playbook Locally on the Controller +#. Ensure that the backup file are available on the controller. Run both + Ansible Restore playbooks, restore\_platform.yml and restore\_user\_images.yml. + For more information on restoring the back up file, see :ref:`Run Restore + Playbook Locally on the Controller `, and :ref:`Run Ansible Restore Playbook Remotely `. .. note:: - The backup file contains the system data and updates. + The backup files contains the system data and updates. -#. Update the controller's software to the previous updating level. +#. If the backup file contains patches, Ansible Restore playbook + restore\_platform.yml will apply the patches and prompt you to reboot the + system, you will need to re-run Ansible Restore playbook The current software version on the controller is compared against the version available in the backup file. If the backed-up version includes @@ -146,13 +149,16 @@ conditions are in place: LIBCUNIT_CONTROLLER_ONLY Applied 20.06 n/a STORAGECONFIG Applied 20.06 n/a - Rerun the Ansible Restore Playbook. + Rerun the Ansible Playbook if there were patches applied and you were + prompted to reboot the system. -#. Unlock Controller-0. +#. Restore the local registry using the file restore\_user\_images.yml. + + This must be done before unlocking controller-0. .. code-block:: none - ~(keystone_admin)$ system host-unlock controller-0 + ~(keystone_admin)]$ system host-unlock controller-0 After you unlock controller-0, storage nodes become available and Ceph becomes operational. @@ -165,37 +171,22 @@ conditions are in place: $ source /etc/platform/openrc -#. For Simplex systems only, if :command:`wipe_ceph_osds` is set to false, - wait for the apps to transition from 'restore-requested' to the 'applied' - state. +#. Apps transition from 'restore-requested' to 'applying' state, and + from 'applying' state to 'applied' state. - If the apps are in 'apply-failed' state, ensure access to the docker - registry, and execute the following command for all custom applications - that need to be restored: + If apps are transitioned from 'applying' to 'restore-requested' state, + ensure there is network access and access to the docker registry. - .. code-block:: none + The process is repeated once per minute until all apps are transitioned to + 'applied'. - ~(keystone_admin)$ system application-apply - - For example, execute the following to restore stx-openstack. - - .. code-block:: none - - ~(keystone_admin)$ system application-apply stx-openstack - - .. note:: - If you have a Simplex system, this is the last step in the process. - - Wait for controller-0 to be in the unlocked, enabled, and available - state. - -#. If you have a Duplex system, restore the controller-1 host. +#. If you have a Duplex system, restore the **controller-1** host. #. List the current state of the hosts. .. code-block:: none - ~(keystone_admin)$ system host-list + ~(keystone_admin)]$ system host-list +----+-------------+------------+---------------+-----------+------------+ | id | hostname | personality| administrative|operational|availability| +----+-------------+------------+---------------+-----------+------------+ @@ -220,7 +211,7 @@ conditions are in place: .. code-block:: none - ~(keystone_admin)$ system host-unlock controller-1 + ~(keystone_admin)]$ system host-unlock controller-1 +-----------------+--------------------------------------+ | Property | Value | +-----------------+--------------------------------------+ @@ -235,7 +226,7 @@ conditions are in place: .. code-block:: none - ~(keystone_admin)$ system host-list + ~(keystone_admin)]$ system host-list +----+-------------+------------+---------------+-----------+------------+ | id | hostname | personality| administrative|operational|availability| +----+-------------+------------+---------------+-----------+------------+ @@ -247,9 +238,9 @@ conditions are in place: | 6 | compute-1 | worker | locked |disabled |offline | +----+-------------+------------+---------------+-----------+------------+ -#. Restore storage configuration. If :command:`wipe_ceph_osds` is set to - **True**, follow the same procedure used to restore controller-1, - beginning with host storage-0 and proceeding in sequence. +#. Restore storage configuration. If :command:`wipe\_ceph\_osds` is set to + **True**, follow the same procedure used to restore **controller-1**, + beginning with host **storage-0** and proceeding in sequence. .. note:: This step should be performed ONLY if you are restoring storage hosts. @@ -261,12 +252,12 @@ conditions are in place: the restore procedure without interruption. Standard with Controller Storage install or reinstall depends on the - :command:`wipe_ceph_osds` configuration: + :command:`wipe\_ceph\_osds` configuration: - #. If :command:`wipe_ceph_osds` is set to **true**, reinstall the + #. If :command:`wipe\_ceph\_osds` is set to **true**, reinstall the storage hosts. - #. If :command:`wipe_ceph_osds` is set to **false** \(default + #. If :command:`wipe\_ceph\_osds` is set to **false** \(default option\), do not reinstall the storage hosts. .. caution:: @@ -280,7 +271,7 @@ conditions are in place: .. code-block:: none - ~(keystone_admin)$ ceph -s + ~(keystone_admin)]$ ceph -s cluster: id: 3361e4ef-b0b3-4f94-97c6-b384f416768d health: HEALTH_OK @@ -316,41 +307,29 @@ conditions are in place: Restore the compute \(worker\) hosts following the same procedure used to restore controller-1. -#. Unlock the compute hosts. The restore is complete. +#. Allow Calico and Coredns pods to be recovered by Kubernetes. They should + all be in 'N/N Running' state. The state of the hosts when the restore operation is complete is as follows: .. code-block:: none - ~(keystone_admin)$ system host-list - +----+-------------+------------+---------------+-----------+------------+ - | id | hostname | personality| administrative|operational|availability| - +----+-------------+------------+---------------+-----------+------------+ - | 1 | controller-0| controller | unlocked |enabled |available | - | 2 | controller-1| controller | unlocked |enabled |available | - | 3 | storage-0 | storage | unlocked |enabled |available | - | 4 | storage-1 | storage | unlocked |enabled |available | - | 5 | compute-0 | worker | unlocked |enabled |available | - | 6 | compute-1 | worker | unlocked |enabled |available | - +----+-------------+------------+---------------+-----------+------------+ + ~(keystone_admin)]$ kubectl get pods -n kube-system | grep -e calico -e coredns + calico-kube-controllers-5cd4695574-d7zwt 1/1 Running + calico-node-6km72 1/1 Running + calico-node-c7xnd 1/1 Running + coredns-6d64d47ff4-99nhq 1/1 Running + coredns-6d64d47ff4-nhh95 1/1 Running -#. For Duplex systems only, if :command:`wipe_ceph_osds` is set to false, wait - for the apps to transition from 'restore-requested' to the 'applied' state. - - If the apps are in 'apply-failed' state, ensure access to the docker - registry, and execute the following command for all custom applications - that need to be restored: +#. Run the :command:`system restore-complete` command. .. code-block:: none - ~(keystone_admin)$ system application-apply + ~(keystone_admin)]$ system restore-complete - For example, execute the following to restore stx-openstack. - - .. code-block:: none - - ~(keystone_admin)$ system application-apply stx-openstack +#. Alarms 750.006 alarms disappear one at a time, as the apps are auto +applied. .. rubric:: |postreq| @@ -359,14 +338,14 @@ conditions are in place: - Passwords for local user accounts must be restored manually since they are not included as part of the backup and restore procedures. -- After restoring a Distributed Cloud subcloud, you need to bring it back +- After restoring a |prod-dc| subcloud, you need to bring it back to the **managed** state on the Central Cloud \(SystemController\), by using the following commands: .. code-block:: none $ source /etc/platform/openrc - ~(keystone_admin)$ dcmanager subcloud manage + ~(keystone_admin)]$ dcmanager subcloud manage where is the name of the subcloud to be managed. diff --git a/doc/source/backup/running-ansible-backup-playbook-locally-on-the-controller.rst b/doc/source/backup/running-ansible-backup-playbook-locally-on-the-controller.rst index d8378c58a..7b2d65729 100644 --- a/doc/source/backup/running-ansible-backup-playbook-locally-on-the-controller.rst +++ b/doc/source/backup/running-ansible-backup-playbook-locally-on-the-controller.rst @@ -9,16 +9,15 @@ Run Ansible Backup Playbook Locally on the Controller In this method the Ansible Backup playbook is run on the active controller. Use the following command to run the Ansible Backup playbook and back up the -|prod| configuration, data, and optionally the user container images in -registry.local data: +|prod| configuration, data, and user container images in registry.local data: .. code-block:: none - ~(keystone_admin)$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/backup.yml -e "ansible_become_pass= admin_password=" [ -e "backup_user_local_registry=true" ] + ~(keystone_admin)]$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/backup.yml -e "ansible_become_pass= admin_password=" -e "backup_user_local_registry=true" -The and need to be set correctly -using the ``-e`` option on the command line, or an override file, or in the Ansible -secret file. +The and need to be set correctly +using the ``-e`` option on the command line, or an override file, or in the +Ansible secret file. The output files will be named: diff --git a/doc/source/backup/running-ansible-backup-playbook-remotely.rst b/doc/source/backup/running-ansible-backup-playbook-remotely.rst index a0d5c619d..b243aefb9 100644 --- a/doc/source/backup/running-ansible-backup-playbook-remotely.rst +++ b/doc/source/backup/running-ansible-backup-playbook-remotely.rst @@ -42,16 +42,35 @@ and target it at controller-0. |prefix|\_Cluster: ansible_host: 128.224.141.74 +#. Create an ansible secrets file. + + .. code-block:: none + + ~(keystone_admin)]$ cat < secrets.yml + vault_password_change_responses: + yes/no: 'yes' + sysadmin*: 'sysadmin' + (current) UNIX password: 'sysadmin' + New password: 'Li69nux*' + Retype new password: 'Li69nux*' + admin_password: Li69nux* + ansible_become_pass: Li69nux* + ansible_ssh_pass: Li69nux* + EOF + #. Run Ansible Backup playbook: .. code-block:: none - ~(keystone_admin)$ ansible-playbook --limit host-name -i -e + ~(keystone_admin)]$ ansible-playbook --limit host-name -i -e "backup_user_local_registry=true" The generated backup tar file can be found in , that - is, /home/sysadmin, by default. You can overwrite it using the ``-e`` + is, /home/sysadmin, by default. You can overwrite it using the **-e** option on the command line or in an override file. .. warning:: - If a backup of the **local registry images** file is created, the - file is not copied from the remote machine to the local machine. + If a backup of the **local registry images** file is created, the file + is not copied from the remote machine to the local machine. The + inventory\_hostname\_docker\_local\_registry\_backup\_timestamp.tgz + file needs to copied off the host machine to be used if a restore is + needed. diff --git a/doc/source/backup/running-restore-playbook-locally-on-the-controller.rst b/doc/source/backup/running-restore-playbook-locally-on-the-controller.rst index aaa246f8f..8834c3f3a 100644 --- a/doc/source/backup/running-restore-playbook-locally-on-the-controller.rst +++ b/doc/source/backup/running-restore-playbook-locally-on-the-controller.rst @@ -16,7 +16,7 @@ following command to run the Ansible Restore playbook: .. code-block:: none - ~(keystone_admin)$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_platform.yml -e "initial_backup_dir= admin_password= wipe_ceph_osds=" + ~(keystone_admin)]$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_platform.yml -e "initial_backup_dir= admin_password= wipe_ceph_osds=" The |prod| restore supports two optional modes, keeping the Ceph cluster data intact or wiping the Ceph cluster. @@ -43,7 +43,7 @@ intact or wiping the Ceph cluster. .. code-block:: none - ~(keystone_admin)$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_platform.yml -e "initial_backup_dir=/home/sysadmin ansible_become_pass=St0rlingX* admin_password=St0rlingX* backup_filename=localhost_platform_backup_2020_07_27_07_48_48.tgz wipe_ceph_osds=true" + ~(keystone_admin)]$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_platform.yml -e "initial_backup_dir=/home/sysadmin ansible_become_pass=St0rlingX* admin_password=St0rlingX* backup_filename=localhost_platform_backup_2020_07_27_07_48_48.tgz wipe_ceph_osds=true" .. note:: If the backup contains patches, Ansible Restore playbook will apply @@ -63,4 +63,4 @@ For example: .. code-block:: none - ~(keystone_admin)$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_user_images.yml -e "initial_backup_dir=/home/sysadmin backup_filename=localhost_docker_local_registry_backup_2020_07_15_21_24_22.tgz ansible_become_pass=St0rlingX*" + ~(keystone_admin)]$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_user_images.yml -e "initial_backup_dir=/home/sysadmin backup_filename=localhost_docker_local_registry_backup_2020_07_15_21_24_22.tgz ansible_become_pass=St0rlingX*" diff --git a/doc/source/backup/system-backup-running-ansible-restore-playbook-remotely.rst b/doc/source/backup/system-backup-running-ansible-restore-playbook-remotely.rst index 9c3762b73..ca19932e6 100644 --- a/doc/source/backup/system-backup-running-ansible-restore-playbook-remotely.rst +++ b/doc/source/backup/system-backup-running-ansible-restore-playbook-remotely.rst @@ -47,7 +47,7 @@ In this method you can run Ansible Restore playbook and point to controller-0. .. code-block:: none - ~(keystone_admin)$ ansible-playbook path-to-restore-platform-playbook-entry-file --limit host-name -i inventory-file -e optional-extra-vars + ~(keystone_admin)]$ ansible-playbook path-to-restore-platform-playbook-entry-file --limit host-name -i inventory-file -e optional-extra-vars where optional-extra-vars can be: @@ -89,7 +89,7 @@ In this method you can run Ansible Restore playbook and point to controller-0. .. parsed-literal:: - ~(keystone_admin)$ ansible-playbook /localdisk/designer/jenkins/tis-stx-dev/cgcs-root/stx/ansible-playbooks/playbookconfig/src/playbooks/restore_platform.yml --limit |prefix|\_Cluster -i $HOME/br_test/hosts -e "ansible_become_pass=St0rlingX* admin_password=St0rlingX* ansible_ssh_pass=St0rlingX* initial_backup_dir=$HOME/br_test backup_filename= |prefix|\_Cluster_system_backup_2019_08_08_15_25_36.tgz ansible_remote_tmp=/home/sysadmin/ansible-restore" + ~(keystone_admin)]$ ansible-playbook /localdisk/designer/jenkins/tis-stx-dev/cgcs-root/stx/ansible-playbooks/playbookconfig/src/playbooks/restore_platform.yml --limit |prefix|\_Cluster -i $HOME/br_test/hosts -e "ansible_become_pass=St0rlingX* admin_password=St0rlingX* ansible_ssh_pass=St0rlingX* initial_backup_dir=$HOME/br_test backup_filename= |prefix|\_Cluster_system_backup_2019_08_08_15_25_36.tgz ansible_remote_tmp=/home/sysadmin/ansible-restore" .. note:: If the backup contains patches, Ansible Restore playbook will apply @@ -105,7 +105,7 @@ In this method you can run Ansible Restore playbook and point to controller-0. .. code-block:: none - ~(keystone_admin)$ ansible-playbook path-to-restore-user-images-playbook-entry-file --limit host-name -i inventory-file -e optional-extra-vars + ~(keystone_admin)]$ ansible-playbook path-to-restore-user-images-playbook-entry-file --limit host-name -i inventory-file -e optional-extra-vars where optional-extra-vars can be: @@ -144,4 +144,4 @@ In this method you can run Ansible Restore playbook and point to controller-0. .. parsed-literal:: - ~(keystone_admin)$ ansible-playbook /localdisk/designer/jenkins/tis-stx-dev/cgcs-root/stx/ansible-playbooks/playbookconfig/src/playbooks/restore_user_images.ym --limit |prefix|\_Cluster -i $HOME/br_test/hosts -e "ansible_become_pass=St0rlingX* ansible_ssh_pass=St0rlingX* initial_backup_dir=$HOME/br_test backup_filename= |prefix|\_Cluster_docker_local_registry_backup_2020_07_15_21_24_22.tgz ansible_remote_tmp=/sufficient/space backup_dir=/sufficient/space" + ~(keystone_admin)]$ ansible-playbook /localdisk/designer/jenkins/tis-stx-dev/cgcs-root/stx/ansible-playbooks/playbookconfig/src/playbooks/restore_user_images.ym --limit |prefix|\_Cluster -i $HOME/br_test/hosts -e "ansible_become_pass=St0rlingX* ansible_ssh_pass=St0rlingX* initial_backup_dir=$HOME/br_test backup_filename= |prefix|\_Cluster_docker_local_registry_backup_2020_07_15_21_24_22.tgz ansible_remote_tmp=/sufficient/space backup_dir=/sufficient/space"