docs/doc/source/backup/kubernetes/backing-up-starlingx-system-data.rst
Elisamara Aoki Goncalves 6c138a86cf Document the best practices, considerations, and recommendation for backups (r8,dsR8)
Update Back Up System Data section.
Applied editorial fixes.

Change-Id: I72dc57a185ef40f9ca98ffa5fbd841d3ecdffa49
Signed-off-by: Elisamara Aoki Goncalves <elisamaraaoki.goncalves@windriver.com>
2023-08-14 13:49:48 +00:00

7.2 KiB

Back Up System Data

A system data backup of system captures core system information needed to restore a fully operational cluster.

Contents of System Backup

The following content is included in the backup:

  • All platform configuration data required to fully restore the system to a working state following the platform restore procedure.
    • Platform and Kubernetes databases.
    • Platform configuration files.
    • Platform certificates and keys.
  • Home directory for the sysadmin user and all user accounts.
  • End-user container images in registry.local; that is, any images other than system and application images. system and application images are re-pulled from their original source, and (optional) external registries during the restore procedure.
  • Distributed Cloud Vault (Central System Controller only).

The following content is excluded from the backup:

  • Application data on Ceph clusters.
  • Modifications manually made to the file systems, such as configuration changes on the /etc directory. After a restore operation has been completed, these modifications must be reapplied.
  • Home directories and passwords of local user accounts. They must be backed up manually by the sysadmin.
  • The /root directory. Use the sysadmin account instead when root access is needed.

Note

Ceph data may be retained when restoring to the same servers and cluster.

System Backup Size

Consider the following for backup size:

  • The base size of a platform system backup sizes range from 10MB to 30MB, depending on the size of the system and deployment. systems are typically 20MB or less.
  • Backup of user home directories can cause the backup archive to be very large and is limited to 2GB or less.
  • Total backup size should be below 100MB when using centralized backup and restore operations.
  • Container images are large and will only be backed up locally to avoid large image archives being transferred for each system. Container images that are not present on the system may be pulled as part of platform and application deployment, or restored separately to the local registry (registry.local).
  • There can also be a significant size impact when patching is included in the backup.

System Backup Filesystem Usage

The following filesystems are used during the backup operations of the system for both local and centralized backup.

Staging Storage

The host filesystem used to stage temporary files during backup operations. The filesystem may also be used to store final backup images if the filesystem is sufficiently sized to store the backup archives.

Host filesystem name: backup

System path: /opt/backups

Default size: 25GB

For more information on how to modify the host filesystem sizes see Resize Filesystems on a Host <resizing-filesystems-on-a-host>.

Local Storage

The host filesystem used to store backup files in a protected partition which does not get wiped during system reinstallation. The protected local backup partition is typically used by systems where there is no redundant filesystem storage and is the default for local backups.

Note

The filesystem is shared with system release pre-staging and needs to be sized for both pre-staging installation media and backup archives.

System Path: /opt/platform-backup/backups

Default Size: 30GB

Centralized Storage

The Distributed Cloud (DC) Vault filesystem is used to store backup archives when using centralized backup and restore. The filesystem size must be increased to accommodate subcloud backup archive storage. A separate backup archive is stored per subcloud and release, and therefore, must be sized to accommodate all backups.

System path: /opt/dc-vault/backups/<subcloud-name>/<release-version>

Default size: 15GB

Note

The filesystem is shared for subcloud deployment and management and must be sized to store subcloud deployment files (subcloud configuration, ISO images and subcloud staging files).

For more information on how to modify the controller filesystem sizes see Storage on Controller Hosts <controller-hosts-storage-on-controller-hosts>.

Distributed Cloud Centralized Backups

A subcloud's system data and optionally container images (from registry.local) can be backed up using DCManager CLI command line interface. The subcloud's system backup data can either be stored locally on the subcloud or on the System Controller.. The subcloud's container image backup (from registry.local) can only be stored locally on the subcloud to avoid overloading the central storage and the network with large amount of data transfer and redundant storage of images in a central location.

image

For more information on the operation of the centralized backup capability see Backup a Subcloud/Group of Subclouds using DCManager CLI <backup-a-subcloud-group-of-subclouds-using-dcmanager-cli-f12020a8fc42>.

For more information on DCManager - Subcloud Backup API see Subcloud Backups.

Execution Time for System Backups

  • The time to execute system backups is approximately 3-4 minutes for an idle system.
  • Centralized backups may require additional time for network transfer for larger backups.
  • Subcloud backups may be initiated and monitored from the DCManager or API, including parallel backups.
  • A minor alarm (210.001) "System Backup in progress" is raised while backing up an individual system.
  • Systems with at least 4 platform cores will have much faster execution times.
  • All backups should be performed remotely and stored off the system.
  • All backups are done during off-peak hours (i.e. maintenance window).
    • Weekly backups should be performed under normal steady state conditions to ensure the system can be restored to a fully operational state.
    • Nightly backups are the exception and should only be performed in periods of significant reconfiguration to the system such as during large/mass rollout (addition of subclouds), upgrade cycle of multiple sites, or disaster recovery rehoming of subclouds.
  • Backups should be performed prior to performing maintenance operations or applying configuration changes to the platform or hosted applications.
  • The retention period of backups should be approximately one month.
    • Since Kubernetes is an intent-based system, the most recent backup is the most important.

Run Ansible Backup Playbook Locally on the Controller <running-ansible-backup-playbook-locally-on-the-controller>

Run Ansible Backup Playbook Remotely <running-ansible-backup-playbook-remotely>