operations-guide/doc/openstack-ops/ch_ops_backup_recovery.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE chapter [
<!-- Some useful entities borrowed from HTML -->
<!ENTITY ndash  "&#x2013;">
<!ENTITY mdash  "&#x2014;">
<!ENTITY hellip "&#x2026;">
<!ENTITY plusmn "&#xB1;">
]>
<chapter xmlns="http://docbook.org/ns/docbook"
    xmlns:xi="http://www.w3.org/2001/XInclude"
    xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0"
    xml:id="backup_and_recovery">
    <?dbhtml stop-chunking?>
    <title>Backup and Recovery</title>
    <para>Standard backup best practices apply when creating your
        OpenStack backup policy. For example, how often to backup your
        data is closely related to how quickly you need to recover
        from data loss.</para>
        <note>
        <para>If you cannot have any data loss at all, you should also
            focus on a highly available deployment. The
                    <citetitle><link
                    xlink:href="http://docs.openstack.org/high-availability-guide/content/"
                    >OpenStack High Availability Guide offers
                    suggestions for elimination of a single point of
                    failure that could cause system downtime. While it
                    is not a completely prescriptive document, it
                    offers methods and techniques for avoiding
                    downtime and data loss.</link></citetitle>
        </para></note>
    <para>Other backup considerations include:</para>
    <itemizedlist>
        <listitem>
            <para>How many backups to keep?</para>
        </listitem>
        <listitem>
            <para>Should backups be kept off-site?</para>
        </listitem>
        <listitem>
            <para>How often should backups be tested?</para>
        </listitem>
    </itemizedlist>
    <para>Just as important as a backup policy is a recovery policy
        (or at least recovery testing).</para>


    <section xml:id="what_to_backup">
        <title>What to Backup</title>
        <para>While OpenStack is composed of many components and
            moving parts, backing up the critical data is quite
            simple.</para>
        <para>This chapter describes only how to back up configuration
            files and databases that the various OpenStack components
            need to run. This chapter does not describe how to back up
            objects inside Object Storage or data contained inside
            Block Storage. Generally these areas are left for the user
            to back up on their own.</para>
    </section>
    <section xml:id="database_backups">
        <title>Database Backups</title>
        <para>The example OpenStack architecture designates the Cloud
            Controller as the MySQL server. This MySQL server hosts
            the databases for Nova, Glance, Cinder, and Keystone. With
            all of these databases in one place, it's very easy to
            create a database backup:</para>

        <programlisting language="bash"><?db-font-size 75%?><prompt>#</prompt> mysqldump --opt --all-databases &gt;
                openstack.sql</programlisting>

        <para>If you only want to backup a single database, you can
            instead run:</para>
        <programlisting language="bash"><?db-font-size 75%?><prompt>#</prompt> mysqldump --opt nova &gt; nova.sql</programlisting>
        <para>where <code>nova</code> is the database you want to back
            up.</para>
        <para>You can easily automate this process by creating a cron
            job that runs the following script once per day:</para>
        <programlisting language="bash"><?db-font-size 65%?><prompt>#</prompt>!/bin/bash
backup_dir="/var/lib/backups/mysql"
filename="${backup_dir}/mysql-`hostname`-`eval date +%Y%m%d`.sql.gz"
# Dump the entire MySQL database
/usr/bin/mysqldump --opt --all-databases | gzip &gt; $filename
# Delete backups older than 7 days
find $backup_dir -ctime +7 -type f -delete</programlisting>
        <para>This script dumps the entire MySQL database and delete
            any backups older than 7 days.</para>
    </section>
    <section xml:id="file_system_backups">
        <title>File System Backups</title>
        <para>This section discusses which files and directories should be backed up regularly, organized by service.</para>
        <section xml:id="compute">
            <title>Compute</title>
            <para>The <code>/etc/nova</code> directory on both the
                cloud controller and compute nodes should be regularly
                backed up.</para>
            <para>
                <code>/var/log/nova</code> does not need backed up if
                you have all logs going to a central area. It is
                highly recommended to use a central logging server or
                backup the log directory.</para>
            <para>
                <code>/var/lib/nova</code> is another important
                directory to backup. The exception to this is the
                    <code>/var/lib/nova/instances</code> subdirectory
                on compute nodes. This subdirectory contains the KVM
                images of running instances. You would only want to
                back up this directory if you need to maintain backup
                copies of all instances. Under most circumstances, you
                do not need to do this, but this can vary from cloud
                to cloud and your service levels. Also be aware that
                making a backup of a live KVM instance can cause that
                instance to not boot properly if it is ever restored
                from a backup.</para>
        </section>
        <section xml:id="image_catalog_delivery">
            <title>Image Catalog and Delivery</title>
            <para>
                <code>/etc/glance</code> and
                    <code>/var/log/glance</code> follow the same rules
                at the nova counterparts.</para>
            <para>
                <code>/var/lib/glance</code> should also be backed up.
                Take special notice of
                    <code>/var/lib/glance/images</code>. If you are
                using a file-based back-end of Glance,
                    <code>/var/lib/glance/images</code> is where the
                images are stored and care should be taken.</para>
            <para>There are two ways to ensure stability with this
                directory. The first is to make sure this directory is
                run on a RAID array. If a disk fails, the directory is
                available. The second way is to use a tool such as
                rsync to replicate the images to another
                server:</para>
            <para># rsync -az --progress /var/lib/glance/images
                backup-server:/var/lib/glance/images/</para>
        </section>
        <section xml:id="identity">
            <title>Identity</title>
            <para>
                <code>/etc/keystone</code> and
                    <code>/var/log/keystone</code> follow the same
                rules as other components.</para>
            <para>
                <code>/var/lib/keystone</code>, while should not
                contain any data being used, can also be backed up
                just in case.</para>
        </section>
        <section xml:id="ops_block_storage">
            <title>Block Storage</title>
            <para>
                <code>/etc/cinder</code> and
                    <code>/var/log/cinder</code> follow the same rules
                as other components.</para>
            <para>
                <code>/var/lib/cinder</code> should also be backed
                up.</para>
        </section>
        <section xml:id="ops_object_storage">
            <title>Object Storage</title>
            <para>
                <code>/etc/swift</code> is very important to have
                backed up. This directory contains the Swift
                configuration files as well as the ring files and ring
                    <glossterm>builder file</glossterm>s, which if
                lost render the data on your cluster inaccessible. A
                best practice is to copy the builder files to all
                storage nodes along with the ring files. Multiple
                backups copies are spread throughout your storage
                cluster.</para>
        </section>
    </section>
    <section xml:id="recovering_backups">
        <title>Recovering Backups</title>
        <para>Recovering backups is a fairly simple process. To begin,
            first ensure that the service you are recovering is not
            running. For example, to do a full recovery of nova on the
            cloud controller, first stop all <code>nova</code>
            services:</para>
        <programlisting language="bash"><?db-font-size 65%?><prompt>#</prompt> stop nova-api
      # stop nova-cert
      # stop nova-consoleauth
      # stop nova-novncproxy
      # stop nova-objectstore
      # stop nova-scheduler</programlisting>
        <para>Now you can import a previously backed up
            database:</para>
        <programlisting language="bash"><?db-font-size 65%?><prompt>#</prompt> mysql nova &lt; nova.sql</programlisting>
        <para>As well as restore backed up nova directories:</para>
        <programlisting language="bash"><?db-font-size 65%?><prompt>#</prompt> mv /etc/nova{,.orig}
      # cp -a /path/to/backup/nova /etc/</programlisting>
        <para>Once the files are restored, start everything back
            up:</para>
        <programlisting language="bash"><?db-font-size 65%?><prompt>#</prompt> start mysql
      # for i in nova-api nova-cert nova-consoleauth nova-novncproxy nova-objectstore nova-scheduler
      &gt; do
      &gt; start $i
      &gt; done
      </programlisting>
        <para>Other services follow the same process, with their
            respective directories and databases.</para>
    </section>
    <section xml:id="ops-backup-recovery-summary">
        <title>Summary</title>
        <para>Backup and subsequent recovery is one of the first tasks system
            administrators learn. However, each system has different
            items that need attention. By taking care of your database, image
            service and appropriate file system locations, you can be assured
            you can handle any event requiring recovery.</para>
        </section>
</chapter>