
This patch addresses the comments made during the latest round of edits from O'Reilly. Change-Id: I7deaceacd319775c9960377074728538aa0b0314
562 lines
29 KiB
XML
562 lines
29 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
||
<!DOCTYPE chapter [
|
||
<!-- Some useful entities borrowed from HTML -->
|
||
<!ENTITY ndash "–">
|
||
<!ENTITY mdash "—">
|
||
<!ENTITY hellip "…">
|
||
<!ENTITY plusmn "±">
|
||
<!ENTITY CHECK '<inlinemediaobject xmlns="http://docbook.org/ns/docbook">
|
||
<imageobject>
|
||
<imagedata fileref="figures/Check_mark_23x20_02.svg"
|
||
format="SVG" scale="60"/>
|
||
</imageobject>
|
||
</inlinemediaobject>'>
|
||
]>
|
||
<chapter xmlns="http://docbook.org/ns/docbook"
|
||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||
xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0"
|
||
xml:id="storage_decision">
|
||
<?dbhtml stop-chunking?>
|
||
<title>Storage Decisions</title>
|
||
<para>Storage is found in many parts of the OpenStack stack, and
|
||
the differing types can cause confusion to even experienced
|
||
cloud engineers. This section focuses on persistent storage
|
||
options you can configure with your cloud.</para>
|
||
<section xml:id="ephemeral_storage">
|
||
<title>Ephemeral Storage</title>
|
||
<para>If you only deploy the OpenStack Compute Service (nova),
|
||
your users do not have access to any form of persistent
|
||
storage by default. The disks associated with VMs are
|
||
"ephemeral", meaning that (from the user's point of view)
|
||
they effectively disappear when a virtual machine is
|
||
terminated. You must identify what type of persistent
|
||
storage you want to support for your users.</para>
|
||
<para>Today, OpenStack clouds explicitly support two types of
|
||
persistent storage: <emphasis>object storage</emphasis>
|
||
and <emphasis>block storage</emphasis>.</para></section>
|
||
<section xml:id="persistent_storage">
|
||
<title>Persistent Storage</title>
|
||
<para>Persistent storage means that the storage resource outlives any
|
||
other resource and is always available, regardless of the state of a
|
||
running instance.</para>
|
||
<section xml:id="object_storage">
|
||
<title>Object Storage</title>
|
||
<para>With object storage, users access binary objects
|
||
through a REST API. You may be familiar with Amazon
|
||
S3, which is a well-known example of an object storage
|
||
system. Object storage is implemented in OpenStack by
|
||
the OpenStack Object Storage (swift) project. If your
|
||
intended users need to archive or manage large
|
||
datasets, you want to provide them with object
|
||
storage. In addition, OpenStack can store your virtual
|
||
machine (VM) images inside of an object storage
|
||
system, as an alternative to storing the images on a
|
||
file system.</para>
|
||
<para>OpenStack Object Storage provides a highly scalable,
|
||
highly available storage solution by relaxing some of the
|
||
constraints of traditional file systems. In designing and
|
||
procuring for such a cluster, it is important to
|
||
understand some key concepts about its operation.
|
||
Essentially, this type of storage is built on the idea
|
||
that all storage hardware fails, at every level, at some
|
||
point. Infrequently encountered failures that would
|
||
hamstring other storage systems, such as issues taking
|
||
down RAID cards, or entire servers are handled gracefully
|
||
with OpenStack Object Storage.</para>
|
||
<para>A good document describing the Object Storage
|
||
architecture is found within <link
|
||
xlink:title="OpenStack wiki"
|
||
xlink:href="http://docs.openstack.org/developer/swift/overview_architecture.html"
|
||
>the developer documentation</link>
|
||
(http://docs.openstack.org/developer/swift/overview_architecture.html)
|
||
- read this first. Once you have understood the
|
||
architecture, you should know what a proxy server does and
|
||
how zones work. However, some important points are often
|
||
missed at first glance.</para>
|
||
<para>When designing your cluster, you must consider
|
||
durability and availability. Understand that the
|
||
predominant source of these is the spread and placement of
|
||
your data, rather than the reliability of the hardware.
|
||
Consider the default value of the number of replicas,
|
||
which is 3. This means that before an object is marked as
|
||
having being written at least two copies exists - in case
|
||
a single server fails to write, the third copy may or may
|
||
not yet exist when the write operation initially returns.
|
||
Altering this number increases the robustness of your
|
||
data, but reduces the amount of storage you have
|
||
available. Next look at the placement of your servers.
|
||
Consider spreading them widely throughout your data
|
||
centre's network and power failure zones. Is a zone a
|
||
rack, a server or a disk?</para>
|
||
<para>Object Storage's network patterns might seem unfamiliar
|
||
at first. Consider these main traffic flows: <itemizedlist>
|
||
<listitem>
|
||
<para>Among <glossterm>object</glossterm>,
|
||
<glossterm>container</glossterm>, and
|
||
<glossterm>account
|
||
server</glossterm>s</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Between those servers and the proxies</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Between the proxies and your users</para>
|
||
</listitem>
|
||
</itemizedlist></para>
|
||
<para>Object Storage is very 'chatty' among servers hosting
|
||
data - even a small cluster does megabytes/second of
|
||
traffic, which is predominantly "Do you have the
|
||
object?"/"Yes I have the object!." Of course, if the
|
||
answer to the aforementioned question is negative or times
|
||
out, replication of the object begins.</para>
|
||
<para>Consider the scenario where an entire server fails, and
|
||
24 TB of data needs to be transferred "immediately" to
|
||
remain at three copies - this can put significant load on
|
||
the network.</para>
|
||
<para>Another oft forgotten fact is that when a new file is
|
||
being uploaded, the proxy server must write out as many
|
||
streams as there are replicas - giving a multiple of
|
||
network traffic. For a 3-replica cluster, 10Gbps in means
|
||
30Gbps out. Combining this with the previous high
|
||
bandwidth demands of replication is what results in the
|
||
recommendation that your private network is of
|
||
significantly higher bandwidth than your public need be.
|
||
Oh, and OpenStack Object Storage communicates internally
|
||
with unencrypted, unauthenticated rsync for performance
|
||
— you do want the private network to be
|
||
private.</para>
|
||
<para>The remaining point on bandwidth is the public facing
|
||
portion. The swift-proxy service is stateless, which means
|
||
that you can easily add more and use http load-balancing
|
||
methods to share bandwidth and availability between
|
||
them.</para>
|
||
<para>More proxies means more bandwidth, if your storage can
|
||
keep up.</para>
|
||
</section>
|
||
<section xml:id="block_storage">
|
||
<title>Block Storage</title>
|
||
<para>Block storage (sometimes referred to as volume
|
||
storage) provides users with access to block storage
|
||
devices. Users interact with block storage by
|
||
attaching volumes to their running VM
|
||
instances.</para>
|
||
<para>These volumes are persistent: they can be detached
|
||
from one instance and re-attached to another, and the
|
||
data remains intact. Block storage is implemented in
|
||
OpenStack by the OpenStack Block Storage (Cinder)
|
||
project, which supports multiple back-ends in the form
|
||
of drivers. Your choice of a storage back-end must be
|
||
supported by a Block Storage driver.</para>
|
||
<para>Most block storage drivers allow the instance to
|
||
have direct access to the underlying storage
|
||
hardware's block device. This helps increase the
|
||
overall read/write IO.</para>
|
||
<para>Experimental support for utilizing files as volumes
|
||
began in the Folsom release. This initially started as
|
||
a reference driver for using NFS with Cinder. By
|
||
Grizzly's release, this has expanded into a full NFS
|
||
driver as well as a GlusterFS driver.</para>
|
||
<para>These drivers work a little differently than a
|
||
traditional "block" storage driver. On an NFS or
|
||
GlusterFS file system, a single file is created and
|
||
then mapped as a "virtual" volume into the instance.
|
||
This mapping/translation is similar to how OpenStack
|
||
utilizes QEMU's file-based virtual machines stored in
|
||
<code>/var/lib/nova/instances</code>.</para>
|
||
</section>
|
||
</section>
|
||
<section xml:id="storage_concepts">
|
||
<title>OpenStack Storage Concepts</title>
|
||
<table xml:id="openstack_storage" rules="all">
|
||
<caption>OpenStack Storage</caption>
|
||
<thead>
|
||
<tr>
|
||
<th/>
|
||
<th>Ephemeral storage</th>
|
||
<th>Block storage</th>
|
||
<th>Object storage</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td><para>Used to…</para></td>
|
||
<td><para>Run operating system and scratch
|
||
space</para></td>
|
||
<td><para>Add additional persistent storage to a
|
||
virtual machine (VM)</para></td>
|
||
<td><para>Store data, including VM
|
||
images</para></td>
|
||
</tr>
|
||
<tr>
|
||
<td><para>Accessed through…</para></td>
|
||
<td><para>A file system</para></td>
|
||
<td><para>A <glossterm>block device</glossterm>
|
||
that can be partitioned, formatted and
|
||
mounted (such as, /dev/vdc)</para></td>
|
||
<td><para>REST API</para></td>
|
||
</tr>
|
||
<tr>
|
||
<td><para>Accessible from…</para></td>
|
||
<td><para>Within a VM</para></td>
|
||
<td><para>Within a VM</para></td>
|
||
<td><para>Anywhere</para></td>
|
||
</tr>
|
||
<tr>
|
||
<td><para>Managed by…</para></td>
|
||
<td><para>OpenStack Compute (Nova)</para></td>
|
||
<td><para>OpenStack Block Storage
|
||
(Cinder)</para></td>
|
||
<td><para>OpenStack Object Storage
|
||
(Swift)</para></td>
|
||
</tr>
|
||
<tr>
|
||
<td><para>Persists until…</para></td>
|
||
<td><para>VM is terminated</para></td>
|
||
<td><para>Deleted by user</para></td>
|
||
<td><para>Deleted by user</para></td>
|
||
</tr>
|
||
<tr>
|
||
<td><para>Sizing determined by…</para></td>
|
||
<td><para>Administrator configures size settings,
|
||
known as <emphasis>flavors</emphasis>
|
||
</para></td>
|
||
<td><para>Specified by user in initial
|
||
request</para></td>
|
||
<td><para>Amount of available physical
|
||
storage</para></td>
|
||
</tr>
|
||
<tr>
|
||
<td><para>Example of typical
|
||
usage…</para></td>
|
||
<td><para>10 GB first disk, 30GB second
|
||
disk</para></td>
|
||
<td><para>1 TB disk</para></td>
|
||
<td>
|
||
<para>10s of TBs of dataset storage</para>
|
||
</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<section xml:id="file_level_storage">
|
||
<!-- FIXME: change to an aside -->
|
||
<title>File-level Storage (for Live Migration)</title>
|
||
<para>With file-level storage, users access stored data
|
||
using the operating system's file system interface.
|
||
Most users, if they have used a network storage
|
||
solution before, have encountered this form of
|
||
networked storage. In the Unix world, the most common
|
||
form of this is NFS. In the Windows world, the most
|
||
common form is called CIFS (previously, SMB).</para>
|
||
<para>OpenStack clouds do not present file-level storage
|
||
to end users. However, it is important to consider
|
||
file-level storage for storing instances under
|
||
<code>/var/lib/nova/instances</code> when
|
||
designing your cloud, since you must have a shared
|
||
file system if you wish to support live
|
||
migration.</para>
|
||
</section>
|
||
</section>
|
||
<?hard-pagebreak?>
|
||
<section xml:id="storage_backends">
|
||
<title>Choosing Storage Back-ends</title>
|
||
<para>Users will indicate different needs for their cloud use
|
||
cases. Some may need fast access to many objects that do
|
||
not change often, or they want to set a Time To Live (TTL)
|
||
value on a file. Others may only access storage that is
|
||
mounted with the file system itself, but want it to be
|
||
replicated instantly when starting a new instance. For
|
||
other systems, ephemeral storage that is released when a
|
||
VM attached to it is shut down. When you select
|
||
<glossterm>storage back-end</glossterm>s, ask the
|
||
following questions on behalf of your users:</para>
|
||
<itemizedlist role="compact">
|
||
<listitem>
|
||
<para>Do my users need block storage?</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Do my users need object storage?</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Do I need to support live migration?</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Should my persistent storage drives be contained
|
||
in my compute nodes, or should I use external
|
||
storage?</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>What is the platter count I can achieve? Do more
|
||
spindles result in better I/O despite network
|
||
access?</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Which one results in the best cost-performance
|
||
scenario I'm aiming for?</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>How do I manage the storage
|
||
operationally?</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>How redundant and distributed is the storage?
|
||
What happens if a storage node fails? To what
|
||
extent can it mitigate my data-loss disaster
|
||
scenarios?</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>To deploy your storage by using entirely commodity
|
||
hardware, you can use a number of open-source packages, as
|
||
shown in the following table:</para>
|
||
<table xml:id="storage_solutions" rules="all">
|
||
<caption>Persistent file-based storage support</caption>
|
||
<thead>
|
||
<tr>
|
||
<th> </th>
|
||
<th>Object</th>
|
||
<th>Block</th>
|
||
<th>File-level* (live migration support)</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td><para>Swift</para></td>
|
||
<td><para>&CHECK;</para></td>
|
||
<td><para> </para></td>
|
||
<td><para> </para></td>
|
||
</tr>
|
||
<tr>
|
||
<td><para>LVM</para></td>
|
||
<td><para> </para></td>
|
||
<td><para>&CHECK;</para></td>
|
||
<td><para> </para></td>
|
||
</tr>
|
||
<tr>
|
||
<td><para>Ceph</para></td>
|
||
<td><para>&CHECK;</para></td>
|
||
<td><para>&CHECK;</para></td>
|
||
<td><para>experimental</para></td>
|
||
</tr>
|
||
<tr>
|
||
<td><para>Gluster</para></td>
|
||
<td><para>&CHECK;</para></td>
|
||
<td><para> </para></td>
|
||
<td><para>&CHECK;</para></td>
|
||
</tr>
|
||
<tr>
|
||
<td><para>NFS</para></td>
|
||
<td><para/></td>
|
||
<td><para>&CHECK;</para></td>
|
||
<td><para>&CHECK;</para></td>
|
||
</tr>
|
||
<tr>
|
||
<td><para>ZFS</para></td>
|
||
<td><para> </para></td>
|
||
<td><para>&CHECK;</para></td>
|
||
<td><para> </para></td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<para>* This list of open-source file-level shared storage
|
||
solutions is not exhaustive, other open source solutions
|
||
exist (MooseFS). Your organization may already have
|
||
deployed a file-level shared storage solution which you
|
||
can use.</para>
|
||
<para>In addition to the open-source technologies, there are a
|
||
number of proprietary solutions that are officially
|
||
supported by OpenStack Block Storage. They are offered by
|
||
the following vendors:</para>
|
||
<itemizedlist role="compact">
|
||
<listitem>
|
||
<para>IBM (Storwize family/SVC, XIV)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>NetApp</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Nexenta</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>SolidFire</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>You can find a matrix of the functionality provided by
|
||
all of the supported Block Storage drivers on the <link
|
||
xlink:title="OpenStack wiki"
|
||
xlink:href="https://wiki.openstack.org/wiki/CinderSupportMatrix"
|
||
>OpenStack wiki</link>
|
||
(https://wiki.openstack.org/wiki/CinderSupportMatrix).</para>
|
||
<para>Also, you need to decide whether you want to support
|
||
object storage in your cloud. The two common use cases for
|
||
providing object storage in a compute cloud are:</para>
|
||
<itemizedlist role="compact">
|
||
<listitem>
|
||
<para>To provide users with a persistent storage
|
||
mechanism</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>As a scalable, reliable data store for virtual
|
||
machine images</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<section xml:id="commodity_storage_backends">
|
||
<title>Commodity Storage Back-end Technologies</title>
|
||
<para>This section provides a high-level overview of the
|
||
differences among the different commodity storage
|
||
back-end technologies. Depending on your cloud user's
|
||
needs, you can implement one or many of these
|
||
technologies in different combinations.</para>
|
||
<itemizedlist role="compact">
|
||
<listitem>
|
||
<para><emphasis role="bold">OpenStack Object
|
||
Storage (Swift)</emphasis>. The official
|
||
OpenStack Object Store implementation. It is a
|
||
mature technology that has been used for
|
||
several years in production by Rackspace as
|
||
the technology behind Rackspace Cloud Files.
|
||
As it is highly scalable, it is well-suited to
|
||
managing petabytes of storage. OpenStack
|
||
Object Storage's advantages are better
|
||
integration with OpenStack (integrates with
|
||
OpenStack Identity, works with OpenStack
|
||
Dashboard interface), and better support for
|
||
multiple data center deployment through
|
||
support of asynchronous eventual consistency
|
||
replication.</para>
|
||
<para>Therefore, if you eventually plan on
|
||
distributing your storage cluster across
|
||
multiple data centers, if you need unified
|
||
accounts for your users for both compute and
|
||
object storage, or if you want to control your
|
||
object storage with the OpenStack dashboard,
|
||
you should consider OpenStack Object Storage.
|
||
More detail can be found about OpenStack
|
||
Object Storage in the section below.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><emphasis role="bold">Ceph</emphasis>. A
|
||
scalable storage solution that replicates data
|
||
across commodity storage nodes. Ceph was
|
||
originally developed by one of the founders of
|
||
DreamHost and is currently used in production
|
||
there.</para>
|
||
<para>Ceph was designed to expose different types
|
||
of storage interfaces to the end-user: it
|
||
supports object storage, block storage, and
|
||
file system interfaces, although the file
|
||
system interface is not yet considered
|
||
production-ready. Ceph supports the same API
|
||
as Swift for object storage, can be used as a
|
||
back-end for Cinder block storage, as well as
|
||
back-end storage for Glance images. Ceph
|
||
supports "thin provisioning", implemented
|
||
using copy-on-write.</para>
|
||
<para>This can be useful when booting from volume
|
||
because a new volume can be provisioned very
|
||
quickly. Ceph also supports keystone-based
|
||
authentication (as of version 0.56), so it can
|
||
be a seamless swap in for the default
|
||
OpenStack Swift implementation.</para>
|
||
<para>Ceph's advantages are that it gives the
|
||
administrator more fine-grained control over
|
||
data distribution and replication strategies,
|
||
enables you to consolidate your object and
|
||
block storage, enables very fast provisioning
|
||
of boot-from-volume instances using thin
|
||
provisioning, and supports a distributed file
|
||
system interface, though this interface is
|
||
<link xlink:title="OpenStack wiki"
|
||
xlink:href="http://ceph.com/docs/master/faq/"
|
||
>not yet recommended</link>
|
||
(http://ceph.com/docs/master/faq/) for use in
|
||
production deployment by the Ceph project.</para>
|
||
<para>If you wish to manage your object and block
|
||
storage within a single system, or if you wish
|
||
to support fast boot-from-volume, you should
|
||
consider Ceph.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><emphasis role="bold">Gluster</emphasis>. A
|
||
distributed, shared file system. As of Gluster
|
||
version 3.3, you can use Gluster to
|
||
consolidate your object storage and file
|
||
storage into one unified file and object
|
||
storage solution, which is called Gluster For
|
||
OpenStack (GFO). GFO uses a customized version
|
||
of Swift that enables Gluster to be used as
|
||
the back-end storage.</para>
|
||
<para>The main advantage of using GFO over regular
|
||
Swift is if you also want to support a
|
||
distributed file system, either to support
|
||
shared storage live migration or to provide it
|
||
as a separate service to your end-users. If
|
||
you wish to manage your object and file
|
||
storage within a single system, you should
|
||
consider GFO.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><emphasis role="bold">LVM</emphasis>. The
|
||
Logical Volume Manager, a Linux-based system
|
||
that provides an abstraction layer on top of
|
||
physical disks to expose logical volumes to
|
||
the operating system. The LVM (Logical Volume
|
||
Manager) back-end implements block storage as
|
||
LVM logical partitions.</para>
|
||
<para>On each host that will house block storage,
|
||
an administrator must initially create a
|
||
volume group dedicated to Block Storage
|
||
volumes. Blocks are created from LVM logical
|
||
volumes.</para>
|
||
<note>
|
||
<para>LVM does <emphasis>not</emphasis>
|
||
provide any replication. Typically,
|
||
administrators configure RAID on nodes
|
||
that use LVM as block storage to protect
|
||
against failures of individual hard
|
||
drives. However, RAID does not protect
|
||
against a failure of the entire
|
||
host.</para>
|
||
</note>
|
||
</listitem>
|
||
<listitem>
|
||
<para><emphasis role="bold">ZFS</emphasis>. The
|
||
Solaris iSCSI driver for OpenStack Block
|
||
Storage implements blocks as ZFS entities. ZFS
|
||
is a file system that also has functionality
|
||
of a volume manager. This is unlike on a Linux
|
||
system, where there is a separation of volume
|
||
manager (LVM) and file system (such as, ext3,
|
||
ext4, xfs, btrfs). ZFS has a number of
|
||
advantages over ext4, including improved data
|
||
integrity checking.</para>
|
||
<para>The ZFS back-end for OpenStack Block Storage
|
||
only supports Solaris-based systems such as
|
||
Illumos. While there is a Linux port of ZFS,
|
||
it is not included in any of the standard
|
||
Linux distributions, and it has not been
|
||
tested with OpenStack Block Storage. As with
|
||
LVM, ZFS does not provide replication across
|
||
hosts on its own, you need to add a
|
||
replication solution on top of ZFS if your
|
||
cloud needs to be able to handle storage node
|
||
failures.</para>
|
||
<para>We don't recommend ZFS unless you have
|
||
previous experience with deploying it, since
|
||
the ZFS back-end for Block Storage requires a
|
||
Solaris-based operating system and we assume
|
||
that your experience is primarily with
|
||
Linux-based systems.</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</section>
|
||
</section>
|
||
<section xml:id="storagedecisions_conclusion">
|
||
<title>Conclusion</title>
|
||
<para>Hopefully you now have some considerations in mind and
|
||
questions to ask your future cloud users about their
|
||
storage use cases. As you can see, your storage decisions
|
||
will also influence your network design for performance
|
||
and security needs. Continue with us to make more informed
|
||
decisions about your OpenStack cloud design.</para>
|
||
</section>
|
||
</chapter>
|