Merge "[ops-guide] remove Arch part in favor of Arch Design Guide"

This commit is contained in:
Jenkins 2017-03-17 08:19:39 +00:00 committed by Gerrit Code Review
commit d1eaafd478
24 changed files with 8 additions and 3153 deletions

@ -1,408 +0,0 @@
====================================================
Designing for Cloud Controllers and Cloud Management
====================================================
OpenStack is designed to be massively horizontally scalable, which
allows all services to be distributed widely. However, to simplify this
guide, we have decided to discuss services of a more central nature,
using the concept of a *cloud controller*. A cloud controller is just a
conceptual simplification. In the real world, you design an architecture
for your cloud controller that enables high availability so that if any
node fails, another can take over the required tasks. In reality, cloud
controller tasks are spread out across more than a single node.
The cloud controller provides the central management system for
OpenStack deployments. Typically, the cloud controller manages
authentication and sends messaging to all the systems through a message
queue.
For many deployments, the cloud controller is a single node. However, to
have high availability, you have to take a few considerations into
account, which we'll cover in this chapter.
The cloud controller manages the following services for the cloud:
Databases
Tracks current information about users and instances, for example,
in a database, typically one database instance managed per service
Message queue services
All :term:`Advanced Message Queuing Protocol (AMQP)` messages for
services are received and sent according to the queue broker
Conductor services
Proxy requests to a database
Authentication and authorization for identity management
Indicates which users can do what actions on certain cloud
resources; quota management is spread out among services,
however
Image-management services
Stores and serves images with metadata on each, for launching in the
cloud
Scheduling services
Indicates which resources to use first; for example, spreading out
where instances are launched based on an algorithm
User dashboard
Provides a web-based front end for users to consume OpenStack cloud
services
API endpoints
Offers each service's REST API access, where the API endpoint
catalog is managed by the Identity service
For our example, the cloud controller has a collection of ``nova-*``
components that represent the global state of the cloud; talks to
services such as authentication; maintains information about the cloud
in a database; communicates to all compute nodes and storage
:term:`workers <worker>` through a queue; and provides API access.
Each service running on a designated cloud controller may be broken out
into separate nodes for scalability or availability.
As another example, you could use pairs of servers for a collective
cloud controller—one active, one standby—for redundant nodes providing a
given set of related services, such as:
- Front end web for API requests, the scheduler for choosing which
compute node to boot an instance on, Identity services, and the
dashboard
- Database and message queue server (such as MySQL, RabbitMQ)
- Image service for the image management
Now that you see the myriad designs for controlling your cloud, read
more about the further considerations to help with your design
decisions.
Hardware Considerations
~~~~~~~~~~~~~~~~~~~~~~~
A cloud controller's hardware can be the same as a compute node, though
you may want to further specify based on the size and type of cloud that
you run.
It's also possible to use virtual machines for all or some of the
services that the cloud controller manages, such as the message queuing.
In this guide, we assume that all services are running directly on the
cloud controller.
:ref:`table_controller_hardware` contains common considerations to
review when sizing hardware for the cloud controller design.
.. _table_controller_hardware:
.. list-table:: Table. Cloud controller hardware sizing considerations
:widths: 25 75
:header-rows: 1
* - Consideration
- Ramification
* - How many instances will run at once?
- Size your database server accordingly, and scale out beyond one cloud
controller if many instances will report status at the same time and
scheduling where a new instance starts up needs computing power.
* - How many compute nodes will run at once?
- Ensure that your messaging queue handles requests successfully and size
accordingly.
* - How many users will access the API?
- If many users will make multiple requests, make sure that the CPU load
for the cloud controller can handle it.
* - How many users will access the dashboard versus the REST API directly?
- The dashboard makes many requests, even more than the API access, so
add even more CPU if your dashboard is the main interface for your users.
* - How many ``nova-api`` services do you run at once for your cloud?
- You need to size the controller with a core per service.
* - How long does a single instance run?
- Starting instances and deleting instances is demanding on the compute
node but also demanding on the controller node because of all the API
queries and scheduling needs.
* - Does your authentication system also verify externally?
- External systems such as :term:`LDAP <Lightweight Directory Access
Protocol (LDAP)>` or :term:`Active Directory` require network
connectivity between the cloud controller and an external authentication
system. Also ensure that the cloud controller has the CPU power to keep
up with requests.
Separation of Services
~~~~~~~~~~~~~~~~~~~~~~
While our example contains all central services in a single location, it
is possible and indeed often a good idea to separate services onto
different physical servers. :ref:`table_deployment_scenarios` is a list
of deployment scenarios we've seen and their justifications.
.. _table_deployment_scenarios:
.. list-table:: Table. Deployment scenarios
:widths: 25 75
:header-rows: 1
* - Scenario
- Justification
* - Run ``glance-*`` servers on the ``swift-proxy`` server.
- This deployment felt that the spare I/O on the Object Storage proxy
server was sufficient and that the Image Delivery portion of glance
benefited from being on physical hardware and having good connectivity
to the Object Storage back end it was using.
* - Run a central dedicated database server.
- This deployment used a central dedicated server to provide the databases
for all services. This approach simplified operations by isolating
database server updates and allowed for the simple creation of slave
database servers for failover.
* - Run one VM per service.
- This deployment ran central services on a set of servers running KVM.
A dedicated VM was created for each service (``nova-scheduler``,
rabbitmq, database, etc). This assisted the deployment with scaling
because administrators could tune the resources given to each virtual
machine based on the load it received (something that was not well
understood during installation).
* - Use an external load balancer.
- This deployment had an expensive hardware load balancer in its
organization. It ran multiple ``nova-api`` and ``swift-proxy``
servers on different physical servers and used the load balancer
to switch between them.
One choice that always comes up is whether to virtualize. Some services,
such as ``nova-compute``, ``swift-proxy`` and ``swift-object`` servers,
should not be virtualized. However, control servers can often be happily
virtualized—the performance penalty can usually be offset by simply
running more of the service.
Database
~~~~~~~~
OpenStack Compute uses an SQL database to store and retrieve stateful
information. MySQL is the popular database choice in the OpenStack
community.
Loss of the database leads to errors. As a result, we recommend that you
cluster your database to make it failure tolerant. Configuring and
maintaining a database cluster is done outside OpenStack and is
determined by the database software you choose to use in your cloud
environment. MySQL/Galera is a popular option for MySQL-based databases.
Message Queue
~~~~~~~~~~~~~
Most OpenStack services communicate with each other using the *message
queue*. For example, Compute communicates to block storage services and
networking services through the message queue. Also, you can optionally
enable notifications for any service. RabbitMQ, Qpid, and Zeromq are all
popular choices for a message-queue service. In general, if the message
queue fails or becomes inaccessible, the cluster grinds to a halt and
ends up in a read-only state, with information stuck at the point where
the last message was sent. Accordingly, we recommend that you cluster
the message queue. Be aware that clustered message queues can be a pain
point for many OpenStack deployments. While RabbitMQ has native
clustering support, there have been reports of issues when running it at
a large scale. While other queuing solutions are available, such as Zeromq
and Qpid, Zeromq does not offer stateful queues. Qpid is the messaging
system of choice for Red Hat and its derivatives. Qpid does not have
native clustering capabilities and requires a supplemental service, such
as Pacemaker or Corsync. For your message queue, you need to determine
what level of data loss you are comfortable with and whether to use an
OpenStack project's ability to retry multiple MQ hosts in the event of a
failure, such as using Compute's ability to do so.
Conductor Services
~~~~~~~~~~~~~~~~~~
In the previous version of OpenStack, all ``nova-compute`` services
required direct access to the database hosted on the cloud controller.
This was problematic for two reasons: security and performance. With
regard to security, if a compute node is compromised, the attacker
inherently has access to the database. With regard to performance,
``nova-compute`` calls to the database are single-threaded and blocking.
This creates a performance bottleneck because database requests are
fulfilled serially rather than in parallel.
The conductor service resolves both of these issues by acting as a proxy
for the ``nova-compute`` service. Now, instead of ``nova-compute``
directly accessing the database, it contacts the ``nova-conductor``
service, and ``nova-conductor`` accesses the database on
``nova-compute``'s behalf. Since ``nova-compute`` no longer has direct
access to the database, the security issue is resolved. Additionally,
``nova-conductor`` is a nonblocking service, so requests from all
compute nodes are fulfilled in parallel.
.. note::
If you are using ``nova-network`` and multi-host networking in your
cloud environment, ``nova-compute`` still requires direct access to
the database.
The ``nova-conductor`` service is horizontally scalable. To make
``nova-conductor`` highly available and fault tolerant, just launch more
instances of the ``nova-conductor`` process, either on the same server
or across multiple servers.
Application Programming Interface (API)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All public access, whether direct, through a command-line client, or
through the web-based dashboard, uses the API service. Find the API
reference at `Development resources for OpenStack clouds
<https://developer.openstack.org/>`_.
You must choose whether you want to support the Amazon EC2 compatibility
APIs, or just the OpenStack APIs. One issue you might encounter when
running both APIs is an inconsistent experience when referring to images
and instances.
For example, the EC2 API refers to instances using IDs that contain
hexadecimal, whereas the OpenStack API uses names and digits. Similarly,
the EC2 API tends to rely on DNS aliases for contacting virtual
machines, as opposed to OpenStack, which typically lists IP
addresses.
If OpenStack is not set up in the right way, it is simple to have
scenarios in which users are unable to contact their instances due to
having only an incorrect DNS alias. Despite this, EC2 compatibility can
assist users migrating to your cloud.
As with databases and message queues, having more than one :term:`API server`
is a good thing. Traditional HTTP load-balancing techniques can be used to
achieve a highly available ``nova-api`` service.
Extensions
~~~~~~~~~~
The `API
Specifications <https://developer.openstack.org/api-guide/quick-start/index.html>`_ define
the core actions, capabilities, and mediatypes of the OpenStack API. A
client can always depend on the availability of this core API, and
implementers are always required to support it in its entirety.
Requiring strict adherence to the core API allows clients to rely upon a
minimal level of functionality when interacting with multiple
implementations of the same API.
The OpenStack Compute API is extensible. An extension adds capabilities
to an API beyond those defined in the core. The introduction of new
features, MIME types, actions, states, headers, parameters, and
resources can all be accomplished by means of extensions to the core
API. This allows the introduction of new features in the API without
requiring a version change and allows the introduction of
vendor-specific niche functionality.
Scheduling
~~~~~~~~~~
The scheduling services are responsible for determining the compute or
storage node where a virtual machine or block storage volume should be
created. The scheduling services receive creation requests for these
resources from the message queue and then begin the process of
determining the appropriate node where the resource should reside. This
process is done by applying a series of user-configurable filters
against the available collection of nodes.
There are currently two schedulers: ``nova-scheduler`` for virtual
machines and ``cinder-scheduler`` for block storage volumes. Both
schedulers are able to scale horizontally, so for high-availability
purposes, or for very large or high-schedule-frequency installations,
you should consider running multiple instances of each scheduler. The
schedulers all listen to the shared message queue, so no special load
balancing is required.
Images
~~~~~~
The OpenStack Image service consists of two parts: ``glance-api`` and
``glance-registry``. The former is responsible for the delivery of
images; the compute node uses it to download images from the back end.
The latter maintains the metadata information associated with virtual
machine images and requires a database.
The ``glance-api`` part is an abstraction layer that allows a choice of
back end. Currently, it supports:
OpenStack Object Storage
Allows you to store images as objects.
File system
Uses any traditional file system to store the images as files.
S3
Allows you to fetch images from Amazon S3.
HTTP
Allows you to fetch images from a web server. You cannot write
images by using this mode.
If you have an OpenStack Object Storage service, we recommend using this
as a scalable place to store your images. You can also use a file system
with sufficient performance or Amazon S3—unless you do not need the
ability to upload new images through OpenStack.
Dashboard
~~~~~~~~~
The OpenStack dashboard (horizon) provides a web-based user interface to
the various OpenStack components. The dashboard includes an end-user
area for users to manage their virtual infrastructure and an admin area
for cloud operators to manage the OpenStack environment as a
whole.
The dashboard is implemented as a Python web application that normally
runs in :term:`Apache` ``httpd``. Therefore, you may treat it the same as any
other web application, provided it can reach the API servers (including
their admin endpoints) over the network.
Authentication and Authorization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The concepts supporting OpenStack's authentication and authorization are
derived from well-understood and widely used systems of a similar
nature. Users have credentials they can use to authenticate, and they
can be a member of one or more groups (known as projects or tenants,
interchangeably).
For example, a cloud administrator might be able to list all instances
in the cloud, whereas a user can see only those in his current group.
Resources quotas, such as the number of cores that can be used, disk
space, and so on, are associated with a project.
OpenStack Identity provides authentication decisions and user attribute
information, which is then used by the other OpenStack services to
perform authorization. The policy is set in the ``policy.json`` file.
For information on how to configure these, see :doc:`ops-projects-users`
OpenStack Identity supports different plug-ins for authentication
decisions and identity storage. Examples of these plug-ins include:
- In-memory key-value Store (a simplified internal storage structure)
- SQL database (such as MySQL or PostgreSQL)
- Memcached (a distributed memory object caching system)
- LDAP (such as OpenLDAP or Microsoft's Active Directory)
Many deployments use the SQL database; however, LDAP is also a popular
choice for those with existing authentication infrastructure that needs
to be integrated.
Network Considerations
~~~~~~~~~~~~~~~~~~~~~~
Because the cloud controller handles so many different services, it must
be able to handle the amount of traffic that hits it. For example, if
you choose to host the OpenStack Image service on the cloud controller,
the cloud controller should be able to support the transferring of the
images at an acceptable speed.
As another example, if you choose to use single-host networking where
the cloud controller is the network gateway for all instances, then the
cloud controller must support the total amount of traffic that travels
between your cloud and the public Internet.
We recommend that you use a fast NIC, such as 10 GB. You can also choose
to use two 10 GB NICs and bond them together. While you might not be
able to get a full bonded 20 GB speed, different transmission streams
use different NICs. For example, if the cloud controller transfers two
images, each image uses a different NIC and gets a full 10 GB of
bandwidth.

@ -1,305 +0,0 @@
=============
Compute Nodes
=============
In this chapter, we discuss some of the choices you need to consider
when building out your compute nodes. Compute nodes form the resource
core of the OpenStack Compute cloud, providing the processing, memory,
network and storage resources to run instances.
Choosing a CPU
~~~~~~~~~~~~~~
The type of CPU in your compute node is a very important choice. First,
ensure that the CPU supports virtualization by way of *VT-x* for Intel
chips and *AMD-v* for AMD chips.
.. tip::
Consult the vendor documentation to check for virtualization
support. For Intel, read `“Does my processor support Intel® Virtualization
Technology?” <http://www.intel.com/support/processors/sb/cs-030729.htm>`_.
For AMD, read `AMD Virtualization
<http://www.amd.com/en-us/innovations/software-technologies/server-solution/virtualization>`_.
Note that your CPU may support virtualization but it may be
disabled. Consult your BIOS documentation for how to enable CPU
features.
The number of cores that the CPU has also affects the decision. It's
common for current CPUs to have up to 12 cores. Additionally, if an
Intel CPU supports hyperthreading, those 12 cores are doubled to 24
cores. If you purchase a server that supports multiple CPUs, the number
of cores is further multiplied.
.. note::
**Multithread Considerations**
Hyper-Threading is Intel's proprietary simultaneous multithreading
implementation used to improve parallelization on their CPUs. You might
consider enabling Hyper-Threading to improve the performance of
multithreaded applications.
Whether you should enable Hyper-Threading on your CPUs depends upon your
use case. For example, disabling Hyper-Threading can be beneficial in
intense computing environments. We recommend that you do performance
testing with your local workload with both Hyper-Threading on and off to
determine what is more appropriate in your case.
Choosing a Hypervisor
~~~~~~~~~~~~~~~~~~~~~
A hypervisor provides software to manage virtual machine access to the
underlying hardware. The hypervisor creates, manages, and monitors
virtual machines. OpenStack Compute supports many hypervisors to various
degrees, including:
* `KVM <http://www.linux-kvm.org/page/Main_Page>`_
* `LXC <https://linuxcontainers.org/>`_
* `QEMU <http://wiki.qemu.org/Main_Page>`_
* `VMware ESX/ESXi <https://www.vmware.com/support/vsphere-hypervisor>`_
* `Xen <http://www.xenproject.org/>`_
* `Hyper-V <http://technet.microsoft.com/en-us/library/hh831531.aspx>`_
* `Docker <https://www.docker.com/>`_
Probably the most important factor in your choice of hypervisor is your
current usage or experience. Aside from that, there are practical
concerns to do with feature parity, documentation, and the level of
community experience.
For example, KVM is the most widely adopted hypervisor in the OpenStack
community. Besides KVM, more deployments run Xen, LXC, VMware, and
Hyper-V than the others listed. However, each of these are lacking some
feature support or the documentation on how to use them with OpenStack
is out of date.
The best information available to support your choice is found on the
`Hypervisor Support Matrix
<https://docs.openstack.org/developer/nova/support-matrix.html>`_
and in the `configuration reference
<https://docs.openstack.org/ocata/config-reference/compute/hypervisors.html>`_.
.. note::
It is also possible to run multiple hypervisors in a single
deployment using host aggregates or cells. However, an individual
compute node can run only a single hypervisor at a time.
Instance Storage Solutions
~~~~~~~~~~~~~~~~~~~~~~~~~~
As part of the procurement for a compute cluster, you must specify some
storage for the disk on which the instantiated instance runs. There are
three main approaches to providing this temporary-style storage, and it
is important to understand the implications of the choice.
They are:
* Off compute node storage—shared file system
* On compute node storage—shared file system
* On compute node storage—nonshared file system
In general, the questions you should ask when selecting storage are as
follows:
* What is the platter count you can achieve?
* Do more spindles result in better I/O despite network access?
* Which one results in the best cost-performance scenario you are aiming for?
* How do you manage the storage operationally?
Many operators use separate compute and storage hosts. Compute services
and storage services have different requirements, and compute hosts
typically require more CPU and RAM than storage hosts. Therefore, for a
fixed budget, it makes sense to have different configurations for your
compute nodes and your storage nodes. Compute nodes will be invested in
CPU and RAM, and storage nodes will be invested in block storage.
However, if you are more restricted in the number of physical hosts you
have available for creating your cloud and you want to be able to
dedicate as many of your hosts as possible to running instances, it
makes sense to run compute and storage on the same machines.
We'll discuss the three main approaches to instance storage in the next
few sections.
Off Compute Node Storage—Shared File System
-------------------------------------------
In this option, the disks storing the running instances are hosted in
servers outside of the compute nodes.
If you use separate compute and storage hosts, you can treat your
compute hosts as "stateless." As long as you don't have any instances
currently running on a compute host, you can take it offline or wipe it
completely without having any effect on the rest of your cloud. This
simplifies maintenance for the compute hosts.
There are several advantages to this approach:
* If a compute node fails, instances are usually easily recoverable.
* Running a dedicated storage system can be operationally simpler.
* You can scale to any number of spindles.
* It may be possible to share the external storage for other purposes.
The main downsides to this approach are:
* Depending on design, heavy I/O usage from some instances can affect
unrelated instances.
* Use of the network can decrease performance.
On Compute Node Storage—Shared File System
------------------------------------------
In this option, each compute node is specified with a significant amount
of disk space, but a distributed file system ties the disks from each
compute node into a single mount.
The main advantage of this option is that it scales to external storage
when you require additional storage.
However, this option has several downsides:
* Running a distributed file system can make you lose your data
locality compared with nonshared storage.
* Recovery of instances is complicated by depending on multiple hosts.
* The chassis size of the compute node can limit the number of spindles
able to be used in a compute node.
* Use of the network can decrease performance.
On Compute Node Storage—Nonshared File System
---------------------------------------------
In this option, each compute node is specified with enough disks to
store the instances it hosts.
There are two main reasons why this is a good idea:
* Heavy I/O usage on one compute node does not affect instances on
other compute nodes.
* Direct I/O access can increase performance.
This has several downsides:
* If a compute node fails, the instances running on that node are lost.
* The chassis size of the compute node can limit the number of spindles
able to be used in a compute node.
* Migrations of instances from one node to another are more complicated
and rely on features that may not continue to be developed.
* If additional storage is required, this option does not scale.
Running a shared file system on a storage system apart from the computes
nodes is ideal for clouds where reliability and scalability are the most
important factors. Running a shared file system on the compute nodes
themselves may be best in a scenario where you have to deploy to
preexisting servers for which you have little to no control over their
specifications. Running a nonshared file system on the compute nodes
themselves is a good option for clouds with high I/O requirements and
low concern for reliability.
Issues with Live Migration
--------------------------
We consider live migration an integral part of the operations of the
cloud. This feature provides the ability to seamlessly move instances
from one physical host to another, a necessity for performing upgrades
that require reboots of the compute hosts, but only works well with
shared storage.
Live migration can also be done with nonshared storage, using a feature
known as *KVM live block migration*. While an earlier implementation of
block-based migration in KVM and QEMU was considered unreliable, there
is a newer, more reliable implementation of block-based live migration
as of QEMU 1.4 and libvirt 1.0.2 that is also compatible with OpenStack.
However, none of the authors of this guide have first-hand experience
using live block migration.
Choice of File System
---------------------
If you want to support shared-storage live migration, you need to
configure a distributed file system.
Possible options include:
* NFS (default for Linux)
* GlusterFS
* MooseFS
* Lustre
We've seen deployments with all, and recommend that you choose the one
you are most familiar with operating. If you are not familiar with any
of these, choose NFS, as it is the easiest to set up and there is
extensive community knowledge about it.
Overcommitting
~~~~~~~~~~~~~~
OpenStack allows you to overcommit CPU and RAM on compute nodes. This
allows you to increase the number of instances you can have running on
your cloud, at the cost of reducing the performance of the instances.
OpenStack Compute uses the following ratios by default:
* CPU allocation ratio: 16:1
* RAM allocation ratio: 1.5:1
The default CPU allocation ratio of 16:1 means that the scheduler
allocates up to 16 virtual cores per physical core. For example, if a
physical node has 12 cores, the scheduler sees 192 available virtual
cores. With typical flavor definitions of 4 virtual cores per instance,
this ratio would provide 48 instances on a physical node.
The formula for the number of virtual instances on a compute node is
``(OR*PC)/VC``, where:
OR
CPU overcommit ratio (virtual cores per physical core)
PC
Number of physical cores
VC
Number of virtual cores per instance
Similarly, the default RAM allocation ratio of 1.5:1 means that the
scheduler allocates instances to a physical node as long as the total
amount of RAM associated with the instances is less than 1.5 times the
amount of RAM available on the physical node.
For example, if a physical node has 48 GB of RAM, the scheduler
allocates instances to that node until the sum of the RAM associated
with the instances reaches 72 GB (such as nine instances, in the case
where each instance has 8 GB of RAM).
.. note::
Regardless of the overcommit ratio, an instance can not be placed
on any physical node with fewer raw (pre-overcommit) resources than
the instance flavor requires.
You must select the appropriate CPU and RAM allocation ratio for your
particular use case.
Logging
~~~~~~~
Logging is detailed more fully in :doc:`ops-logging-monitoring`. However,
it is an important design consideration to take into account before
commencing operations of your cloud.
OpenStack produces a great deal of useful logging information, however;
but for the information to be useful for operations purposes, you should
consider having a central logging server to send logs to, and a log
parsing/analysis system (such as logstash).
Networking
~~~~~~~~~~
Networking in OpenStack is a complex, multifaceted challenge. See
:doc:`arch-network-design`.
Conclusion
~~~~~~~~~~
Compute nodes are the workhorse of your cloud and the place where your
users' applications will run. They are likely to be affected by your
decisions on what to deploy and how you deploy it. Their requirements
should be reflected in the choices you make.

@ -1,568 +0,0 @@
===========================================
Example Architecture — OpenStack Networking
===========================================
This chapter provides an example architecture using OpenStack
Networking, also known as the Neutron project, in a highly available
environment.
Overview
~~~~~~~~
A highly available environment can be put into place if you require an
environment that can scale horizontally, or want your cloud to continue
to be operational in case of node failure. This example architecture has
been written based on the current default feature set of OpenStack
Havana, with an emphasis on high availability.
Components
----------
.. list-table::
:widths: 50 50
:header-rows: 1
* - Component
- Details
* - OpenStack release
- Havana
* - Host operating system
- Red Hat Enterprise Linux 6.5
* - OpenStack package repository
- `Red Hat Distributed OpenStack (RDO) <https://repos.fedorapeople.org/repos/openstack/>`_
* - Hypervisor
- KVM
* - Database
- MySQL
* - Message queue
- Qpid
* - Networking service
- OpenStack Networking
* - Tenant Network Separation
- VLAN
* - Image service back end
- GlusterFS
* - Identity driver
- SQL
* - Block Storage back end
- GlusterFS
Rationale
---------
This example architecture has been selected based on the current default
feature set of OpenStack Havana, with an emphasis on high availability.
This architecture is currently being deployed in an internal Red Hat
OpenStack cloud and used to run hosted and shared services, which by
their nature must be highly available.
This architecture's components have been selected for the following
reasons:
Red Hat Enterprise Linux
You must choose an operating system that can run on all of the
physical nodes. This example architecture is based on Red Hat
Enterprise Linux, which offers reliability, long-term support,
certified testing, and is hardened. Enterprise customers, now moving
into OpenStack usage, typically require these advantages.
RDO
The Red Hat Distributed OpenStack package offers an easy way to
download the most current OpenStack release that is built for the
Red Hat Enterprise Linux platform.
KVM
KVM is the supported hypervisor of choice for Red Hat Enterprise
Linux (and included in distribution). It is feature complete and
free from licensing charges and restrictions.
MySQL
MySQL is used as the database back end for all databases in the
OpenStack environment. MySQL is the supported database of choice for
Red Hat Enterprise Linux (and included in distribution); the
database is open source, scalable, and handles memory well.
Qpid
Apache Qpid offers 100 percent compatibility with the
:term:`Advanced Message Queuing Protocol (AMQP)` Standard, and its
broker is available for both C++ and Java.
OpenStack Networking
OpenStack Networking offers sophisticated networking functionality,
including Layer 2 (L2) network segregation and provider networks.
VLAN
Using a virtual local area network offers broadcast control,
security, and physical layer transparency. If needed, use VXLAN to
extend your address space.
GlusterFS
GlusterFS offers scalable storage. As your environment grows, you
can continue to add more storage nodes (instead of being restricted,
for example, by an expensive storage array).
Detailed Description
~~~~~~~~~~~~~~~~~~~~
Node types
----------
This section gives you a breakdown of the different nodes that make up
the OpenStack environment. A node is a physical machine that is
provisioned with an operating system, and running a defined software
stack on top of it. :ref:`table_node_types` provides node descriptions and
specifications.
.. _table_node_types:
.. list-table:: Table. Node types
:widths: 20 50 30
:header-rows: 1
* - Type
- Description
- Example hardware
* - Controller
- Controller nodes are responsible for running the management software
services needed for the OpenStack environment to function.
These nodes:
* Provide the front door that people access as well as the API
services that all other components in the environment talk to.
* Run a number of services in a highly available fashion,
utilizing Pacemaker and HAProxy to provide a virtual IP and
load-balancing functions so all controller nodes are being used.
* Supply highly available "infrastructure" services,
such as MySQL and Qpid, that underpin all the services.
* Provide what is known as "persistent storage" through services
run on the host as well. This persistent storage is backed onto
the storage nodes for reliability.
See :ref:`controller_node`.
- Model: Dell R620
CPU: 2x Intel® Xeon® CPU E5-2620 0 @ 2.00 GHz
Memory: 32 GB
Disk: two 300 GB 10000 RPM SAS Disks
Network: two 10G network ports
* - Compute
- Compute nodes run the virtual machine instances in OpenStack. They:
* Run the bare minimum of services needed to facilitate these
instances.
* Use local storage on the node for the virtual machines so that
no VM migration or instance recovery at node failure is possible.
See :ref:`compute_node`.
- Model: Dell R620
CPU: 2x Intel® Xeon® CPU E5-2650 0 @ 2.00 GHz
Memory: 128 GB
Disk: two 600 GB 10000 RPM SAS Disks
Network: four 10G network ports (For future proofing expansion)
* - Storage
- Storage nodes store all the data required for the environment,
including disk images in the Image service library, and the
persistent storage volumes created by the Block Storage service.
Storage nodes use GlusterFS technology to keep the data highly
available and scalable.
See :ref:`storage_node`.
- Model: Dell R720xd
CPU: 2x Intel® Xeon® CPU E5-2620 0 @ 2.00 GHz
Memory: 64 GB
Disk: two 500 GB 7200 RPM SAS Disks and twenty-four 600 GB
10000 RPM SAS Disks
Raid Controller: PERC H710P Integrated RAID Controller, 1 GB NV Cache
Network: two 10G network ports
* - Network
- Network nodes are responsible for doing all the virtual networking
needed for people to create public or private networks and uplink
their virtual machines into external networks. Network nodes:
* Form the only ingress and egress point for instances running
on top of OpenStack.
* Run all of the environment's networking services, with the
exception of the networking API service (which runs on the
controller node).
See :ref:`network_node`.
- Model: Dell R620
CPU: 1x Intel® Xeon® CPU E5-2620 0 @ 2.00 GHz
Memory: 32 GB
Disk: two 300 GB 10000 RPM SAS Disks
Network: five 10G network ports
* - Utility
- Utility nodes are used by internal administration staff only to
provide a number of basic system administration functions needed
to get the environment up and running and to maintain the hardware,
OS, and software on which it runs.
These nodes run services such as provisioning, configuration
management, monitoring, or GlusterFS management software.
They are not required to scale, although these machines are
usually backed up.
- Model: Dell R620
CPU: 2x Intel® Xeon® CPU E5-2620 0 @ 2.00 GHz
Memory: 32 GB
Disk: two 500 GB 7200 RPM SAS Disks
Network: two 10G network ports
.. _networking_layout:
Networking layout
-----------------
The network contains all the management devices for all hardware in the
environment (for example, by including Dell iDrac7 devices for the
hardware nodes, and management interfaces for network switches). The
network is accessed by internal staff only when diagnosing or recovering
a hardware issue.
OpenStack internal network
--------------------------
This network is used for OpenStack management functions and traffic,
including services needed for the provisioning of physical nodes
(``pxe``, ``tftp``, ``kickstart``), traffic between various OpenStack
node types using OpenStack APIs and messages (for example,
``nova-compute`` talking to ``keystone`` or ``cinder-volume`` talking to
``nova-api``), and all traffic for storage data to the storage layer
underneath by the Gluster protocol. All physical nodes have at least one
network interface (typically ``eth0``) in this network. This network is
only accessible from other VLANs on port 22 (for ``ssh`` access to
manage machines).
Public Network
--------------
This network is a combination of:
- IP addresses for public-facing interfaces on the controller nodes
(which end users will access the OpenStack services)
- A range of publicly routable, IPv4 network addresses to be used by
OpenStack Networking for floating IPs. You may be restricted in your
access to IPv4 addresses; a large range of IPv4 addresses is not
necessary.
- Routers for private networks created within OpenStack.
This network is connected to the controller nodes so users can access
the OpenStack interfaces, and connected to the network nodes to provide
VMs with publicly routable traffic functionality. The network is also
connected to the utility machines so that any utility services that need
to be made public (such as system monitoring) can be accessed.
VM traffic network
------------------
This is a closed network that is not publicly routable and is simply
used as a private, internal network for traffic between virtual machines
in OpenStack, and between the virtual machines and the network nodes
that provide l3 routes out to the public network (and floating IPs for
connections back in to the VMs). Because this is a closed network, we
are using a different address space to the others to clearly define the
separation. Only Compute and OpenStack Networking nodes need to be
connected to this network.
Node connectivity
~~~~~~~~~~~~~~~~~
The following section details how the nodes are connected to the
different networks (see :ref:`networking_layout`) and
what other considerations need to take place (for example, bonding) when
connecting nodes to the networks.
Initial deployment
------------------
Initially, the connection setup should revolve around keeping the
connectivity simple and straightforward in order to minimize deployment
complexity and time to deploy.
The deployment shown in :ref:`figure_basic_node_deployment` aims to
have 1 × 10G connectivity available to all compute nodes, while still
leveraging bonding on appropriate nodes for maximum performance.
.. _figure_basic_node_deployment:
.. figure:: figures/osog_0101.png
:alt: Basic node deployment
:width: 100%
Figure. Basic node deployment
Connectivity for maximum performance
------------------------------------
If the networking performance of the basic layout is not enough, you can
move to :ref:`figure_performance_node_deployment`, which provides 2 × 10G
network links to all instances in the environment as well as providing more
network bandwidth to the storage layer.
.. _figure_performance_node_deployment:
.. figure:: figures/osog_0102.png
:alt: Performance node deployment
:width: 100%
Figure. Performance node deployment
Node diagrams
~~~~~~~~~~~~~
The following diagrams, :ref:`controller_node` through :ref:`storage_node`,
include logical information about the different types of nodes, indicating
what services will be running on top of them and how they interact with
each other. The diagrams also illustrate how the availability and
scalability of services are achieved.
.. _controller_node:
.. figure:: figures/osog_0103.png
:alt: Controller node
:width: 100%
Figure. Controller node
.. _compute_node:
.. figure:: figures/osog_0104.png
:alt: Compute node
:width: 100%
Figure. Compute node
.. _network_node:
.. figure:: figures/osog_0105.png
:alt: Network node
:width: 100%
Figure. Network node
.. _storage_node:
.. figure:: figures/osog_0106.png
:alt: Storage node
:width: 100%
Figure. Storage node
Example Component Configuration
-------------------------------
:ref:`third_party_component_configuration` and
:ref:`openstack_component_configuration` include example configuration
and considerations for both third-party and OpenStack components:
.. _third_party_component_configuration:
.. list-table:: Table. Third-party component configuration
:widths: 10 30 30 30
:header-rows: 1
* - Component
- Tuning
- Availability
- Scalability
* - MySQL
- ``binlog-format = row``
- Master/master replication. However, both nodes are not used at the
same time. Replication keeps all nodes as close to being up to date
as possible (although the asynchronous nature of the replication means
a fully consistent state is not possible). Connections to the database
only happen through a Pacemaker virtual IP, ensuring that most problems
that occur with master-master replication can be avoided.
- Not heavily considered. Once load on the MySQL server increases enough
that scalability needs to be considered, multiple masters or a
master/slave setup can be used.
* - Qpid
- ``max-connections=1000`` ``worker-threads=20`` ``connection-backlog=10``,
sasl security enabled with SASL-BASIC authentication
- Qpid is added as a resource to the Pacemaker software that runs on
Controller nodes where Qpid is situated. This ensures only one Qpid
instance is running at one time, and the node with the Pacemaker
virtual IP will always be the node running Qpid.
- Not heavily considered. However, Qpid can be changed to run on all
controller nodes for scalability and availability purposes,
and removed from Pacemaker.
* - HAProxy
- ``maxconn 3000``
- HAProxy is a software layer-7 load balancer used to front door all
clustered OpenStack API components and do SSL termination.
HAProxy can be added as a resource to the Pacemaker software that
runs on the Controller nodes where HAProxy is situated.
This ensures that only one HAProxy instance is running at one time,
and the node with the Pacemaker virtual IP will always be the node
running HAProxy.
- Not considered. HAProxy has small enough performance overheads that
a single instance should scale enough for this level of workload.
If extra scalability is needed, ``keepalived`` or other Layer-4
load balancing can be introduced to be placed in front of multiple
copies of HAProxy.
* - Memcached
- ``MAXCONN="8192" CACHESIZE="30457"``
- Memcached is a fast in-memory key-value cache software that is used
by OpenStack components for caching data and increasing performance.
Memcached runs on all controller nodes, ensuring that should one go
down, another instance of Memcached is available.
- Not considered. A single instance of Memcached should be able to
scale to the desired workloads. If scalability is desired, HAProxy
can be placed in front of Memcached (in raw ``tcp`` mode) to utilize
multiple Memcached instances for scalability. However, this might
cause cache consistency issues.
* - Pacemaker
- Configured to use ``corosync`` and ``cman`` as a cluster communication
stack/quorum manager, and as a two-node cluster.
- Pacemaker is the clustering software used to ensure the availability
of services running on the controller and network nodes:
* Because Pacemaker is cluster software, the software itself handles
its own availability, leveraging ``corosync`` and ``cman``
underneath.
* If you use the GlusterFS native client, no virtual IP is needed,
since the client knows all about nodes after initial connection
and automatically routes around failures on the client side.
* If you use the NFS or SMB adaptor, you will need a virtual IP on
which to mount the GlusterFS volumes.
- If more nodes need to be made cluster aware, Pacemaker can scale to
64 nodes.
* - GlusterFS
- ``glusterfs`` performance profile "virt" enabled on all volumes.
Volumes are setup in two-node replication.
- Glusterfs is a clustered file system that is run on the storage
nodes to provide persistent scalable data storage in the environment.
Because all connections to gluster use the ``gluster`` native mount
points, the ``gluster`` instances themselves provide availability
and failover functionality.
- The scalability of GlusterFS storage can be achieved by adding in
more storage volumes.
|
.. _openstack_component_configuration:
.. list-table:: Table. OpenStack component configuration
:widths: 10 10 20 30 30
:header-rows: 1
* - Component
- Node type
- Tuning
- Availability
- Scalability
* - Dashboard (horizon)
- Controller
- Configured to use Memcached as a session store, ``neutron``
support is enabled, ``can_set_mount_point = False``
- The dashboard is run on all controller nodes, ensuring at least one
instance will be available in case of node failure.
It also sits behind HAProxy, which detects when the software fails
and routes requests around the failing instance.
- The dashboard is run on all controller nodes, so scalability can be
achieved with additional controller nodes. HAProxy allows scalability
for the dashboard as more nodes are added.
* - Identity (keystone)
- Controller
- Configured to use Memcached for caching and PKI for tokens.
- Identity is run on all controller nodes, ensuring at least one
instance will be available in case of node failure.
Identity also sits behind HAProxy, which detects when the software
fails and routes requests around the failing instance.
- Identity is run on all controller nodes, so scalability can be
achieved with additional controller nodes.
HAProxy allows scalability for Identity as more nodes are added.
* - Image service (glance)
- Controller
- ``/var/lib/glance/images`` is a GlusterFS native mount to a Gluster
volume off the storage layer.
- The Image service is run on all controller nodes, ensuring at least
one instance will be available in case of node failure.
It also sits behind HAProxy, which detects when the software fails
and routes requests around the failing instance.
- The Image service is run on all controller nodes, so scalability
can be achieved with additional controller nodes. HAProxy allows
scalability for the Image service as more nodes are added.
* - Compute (nova)
- Controller, Compute
- Configured to use Qpid, ``qpid_heartbeat = `` ``10``,configured to
use Memcached for caching, configured to use ``libvirt``, configured
to use ``neutron``.
Configured ``nova-consoleauth`` to use Memcached for session
management (so that it can have multiple copies and run in a
load balancer).
- The nova API, scheduler, objectstore, cert, consoleauth, conductor,
and vncproxy services are run on all controller nodes, ensuring at
least one instance will be available in case of node failure.
Compute is also behind HAProxy, which detects when the software
fails and routes requests around the failing instance.
Nova-compute and nova-conductor services, which run on the compute
nodes, are only needed to run services on that node, so availability
of those services is coupled tightly to the nodes that are available.
As long as a compute node is up, it will have the needed services
running on top of it.
- The nova API, scheduler, objectstore, cert, consoleauth, conductor,
and vncproxy services are run on all controller nodes, so scalability
can be achieved with additional controller nodes. HAProxy allows
scalability for Compute as more nodes are added. The scalability
of services running on the compute nodes (compute, conductor) is
achieved linearly by adding in more compute nodes.
* - Block Storage (cinder)
- Controller
- Configured to use Qpid, ``qpid_heartbeat = ``10``,configured to
use a Gluster volume from the storage layer as the back end for
Block Storage, using the Gluster native client.
- Block Storage API, scheduler, and volume services are run on all
controller nodes, ensuring at least one instance will be available
in case of node failure. Block Storage also sits behind HAProxy,
which detects if the software fails and routes requests around the
failing instance.
- Block Storage API, scheduler and volume services are run on all
controller nodes, so scalability can be achieved with additional
controller nodes. HAProxy allows scalability for Block Storage as
more nodes are added.
* - OpenStack Networking (neutron)
- Controller, Compute, Network
- Configured to use QPID, ``qpid_heartbeat = 10``, kernel namespace
support enabled, ``tenant_network_type = vlan``,
``allow_overlapping_ips = true``, ``tenant_network_type = vlan``,
``bridge_uplinks = br-ex:em2``, ``bridge_mappings = physnet1:br-ex``
- The OpenStack Networking service is run on all controller nodes,
ensuring at least one instance will be available in case of node
failure. It also sits behind HAProxy, which detects if the software
fails and routes requests around the failing instance.
- The OpenStack Networking server service is run on all controller
nodes, so scalability can be achieved with additional controller
nodes. HAProxy allows scalability for OpenStack Networking as more
nodes are added. Scalability of services running on the network
nodes is not currently supported by OpenStack Networking, so they
are not be considered. One copy of the services should be sufficient
to handle the workload. Scalability of the ``ovs-agent`` running on
compute nodes is achieved by adding in more compute nodes as
necessary.

@ -1,261 +0,0 @@
===============================================
Example Architecture — Legacy Networking (nova)
===============================================
This particular example architecture has been upgraded from :term:`Grizzly` to
:term:`Havana` and tested in production environments where many public IP
addresses are available for assignment to multiple instances. You can
find a second example architecture that uses OpenStack Networking
(neutron) after this section. Each example offers high availability,
meaning that if a particular node goes down, another node with the same
configuration can take over the tasks so that the services continue to
be available.
Overview
~~~~~~~~
The simplest architecture you can build upon for Compute has a single
cloud controller and multiple compute nodes. The simplest architecture
for Object Storage has five nodes: one for identifying users and
proxying requests to the API, then four for storage itself to provide
enough replication for eventual consistency. This example architecture
does not dictate a particular number of nodes, but shows the thinking
and considerations that went into choosing this architecture including
the features offered.
Components
~~~~~~~~~~
.. list-table::
:widths: 50 50
:header-rows: 1
* - Component
- Details
* - OpenStack release
- Havana
* - Host operating system
- Ubuntu 12.04 LTS or Red Hat Enterprise Linux 6.5,
including derivatives such as CentOS and Scientific Linux
* - OpenStack package repository
- `Ubuntu Cloud Archive <https://wiki.ubuntu.com/ServerTeam/CloudArchive>`_
or `RDO <http://openstack.redhat.com/Frequently_Asked_Questions>`_
* - Hypervisor
- KVM
* - Database
- MySQL\*
* - Message queue
- RabbitMQ for Ubuntu; Qpid for Red Hat Enterprise Linux and derivatives
* - Networking service
- ``nova-network``
* - Network manager
- FlatDHCP
* - Single ``nova-network`` or multi-host?
- multi-host\*
* - Image service (glance) back end
- file
* - Identity (keystone) driver
- SQL
* - Block Storage (cinder) back end
- LVM/iSCSI
* - Live Migration back end
- Shared storage using NFS\*
* - Object storage
- OpenStack Object Storage (swift)
An asterisk (\*) indicates when the example architecture deviates from
the settings of a default installation. We'll offer explanations for
those deviations next.
.. note::
The following features of OpenStack are supported by the example
architecture documented in this guide, but are optional:
- :term:`Dashboard <Dashboard (horizon)>`: You probably want to offer
a dashboard, but your users may be more interested in API access only.
- :term:`Block storage <Block Storage service (cinder)>`:
You don't have to offer users block storage if their use case only
needs ephemeral storage on compute nodes, for example.
- :term:`Floating IP address <floating IP address>`:
Floating IP addresses are public IP addresses that you allocate
from a predefined pool to assign to virtual machines at launch.
Floating IP address ensure that the public IP address is available
whenever an instance is booted. Not every organization can offer
thousands of public floating IP addresses for thousands of
instances, so this feature is considered optional.
- :term:`Live migration <live migration>`: If you need to move
running virtual machine instances from one host to another with
little or no service interruption, you would enable live migration,
but it is considered optional.
- :term:`Object storage <Object Storage service (swift)>`: You may
choose to store machine images on a file system rather than in
object storage if you do not have the extra hardware for the
required replication and redundancy that OpenStack Object Storage
offers.
Rationale
~~~~~~~~~
This example architecture has been selected based on the current default
feature set of OpenStack Havana, with an emphasis on stability. We
believe that many clouds that currently run OpenStack in production have
made similar choices.
You must first choose the operating system that runs on all of the
physical nodes. While OpenStack is supported on several distributions of
Linux, we used *Ubuntu 12.04 LTS (Long Term Support)*, which is used by
the majority of the development community, has feature completeness
compared with other distributions and has clear future support plans.
We recommend that you do not use the default Ubuntu OpenStack install
packages and instead use the `Ubuntu Cloud
Archive <https://wiki.ubuntu.com/ServerTeam/CloudArchive>`__. The Cloud
Archive is a package repository supported by Canonical that allows you
to upgrade to future OpenStack releases while remaining on Ubuntu 12.04.
*KVM* as a :term:`hypervisor` complements the choice of Ubuntu—being a
matched pair in terms of support, and also because of the significant degree
of attention it garners from the OpenStack development community (including
the authors, who mostly use KVM). It is also feature complete, free from
licensing charges and restrictions.
*MySQL* follows a similar trend. Despite its recent change of ownership,
this database is the most tested for use with OpenStack and is heavily
documented. We deviate from the default database, *SQLite*, because
SQLite is not an appropriate database for production usage.
The choice of *RabbitMQ* over other
:term:`AMQP <Advanced Message Queuing Protocol (AMQP)>` compatible options
that are gaining support in OpenStack, such as ZeroMQ and Qpid, is due to its
ease of use and significant testing in production. It also is the only
option that supports features such as Compute cells. We recommend
clustering with RabbitMQ, as it is an integral component of the system
and fairly simple to implement due to its inbuilt nature.
As discussed in previous chapters, there are several options for
networking in OpenStack Compute. We recommend *FlatDHCP* and to use
*Multi-Host* networking mode for high availability, running one
``nova-network`` daemon per OpenStack compute host. This provides a
robust mechanism for ensuring network interruptions are isolated to
individual compute hosts, and allows for the direct use of hardware
network gateways.
*Live Migration* is supported by way of shared storage, with *NFS* as
the distributed file system.
Acknowledging that many small-scale deployments see running Object
Storage just for the storage of virtual machine images as too costly, we
opted for the file back end in the OpenStack :term:`Image service (Glance)`.
If your cloud will include Object Storage, you can easily add it as a back
end.
We chose the *SQL back end for Identity* over others, such as LDAP. This
back end is simple to install and is robust. The authors acknowledge
that many installations want to bind with existing directory services
and caution careful understanding of the `array of options available
<https://docs.openstack.org/ocata/config-reference/identity/options.html#keystone-ldap>`_.
Block Storage (cinder) is installed natively on external storage nodes
and uses the *LVM/iSCSI plug-in*. Most Block Storage plug-ins are tied
to particular vendor products and implementations limiting their use to
consumers of those hardware platforms, but LVM/iSCSI is robust and
stable on commodity hardware.
While the cloud can be run without the *OpenStack Dashboard*, we
consider it to be indispensable, not just for user interaction with the
cloud, but also as a tool for operators. Additionally, the dashboard's
use of Django makes it a flexible framework for extension.
Why not use OpenStack Networking?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This example architecture does not use OpenStack Networking, because it
does not yet support multi-host networking and our organizations
(university, government) have access to a large range of
publicly-accessible IPv4 addresses.
Why use multi-host networking?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In a default OpenStack deployment, there is a single ``nova-network``
service that runs within the cloud (usually on the cloud controller)
that provides services such as
:term:`Network Address Translation (NAT)`, :term:`DHCP <Dynamic Host
Configuration Protocol (DHCP)>`, and :term:`DNS <Domain Name System (DNS)>`
to the guest instances. If the single node that runs the ``nova-network``
service goes down, you cannot access your instances, and the instances
cannot access the Internet. The single node that runs the ``nova-network``
service can become a bottleneck if excessive network traffic comes in and
goes out of the cloud.
.. tip::
`Multi-host <https://docs.openstack.org/havana/install-guide/install/apt/content/nova-network.html>`_
is a high-availability option for the network configuration, where
the ``nova-network`` service is run on every compute node instead of
running on only a single node.
Detailed Description
--------------------
The reference architecture consists of multiple compute nodes, a cloud
controller, an external NFS storage server for instance storage, and an
OpenStack Block Storage server for volume storage.
A network time service (:term:`Network Time Protocol (NTP)`)
synchronizes time on all the nodes. FlatDHCPManager in
multi-host mode is used for the networking. A logical diagram for this
example architecture shows which services are running on each node:
.. image:: figures/osog_01in01.png
:width: 100%
|
The cloud controller runs the dashboard, the API services, the database
(MySQL), a message queue server (RabbitMQ), the scheduler for choosing
compute resources (``nova-scheduler``), Identity services (keystone,
``nova-consoleauth``), Image services (``glance-api``,
``glance-registry``), services for console access of guests, and Block
Storage services, including the scheduler for storage resources
(``cinder-api`` and ``cinder-scheduler``).
Compute nodes are where the computing resources are held, and in our
example architecture, they run the hypervisor (KVM), libvirt (the driver
for the hypervisor, which enables live migration from node to node),
``nova-compute``, ``nova-api-metadata`` (generally only used when
running in multi-host mode, it retrieves instance-specific metadata),
``nova-vncproxy``, and ``nova-network``.
The network consists of two switches, one for the management or private
traffic, and one that covers public access, including floating IPs. To
support this, the cloud controller and the compute nodes have two
network cards. The OpenStack Block Storage and NFS storage servers only
need to access the private network and therefore only need one network
card, but multiple cards run in a bonded configuration are recommended
if possible. Floating IP access is direct to the Internet, whereas Flat
IP access goes through a NAT. To envision the network traffic, use this
diagram:
.. image:: figures/osog_01in02.png
:width: 100%
|
Optional Extensions
-------------------
You can extend this reference architecture as follows:
- Add additional cloud controllers (see :doc:`ops-maintenance`).
- Add an OpenStack Storage service (see the Object Storage chapter in
the `Installation Tutorials and Guides
<https://docs.openstack.org/project-install-guide/ocata/>`_ for your distribution).
- Add additional OpenStack Block Storage hosts (see
:doc:`ops-maintenance`).

@ -1,12 +0,0 @@
=========================================
Parting Thoughts on Architecture Examples
=========================================
With so many considerations and options available, our hope is to
provide a few clearly-marked and tested paths for your OpenStack
exploration. If you're looking for additional ideas, check out
:doc:`app-usecases`, the
`Installation Tutorials and Guides
<https://docs.openstack.org/project-install-guide/ocata/>`_, or the
`OpenStack User Stories
page <https://www.openstack.org/user-stories/>`_.

@ -1,30 +0,0 @@
=====================
Architecture Examples
=====================
To understand the possibilities that OpenStack offers, it's best to
start with basic architecture that has been tested in production
environments. We offer two examples with basic pivots on the base
operating system (Ubuntu and Red Hat Enterprise Linux) and the
networking architecture. There are other differences between these two
examples and this guide provides reasons for each choice made.
Because OpenStack is highly configurable, with many different back ends
and network configuration options, it is difficult to write
documentation that covers all possible OpenStack deployments. Therefore,
this guide defines examples of architecture to simplify the task of
documenting, as well as to provide the scope for this guide. Both of the
offered architecture examples are currently running in production and
serving users.
.. tip::
As always, refer to the :doc:`common/glossary` if you are unclear
about any of the terminology mentioned in architecture examples.
.. toctree::
:maxdepth: 2
arch-example-nova-network.rst
arch-example-neutron.rst
arch-example-thoughts.rst

@ -1,293 +0,0 @@
==============
Network Design
==============
OpenStack provides a rich networking environment, and this chapter
details the requirements and options to deliberate when designing your
cloud.
.. warning::
If this is the first time you are deploying a cloud infrastructure
in your organization, after reading this section, your first
conversations should be with your networking team. Network usage in
a running cloud is vastly different from traditional network
deployments and has the potential to be disruptive at both a
connectivity and a policy level.
For example, you must plan the number of IP addresses that you need for
both your guest instances as well as management infrastructure.
Additionally, you must research and discuss cloud network connectivity
through proxy servers and firewalls.
In this chapter, we'll give some examples of network implementations to
consider and provide information about some of the network layouts that
OpenStack uses. Finally, we have some brief notes on the networking
services that are essential for stable operation.
Management Network
~~~~~~~~~~~~~~~~~~
A :term:`management network` (a separate network for use by your cloud
operators) typically consists of a separate switch and separate NICs
(network interface cards), and is a recommended option. This segregation
prevents system administration and the monitoring of system access from
being disrupted by traffic generated by guests.
Consider creating other private networks for communication between
internal components of OpenStack, such as the message queue and
OpenStack Compute. Using a virtual local area network (VLAN) works well
for these scenarios because it provides a method for creating multiple
virtual networks on a physical network.
Public Addressing Options
~~~~~~~~~~~~~~~~~~~~~~~~~
There are two main types of IP addresses for guest virtual machines:
fixed IPs and floating IPs. Fixed IPs are assigned to instances on boot,
whereas floating IP addresses can change their association between
instances by action of the user. Both types of IP addresses can be
either public or private, depending on your use case.
Fixed IP addresses are required, whereas it is possible to run OpenStack
without floating IPs. One of the most common use cases for floating IPs
is to provide public IP addresses to a private cloud, where there are a
limited number of IP addresses available. Another is for a public cloud
user to have a "static" IP address that can be reassigned when an
instance is upgraded or moved.
Fixed IP addresses can be private for private clouds, or public for
public clouds. When an instance terminates, its fixed IP is lost. It is
worth noting that newer users of cloud computing may find their
ephemeral nature frustrating.
IP Address Planning
~~~~~~~~~~~~~~~~~~~
An OpenStack installation can potentially have many subnets (ranges of
IP addresses) and different types of services in each. An IP address
plan can assist with a shared understanding of network partition
purposes and scalability. Control services can have public and private
IP addresses, and as noted above, there are a couple of options for an
instance's public addresses.
An IP address plan might be broken down into the following sections:
Subnet router
Packets leaving the subnet go via this address, which could be a
dedicated router or a ``nova-network`` service.
Control services public interfaces
Public access to ``swift-proxy``, ``nova-api``, ``glance-api``, and
horizon come to these addresses, which could be on one side of a
load balancer or pointing at individual machines.
Object Storage cluster internal communications
Traffic among object/account/container servers and between these and
the proxy server's internal interface uses this private network.
Compute and storage communications
If ephemeral or block storage is external to the compute node, this
network is used.
Out-of-band remote management
If a dedicated remote access controller chip is included in servers,
often these are on a separate network.
In-band remote management
Often, an extra (such as 1 GB) interface on compute or storage nodes
is used for system administrators or monitoring tools to access the
host instead of going through the public interface.
Spare space for future growth
Adding more public-facing control services or guest instance IPs
should always be part of your plan.
For example, take a deployment that has both OpenStack Compute and
Object Storage, with private ranges 172.22.42.0/24 and 172.22.87.0/26
available. One way to segregate the space might be as follows:
.. code-block:: none
172.22.42.0/24:
172.22.42.1 - 172.22.42.3 - subnet routers
172.22.42.4 - 172.22.42.20 - spare for networks
172.22.42.21 - 172.22.42.104 - Compute node remote access controllers
(inc spare)
172.22.42.105 - 172.22.42.188 - Compute node management interfaces (inc spare)
172.22.42.189 - 172.22.42.208 - Swift proxy remote access controllers
(inc spare)
172.22.42.209 - 172.22.42.228 - Swift proxy management interfaces (inc spare)
172.22.42.229 - 172.22.42.252 - Swift storage servers remote access controllers
(inc spare)
172.22.42.253 - 172.22.42.254 - spare
172.22.87.0/26:
172.22.87.1 - 172.22.87.3 - subnet routers
172.22.87.4 - 172.22.87.24 - Swift proxy server internal interfaces
(inc spare)
172.22.87.25 - 172.22.87.63 - Swift object server internal interfaces
(inc spare)
A similar approach can be taken with public IP addresses, taking note
that large, flat ranges are preferred for use with guest instance IPs.
Take into account that for some OpenStack networking options, a public
IP address in the range of a guest instance public IP address is
assigned to the ``nova-compute`` host.
Network Topology
~~~~~~~~~~~~~~~~
OpenStack Compute with ``nova-network`` provides predefined network
deployment models, each with its own strengths and weaknesses. The
selection of a network manager changes your network topology, so the
choice should be made carefully. You also have a choice between the
tried-and-true legacy ``nova-network`` settings or the neutron project
for OpenStack Networking. Both offer networking for launched instances
with different implementations and requirements.
For OpenStack Networking with the neutron project, typical
configurations are documented with the idea that any setup you can
configure with real hardware you can re-create with a software-defined
equivalent. Each tenant can contain typical network elements such as
routers, and services such as :term:`DHCP <Dynamic Host Configuration
Protocol (DHCP)>`.
:ref:`table_networking_deployment` describes the networking deployment
options for both legacy ``nova-network`` options and an equivalent
neutron configuration.
.. _table_networking_deployment:
.. list-table:: Networking deployment options
:widths: 10 30 30 30
:header-rows: 1
* - Network deployment model
- Strengths
- Weaknesses
- Neutron equivalent
* - Flat
- Extremely simple topology. No DHCP overhead.
- Requires file injection into the instance to configure network
interfaces.
- Configure a single bridge as the integration bridge (br-int) and
connect it to a physical network interface with the Modular Layer 2
(ML2) plug-in, which uses Open vSwitch by default.
* - FlatDHCP
- Relatively simple to deploy. Standard networking. Works with all guest
operating systems.
- Requires its own DHCP broadcast domain.
- Configure DHCP agents and routing agents. Network Address Translation
(NAT) performed outside of compute nodes, typically on one or more
network nodes.
* - VlanManager
- Each tenant is isolated to its own VLANs.
- More complex to set up. Requires its own DHCP broadcast domain.
Requires many VLANs to be trunked onto a single port. Standard VLAN
number limitation. Switches must support 802.1q VLAN tagging.
- Isolated tenant networks implement some form of isolation of layer 2
traffic between distinct networks. VLAN tagging is key concept, where
traffic is “tagged” with an ordinal identifier for the VLAN. Isolated
network implementations may or may not include additional services like
DHCP, NAT, and routing.
* - FlatDHCP Multi-host with high availability (HA)
- Networking failure is isolated to the VMs running on the affected
hypervisor. DHCP traffic can be isolated within an individual host.
Network traffic is distributed to the compute nodes.
- More complex to set up. Compute nodes typically need IP addresses
accessible by external networks. Options must be carefully configured
for live migration to work with networking services.
- Configure neutron with multiple DHCP and layer-3 agents. Network nodes
are not able to failover to each other, so the controller runs
networking services, such as DHCP. Compute nodes run the ML2 plug-in
with support for agents such as Open vSwitch or Linux Bridge.
Both ``nova-network`` and neutron services provide similar capabilities,
such as VLAN between VMs. You also can provide multiple NICs on VMs with
either service. Further discussion follows.
VLAN Configuration Within OpenStack VMs
---------------------------------------
VLAN configuration can be as simple or as complicated as desired. The
use of VLANs has the benefit of allowing each project its own subnet and
broadcast segregation from other projects. To allow OpenStack to
efficiently use VLANs, you must allocate a VLAN range (one for each
project) and turn each compute node switch port into a trunk
port.
For example, if you estimate that your cloud must support a maximum of
100 projects, pick a free VLAN range that your network infrastructure is
currently not using (such as VLAN 200299). You must configure OpenStack
with this range and also configure your switch ports to allow VLAN
traffic from that range.
Multi-NIC Provisioning
----------------------
OpenStack Networking with ``neutron`` and OpenStack Compute with
``nova-network`` have the ability to assign multiple NICs to instances. For
``nova-network`` this can be done on a per-request basis, with each
additional NIC using up an entire subnet or VLAN, reducing the total
number of supported projects.
Multi-Host and Single-Host Networking
-------------------------------------
The ``nova-network`` service has the ability to operate in a multi-host
or single-host mode. Multi-host is when each compute node runs a copy of
``nova-network`` and the instances on that compute node use the compute
node as a gateway to the Internet. The compute nodes also host the
floating IPs and security groups for instances on that node. Single-host
is when a central server—for example, the cloud controller—runs the
``nova-network`` service. All compute nodes forward traffic from the
instances to the cloud controller. The cloud controller then forwards
traffic to the Internet. The cloud controller hosts the floating IPs and
security groups for all instances on all compute nodes in the
cloud.
There are benefits to both modes. Single-node has the downside of a
single point of failure. If the cloud controller is not available,
instances cannot communicate on the network. This is not true with
multi-host, but multi-host requires that each compute node has a public
IP address to communicate on the Internet. If you are not able to obtain
a significant block of public IP addresses, multi-host might not be an
option.
Services for Networking
~~~~~~~~~~~~~~~~~~~~~~~
OpenStack, like any network application, has a number of standard
considerations to apply, such as NTP and DNS.
NTP
---
Time synchronization is a critical element to ensure continued operation
of OpenStack components. Correct time is necessary to avoid errors in
instance scheduling, replication of objects in the object store, and
even matching log timestamps for debugging.
All servers running OpenStack components should be able to access an
appropriate NTP server. You may decide to set up one locally or use the
public pools available from the `Network Time Protocol
project <http://www.pool.ntp.org/>`_.
DNS
---
OpenStack does not currently provide DNS services, aside from the
dnsmasq daemon, which resides on ``nova-network`` hosts. You could
consider providing a dynamic DNS service to allow instances to update a
DNS entry with new IP addresses. You can also consider making a generic
forward and reverse DNS mapping for instances' IP addresses, such as
vm-203-0-113-123.example.com.
Conclusion
~~~~~~~~~~
Armed with your IP address layout and numbers and knowledge about the
topologies and services you can use, it's now time to prepare the
network for your installation. Be sure to also check out the `OpenStack
Security Guide <https://docs.openstack.org/security-guide/>`_ for tips on securing
your network. We wish you a good relationship with your networking team!

@ -1,251 +0,0 @@
===========================
Provisioning and Deployment
===========================
A critical part of a cloud's scalability is the amount of effort that it
takes to run your cloud. To minimize the operational cost of running
your cloud, set up and use an automated deployment and configuration
infrastructure with a configuration management system, such as :term:`Puppet`
or :term:`Chef`. Combined, these systems greatly reduce manual effort and the
chance for operator error.
This infrastructure includes systems to automatically install the
operating system's initial configuration and later coordinate the
configuration of all services automatically and centrally, which reduces
both manual effort and the chance for error. Examples include Ansible,
CFEngine, Chef, Puppet, and Salt. You can even use OpenStack to deploy
OpenStack, named TripleO (OpenStack On OpenStack).
Automated Deployment
~~~~~~~~~~~~~~~~~~~~
An automated deployment system installs and configures operating systems
on new servers, without intervention, after the absolute minimum amount
of manual work, including physical racking, MAC-to-IP assignment, and
power configuration. Typically, solutions rely on wrappers around PXE
boot and TFTP servers for the basic operating system install and then
hand off to an automated configuration management system.
Both Ubuntu and Red Hat Enterprise Linux include mechanisms for
configuring the operating system, including preseed and kickstart, that
you can use after a network boot. Typically, these are used to bootstrap
an automated configuration system. Alternatively, you can use an
image-based approach for deploying the operating system, such as
systemimager. You can use both approaches with a virtualized
infrastructure, such as when you run VMs to separate your control
services and physical infrastructure.
When you create a deployment plan, focus on a few vital areas because
they are very hard to modify post deployment. The next two sections talk
about configurations for:
- Disk partitioning and disk array setup for scalability
- Networking configuration just for PXE booting
Disk Partitioning and RAID
--------------------------
At the very base of any operating system are the hard drives on which
the operating system (OS) is installed.
You must complete the following configurations on the server's hard
drives:
- Partitioning, which provides greater flexibility for layout of
operating system and swap space, as described below.
- Adding to a RAID array (RAID stands for redundant array of
independent disks), based on the number of disks you have available,
so that you can add capacity as your cloud grows. Some options are
described in more detail below.
The simplest option to get started is to use one hard drive with two
partitions:
- File system to store files and directories, where all the data lives,
including the root partition that starts and runs the system.
- Swap space to free up memory for processes, as an independent area of
the physical disk used only for swapping and nothing else.
RAID is not used in this simplistic one-drive setup because generally
for production clouds, you want to ensure that if one disk fails,
another can take its place. Instead, for production, use more than one
disk. The number of disks determine what types of RAID arrays to build.
We recommend that you choose one of the following multiple disk options:
Option 1
Partition all drives in the same way in a horizontal fashion, as
shown in :ref:`partition_setup`.
With this option, you can assign different partitions to different
RAID arrays. You can allocate partition 1 of disk one and two to the
``/boot`` partition mirror. You can make partition 2 of all disks
the root partition mirror. You can use partition 3 of all disks for
a ``cinder-volumes`` LVM partition running on a RAID 10 array.
.. _partition_setup:
.. figure:: figures/osog_0201.png
Figure. Partition setup of drives
While you might end up with unused partitions, such as partition 1
in disk three and four of this example, this option allows for
maximum utilization of disk space. I/O performance might be an issue
as a result of all disks being used for all tasks.
Option 2
Add all raw disks to one large RAID array, either hardware or
software based. You can partition this large array with the boot,
root, swap, and LVM areas. This option is simple to implement and
uses all partitions. However, disk I/O might suffer.
Option 3
Dedicate entire disks to certain partitions. For example, you could
allocate disk one and two entirely to the boot, root, and swap
partitions under a RAID 1 mirror. Then, allocate disk three and four
entirely to the LVM partition, also under a RAID 1 mirror. Disk I/O
should be better because I/O is focused on dedicated tasks. However,
the LVM partition is much smaller.
.. tip::
You may find that you can automate the partitioning itself. For
example, MIT uses `Fully Automatic Installation
(FAI) <http://fai-project.org/>`_ to do the initial PXE-based
partition and then install using a combination of min/max and
percentage-based partitioning.
As with most architecture choices, the right answer depends on your
environment. If you are using existing hardware, you know the disk
density of your servers and can determine some decisions based on the
options above. If you are going through a procurement process, your
user's requirements also help you determine hardware purchases. Here are
some examples from a private cloud providing web developers custom
environments at AT&T. This example is from a specific deployment, so
your existing hardware or procurement opportunity may vary from this.
AT&T uses three types of hardware in its deployment:
- Hardware for controller nodes, used for all stateless OpenStack API
services. About 3264 GB memory, small attached disk, one processor,
varied number of cores, such as 612.
- Hardware for compute nodes. Typically 256 or 144 GB memory, two
processors, 24 cores. 46 TB direct attached storage, typically in a
RAID 5 configuration.
- Hardware for storage nodes. Typically for these, the disk space is
optimized for the lowest cost per GB of storage while maintaining
rack-space efficiency.
Again, the right answer depends on your environment. You have to make
your decision based on the trade-offs between space utilization,
simplicity, and I/O performance.
Network Configuration
---------------------
Network configuration is a very large topic that spans multiple areas of
this book. For now, make sure that your servers can PXE boot and
successfully communicate with the deployment server.
For example, you usually cannot configure NICs for VLANs when PXE
booting. Additionally, you usually cannot PXE boot with bonded NICs. If
you run into this scenario, consider using a simple 1 GB switch in a
private network on which only your cloud communicates.
Automated Configuration
~~~~~~~~~~~~~~~~~~~~~~~
The purpose of automatic configuration management is to establish and
maintain the consistency of a system without using human intervention.
You want to maintain consistency in your deployments so that you can
have the same cloud every time, repeatably. Proper use of automatic
configuration-management tools ensures that components of the cloud
systems are in particular states, in addition to simplifying deployment,
and configuration change propagation.
These tools also make it possible to test and roll back changes, as they
are fully repeatable. Conveniently, a large body of work has been done
by the OpenStack community in this space. Puppet, a configuration
management tool, even provides official modules for OpenStack projects
in an OpenStack infrastructure system known as `Puppet
OpenStack <https://wiki.openstack.org/wiki/Puppet>`_. Chef
configuration management is provided within `openstack/openstack-chef-repo
<https://git.openstack.org/cgit/openstack/openstack-chef-repo>`_. Additional
configuration management systems include Juju, Ansible, and Salt. Also,
PackStack is a command-line utility for Red Hat Enterprise Linux and
derivatives that uses Puppet modules to support rapid deployment of
OpenStack on existing servers over an SSH connection.
An integral part of a configuration-management system is the item that
it controls. You should carefully consider all of the items that you
want, or do not want, to be automatically managed. For example, you may
not want to automatically format hard drives with user data.
Remote Management
~~~~~~~~~~~~~~~~~
In our experience, most operators don't sit right next to the servers
running the cloud, and many don't necessarily enjoy visiting the data
center. OpenStack should be entirely remotely configurable, but
sometimes not everything goes according to plan.
In this instance, having an out-of-band access into nodes running
OpenStack components is a boon. The IPMI protocol is the de facto
standard here, and acquiring hardware that supports it is highly
recommended to achieve that lights-out data center aim.
In addition, consider remote power control as well. While IPMI usually
controls the server's power state, having remote access to the PDU that
the server is plugged into can really be useful for situations when
everything seems wedged.
Parting Thoughts for Provisioning and Deploying OpenStack
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can save time by understanding the use cases for the cloud you want
to create. Use cases for OpenStack are varied. Some include object
storage only; others require preconfigured compute resources to speed
development-environment set up; and others need fast provisioning of
compute resources that are already secured per tenant with private
networks. Your users may have need for highly redundant servers to make
sure their legacy applications continue to run. Perhaps a goal would be
to architect these legacy applications so that they run on multiple
instances in a cloudy, fault-tolerant way, but not make it a goal to add
to those clusters over time. Your users may indicate that they need
scaling considerations because of heavy Windows server use.
You can save resources by looking at the best fit for the hardware you
have in place already. You might have some high-density storage hardware
available. You could format and repurpose those servers for OpenStack
Object Storage. All of these considerations and input from users help
you build your use case and your deployment plan.
.. tip::
For further research about OpenStack deployment, investigate the
supported and documented preconfigured, prepackaged installers for
OpenStack from companies such as
`Canonical <http://www.ubuntu.com/cloud/openstack>`_,
`Cisco <http://www.cisco.com/web/solutions/openstack/index.html>`_,
`Cloudscaling <http://www.cloudscaling.com/>`_,
`IBM <http://www-03.ibm.com/software/products/en/ibm-cloud-orchestrator>`_,
`Metacloud <http://www.metacloud.com/>`_,
`Mirantis <https://www.mirantis.com/>`_,
`Rackspace <http://www.rackspace.com/cloud/private>`_,
`Red Hat <http://www.redhat.com/openstack/>`_,
`SUSE <https://www.suse.com/products/suse-openstack-cloud/>`_,
and `SwiftStack <https://www.swiftstack.com/>`_.
Conclusion
~~~~~~~~~~
The decisions you make with respect to provisioning and deployment will
affect your day-to-day, week-to-week, and month-to-month maintenance of
the cloud. Your configuration management will be able to evolve over
time. However, more thought and design need to be done for upfront
choices about deployment, disk partitioning, and network configuration.

@ -1,430 +0,0 @@
=======
Scaling
=======
Whereas traditional applications required larger hardware to scale
("vertical scaling"), cloud-based applications typically request more,
discrete hardware ("horizontal scaling"). If your cloud is successful,
eventually you must add resources to meet the increasing demand.
To suit the cloud paradigm, OpenStack itself is designed to be
horizontally scalable. Rather than switching to larger servers, you
procure more servers and simply install identically configured services.
Ideally, you scale out and load balance among groups of functionally
identical services (for example, compute nodes or ``nova-api`` nodes),
that communicate on a message bus.
The Starting Point
~~~~~~~~~~~~~~~~~~
Determining the scalability of your cloud and how to improve it is an
exercise with many variables to balance. No one solution meets
everyone's scalability goals. However, it is helpful to track a number
of metrics. Since you can define virtual hardware templates, called
"flavors" in OpenStack, you can start to make scaling decisions based on
the flavors you'll provide. These templates define sizes for memory in
RAM, root disk size, amount of ephemeral data disk space available, and
number of cores for starters.
The default OpenStack flavors are shown in :ref:`table_default_flavors`.
.. _table_default_flavors:
.. list-table:: Table. OpenStack default flavors
:widths: 20 20 20 20 20
:header-rows: 1
* - Name
- Virtual cores
- Memory
- Disk
- Ephemeral
* - m1.tiny
- 1
- 512 MB
- 1 GB
- 0 GB
* - m1.small
- 1
- 2 GB
- 10 GB
- 20 GB
* - m1.medium
- 2
- 4 GB
- 10 GB
- 40 GB
* - m1.large
- 4
- 8 GB
- 10 GB
- 80 GB
* - m1.xlarge
- 8
- 16 GB
- 10 GB
- 160 GB
The starting point for most is the core count of your cloud. By applying
some ratios, you can gather information about:
- The number of virtual machines (VMs) you expect to run,
``((overcommit fraction × cores) / virtual cores per instance)``
- How much storage is required ``(flavor disk size × number of instances)``
You can use these ratios to determine how much additional infrastructure
you need to support your cloud.
Here is an example using the ratios for gathering scalability
information for the number of VMs expected as well as the storage
needed. The following numbers support (200 / 2) × 16 = 1600 VM instances
and require 80 TB of storage for ``/var/lib/nova/instances``:
- 200 physical cores.
- Most instances are size m1.medium (two virtual cores, 50 GB of
storage).
- Default CPU overcommit ratio (``cpu_allocation_ratio`` in nova.conf)
of 16:1.
.. note::
Regardless of the overcommit ratio, an instance can not be placed
on any physical node with fewer raw (pre-overcommit) resources than
instance flavor requires.
However, you need more than the core count alone to estimate the load
that the API services, database servers, and queue servers are likely to
encounter. You must also consider the usage patterns of your cloud.
As a specific example, compare a cloud that supports a managed
web-hosting platform with one running integration tests for a
development project that creates one VM per code commit. In the former,
the heavy work of creating a VM happens only every few months, whereas
the latter puts constant heavy load on the cloud controller. You must
consider your average VM lifetime, as a larger number generally means
less load on the cloud controller.
Aside from the creation and termination of VMs, you must consider the
impact of users accessing the service—particularly on ``nova-api`` and
its associated database. Listing instances garners a great deal of
information and, given the frequency with which users run this
operation, a cloud with a large number of users can increase the load
significantly. This can occur even without their knowledge—leaving the
OpenStack dashboard instances tab open in the browser refreshes the list
of VMs every 30 seconds.
After you consider these factors, you can determine how many cloud
controller cores you require. A typical eight core, 8 GB of RAM server
is sufficient for up to a rack of compute nodes — given the above
caveats.
You must also consider key hardware specifications for the performance
of user VMs, as well as budget and performance needs, including storage
performance (spindles/core), memory availability (RAM/core), network
bandwidth (Gbps/core), and overall CPU performance (CPU/core).
.. tip::
For a discussion of metric tracking, including how to extract
metrics from your cloud, see :doc:`ops-logging-monitoring`.
Adding Cloud Controller Nodes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can facilitate the horizontal expansion of your cloud by adding
nodes. Adding compute nodes is straightforward—they are easily picked up
by the existing installation. However, you must consider some important
points when you design your cluster to be highly available.
Recall that a cloud controller node runs several different services. You
can install services that communicate only using the message queue
internally—\ ``nova-scheduler`` and ``nova-console``—on a new server for
expansion. However, other integral parts require more care.
You should load balance user-facing services such as dashboard,
``nova-api``, or the Object Storage proxy. Use any standard HTTP
load-balancing method (DNS round robin, hardware load balancer, or
software such as Pound or HAProxy). One caveat with dashboard is the VNC
proxy, which uses the WebSocket protocol—something that an L7 load
balancer might struggle with. See also `Horizon session storage
<https://docs.openstack.org/developer/horizon/topics/deployment.html#session-storage>`_.
You can configure some services, such as ``nova-api`` and
``glance-api``, to use multiple processes by changing a flag in their
configuration file—allowing them to share work between multiple cores on
the one machine.
.. tip::
Several options are available for MySQL load balancing, and the
supported AMQP brokers have built-in clustering support. Information
on how to configure these and many of the other services can be
found in :doc:`operations`.
Segregating Your Cloud
~~~~~~~~~~~~~~~~~~~~~~
When you want to offer users different regions to provide legal
considerations for data storage, redundancy across earthquake fault
lines, or for low-latency API calls, you segregate your cloud. Use one
of the following OpenStack methods to segregate your cloud: *cells*,
*regions*, *availability zones*, or *host aggregates*.
Each method provides different functionality and can be best divided
into two groups:
- Cells and regions, which segregate an entire cloud and result in
running separate Compute deployments.
- :term:`Availability zones <availability zone>` and host aggregates,
which merely divide a single Compute deployment.
:ref:`table_segregation_methods` provides a comparison view of each
segregation method currently provided by OpenStack Compute.
.. _table_segregation_methods:
.. list-table:: Table. OpenStack segregation methods
:widths: 20 20 20 20 20
:header-rows: 1
* -
- Cells
- Regions
- Availability zones
- Host aggregates
* - **Use when you need**
- A single :term:`API endpoint` for compute, or you require a second
level of scheduling.
- Discrete regions with separate API endpoints and no coordination
between regions.
- Logical separation within your nova deployment for physical isolation
or redundancy.
- To schedule a group of hosts with common features.
* - **Example**
- A cloud with multiple sites where you can schedule VMs "anywhere" or on
a particular site.
- A cloud with multiple sites, where you schedule VMs to a particular
site and you want a shared infrastructure.
- A single-site cloud with equipment fed by separate power supplies.
- Scheduling to hosts with trusted hardware support.
* - **Overhead**
- Considered experimental. A new service, nova-cells. Each cell has a full
nova installation except nova-api.
- A different API endpoint for every region. Each region has a full nova
installation.
- Configuration changes to ``nova.conf``.
- Configuration changes to ``nova.conf``.
* - **Shared services**
- Keystone, ``nova-api``
- Keystone
- Keystone, All nova services
- Keystone, All nova services
Cells and Regions
-----------------
OpenStack Compute cells are designed to allow running the cloud in a
distributed fashion without having to use more complicated technologies,
or be invasive to existing nova installations. Hosts in a cloud are
partitioned into groups called *cells*. Cells are configured in a tree.
The top-level cell ("API cell") has a host that runs the ``nova-api``
service, but no ``nova-compute`` services. Each child cell runs all of
the other typical ``nova-*`` services found in a regular installation,
except for the ``nova-api`` service. Each cell has its own message queue
and database service and also runs ``nova-cells``, which manages the
communication between the API cell and child cells.
This allows for a single API server being used to control access to
multiple cloud installations. Introducing a second level of scheduling
(the cell selection), in addition to the regular ``nova-scheduler``
selection of hosts, provides greater flexibility to control where
virtual machines are run.
Unlike having a single API endpoint, regions have a separate API
endpoint per installation, allowing for a more discrete separation.
Users wanting to run instances across sites have to explicitly select a
region. However, the additional complexity of a running a new service is
not required.
The OpenStack dashboard (horizon) can be configured to use multiple
regions. This can be configured through the ``AVAILABLE_REGIONS``
parameter.
Availability Zones and Host Aggregates
--------------------------------------
You can use availability zones, host aggregates, or both to partition a
nova deployment.
Availability zones are implemented through and configured in a similar
way to host aggregates.
However, you use them for different reasons.
Availability zone
^^^^^^^^^^^^^^^^^
This enables you to arrange OpenStack compute hosts into logical groups
and provides a form of physical isolation and redundancy from other
availability zones, such as by using a separate power supply or network
equipment.
You define the availability zone in which a specified compute host
resides locally on each server. An availability zone is commonly used to
identify a set of servers that have a common attribute. For instance, if
some of the racks in your data center are on a separate power source,
you can put servers in those racks in their own availability zone.
Availability zones can also help separate different classes of hardware.
When users provision resources, they can specify from which availability
zone they want their instance to be built. This allows cloud consumers
to ensure that their application resources are spread across disparate
machines to achieve high availability in the event of hardware failure.
Host aggregates zone
^^^^^^^^^^^^^^^^^^^^
This enables you to partition OpenStack Compute deployments into logical
groups for load balancing and instance distribution. You can use host
aggregates to further partition an availability zone. For example, you
might use host aggregates to partition an availability zone into groups
of hosts that either share common resources, such as storage and
network, or have a special property, such as trusted computing
hardware.
A common use of host aggregates is to provide information for use with
the ``nova-scheduler``. For example, you might use a host aggregate to
group a set of hosts that share specific flavors or images.
The general case for this is setting key-value pairs in the aggregate
metadata and matching key-value pairs in flavor's ``extra_specs``
metadata. The ``AggregateInstanceExtraSpecsFilter`` in the filter
scheduler will enforce that instances be scheduled only on hosts in
aggregates that define the same key to the same value.
An advanced use of this general concept allows different flavor types to
run with different CPU and RAM allocation ratios so that high-intensity
computing loads and low-intensity development and testing systems can
share the same cloud without either starving the high-use systems or
wasting resources on low-utilization systems. This works by setting
``metadata`` in your host aggregates and matching ``extra_specs`` in
your flavor types.
The first step is setting the aggregate metadata keys
``cpu_allocation_ratio`` and ``ram_allocation_ratio`` to a
floating-point value. The filter schedulers ``AggregateCoreFilter`` and
``AggregateRamFilter`` will use those values rather than the global
defaults in ``nova.conf`` when scheduling to hosts in the aggregate. It
is important to be cautious when using this feature, since each host can
be in multiple aggregates but should have only one allocation ratio for
each resources. It is up to you to avoid putting a host in multiple
aggregates that define different values for the same resource.
This is the first half of the equation. To get flavor types that are
guaranteed a particular ratio, you must set the ``extra_specs`` in the
flavor type to the key-value pair you want to match in the aggregate.
For example, if you define ``extra_specs`` ``cpu_allocation_ratio`` to
"1.0", then instances of that type will run in aggregates only where the
metadata key ``cpu_allocation_ratio`` is also defined as "1.0." In
practice, it is better to define an additional key-value pair in the
aggregate metadata to match on rather than match directly on
``cpu_allocation_ratio`` or ``core_allocation_ratio``. This allows
better abstraction. For example, by defining a key ``overcommit`` and
setting a value of "high," "medium," or "low," you could then tune the
numeric allocation ratios in the aggregates without also needing to
change all flavor types relating to them.
.. note::
Previously, all services had an availability zone. Currently, only
the ``nova-compute`` service has its own availability zone. Services
such as ``nova-scheduler`` and ``nova-conductor`` span all
availability zones.
When you run any of the following operations, the services appear in
their own internal availability zone
(CONF.internal_service_availability_zone):
- :command:`openstack host list` (os-hosts)
- :command:`euca-describe-availability-zones verbose`
- :command:`openstack compute service list`
The internal availability zone is hidden in
euca-describe-availability_zones (nonverbose).
CONF.node_availability_zone has been renamed to
CONF.default_availability_zone and is used only by the
``nova-api`` and ``nova-scheduler`` services.
CONF.node_availability_zone still works but is deprecated.
Scalable Hardware
~~~~~~~~~~~~~~~~~
While several resources already exist to help with deploying and
installing OpenStack, it's very important to make sure that you have
your deployment planned out ahead of time. This guide presumes that you
have at least set aside a rack for the OpenStack cloud but also offers
suggestions for when and what to scale.
Hardware Procurement
--------------------
“The Cloud” has been described as a volatile environment where servers
can be created and terminated at will. While this may be true, it does
not mean that your servers must be volatile. Ensuring that your cloud's
hardware is stable and configured correctly means that your cloud
environment remains up and running. Basically, put effort into creating
a stable hardware environment so that you can host a cloud that users
may treat as unstable and volatile.
OpenStack can be deployed on any hardware supported by an
OpenStack-compatible Linux distribution.
Hardware does not have to be consistent, but it should at least have the
same type of CPU to support instance migration.
The typical hardware recommended for use with OpenStack is the standard
value-for-money offerings that most hardware vendors stock. It should be
straightforward to divide your procurement into building blocks such as
"compute," "object storage," and "cloud controller," and request as many
of these as you need. Alternatively, should you be unable to spend more,
if you have existing servers—provided they meet your performance
requirements and virtualization technology—they are quite likely to be
able to support OpenStack.
Capacity Planning
-----------------
OpenStack is designed to increase in size in a straightforward manner.
Taking into account the considerations that we've mentioned in this
chapter—particularly on the sizing of the cloud controller—it should be
possible to procure additional compute or object storage nodes as
needed. New nodes do not need to be the same specification, or even
vendor, as existing nodes.
For compute nodes, ``nova-scheduler`` will take care of differences in
sizing having to do with core count and RAM amounts; however, you should
consider that the user experience changes with differing CPU speeds.
When adding object storage nodes, a :term:`weight` should be specified
that reflects the :term:`capability` of the node.
Monitoring the resource usage and user growth will enable you to know
when to procure. :doc:`ops-logging-monitoring` details some useful metrics.
Burn-in Testing
---------------
The chances of failure for the server's hardware are high at the start
and the end of its life. As a result, dealing with hardware failures
while in production can be avoided by appropriate burn-in testing to
attempt to trigger the early-stage failures. The general principle is to
stress the hardware to its limits. Examples of burn-in tests include
running a CPU or disk benchmark for several days.

@ -1,498 +0,0 @@
=================
Storage Decisions
=================
Storage is found in many parts of the OpenStack stack, and the differing
types can cause confusion to even experienced cloud engineers. This
section focuses on persistent storage options you can configure with
your cloud. It's important to understand the distinction between
:term:`ephemeral <ephemeral volume>` storage and
:term:`persistent <persistent volume>` storage.
Ephemeral Storage
~~~~~~~~~~~~~~~~~
If you deploy only the OpenStack :term:`Compute service (nova)`,
your users do not have access to any form of persistent storage by default.
The disks associated with VMs are "ephemeral," meaning that (from the user's
point of view) they effectively disappear when a virtual machine is
terminated.
Persistent Storage
~~~~~~~~~~~~~~~~~~
Persistent storage means that the storage resource outlives any other
resource and is always available, regardless of the state of a running
instance.
Today, OpenStack clouds explicitly support three types of persistent
storage: *object storage*, *block storage*, and *file system storage*.
Object Storage
--------------
With object storage, users access binary objects through a REST API. You
may be familiar with Amazon S3, which is a well-known example of an
object storage system. Object storage is implemented in OpenStack by the
OpenStack Object Storage (swift) project. If your intended users need to
archive or manage large datasets, you want to provide them with object
storage. In addition, OpenStack can store your virtual machine (VM)
images inside of an object storage system, as an alternative to storing
the images on a file system.
OpenStack Object Storage provides a highly scalable, highly available
storage solution by relaxing some of the constraints of traditional file
systems. In designing and procuring for such a cluster, it is important
to understand some key concepts about its operation. Essentially, this
type of storage is built on the idea that all storage hardware fails, at
every level, at some point. Infrequently encountered failures that would
hamstring other storage systems, such as issues taking down RAID cards
or entire servers, are handled gracefully with OpenStack Object
Storage.
A good document describing the Object Storage architecture is found
within the `developer
documentation <https://docs.openstack.org/developer/swift/overview_architecture.html>`_
— read this first. Once you understand the architecture, you should know what a
proxy server does and how zones work. However, some important points are
often missed at first glance.
When designing your cluster, you must consider durability and
availability. Understand that the predominant source of these is the
spread and placement of your data, rather than the reliability of the
hardware. Consider the default value of the number of replicas, which is
three. This means that before an object is marked as having been
written, at least two copies exist—in case a single server fails to
write, the third copy may or may not yet exist when the write operation
initially returns. Altering this number increases the robustness of your
data, but reduces the amount of storage you have available. Next, look
at the placement of your servers. Consider spreading them widely
throughout your data center's network and power-failure zones. Is a zone
a rack, a server, or a disk?
Object Storage's network patterns might seem unfamiliar at first.
Consider these main traffic flows:
* Among :term:`object`, :term:`container`, and
:term:`account servers <account server>`
* Between those servers and the proxies
* Between the proxies and your users
Object Storage is very "chatty" among servers hosting data—even a small
cluster does megabytes/second of traffic, which is predominantly, “Do
you have the object?”/“Yes I have the object!” Of course, if the answer
to the aforementioned question is negative or the request times out,
replication of the object begins.
Consider the scenario where an entire server fails and 24 TB of data
needs to be transferred "immediately" to remain at three copies—this can
put significant load on the network.
Another fact that's often forgotten is that when a new file is being
uploaded, the proxy server must write out as many streams as there are
replicas—giving a multiple of network traffic. For a three-replica
cluster, 10 Gbps in means 30 Gbps out. Combining this with the previous
high bandwidth demands of replication is what results in the
recommendation that your private network be of significantly higher
bandwidth than your public need be. Oh, and OpenStack Object Storage
communicates internally with unencrypted, unauthenticated rsync for
performance—you do want the private network to be private.
The remaining point on bandwidth is the public-facing portion. The
``swift-proxy`` service is stateless, which means that you can easily
add more and use HTTP load-balancing methods to share bandwidth and
availability between them.
More proxies means more bandwidth, if your storage can keep up.
Block Storage
-------------
Block storage (sometimes referred to as volume storage) provides users
with access to block-storage devices. Users interact with block storage
by attaching volumes to their running VM instances.
These volumes are persistent: they can be detached from one instance and
re-attached to another, and the data remains intact. Block storage is
implemented in OpenStack by the OpenStack Block Storage (cinder)
project, which supports multiple back ends in the form of drivers. Your
choice of a storage back end must be supported by a Block Storage
driver.
Most block storage drivers allow the instance to have direct access to
the underlying storage hardware's block device. This helps increase the
overall read/write IO. However, support for utilizing files as volumes
is also well established, with full support for NFS and other protocols.
These drivers work a little differently than a traditional "block"
storage driver. On an NFS file system, a single file is
created and then mapped as a "virtual" volume into the instance. This
mapping/translation is similar to how OpenStack utilizes QEMU's
file-based virtual machines stored in ``/var/lib/nova/instances``.
Shared File Systems Service
---------------------------
The Shared File Systems service provides a set of services for
management of Shared File Systems in a multi-tenant cloud environment.
Users interact with Shared File Systems service by mounting remote File
Systems on their instances with the following usage of those systems for
file storing and exchange. Shared File Systems service provides you with
shares. A share is a remote, mountable file system. You can mount a
share to and access a share from several hosts by several users at a
time. With shares, user can also:
* Create a share specifying its size, shared file system protocol,
visibility level
* Create a share on either a share server or standalone, depending on
the selected back-end mode, with or without using a share network.
* Specify access rules and security services for existing shares.
* Combine several shares in groups to keep data consistency inside the
groups for the following safe group operations.
* Create a snapshot of a selected share or a share group for storing
the existing shares consistently or creating new shares from that
snapshot in a consistent way
* Create a share from a snapshot.
* Set rate limits and quotas for specific shares and snapshots
* View usage of share resources
* Remove shares.
Like Block Storage, the Shared File Systems service is persistent. It
can be:
* Mounted to any number of client machines.
* Detached from one instance and attached to another without data loss.
During this process the data are safe unless the Shared File Systems
service itself is changed or removed.
Shares are provided by the Shared File Systems service. In OpenStack,
Shared File Systems service is implemented by Shared File System
(manila) project, which supports multiple back-ends in the form of
drivers. The Shared File Systems service can be configured to provision
shares from one or more back-ends. Share servers are, mostly, virtual
machines that export file shares via different protocols such as NFS,
CIFS, GlusterFS, or HDFS.
OpenStack Storage Concepts
~~~~~~~~~~~~~~~~~~~~~~~~~~
:ref:`table_openstack_storage` explains the different storage concepts
provided by OpenStack.
.. _table_openstack_storage:
.. list-table:: Table. OpenStack storage
:widths: 20 20 20 20 20
:header-rows: 1
* -
- Ephemeral storage
- Block storage
- Object storage
- Shared File System storage
* - Used to…
- Run operating system and scratch space
- Add additional persistent storage to a virtual machine (VM)
- Store data, including VM images
- Add additional persistent storage to a virtual machine
* - Accessed through…
- A file system
- A block device that can be partitioned, formatted, and mounted
(such as, /dev/vdc)
- The REST API
- A Shared File Systems service share (either manila managed or an
external one registered in manila) that can be partitioned, formatted
and mounted (such as /dev/vdc)
* - Accessible from…
- Within a VM
- Within a VM
- Anywhere
- Within a VM
* - Managed by…
- OpenStack Compute (nova)
- OpenStack Block Storage (cinder)
- OpenStack Object Storage (swift)
- OpenStack Shared File System Storage (manila)
* - Persists until…
- VM is terminated
- Deleted by user
- Deleted by user
- Deleted by user
* - Sizing determined by…
- Administrator configuration of size settings, known as *flavors*
- User specification in initial request
- Amount of available physical storage
- * User specification in initial request
* Requests for extension
* Available user-level quotes
* Limitations applied by Administrator
* - Encryption set by…
- Parameter in nova.conf
- Admin establishing `encrypted volume type
<https://docs.openstack.org/admin-guide/dashboard-manage-volumes.html>`_,
then user selecting encrypted volume
- Not yet available
- Shared File Systems service does not apply any additional encryption
above what the shares back-end storage provides
* - Example of typical usage…
- 10 GB first disk, 30 GB second disk
- 1 TB disk
- 10s of TBs of dataset storage
- Depends completely on the size of back-end storage specified when
a share was being created. In case of thin provisioning it can be
partial space reservation (for more details see
`Capabilities and Extra-Specs
<https://docs.openstack.org/developer/manila/devref/capabilities_and_extra_specs.html?highlight=extra%20specs#common-capabilities>`_
specification)
.. note::
**File-level Storage (for Live Migration)**
With file-level storage, users access stored data using the operating
system's file system interface. Most users, if they have used a network
storage solution before, have encountered this form of networked
storage. In the Unix world, the most common form of this is NFS. In the
Windows world, the most common form is called CIFS (previously, SMB).
OpenStack clouds do not present file-level storage to end users.
However, it is important to consider file-level storage for storing
instances under ``/var/lib/nova/instances`` when designing your cloud,
since you must have a shared file system if you want to support live
migration.
Choosing Storage Back Ends
~~~~~~~~~~~~~~~~~~~~~~~~~~
Users will indicate different needs for their cloud use cases. Some may
need fast access to many objects that do not change often, or want to
set a time-to-live (TTL) value on a file. Others may access only storage
that is mounted with the file system itself, but want it to be
replicated instantly when starting a new instance. For other systems,
ephemeral storage—storage that is released when a VM attached to it is
shut down— is the preferred way. When you select
:term:`storage back ends <storage back end>`,
ask the following questions on behalf of your users:
* Do my users need block storage?
* Do my users need object storage?
* Do I need to support live migration?
* Should my persistent storage drives be contained in my compute nodes,
or should I use external storage?
* What is the platter count I can achieve? Do more spindles result in
better I/O despite network access?
* Which one results in the best cost-performance scenario I'm aiming for?
* How do I manage the storage operationally?
* How redundant and distributed is the storage? What happens if a
storage node fails? To what extent can it mitigate my data-loss
disaster scenarios?
To deploy your storage by using only commodity hardware, you can use a number
of open-source packages, as shown in :ref:`table_persistent_file_storage`.
.. _table_persistent_file_storage:
.. list-table:: Table. Persistent file-based storage support
:widths: 25 25 25 25
:header-rows: 1
* -
- Object
- Block
- File-level
* - Swift
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
-
-
* - LVM
-
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
-
* - Ceph
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
- Experimental
* - Gluster
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
-
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
* - NFS
-
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
* - ZFS
-
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
-
* - Sheepdog
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
-
This list of open source file-level shared storage solutions is not
exhaustive; other open source solutions exist (MooseFS). Your
organization may already have deployed a file-level shared storage
solution that you can use.
.. note::
**Storage Driver Support**
In addition to the open source technologies, there are a number of
proprietary solutions that are officially supported by OpenStack Block
Storage. The full list of options can be found in the
`Available Drivers <https://docs.openstack.org/developer/cinder/drivers.html>`_
list.
You can find a matrix of the functionality provided by all of the
supported Block Storage drivers on the `OpenStack
wiki <https://wiki.openstack.org/wiki/CinderSupportMatrix>`_.
Also, you need to decide whether you want to support object storage in
your cloud. The two common use cases for providing object storage in a
compute cloud are:
* To provide users with a persistent storage mechanism
* As a scalable, reliable data store for virtual machine images
Commodity Storage Back-end Technologies
---------------------------------------
This section provides a high-level overview of the differences among the
different commodity storage back end technologies. Depending on your
cloud user's needs, you can implement one or many of these technologies
in different combinations:
OpenStack Object Storage (swift)
The official OpenStack Object Store implementation. It is a mature
technology that has been used for several years in production by
Rackspace as the technology behind Rackspace Cloud Files. As it is
highly scalable, it is well-suited to managing petabytes of storage.
OpenStack Object Storage's advantages are better integration with
OpenStack (integrates with OpenStack Identity, works with the
OpenStack dashboard interface) and better support for multiple data
center deployment through support of asynchronous eventual
consistency replication.
Therefore, if you eventually plan on distributing your storage
cluster across multiple data centers, if you need unified accounts
for your users for both compute and object storage, or if you want
to control your object storage with the OpenStack dashboard, you
should consider OpenStack Object Storage. More detail can be found
about OpenStack Object Storage in the section below.
Ceph
A scalable storage solution that replicates data across commodity
storage nodes. Ceph was originally developed by one of the founders
of DreamHost and is currently used in production there.
Ceph was designed to expose different types of storage interfaces to
the end user: it supports object storage, block storage, and
file-system interfaces, although the file-system interface is not
yet considered production-ready. Ceph supports the same API as swift
for object storage and can be used as a back end for cinder block
storage as well as back-end storage for glance images. Ceph supports
"thin provisioning," implemented using copy-on-write.
This can be useful when booting from volume because a new volume can
be provisioned very quickly. Ceph also supports keystone-based
authentication (as of version 0.56), so it can be a seamless swap in
for the default OpenStack swift implementation.
Ceph's advantages are that it gives the administrator more
fine-grained control over data distribution and replication
strategies, enables you to consolidate your object and block
storage, enables very fast provisioning of boot-from-volume
instances using thin provisioning, and supports a distributed
file-system interface, though this interface is `not yet
recommended <http://ceph.com/docs/master/cephfs/>`_ for use in
production deployment by the Ceph project.
If you want to manage your object and block storage within a single
system, or if you want to support fast boot-from-volume, you should
consider Ceph.
Gluster
A distributed, shared file system. As of Gluster version 3.3, you
can use Gluster to consolidate your object storage and file storage
into one unified file and object storage solution, which is called
Gluster For OpenStack (GFO). GFO uses a customized version of swift
that enables Gluster to be used as the back-end storage.
The main reason to use GFO rather than regular swift is if you also
want to support a distributed file system, either to support shared
storage live migration or to provide it as a separate service to
your end users. If you want to manage your object and file storage
within a single system, you should consider GFO.
LVM
The Logical Volume Manager is a Linux-based system that provides an
abstraction layer on top of physical disks to expose logical volumes
to the operating system. The LVM back-end implements block storage
as LVM logical partitions.
On each host that will house block storage, an administrator must
initially create a volume group dedicated to Block Storage volumes.
Blocks are created from LVM logical volumes.
.. note::
LVM does *not* provide any replication. Typically,
administrators configure RAID on nodes that use LVM as block
storage to protect against failures of individual hard drives.
However, RAID does not protect against a failure of the entire
host.
ZFS
The Solaris iSCSI driver for OpenStack Block Storage implements
blocks as ZFS entities. ZFS is a file system that also has the
functionality of a volume manager. This is unlike on a Linux system,
where there is a separation of volume manager (LVM) and file system
(such as, ext3, ext4, xfs, and btrfs). ZFS has a number of
advantages over ext4, including improved data-integrity checking.
The ZFS back end for OpenStack Block Storage supports only
Solaris-based systems, such as Illumos. While there is a Linux port
of ZFS, it is not included in any of the standard Linux
distributions, and it has not been tested with OpenStack Block
Storage. As with LVM, ZFS does not provide replication across hosts
on its own; you need to add a replication solution on top of ZFS if
your cloud needs to be able to handle storage-node failures.
We don't recommend ZFS unless you have previous experience with
deploying it, since the ZFS back end for Block Storage requires a
Solaris-based operating system, and we assume that your experience
is primarily with Linux-based systems.
Sheepdog
Sheepdog is a userspace distributed storage system. Sheepdog scales
to several hundred nodes, and has powerful virtual disk management
features like snapshot, cloning, rollback, thin provisioning.
It is essentially an object storage system that manages disks and
aggregates the space and performance of disks linearly in hyper
scale on commodity hardware in a smart way. On top of its object
store, Sheepdog provides elastic volume service and http service.
Sheepdog does not assume anything about kernel version and can work
nicely with xattr-supported file systems.
Conclusion
~~~~~~~~~~
We hope that you now have some considerations in mind and questions to
ask your future cloud users about their storage use cases. As you can
see, your storage decisions will also influence your network design for
performance and security needs. Continue with us to make more informed
decisions about your OpenStack cloud design.

@ -1,52 +0,0 @@
============
Architecture
============
Designing an OpenStack cloud is a great achievement. It requires a
robust understanding of the requirements and needs of the cloud's users
to determine the best possible configuration to meet them. OpenStack
provides a great deal of flexibility to achieve your needs, and this
part of the book aims to shine light on many of the decisions you need
to make during the process.
To design, deploy, and configure OpenStack, administrators must
understand the logical architecture. A diagram can help you envision all
the integrated services within OpenStack and how they interact with each
other.
OpenStack modules are one of the following types:
Daemon
Runs as a background process. On Linux platforms, a daemon is usually
installed as a service.
Script
Installs a virtual environment and runs tests.
Command-line interface (CLI)
Enables users to submit API calls to OpenStack services through commands.
As shown, end users can interact through the dashboard, CLIs, and APIs.
All services authenticate through a common Identity service, and
individual services interact with each other through public APIs, except
where privileged administrator commands are necessary.
:ref:`logical_architecture` shows the most common, but not the only logical
architecture for an OpenStack cloud.
.. _logical_architecture:
.. figure:: figures/osog_0001.png
:width: 100%
OpenStack Logical Architecture
.. toctree::
:maxdepth: 2
arch-examples.rst
arch-provision.rst
arch-cloud-controller.rst
arch-compute-nodes.rst
arch-scaling.rst
arch-storage.rst
arch-network-design.rst

Binary file not shown.

Before

(image error) Size: 675 KiB

Binary file not shown.

Before

(image error) Size: 39 KiB

Binary file not shown.

Before

(image error) Size: 41 KiB

Binary file not shown.

Before

(image error) Size: 196 KiB

Binary file not shown.

Before

(image error) Size: 59 KiB

Binary file not shown.

Before

(image error) Size: 99 KiB

Binary file not shown.

Before

(image error) Size: 89 KiB

Binary file not shown.

Before

(image error) Size: 95 KiB

Binary file not shown.

Before

(image error) Size: 105 KiB

Binary file not shown.

Before

(image error) Size: 42 KiB

@ -16,7 +16,6 @@ Contents
acknowledgements.rst
preface.rst
common/conventions.rst
architecture.rst
operations.rst
Appendix

@ -151,6 +151,9 @@ Installation Tutorials and Guides
Contains a reference listing of all configuration options for core
and integrated OpenStack services by release version
`OpenStack Architecture Design Guide <https://docs.openstack.org/arch-design/>`_
Contains guidelines for designing an OpenStack cloud
`OpenStack Administrator Guide <https://docs.openstack.org/admin-guide/>`_
Contains how-to information for managing an OpenStack cloud as
needed for your use cases, such as storage, computing, or
@ -184,50 +187,8 @@ Installation Tutorials and Guides
How This Book Is Organized
~~~~~~~~~~~~~~~~~~~~~~~~~~
This book is organized into two parts: the architecture decisions for
designing OpenStack clouds and the repeated operations for running
OpenStack clouds.
**Part I:**
:doc:`arch-examples`
Because of all the decisions the other chapters discuss, this
chapter describes the decisions made for this particular book and
much of the justification for the example architecture.
:doc:`arch-provision`
While this book doesn't describe installation, we do recommend
automation for deployment and configuration, discussed in this
chapter.
:doc:`arch-cloud-controller`
The cloud controller is an invention for the sake of consolidating
and describing which services run on which nodes. This chapter
discusses hardware and network considerations as well as how to
design the cloud controller for performance and separation of
services.
:doc:`arch-compute-nodes`
This chapter describes the compute nodes, which are dedicated to
running virtual machines. Some hardware choices come into play here,
as well as logging and networking descriptions.
:doc:`arch-scaling`
This chapter discusses the growth of your cloud resources through
scaling and segregation considerations.
:doc:`arch-storage`
As with other architecture decisions, storage concepts within
OpenStack offer many options. This chapter lays out the choices for
you.
:doc:`arch-network-design`
Your OpenStack cloud networking needs to fit into your existing
networks while also enabling the best design for your users and
administrators, and this chapter gives you in-depth information
about networking decisions.
**Part II:**
This book contains several parts to show best practices and tips for
the repeated operations for running OpenStack clouds.
:doc:`ops-lay-of-the-land`
This chapter is written to let you get your hands wrapped around

@ -87,6 +87,9 @@ redirect 301 /trunk/openstack-ops/oreilly-openstack-ops-guide.pdf /openstack-ops
redirectmatch 301 /trunk/openstack-ops/.*$ /ops-guide/
redirect 301 /ops/index.html /ops-guide/index.html
# Redirect Operations Guide architecture part to Architecture Guide
redirectmatch 301 /ops-guide/arch.*$ /arch-design/index.html
# Redirect Architecture Guide to /arch-design/
redirect 301 /arch/index.html /arch-design/index.html