Merge "[ops-guide] remove Arch part in favor of Arch Design Guide"
This commit is contained in:
commit
d1eaafd478
doc/ops-guide/source
arch-cloud-controller.rstarch-compute-nodes.rstarch-example-neutron.rstarch-example-nova-network.rstarch-example-thoughts.rstarch-examples.rstarch-network-design.rstarch-provision.rstarch-scaling.rstarch-storage.rstarchitecture.rst
figures
osog_0001.pngosog_0101.pngosog_0102.pngosog_0103.pngosog_0104.pngosog_0105.pngosog_0106.pngosog_01in01.pngosog_01in02.pngosog_0201.png
index.rstpreface.rstwww/static
@ -1,408 +0,0 @@
|
||||
====================================================
|
||||
Designing for Cloud Controllers and Cloud Management
|
||||
====================================================
|
||||
|
||||
OpenStack is designed to be massively horizontally scalable, which
|
||||
allows all services to be distributed widely. However, to simplify this
|
||||
guide, we have decided to discuss services of a more central nature,
|
||||
using the concept of a *cloud controller*. A cloud controller is just a
|
||||
conceptual simplification. In the real world, you design an architecture
|
||||
for your cloud controller that enables high availability so that if any
|
||||
node fails, another can take over the required tasks. In reality, cloud
|
||||
controller tasks are spread out across more than a single node.
|
||||
|
||||
The cloud controller provides the central management system for
|
||||
OpenStack deployments. Typically, the cloud controller manages
|
||||
authentication and sends messaging to all the systems through a message
|
||||
queue.
|
||||
|
||||
For many deployments, the cloud controller is a single node. However, to
|
||||
have high availability, you have to take a few considerations into
|
||||
account, which we'll cover in this chapter.
|
||||
|
||||
The cloud controller manages the following services for the cloud:
|
||||
|
||||
Databases
|
||||
Tracks current information about users and instances, for example,
|
||||
in a database, typically one database instance managed per service
|
||||
|
||||
Message queue services
|
||||
All :term:`Advanced Message Queuing Protocol (AMQP)` messages for
|
||||
services are received and sent according to the queue broker
|
||||
|
||||
Conductor services
|
||||
Proxy requests to a database
|
||||
|
||||
Authentication and authorization for identity management
|
||||
Indicates which users can do what actions on certain cloud
|
||||
resources; quota management is spread out among services,
|
||||
however
|
||||
|
||||
Image-management services
|
||||
Stores and serves images with metadata on each, for launching in the
|
||||
cloud
|
||||
|
||||
Scheduling services
|
||||
Indicates which resources to use first; for example, spreading out
|
||||
where instances are launched based on an algorithm
|
||||
|
||||
User dashboard
|
||||
Provides a web-based front end for users to consume OpenStack cloud
|
||||
services
|
||||
|
||||
API endpoints
|
||||
Offers each service's REST API access, where the API endpoint
|
||||
catalog is managed by the Identity service
|
||||
|
||||
For our example, the cloud controller has a collection of ``nova-*``
|
||||
components that represent the global state of the cloud; talks to
|
||||
services such as authentication; maintains information about the cloud
|
||||
in a database; communicates to all compute nodes and storage
|
||||
:term:`workers <worker>` through a queue; and provides API access.
|
||||
Each service running on a designated cloud controller may be broken out
|
||||
into separate nodes for scalability or availability.
|
||||
|
||||
As another example, you could use pairs of servers for a collective
|
||||
cloud controller—one active, one standby—for redundant nodes providing a
|
||||
given set of related services, such as:
|
||||
|
||||
- Front end web for API requests, the scheduler for choosing which
|
||||
compute node to boot an instance on, Identity services, and the
|
||||
dashboard
|
||||
|
||||
- Database and message queue server (such as MySQL, RabbitMQ)
|
||||
|
||||
- Image service for the image management
|
||||
|
||||
Now that you see the myriad designs for controlling your cloud, read
|
||||
more about the further considerations to help with your design
|
||||
decisions.
|
||||
|
||||
Hardware Considerations
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
A cloud controller's hardware can be the same as a compute node, though
|
||||
you may want to further specify based on the size and type of cloud that
|
||||
you run.
|
||||
|
||||
It's also possible to use virtual machines for all or some of the
|
||||
services that the cloud controller manages, such as the message queuing.
|
||||
In this guide, we assume that all services are running directly on the
|
||||
cloud controller.
|
||||
|
||||
:ref:`table_controller_hardware` contains common considerations to
|
||||
review when sizing hardware for the cloud controller design.
|
||||
|
||||
.. _table_controller_hardware:
|
||||
|
||||
.. list-table:: Table. Cloud controller hardware sizing considerations
|
||||
:widths: 25 75
|
||||
:header-rows: 1
|
||||
|
||||
* - Consideration
|
||||
- Ramification
|
||||
* - How many instances will run at once?
|
||||
- Size your database server accordingly, and scale out beyond one cloud
|
||||
controller if many instances will report status at the same time and
|
||||
scheduling where a new instance starts up needs computing power.
|
||||
* - How many compute nodes will run at once?
|
||||
- Ensure that your messaging queue handles requests successfully and size
|
||||
accordingly.
|
||||
* - How many users will access the API?
|
||||
- If many users will make multiple requests, make sure that the CPU load
|
||||
for the cloud controller can handle it.
|
||||
* - How many users will access the dashboard versus the REST API directly?
|
||||
- The dashboard makes many requests, even more than the API access, so
|
||||
add even more CPU if your dashboard is the main interface for your users.
|
||||
* - How many ``nova-api`` services do you run at once for your cloud?
|
||||
- You need to size the controller with a core per service.
|
||||
* - How long does a single instance run?
|
||||
- Starting instances and deleting instances is demanding on the compute
|
||||
node but also demanding on the controller node because of all the API
|
||||
queries and scheduling needs.
|
||||
* - Does your authentication system also verify externally?
|
||||
- External systems such as :term:`LDAP <Lightweight Directory Access
|
||||
Protocol (LDAP)>` or :term:`Active Directory` require network
|
||||
connectivity between the cloud controller and an external authentication
|
||||
system. Also ensure that the cloud controller has the CPU power to keep
|
||||
up with requests.
|
||||
|
||||
|
||||
Separation of Services
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
While our example contains all central services in a single location, it
|
||||
is possible and indeed often a good idea to separate services onto
|
||||
different physical servers. :ref:`table_deployment_scenarios` is a list
|
||||
of deployment scenarios we've seen and their justifications.
|
||||
|
||||
.. _table_deployment_scenarios:
|
||||
|
||||
.. list-table:: Table. Deployment scenarios
|
||||
:widths: 25 75
|
||||
:header-rows: 1
|
||||
|
||||
* - Scenario
|
||||
- Justification
|
||||
* - Run ``glance-*`` servers on the ``swift-proxy`` server.
|
||||
- This deployment felt that the spare I/O on the Object Storage proxy
|
||||
server was sufficient and that the Image Delivery portion of glance
|
||||
benefited from being on physical hardware and having good connectivity
|
||||
to the Object Storage back end it was using.
|
||||
* - Run a central dedicated database server.
|
||||
- This deployment used a central dedicated server to provide the databases
|
||||
for all services. This approach simplified operations by isolating
|
||||
database server updates and allowed for the simple creation of slave
|
||||
database servers for failover.
|
||||
* - Run one VM per service.
|
||||
- This deployment ran central services on a set of servers running KVM.
|
||||
A dedicated VM was created for each service (``nova-scheduler``,
|
||||
rabbitmq, database, etc). This assisted the deployment with scaling
|
||||
because administrators could tune the resources given to each virtual
|
||||
machine based on the load it received (something that was not well
|
||||
understood during installation).
|
||||
* - Use an external load balancer.
|
||||
- This deployment had an expensive hardware load balancer in its
|
||||
organization. It ran multiple ``nova-api`` and ``swift-proxy``
|
||||
servers on different physical servers and used the load balancer
|
||||
to switch between them.
|
||||
|
||||
One choice that always comes up is whether to virtualize. Some services,
|
||||
such as ``nova-compute``, ``swift-proxy`` and ``swift-object`` servers,
|
||||
should not be virtualized. However, control servers can often be happily
|
||||
virtualized—the performance penalty can usually be offset by simply
|
||||
running more of the service.
|
||||
|
||||
Database
|
||||
~~~~~~~~
|
||||
|
||||
OpenStack Compute uses an SQL database to store and retrieve stateful
|
||||
information. MySQL is the popular database choice in the OpenStack
|
||||
community.
|
||||
|
||||
Loss of the database leads to errors. As a result, we recommend that you
|
||||
cluster your database to make it failure tolerant. Configuring and
|
||||
maintaining a database cluster is done outside OpenStack and is
|
||||
determined by the database software you choose to use in your cloud
|
||||
environment. MySQL/Galera is a popular option for MySQL-based databases.
|
||||
|
||||
Message Queue
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
Most OpenStack services communicate with each other using the *message
|
||||
queue*. For example, Compute communicates to block storage services and
|
||||
networking services through the message queue. Also, you can optionally
|
||||
enable notifications for any service. RabbitMQ, Qpid, and Zeromq are all
|
||||
popular choices for a message-queue service. In general, if the message
|
||||
queue fails or becomes inaccessible, the cluster grinds to a halt and
|
||||
ends up in a read-only state, with information stuck at the point where
|
||||
the last message was sent. Accordingly, we recommend that you cluster
|
||||
the message queue. Be aware that clustered message queues can be a pain
|
||||
point for many OpenStack deployments. While RabbitMQ has native
|
||||
clustering support, there have been reports of issues when running it at
|
||||
a large scale. While other queuing solutions are available, such as Zeromq
|
||||
and Qpid, Zeromq does not offer stateful queues. Qpid is the messaging
|
||||
system of choice for Red Hat and its derivatives. Qpid does not have
|
||||
native clustering capabilities and requires a supplemental service, such
|
||||
as Pacemaker or Corsync. For your message queue, you need to determine
|
||||
what level of data loss you are comfortable with and whether to use an
|
||||
OpenStack project's ability to retry multiple MQ hosts in the event of a
|
||||
failure, such as using Compute's ability to do so.
|
||||
|
||||
Conductor Services
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
In the previous version of OpenStack, all ``nova-compute`` services
|
||||
required direct access to the database hosted on the cloud controller.
|
||||
This was problematic for two reasons: security and performance. With
|
||||
regard to security, if a compute node is compromised, the attacker
|
||||
inherently has access to the database. With regard to performance,
|
||||
``nova-compute`` calls to the database are single-threaded and blocking.
|
||||
This creates a performance bottleneck because database requests are
|
||||
fulfilled serially rather than in parallel.
|
||||
|
||||
The conductor service resolves both of these issues by acting as a proxy
|
||||
for the ``nova-compute`` service. Now, instead of ``nova-compute``
|
||||
directly accessing the database, it contacts the ``nova-conductor``
|
||||
service, and ``nova-conductor`` accesses the database on
|
||||
``nova-compute``'s behalf. Since ``nova-compute`` no longer has direct
|
||||
access to the database, the security issue is resolved. Additionally,
|
||||
``nova-conductor`` is a nonblocking service, so requests from all
|
||||
compute nodes are fulfilled in parallel.
|
||||
|
||||
.. note::
|
||||
|
||||
If you are using ``nova-network`` and multi-host networking in your
|
||||
cloud environment, ``nova-compute`` still requires direct access to
|
||||
the database.
|
||||
|
||||
The ``nova-conductor`` service is horizontally scalable. To make
|
||||
``nova-conductor`` highly available and fault tolerant, just launch more
|
||||
instances of the ``nova-conductor`` process, either on the same server
|
||||
or across multiple servers.
|
||||
|
||||
Application Programming Interface (API)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
All public access, whether direct, through a command-line client, or
|
||||
through the web-based dashboard, uses the API service. Find the API
|
||||
reference at `Development resources for OpenStack clouds
|
||||
<https://developer.openstack.org/>`_.
|
||||
|
||||
You must choose whether you want to support the Amazon EC2 compatibility
|
||||
APIs, or just the OpenStack APIs. One issue you might encounter when
|
||||
running both APIs is an inconsistent experience when referring to images
|
||||
and instances.
|
||||
|
||||
For example, the EC2 API refers to instances using IDs that contain
|
||||
hexadecimal, whereas the OpenStack API uses names and digits. Similarly,
|
||||
the EC2 API tends to rely on DNS aliases for contacting virtual
|
||||
machines, as opposed to OpenStack, which typically lists IP
|
||||
addresses.
|
||||
|
||||
If OpenStack is not set up in the right way, it is simple to have
|
||||
scenarios in which users are unable to contact their instances due to
|
||||
having only an incorrect DNS alias. Despite this, EC2 compatibility can
|
||||
assist users migrating to your cloud.
|
||||
|
||||
As with databases and message queues, having more than one :term:`API server`
|
||||
is a good thing. Traditional HTTP load-balancing techniques can be used to
|
||||
achieve a highly available ``nova-api`` service.
|
||||
|
||||
Extensions
|
||||
~~~~~~~~~~
|
||||
|
||||
The `API
|
||||
Specifications <https://developer.openstack.org/api-guide/quick-start/index.html>`_ define
|
||||
the core actions, capabilities, and mediatypes of the OpenStack API. A
|
||||
client can always depend on the availability of this core API, and
|
||||
implementers are always required to support it in its entirety.
|
||||
Requiring strict adherence to the core API allows clients to rely upon a
|
||||
minimal level of functionality when interacting with multiple
|
||||
implementations of the same API.
|
||||
|
||||
The OpenStack Compute API is extensible. An extension adds capabilities
|
||||
to an API beyond those defined in the core. The introduction of new
|
||||
features, MIME types, actions, states, headers, parameters, and
|
||||
resources can all be accomplished by means of extensions to the core
|
||||
API. This allows the introduction of new features in the API without
|
||||
requiring a version change and allows the introduction of
|
||||
vendor-specific niche functionality.
|
||||
|
||||
Scheduling
|
||||
~~~~~~~~~~
|
||||
|
||||
The scheduling services are responsible for determining the compute or
|
||||
storage node where a virtual machine or block storage volume should be
|
||||
created. The scheduling services receive creation requests for these
|
||||
resources from the message queue and then begin the process of
|
||||
determining the appropriate node where the resource should reside. This
|
||||
process is done by applying a series of user-configurable filters
|
||||
against the available collection of nodes.
|
||||
|
||||
There are currently two schedulers: ``nova-scheduler`` for virtual
|
||||
machines and ``cinder-scheduler`` for block storage volumes. Both
|
||||
schedulers are able to scale horizontally, so for high-availability
|
||||
purposes, or for very large or high-schedule-frequency installations,
|
||||
you should consider running multiple instances of each scheduler. The
|
||||
schedulers all listen to the shared message queue, so no special load
|
||||
balancing is required.
|
||||
|
||||
Images
|
||||
~~~~~~
|
||||
|
||||
The OpenStack Image service consists of two parts: ``glance-api`` and
|
||||
``glance-registry``. The former is responsible for the delivery of
|
||||
images; the compute node uses it to download images from the back end.
|
||||
The latter maintains the metadata information associated with virtual
|
||||
machine images and requires a database.
|
||||
|
||||
The ``glance-api`` part is an abstraction layer that allows a choice of
|
||||
back end. Currently, it supports:
|
||||
|
||||
OpenStack Object Storage
|
||||
Allows you to store images as objects.
|
||||
|
||||
File system
|
||||
Uses any traditional file system to store the images as files.
|
||||
|
||||
S3
|
||||
Allows you to fetch images from Amazon S3.
|
||||
|
||||
HTTP
|
||||
Allows you to fetch images from a web server. You cannot write
|
||||
images by using this mode.
|
||||
|
||||
If you have an OpenStack Object Storage service, we recommend using this
|
||||
as a scalable place to store your images. You can also use a file system
|
||||
with sufficient performance or Amazon S3—unless you do not need the
|
||||
ability to upload new images through OpenStack.
|
||||
|
||||
Dashboard
|
||||
~~~~~~~~~
|
||||
|
||||
The OpenStack dashboard (horizon) provides a web-based user interface to
|
||||
the various OpenStack components. The dashboard includes an end-user
|
||||
area for users to manage their virtual infrastructure and an admin area
|
||||
for cloud operators to manage the OpenStack environment as a
|
||||
whole.
|
||||
|
||||
The dashboard is implemented as a Python web application that normally
|
||||
runs in :term:`Apache` ``httpd``. Therefore, you may treat it the same as any
|
||||
other web application, provided it can reach the API servers (including
|
||||
their admin endpoints) over the network.
|
||||
|
||||
Authentication and Authorization
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The concepts supporting OpenStack's authentication and authorization are
|
||||
derived from well-understood and widely used systems of a similar
|
||||
nature. Users have credentials they can use to authenticate, and they
|
||||
can be a member of one or more groups (known as projects or tenants,
|
||||
interchangeably).
|
||||
|
||||
For example, a cloud administrator might be able to list all instances
|
||||
in the cloud, whereas a user can see only those in his current group.
|
||||
Resources quotas, such as the number of cores that can be used, disk
|
||||
space, and so on, are associated with a project.
|
||||
|
||||
OpenStack Identity provides authentication decisions and user attribute
|
||||
information, which is then used by the other OpenStack services to
|
||||
perform authorization. The policy is set in the ``policy.json`` file.
|
||||
For information on how to configure these, see :doc:`ops-projects-users`
|
||||
|
||||
OpenStack Identity supports different plug-ins for authentication
|
||||
decisions and identity storage. Examples of these plug-ins include:
|
||||
|
||||
- In-memory key-value Store (a simplified internal storage structure)
|
||||
|
||||
- SQL database (such as MySQL or PostgreSQL)
|
||||
|
||||
- Memcached (a distributed memory object caching system)
|
||||
|
||||
- LDAP (such as OpenLDAP or Microsoft's Active Directory)
|
||||
|
||||
Many deployments use the SQL database; however, LDAP is also a popular
|
||||
choice for those with existing authentication infrastructure that needs
|
||||
to be integrated.
|
||||
|
||||
Network Considerations
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Because the cloud controller handles so many different services, it must
|
||||
be able to handle the amount of traffic that hits it. For example, if
|
||||
you choose to host the OpenStack Image service on the cloud controller,
|
||||
the cloud controller should be able to support the transferring of the
|
||||
images at an acceptable speed.
|
||||
|
||||
As another example, if you choose to use single-host networking where
|
||||
the cloud controller is the network gateway for all instances, then the
|
||||
cloud controller must support the total amount of traffic that travels
|
||||
between your cloud and the public Internet.
|
||||
|
||||
We recommend that you use a fast NIC, such as 10 GB. You can also choose
|
||||
to use two 10 GB NICs and bond them together. While you might not be
|
||||
able to get a full bonded 20 GB speed, different transmission streams
|
||||
use different NICs. For example, if the cloud controller transfers two
|
||||
images, each image uses a different NIC and gets a full 10 GB of
|
||||
bandwidth.
|
@ -1,305 +0,0 @@
|
||||
=============
|
||||
Compute Nodes
|
||||
=============
|
||||
|
||||
In this chapter, we discuss some of the choices you need to consider
|
||||
when building out your compute nodes. Compute nodes form the resource
|
||||
core of the OpenStack Compute cloud, providing the processing, memory,
|
||||
network and storage resources to run instances.
|
||||
|
||||
Choosing a CPU
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
The type of CPU in your compute node is a very important choice. First,
|
||||
ensure that the CPU supports virtualization by way of *VT-x* for Intel
|
||||
chips and *AMD-v* for AMD chips.
|
||||
|
||||
.. tip::
|
||||
|
||||
Consult the vendor documentation to check for virtualization
|
||||
support. For Intel, read `“Does my processor support Intel® Virtualization
|
||||
Technology?” <http://www.intel.com/support/processors/sb/cs-030729.htm>`_.
|
||||
For AMD, read `AMD Virtualization
|
||||
<http://www.amd.com/en-us/innovations/software-technologies/server-solution/virtualization>`_.
|
||||
Note that your CPU may support virtualization but it may be
|
||||
disabled. Consult your BIOS documentation for how to enable CPU
|
||||
features.
|
||||
|
||||
The number of cores that the CPU has also affects the decision. It's
|
||||
common for current CPUs to have up to 12 cores. Additionally, if an
|
||||
Intel CPU supports hyperthreading, those 12 cores are doubled to 24
|
||||
cores. If you purchase a server that supports multiple CPUs, the number
|
||||
of cores is further multiplied.
|
||||
|
||||
.. note::
|
||||
|
||||
**Multithread Considerations**
|
||||
|
||||
Hyper-Threading is Intel's proprietary simultaneous multithreading
|
||||
implementation used to improve parallelization on their CPUs. You might
|
||||
consider enabling Hyper-Threading to improve the performance of
|
||||
multithreaded applications.
|
||||
|
||||
Whether you should enable Hyper-Threading on your CPUs depends upon your
|
||||
use case. For example, disabling Hyper-Threading can be beneficial in
|
||||
intense computing environments. We recommend that you do performance
|
||||
testing with your local workload with both Hyper-Threading on and off to
|
||||
determine what is more appropriate in your case.
|
||||
|
||||
Choosing a Hypervisor
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
A hypervisor provides software to manage virtual machine access to the
|
||||
underlying hardware. The hypervisor creates, manages, and monitors
|
||||
virtual machines. OpenStack Compute supports many hypervisors to various
|
||||
degrees, including:
|
||||
|
||||
* `KVM <http://www.linux-kvm.org/page/Main_Page>`_
|
||||
* `LXC <https://linuxcontainers.org/>`_
|
||||
* `QEMU <http://wiki.qemu.org/Main_Page>`_
|
||||
* `VMware ESX/ESXi <https://www.vmware.com/support/vsphere-hypervisor>`_
|
||||
* `Xen <http://www.xenproject.org/>`_
|
||||
* `Hyper-V <http://technet.microsoft.com/en-us/library/hh831531.aspx>`_
|
||||
* `Docker <https://www.docker.com/>`_
|
||||
|
||||
Probably the most important factor in your choice of hypervisor is your
|
||||
current usage or experience. Aside from that, there are practical
|
||||
concerns to do with feature parity, documentation, and the level of
|
||||
community experience.
|
||||
|
||||
For example, KVM is the most widely adopted hypervisor in the OpenStack
|
||||
community. Besides KVM, more deployments run Xen, LXC, VMware, and
|
||||
Hyper-V than the others listed. However, each of these are lacking some
|
||||
feature support or the documentation on how to use them with OpenStack
|
||||
is out of date.
|
||||
|
||||
The best information available to support your choice is found on the
|
||||
`Hypervisor Support Matrix
|
||||
<https://docs.openstack.org/developer/nova/support-matrix.html>`_
|
||||
and in the `configuration reference
|
||||
<https://docs.openstack.org/ocata/config-reference/compute/hypervisors.html>`_.
|
||||
|
||||
.. note::
|
||||
|
||||
It is also possible to run multiple hypervisors in a single
|
||||
deployment using host aggregates or cells. However, an individual
|
||||
compute node can run only a single hypervisor at a time.
|
||||
|
||||
Instance Storage Solutions
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
As part of the procurement for a compute cluster, you must specify some
|
||||
storage for the disk on which the instantiated instance runs. There are
|
||||
three main approaches to providing this temporary-style storage, and it
|
||||
is important to understand the implications of the choice.
|
||||
|
||||
They are:
|
||||
|
||||
* Off compute node storage—shared file system
|
||||
* On compute node storage—shared file system
|
||||
* On compute node storage—nonshared file system
|
||||
|
||||
In general, the questions you should ask when selecting storage are as
|
||||
follows:
|
||||
|
||||
* What is the platter count you can achieve?
|
||||
* Do more spindles result in better I/O despite network access?
|
||||
* Which one results in the best cost-performance scenario you are aiming for?
|
||||
* How do you manage the storage operationally?
|
||||
|
||||
Many operators use separate compute and storage hosts. Compute services
|
||||
and storage services have different requirements, and compute hosts
|
||||
typically require more CPU and RAM than storage hosts. Therefore, for a
|
||||
fixed budget, it makes sense to have different configurations for your
|
||||
compute nodes and your storage nodes. Compute nodes will be invested in
|
||||
CPU and RAM, and storage nodes will be invested in block storage.
|
||||
|
||||
However, if you are more restricted in the number of physical hosts you
|
||||
have available for creating your cloud and you want to be able to
|
||||
dedicate as many of your hosts as possible to running instances, it
|
||||
makes sense to run compute and storage on the same machines.
|
||||
|
||||
We'll discuss the three main approaches to instance storage in the next
|
||||
few sections.
|
||||
|
||||
Off Compute Node Storage—Shared File System
|
||||
-------------------------------------------
|
||||
|
||||
In this option, the disks storing the running instances are hosted in
|
||||
servers outside of the compute nodes.
|
||||
|
||||
If you use separate compute and storage hosts, you can treat your
|
||||
compute hosts as "stateless." As long as you don't have any instances
|
||||
currently running on a compute host, you can take it offline or wipe it
|
||||
completely without having any effect on the rest of your cloud. This
|
||||
simplifies maintenance for the compute hosts.
|
||||
|
||||
There are several advantages to this approach:
|
||||
|
||||
* If a compute node fails, instances are usually easily recoverable.
|
||||
* Running a dedicated storage system can be operationally simpler.
|
||||
* You can scale to any number of spindles.
|
||||
* It may be possible to share the external storage for other purposes.
|
||||
|
||||
The main downsides to this approach are:
|
||||
|
||||
* Depending on design, heavy I/O usage from some instances can affect
|
||||
unrelated instances.
|
||||
* Use of the network can decrease performance.
|
||||
|
||||
On Compute Node Storage—Shared File System
|
||||
------------------------------------------
|
||||
|
||||
In this option, each compute node is specified with a significant amount
|
||||
of disk space, but a distributed file system ties the disks from each
|
||||
compute node into a single mount.
|
||||
|
||||
The main advantage of this option is that it scales to external storage
|
||||
when you require additional storage.
|
||||
|
||||
However, this option has several downsides:
|
||||
|
||||
* Running a distributed file system can make you lose your data
|
||||
locality compared with nonshared storage.
|
||||
* Recovery of instances is complicated by depending on multiple hosts.
|
||||
* The chassis size of the compute node can limit the number of spindles
|
||||
able to be used in a compute node.
|
||||
* Use of the network can decrease performance.
|
||||
|
||||
On Compute Node Storage—Nonshared File System
|
||||
---------------------------------------------
|
||||
|
||||
In this option, each compute node is specified with enough disks to
|
||||
store the instances it hosts.
|
||||
|
||||
There are two main reasons why this is a good idea:
|
||||
|
||||
* Heavy I/O usage on one compute node does not affect instances on
|
||||
other compute nodes.
|
||||
* Direct I/O access can increase performance.
|
||||
|
||||
This has several downsides:
|
||||
|
||||
* If a compute node fails, the instances running on that node are lost.
|
||||
* The chassis size of the compute node can limit the number of spindles
|
||||
able to be used in a compute node.
|
||||
* Migrations of instances from one node to another are more complicated
|
||||
and rely on features that may not continue to be developed.
|
||||
* If additional storage is required, this option does not scale.
|
||||
|
||||
Running a shared file system on a storage system apart from the computes
|
||||
nodes is ideal for clouds where reliability and scalability are the most
|
||||
important factors. Running a shared file system on the compute nodes
|
||||
themselves may be best in a scenario where you have to deploy to
|
||||
preexisting servers for which you have little to no control over their
|
||||
specifications. Running a nonshared file system on the compute nodes
|
||||
themselves is a good option for clouds with high I/O requirements and
|
||||
low concern for reliability.
|
||||
|
||||
Issues with Live Migration
|
||||
--------------------------
|
||||
|
||||
We consider live migration an integral part of the operations of the
|
||||
cloud. This feature provides the ability to seamlessly move instances
|
||||
from one physical host to another, a necessity for performing upgrades
|
||||
that require reboots of the compute hosts, but only works well with
|
||||
shared storage.
|
||||
|
||||
Live migration can also be done with nonshared storage, using a feature
|
||||
known as *KVM live block migration*. While an earlier implementation of
|
||||
block-based migration in KVM and QEMU was considered unreliable, there
|
||||
is a newer, more reliable implementation of block-based live migration
|
||||
as of QEMU 1.4 and libvirt 1.0.2 that is also compatible with OpenStack.
|
||||
However, none of the authors of this guide have first-hand experience
|
||||
using live block migration.
|
||||
|
||||
Choice of File System
|
||||
---------------------
|
||||
|
||||
If you want to support shared-storage live migration, you need to
|
||||
configure a distributed file system.
|
||||
|
||||
Possible options include:
|
||||
|
||||
* NFS (default for Linux)
|
||||
* GlusterFS
|
||||
* MooseFS
|
||||
* Lustre
|
||||
|
||||
We've seen deployments with all, and recommend that you choose the one
|
||||
you are most familiar with operating. If you are not familiar with any
|
||||
of these, choose NFS, as it is the easiest to set up and there is
|
||||
extensive community knowledge about it.
|
||||
|
||||
Overcommitting
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
OpenStack allows you to overcommit CPU and RAM on compute nodes. This
|
||||
allows you to increase the number of instances you can have running on
|
||||
your cloud, at the cost of reducing the performance of the instances.
|
||||
OpenStack Compute uses the following ratios by default:
|
||||
|
||||
* CPU allocation ratio: 16:1
|
||||
* RAM allocation ratio: 1.5:1
|
||||
|
||||
The default CPU allocation ratio of 16:1 means that the scheduler
|
||||
allocates up to 16 virtual cores per physical core. For example, if a
|
||||
physical node has 12 cores, the scheduler sees 192 available virtual
|
||||
cores. With typical flavor definitions of 4 virtual cores per instance,
|
||||
this ratio would provide 48 instances on a physical node.
|
||||
|
||||
The formula for the number of virtual instances on a compute node is
|
||||
``(OR*PC)/VC``, where:
|
||||
|
||||
OR
|
||||
CPU overcommit ratio (virtual cores per physical core)
|
||||
|
||||
PC
|
||||
Number of physical cores
|
||||
|
||||
VC
|
||||
Number of virtual cores per instance
|
||||
|
||||
Similarly, the default RAM allocation ratio of 1.5:1 means that the
|
||||
scheduler allocates instances to a physical node as long as the total
|
||||
amount of RAM associated with the instances is less than 1.5 times the
|
||||
amount of RAM available on the physical node.
|
||||
|
||||
For example, if a physical node has 48 GB of RAM, the scheduler
|
||||
allocates instances to that node until the sum of the RAM associated
|
||||
with the instances reaches 72 GB (such as nine instances, in the case
|
||||
where each instance has 8 GB of RAM).
|
||||
|
||||
.. note::
|
||||
Regardless of the overcommit ratio, an instance can not be placed
|
||||
on any physical node with fewer raw (pre-overcommit) resources than
|
||||
the instance flavor requires.
|
||||
|
||||
You must select the appropriate CPU and RAM allocation ratio for your
|
||||
particular use case.
|
||||
|
||||
Logging
|
||||
~~~~~~~
|
||||
|
||||
Logging is detailed more fully in :doc:`ops-logging-monitoring`. However,
|
||||
it is an important design consideration to take into account before
|
||||
commencing operations of your cloud.
|
||||
|
||||
OpenStack produces a great deal of useful logging information, however;
|
||||
but for the information to be useful for operations purposes, you should
|
||||
consider having a central logging server to send logs to, and a log
|
||||
parsing/analysis system (such as logstash).
|
||||
|
||||
Networking
|
||||
~~~~~~~~~~
|
||||
|
||||
Networking in OpenStack is a complex, multifaceted challenge. See
|
||||
:doc:`arch-network-design`.
|
||||
|
||||
Conclusion
|
||||
~~~~~~~~~~
|
||||
|
||||
Compute nodes are the workhorse of your cloud and the place where your
|
||||
users' applications will run. They are likely to be affected by your
|
||||
decisions on what to deploy and how you deploy it. Their requirements
|
||||
should be reflected in the choices you make.
|
@ -1,568 +0,0 @@
|
||||
===========================================
|
||||
Example Architecture — OpenStack Networking
|
||||
===========================================
|
||||
|
||||
This chapter provides an example architecture using OpenStack
|
||||
Networking, also known as the Neutron project, in a highly available
|
||||
environment.
|
||||
|
||||
Overview
|
||||
~~~~~~~~
|
||||
|
||||
A highly available environment can be put into place if you require an
|
||||
environment that can scale horizontally, or want your cloud to continue
|
||||
to be operational in case of node failure. This example architecture has
|
||||
been written based on the current default feature set of OpenStack
|
||||
Havana, with an emphasis on high availability.
|
||||
|
||||
Components
|
||||
----------
|
||||
|
||||
.. list-table::
|
||||
:widths: 50 50
|
||||
:header-rows: 1
|
||||
|
||||
* - Component
|
||||
- Details
|
||||
* - OpenStack release
|
||||
- Havana
|
||||
* - Host operating system
|
||||
- Red Hat Enterprise Linux 6.5
|
||||
* - OpenStack package repository
|
||||
- `Red Hat Distributed OpenStack (RDO) <https://repos.fedorapeople.org/repos/openstack/>`_
|
||||
* - Hypervisor
|
||||
- KVM
|
||||
* - Database
|
||||
- MySQL
|
||||
* - Message queue
|
||||
- Qpid
|
||||
* - Networking service
|
||||
- OpenStack Networking
|
||||
* - Tenant Network Separation
|
||||
- VLAN
|
||||
* - Image service back end
|
||||
- GlusterFS
|
||||
* - Identity driver
|
||||
- SQL
|
||||
* - Block Storage back end
|
||||
- GlusterFS
|
||||
|
||||
Rationale
|
||||
---------
|
||||
|
||||
This example architecture has been selected based on the current default
|
||||
feature set of OpenStack Havana, with an emphasis on high availability.
|
||||
This architecture is currently being deployed in an internal Red Hat
|
||||
OpenStack cloud and used to run hosted and shared services, which by
|
||||
their nature must be highly available.
|
||||
|
||||
This architecture's components have been selected for the following
|
||||
reasons:
|
||||
|
||||
Red Hat Enterprise Linux
|
||||
You must choose an operating system that can run on all of the
|
||||
physical nodes. This example architecture is based on Red Hat
|
||||
Enterprise Linux, which offers reliability, long-term support,
|
||||
certified testing, and is hardened. Enterprise customers, now moving
|
||||
into OpenStack usage, typically require these advantages.
|
||||
|
||||
RDO
|
||||
The Red Hat Distributed OpenStack package offers an easy way to
|
||||
download the most current OpenStack release that is built for the
|
||||
Red Hat Enterprise Linux platform.
|
||||
|
||||
KVM
|
||||
KVM is the supported hypervisor of choice for Red Hat Enterprise
|
||||
Linux (and included in distribution). It is feature complete and
|
||||
free from licensing charges and restrictions.
|
||||
|
||||
MySQL
|
||||
MySQL is used as the database back end for all databases in the
|
||||
OpenStack environment. MySQL is the supported database of choice for
|
||||
Red Hat Enterprise Linux (and included in distribution); the
|
||||
database is open source, scalable, and handles memory well.
|
||||
|
||||
Qpid
|
||||
Apache Qpid offers 100 percent compatibility with the
|
||||
:term:`Advanced Message Queuing Protocol (AMQP)` Standard, and its
|
||||
broker is available for both C++ and Java.
|
||||
|
||||
OpenStack Networking
|
||||
OpenStack Networking offers sophisticated networking functionality,
|
||||
including Layer 2 (L2) network segregation and provider networks.
|
||||
|
||||
VLAN
|
||||
Using a virtual local area network offers broadcast control,
|
||||
security, and physical layer transparency. If needed, use VXLAN to
|
||||
extend your address space.
|
||||
|
||||
GlusterFS
|
||||
GlusterFS offers scalable storage. As your environment grows, you
|
||||
can continue to add more storage nodes (instead of being restricted,
|
||||
for example, by an expensive storage array).
|
||||
|
||||
Detailed Description
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Node types
|
||||
----------
|
||||
|
||||
This section gives you a breakdown of the different nodes that make up
|
||||
the OpenStack environment. A node is a physical machine that is
|
||||
provisioned with an operating system, and running a defined software
|
||||
stack on top of it. :ref:`table_node_types` provides node descriptions and
|
||||
specifications.
|
||||
|
||||
.. _table_node_types:
|
||||
|
||||
.. list-table:: Table. Node types
|
||||
:widths: 20 50 30
|
||||
:header-rows: 1
|
||||
|
||||
* - Type
|
||||
- Description
|
||||
- Example hardware
|
||||
* - Controller
|
||||
- Controller nodes are responsible for running the management software
|
||||
services needed for the OpenStack environment to function.
|
||||
These nodes:
|
||||
|
||||
* Provide the front door that people access as well as the API
|
||||
services that all other components in the environment talk to.
|
||||
* Run a number of services in a highly available fashion,
|
||||
utilizing Pacemaker and HAProxy to provide a virtual IP and
|
||||
load-balancing functions so all controller nodes are being used.
|
||||
* Supply highly available "infrastructure" services,
|
||||
such as MySQL and Qpid, that underpin all the services.
|
||||
* Provide what is known as "persistent storage" through services
|
||||
run on the host as well. This persistent storage is backed onto
|
||||
the storage nodes for reliability.
|
||||
|
||||
See :ref:`controller_node`.
|
||||
- Model: Dell R620
|
||||
|
||||
CPU: 2x Intel® Xeon® CPU E5-2620 0 @ 2.00 GHz
|
||||
|
||||
Memory: 32 GB
|
||||
|
||||
Disk: two 300 GB 10000 RPM SAS Disks
|
||||
|
||||
Network: two 10G network ports
|
||||
* - Compute
|
||||
- Compute nodes run the virtual machine instances in OpenStack. They:
|
||||
|
||||
* Run the bare minimum of services needed to facilitate these
|
||||
instances.
|
||||
* Use local storage on the node for the virtual machines so that
|
||||
no VM migration or instance recovery at node failure is possible.
|
||||
|
||||
See :ref:`compute_node`.
|
||||
- Model: Dell R620
|
||||
|
||||
CPU: 2x Intel® Xeon® CPU E5-2650 0 @ 2.00 GHz
|
||||
|
||||
Memory: 128 GB
|
||||
|
||||
Disk: two 600 GB 10000 RPM SAS Disks
|
||||
|
||||
Network: four 10G network ports (For future proofing expansion)
|
||||
* - Storage
|
||||
- Storage nodes store all the data required for the environment,
|
||||
including disk images in the Image service library, and the
|
||||
persistent storage volumes created by the Block Storage service.
|
||||
Storage nodes use GlusterFS technology to keep the data highly
|
||||
available and scalable.
|
||||
|
||||
See :ref:`storage_node`.
|
||||
- Model: Dell R720xd
|
||||
|
||||
CPU: 2x Intel® Xeon® CPU E5-2620 0 @ 2.00 GHz
|
||||
|
||||
Memory: 64 GB
|
||||
|
||||
Disk: two 500 GB 7200 RPM SAS Disks and twenty-four 600 GB
|
||||
10000 RPM SAS Disks
|
||||
|
||||
Raid Controller: PERC H710P Integrated RAID Controller, 1 GB NV Cache
|
||||
|
||||
Network: two 10G network ports
|
||||
* - Network
|
||||
- Network nodes are responsible for doing all the virtual networking
|
||||
needed for people to create public or private networks and uplink
|
||||
their virtual machines into external networks. Network nodes:
|
||||
|
||||
* Form the only ingress and egress point for instances running
|
||||
on top of OpenStack.
|
||||
* Run all of the environment's networking services, with the
|
||||
exception of the networking API service (which runs on the
|
||||
controller node).
|
||||
|
||||
See :ref:`network_node`.
|
||||
- Model: Dell R620
|
||||
|
||||
CPU: 1x Intel® Xeon® CPU E5-2620 0 @ 2.00 GHz
|
||||
|
||||
Memory: 32 GB
|
||||
|
||||
Disk: two 300 GB 10000 RPM SAS Disks
|
||||
|
||||
Network: five 10G network ports
|
||||
* - Utility
|
||||
- Utility nodes are used by internal administration staff only to
|
||||
provide a number of basic system administration functions needed
|
||||
to get the environment up and running and to maintain the hardware,
|
||||
OS, and software on which it runs.
|
||||
|
||||
These nodes run services such as provisioning, configuration
|
||||
management, monitoring, or GlusterFS management software.
|
||||
They are not required to scale, although these machines are
|
||||
usually backed up.
|
||||
- Model: Dell R620
|
||||
|
||||
CPU: 2x Intel® Xeon® CPU E5-2620 0 @ 2.00 GHz
|
||||
|
||||
Memory: 32 GB
|
||||
|
||||
Disk: two 500 GB 7200 RPM SAS Disks
|
||||
|
||||
Network: two 10G network ports
|
||||
|
||||
|
||||
.. _networking_layout:
|
||||
|
||||
Networking layout
|
||||
-----------------
|
||||
|
||||
The network contains all the management devices for all hardware in the
|
||||
environment (for example, by including Dell iDrac7 devices for the
|
||||
hardware nodes, and management interfaces for network switches). The
|
||||
network is accessed by internal staff only when diagnosing or recovering
|
||||
a hardware issue.
|
||||
|
||||
OpenStack internal network
|
||||
--------------------------
|
||||
|
||||
This network is used for OpenStack management functions and traffic,
|
||||
including services needed for the provisioning of physical nodes
|
||||
(``pxe``, ``tftp``, ``kickstart``), traffic between various OpenStack
|
||||
node types using OpenStack APIs and messages (for example,
|
||||
``nova-compute`` talking to ``keystone`` or ``cinder-volume`` talking to
|
||||
``nova-api``), and all traffic for storage data to the storage layer
|
||||
underneath by the Gluster protocol. All physical nodes have at least one
|
||||
network interface (typically ``eth0``) in this network. This network is
|
||||
only accessible from other VLANs on port 22 (for ``ssh`` access to
|
||||
manage machines).
|
||||
|
||||
Public Network
|
||||
--------------
|
||||
|
||||
This network is a combination of:
|
||||
|
||||
- IP addresses for public-facing interfaces on the controller nodes
|
||||
(which end users will access the OpenStack services)
|
||||
|
||||
- A range of publicly routable, IPv4 network addresses to be used by
|
||||
OpenStack Networking for floating IPs. You may be restricted in your
|
||||
access to IPv4 addresses; a large range of IPv4 addresses is not
|
||||
necessary.
|
||||
|
||||
- Routers for private networks created within OpenStack.
|
||||
|
||||
This network is connected to the controller nodes so users can access
|
||||
the OpenStack interfaces, and connected to the network nodes to provide
|
||||
VMs with publicly routable traffic functionality. The network is also
|
||||
connected to the utility machines so that any utility services that need
|
||||
to be made public (such as system monitoring) can be accessed.
|
||||
|
||||
VM traffic network
|
||||
------------------
|
||||
|
||||
This is a closed network that is not publicly routable and is simply
|
||||
used as a private, internal network for traffic between virtual machines
|
||||
in OpenStack, and between the virtual machines and the network nodes
|
||||
that provide l3 routes out to the public network (and floating IPs for
|
||||
connections back in to the VMs). Because this is a closed network, we
|
||||
are using a different address space to the others to clearly define the
|
||||
separation. Only Compute and OpenStack Networking nodes need to be
|
||||
connected to this network.
|
||||
|
||||
Node connectivity
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
The following section details how the nodes are connected to the
|
||||
different networks (see :ref:`networking_layout`) and
|
||||
what other considerations need to take place (for example, bonding) when
|
||||
connecting nodes to the networks.
|
||||
|
||||
Initial deployment
|
||||
------------------
|
||||
|
||||
Initially, the connection setup should revolve around keeping the
|
||||
connectivity simple and straightforward in order to minimize deployment
|
||||
complexity and time to deploy.
|
||||
The deployment shown in :ref:`figure_basic_node_deployment` aims to
|
||||
have 1 × 10G connectivity available to all compute nodes, while still
|
||||
leveraging bonding on appropriate nodes for maximum performance.
|
||||
|
||||
.. _figure_basic_node_deployment:
|
||||
|
||||
.. figure:: figures/osog_0101.png
|
||||
:alt: Basic node deployment
|
||||
:width: 100%
|
||||
|
||||
Figure. Basic node deployment
|
||||
|
||||
|
||||
Connectivity for maximum performance
|
||||
------------------------------------
|
||||
|
||||
If the networking performance of the basic layout is not enough, you can
|
||||
move to :ref:`figure_performance_node_deployment`, which provides 2 × 10G
|
||||
network links to all instances in the environment as well as providing more
|
||||
network bandwidth to the storage layer.
|
||||
|
||||
.. _figure_performance_node_deployment:
|
||||
|
||||
.. figure:: figures/osog_0102.png
|
||||
:alt: Performance node deployment
|
||||
:width: 100%
|
||||
|
||||
Figure. Performance node deployment
|
||||
|
||||
|
||||
Node diagrams
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
The following diagrams, :ref:`controller_node` through :ref:`storage_node`,
|
||||
include logical information about the different types of nodes, indicating
|
||||
what services will be running on top of them and how they interact with
|
||||
each other. The diagrams also illustrate how the availability and
|
||||
scalability of services are achieved.
|
||||
|
||||
.. _controller_node:
|
||||
|
||||
.. figure:: figures/osog_0103.png
|
||||
:alt: Controller node
|
||||
:width: 100%
|
||||
|
||||
Figure. Controller node
|
||||
|
||||
.. _compute_node:
|
||||
|
||||
.. figure:: figures/osog_0104.png
|
||||
:alt: Compute node
|
||||
:width: 100%
|
||||
|
||||
Figure. Compute node
|
||||
|
||||
.. _network_node:
|
||||
|
||||
.. figure:: figures/osog_0105.png
|
||||
:alt: Network node
|
||||
:width: 100%
|
||||
|
||||
Figure. Network node
|
||||
|
||||
.. _storage_node:
|
||||
|
||||
.. figure:: figures/osog_0106.png
|
||||
:alt: Storage node
|
||||
:width: 100%
|
||||
|
||||
Figure. Storage node
|
||||
|
||||
|
||||
Example Component Configuration
|
||||
-------------------------------
|
||||
|
||||
:ref:`third_party_component_configuration` and
|
||||
:ref:`openstack_component_configuration` include example configuration
|
||||
and considerations for both third-party and OpenStack components:
|
||||
|
||||
.. _third_party_component_configuration:
|
||||
|
||||
.. list-table:: Table. Third-party component configuration
|
||||
:widths: 10 30 30 30
|
||||
:header-rows: 1
|
||||
|
||||
* - Component
|
||||
- Tuning
|
||||
- Availability
|
||||
- Scalability
|
||||
* - MySQL
|
||||
- ``binlog-format = row``
|
||||
- Master/master replication. However, both nodes are not used at the
|
||||
same time. Replication keeps all nodes as close to being up to date
|
||||
as possible (although the asynchronous nature of the replication means
|
||||
a fully consistent state is not possible). Connections to the database
|
||||
only happen through a Pacemaker virtual IP, ensuring that most problems
|
||||
that occur with master-master replication can be avoided.
|
||||
- Not heavily considered. Once load on the MySQL server increases enough
|
||||
that scalability needs to be considered, multiple masters or a
|
||||
master/slave setup can be used.
|
||||
* - Qpid
|
||||
- ``max-connections=1000`` ``worker-threads=20`` ``connection-backlog=10``,
|
||||
sasl security enabled with SASL-BASIC authentication
|
||||
- Qpid is added as a resource to the Pacemaker software that runs on
|
||||
Controller nodes where Qpid is situated. This ensures only one Qpid
|
||||
instance is running at one time, and the node with the Pacemaker
|
||||
virtual IP will always be the node running Qpid.
|
||||
- Not heavily considered. However, Qpid can be changed to run on all
|
||||
controller nodes for scalability and availability purposes,
|
||||
and removed from Pacemaker.
|
||||
* - HAProxy
|
||||
- ``maxconn 3000``
|
||||
- HAProxy is a software layer-7 load balancer used to front door all
|
||||
clustered OpenStack API components and do SSL termination.
|
||||
HAProxy can be added as a resource to the Pacemaker software that
|
||||
runs on the Controller nodes where HAProxy is situated.
|
||||
This ensures that only one HAProxy instance is running at one time,
|
||||
and the node with the Pacemaker virtual IP will always be the node
|
||||
running HAProxy.
|
||||
- Not considered. HAProxy has small enough performance overheads that
|
||||
a single instance should scale enough for this level of workload.
|
||||
If extra scalability is needed, ``keepalived`` or other Layer-4
|
||||
load balancing can be introduced to be placed in front of multiple
|
||||
copies of HAProxy.
|
||||
* - Memcached
|
||||
- ``MAXCONN="8192" CACHESIZE="30457"``
|
||||
- Memcached is a fast in-memory key-value cache software that is used
|
||||
by OpenStack components for caching data and increasing performance.
|
||||
Memcached runs on all controller nodes, ensuring that should one go
|
||||
down, another instance of Memcached is available.
|
||||
- Not considered. A single instance of Memcached should be able to
|
||||
scale to the desired workloads. If scalability is desired, HAProxy
|
||||
can be placed in front of Memcached (in raw ``tcp`` mode) to utilize
|
||||
multiple Memcached instances for scalability. However, this might
|
||||
cause cache consistency issues.
|
||||
* - Pacemaker
|
||||
- Configured to use ``corosync`` and ``cman`` as a cluster communication
|
||||
stack/quorum manager, and as a two-node cluster.
|
||||
- Pacemaker is the clustering software used to ensure the availability
|
||||
of services running on the controller and network nodes:
|
||||
|
||||
* Because Pacemaker is cluster software, the software itself handles
|
||||
its own availability, leveraging ``corosync`` and ``cman``
|
||||
underneath.
|
||||
* If you use the GlusterFS native client, no virtual IP is needed,
|
||||
since the client knows all about nodes after initial connection
|
||||
and automatically routes around failures on the client side.
|
||||
* If you use the NFS or SMB adaptor, you will need a virtual IP on
|
||||
which to mount the GlusterFS volumes.
|
||||
- If more nodes need to be made cluster aware, Pacemaker can scale to
|
||||
64 nodes.
|
||||
* - GlusterFS
|
||||
- ``glusterfs`` performance profile "virt" enabled on all volumes.
|
||||
Volumes are setup in two-node replication.
|
||||
- Glusterfs is a clustered file system that is run on the storage
|
||||
nodes to provide persistent scalable data storage in the environment.
|
||||
Because all connections to gluster use the ``gluster`` native mount
|
||||
points, the ``gluster`` instances themselves provide availability
|
||||
and failover functionality.
|
||||
- The scalability of GlusterFS storage can be achieved by adding in
|
||||
more storage volumes.
|
||||
|
||||
|
|
||||
|
||||
.. _openstack_component_configuration:
|
||||
|
||||
.. list-table:: Table. OpenStack component configuration
|
||||
:widths: 10 10 20 30 30
|
||||
:header-rows: 1
|
||||
|
||||
* - Component
|
||||
- Node type
|
||||
- Tuning
|
||||
- Availability
|
||||
- Scalability
|
||||
* - Dashboard (horizon)
|
||||
- Controller
|
||||
- Configured to use Memcached as a session store, ``neutron``
|
||||
support is enabled, ``can_set_mount_point = False``
|
||||
- The dashboard is run on all controller nodes, ensuring at least one
|
||||
instance will be available in case of node failure.
|
||||
It also sits behind HAProxy, which detects when the software fails
|
||||
and routes requests around the failing instance.
|
||||
- The dashboard is run on all controller nodes, so scalability can be
|
||||
achieved with additional controller nodes. HAProxy allows scalability
|
||||
for the dashboard as more nodes are added.
|
||||
* - Identity (keystone)
|
||||
- Controller
|
||||
- Configured to use Memcached for caching and PKI for tokens.
|
||||
- Identity is run on all controller nodes, ensuring at least one
|
||||
instance will be available in case of node failure.
|
||||
Identity also sits behind HAProxy, which detects when the software
|
||||
fails and routes requests around the failing instance.
|
||||
- Identity is run on all controller nodes, so scalability can be
|
||||
achieved with additional controller nodes.
|
||||
HAProxy allows scalability for Identity as more nodes are added.
|
||||
* - Image service (glance)
|
||||
- Controller
|
||||
- ``/var/lib/glance/images`` is a GlusterFS native mount to a Gluster
|
||||
volume off the storage layer.
|
||||
- The Image service is run on all controller nodes, ensuring at least
|
||||
one instance will be available in case of node failure.
|
||||
It also sits behind HAProxy, which detects when the software fails
|
||||
and routes requests around the failing instance.
|
||||
- The Image service is run on all controller nodes, so scalability
|
||||
can be achieved with additional controller nodes. HAProxy allows
|
||||
scalability for the Image service as more nodes are added.
|
||||
* - Compute (nova)
|
||||
- Controller, Compute
|
||||
- Configured to use Qpid, ``qpid_heartbeat = `` ``10``,configured to
|
||||
use Memcached for caching, configured to use ``libvirt``, configured
|
||||
to use ``neutron``.
|
||||
|
||||
Configured ``nova-consoleauth`` to use Memcached for session
|
||||
management (so that it can have multiple copies and run in a
|
||||
load balancer).
|
||||
- The nova API, scheduler, objectstore, cert, consoleauth, conductor,
|
||||
and vncproxy services are run on all controller nodes, ensuring at
|
||||
least one instance will be available in case of node failure.
|
||||
Compute is also behind HAProxy, which detects when the software
|
||||
fails and routes requests around the failing instance.
|
||||
|
||||
Nova-compute and nova-conductor services, which run on the compute
|
||||
nodes, are only needed to run services on that node, so availability
|
||||
of those services is coupled tightly to the nodes that are available.
|
||||
As long as a compute node is up, it will have the needed services
|
||||
running on top of it.
|
||||
- The nova API, scheduler, objectstore, cert, consoleauth, conductor,
|
||||
and vncproxy services are run on all controller nodes, so scalability
|
||||
can be achieved with additional controller nodes. HAProxy allows
|
||||
scalability for Compute as more nodes are added. The scalability
|
||||
of services running on the compute nodes (compute, conductor) is
|
||||
achieved linearly by adding in more compute nodes.
|
||||
* - Block Storage (cinder)
|
||||
- Controller
|
||||
- Configured to use Qpid, ``qpid_heartbeat = ``10``,configured to
|
||||
use a Gluster volume from the storage layer as the back end for
|
||||
Block Storage, using the Gluster native client.
|
||||
- Block Storage API, scheduler, and volume services are run on all
|
||||
controller nodes, ensuring at least one instance will be available
|
||||
in case of node failure. Block Storage also sits behind HAProxy,
|
||||
which detects if the software fails and routes requests around the
|
||||
failing instance.
|
||||
- Block Storage API, scheduler and volume services are run on all
|
||||
controller nodes, so scalability can be achieved with additional
|
||||
controller nodes. HAProxy allows scalability for Block Storage as
|
||||
more nodes are added.
|
||||
* - OpenStack Networking (neutron)
|
||||
- Controller, Compute, Network
|
||||
- Configured to use QPID, ``qpid_heartbeat = 10``, kernel namespace
|
||||
support enabled, ``tenant_network_type = vlan``,
|
||||
``allow_overlapping_ips = true``, ``tenant_network_type = vlan``,
|
||||
``bridge_uplinks = br-ex:em2``, ``bridge_mappings = physnet1:br-ex``
|
||||
- The OpenStack Networking service is run on all controller nodes,
|
||||
ensuring at least one instance will be available in case of node
|
||||
failure. It also sits behind HAProxy, which detects if the software
|
||||
fails and routes requests around the failing instance.
|
||||
- The OpenStack Networking server service is run on all controller
|
||||
nodes, so scalability can be achieved with additional controller
|
||||
nodes. HAProxy allows scalability for OpenStack Networking as more
|
||||
nodes are added. Scalability of services running on the network
|
||||
nodes is not currently supported by OpenStack Networking, so they
|
||||
are not be considered. One copy of the services should be sufficient
|
||||
to handle the workload. Scalability of the ``ovs-agent`` running on
|
||||
compute nodes is achieved by adding in more compute nodes as
|
||||
necessary.
|
@ -1,261 +0,0 @@
|
||||
===============================================
|
||||
Example Architecture — Legacy Networking (nova)
|
||||
===============================================
|
||||
|
||||
This particular example architecture has been upgraded from :term:`Grizzly` to
|
||||
:term:`Havana` and tested in production environments where many public IP
|
||||
addresses are available for assignment to multiple instances. You can
|
||||
find a second example architecture that uses OpenStack Networking
|
||||
(neutron) after this section. Each example offers high availability,
|
||||
meaning that if a particular node goes down, another node with the same
|
||||
configuration can take over the tasks so that the services continue to
|
||||
be available.
|
||||
|
||||
Overview
|
||||
~~~~~~~~
|
||||
|
||||
The simplest architecture you can build upon for Compute has a single
|
||||
cloud controller and multiple compute nodes. The simplest architecture
|
||||
for Object Storage has five nodes: one for identifying users and
|
||||
proxying requests to the API, then four for storage itself to provide
|
||||
enough replication for eventual consistency. This example architecture
|
||||
does not dictate a particular number of nodes, but shows the thinking
|
||||
and considerations that went into choosing this architecture including
|
||||
the features offered.
|
||||
|
||||
Components
|
||||
~~~~~~~~~~
|
||||
|
||||
.. list-table::
|
||||
:widths: 50 50
|
||||
:header-rows: 1
|
||||
|
||||
* - Component
|
||||
- Details
|
||||
* - OpenStack release
|
||||
- Havana
|
||||
* - Host operating system
|
||||
- Ubuntu 12.04 LTS or Red Hat Enterprise Linux 6.5,
|
||||
including derivatives such as CentOS and Scientific Linux
|
||||
* - OpenStack package repository
|
||||
- `Ubuntu Cloud Archive <https://wiki.ubuntu.com/ServerTeam/CloudArchive>`_
|
||||
or `RDO <http://openstack.redhat.com/Frequently_Asked_Questions>`_
|
||||
* - Hypervisor
|
||||
- KVM
|
||||
* - Database
|
||||
- MySQL\*
|
||||
* - Message queue
|
||||
- RabbitMQ for Ubuntu; Qpid for Red Hat Enterprise Linux and derivatives
|
||||
* - Networking service
|
||||
- ``nova-network``
|
||||
* - Network manager
|
||||
- FlatDHCP
|
||||
* - Single ``nova-network`` or multi-host?
|
||||
- multi-host\*
|
||||
* - Image service (glance) back end
|
||||
- file
|
||||
* - Identity (keystone) driver
|
||||
- SQL
|
||||
* - Block Storage (cinder) back end
|
||||
- LVM/iSCSI
|
||||
* - Live Migration back end
|
||||
- Shared storage using NFS\*
|
||||
* - Object storage
|
||||
- OpenStack Object Storage (swift)
|
||||
|
||||
An asterisk (\*) indicates when the example architecture deviates from
|
||||
the settings of a default installation. We'll offer explanations for
|
||||
those deviations next.
|
||||
|
||||
.. note::
|
||||
|
||||
The following features of OpenStack are supported by the example
|
||||
architecture documented in this guide, but are optional:
|
||||
|
||||
- :term:`Dashboard <Dashboard (horizon)>`: You probably want to offer
|
||||
a dashboard, but your users may be more interested in API access only.
|
||||
|
||||
- :term:`Block storage <Block Storage service (cinder)>`:
|
||||
You don't have to offer users block storage if their use case only
|
||||
needs ephemeral storage on compute nodes, for example.
|
||||
|
||||
- :term:`Floating IP address <floating IP address>`:
|
||||
Floating IP addresses are public IP addresses that you allocate
|
||||
from a predefined pool to assign to virtual machines at launch.
|
||||
Floating IP address ensure that the public IP address is available
|
||||
whenever an instance is booted. Not every organization can offer
|
||||
thousands of public floating IP addresses for thousands of
|
||||
instances, so this feature is considered optional.
|
||||
|
||||
- :term:`Live migration <live migration>`: If you need to move
|
||||
running virtual machine instances from one host to another with
|
||||
little or no service interruption, you would enable live migration,
|
||||
but it is considered optional.
|
||||
|
||||
- :term:`Object storage <Object Storage service (swift)>`: You may
|
||||
choose to store machine images on a file system rather than in
|
||||
object storage if you do not have the extra hardware for the
|
||||
required replication and redundancy that OpenStack Object Storage
|
||||
offers.
|
||||
|
||||
Rationale
|
||||
~~~~~~~~~
|
||||
|
||||
This example architecture has been selected based on the current default
|
||||
feature set of OpenStack Havana, with an emphasis on stability. We
|
||||
believe that many clouds that currently run OpenStack in production have
|
||||
made similar choices.
|
||||
|
||||
You must first choose the operating system that runs on all of the
|
||||
physical nodes. While OpenStack is supported on several distributions of
|
||||
Linux, we used *Ubuntu 12.04 LTS (Long Term Support)*, which is used by
|
||||
the majority of the development community, has feature completeness
|
||||
compared with other distributions and has clear future support plans.
|
||||
|
||||
We recommend that you do not use the default Ubuntu OpenStack install
|
||||
packages and instead use the `Ubuntu Cloud
|
||||
Archive <https://wiki.ubuntu.com/ServerTeam/CloudArchive>`__. The Cloud
|
||||
Archive is a package repository supported by Canonical that allows you
|
||||
to upgrade to future OpenStack releases while remaining on Ubuntu 12.04.
|
||||
|
||||
*KVM* as a :term:`hypervisor` complements the choice of Ubuntu—being a
|
||||
matched pair in terms of support, and also because of the significant degree
|
||||
of attention it garners from the OpenStack development community (including
|
||||
the authors, who mostly use KVM). It is also feature complete, free from
|
||||
licensing charges and restrictions.
|
||||
|
||||
*MySQL* follows a similar trend. Despite its recent change of ownership,
|
||||
this database is the most tested for use with OpenStack and is heavily
|
||||
documented. We deviate from the default database, *SQLite*, because
|
||||
SQLite is not an appropriate database for production usage.
|
||||
|
||||
The choice of *RabbitMQ* over other
|
||||
:term:`AMQP <Advanced Message Queuing Protocol (AMQP)>` compatible options
|
||||
that are gaining support in OpenStack, such as ZeroMQ and Qpid, is due to its
|
||||
ease of use and significant testing in production. It also is the only
|
||||
option that supports features such as Compute cells. We recommend
|
||||
clustering with RabbitMQ, as it is an integral component of the system
|
||||
and fairly simple to implement due to its inbuilt nature.
|
||||
|
||||
As discussed in previous chapters, there are several options for
|
||||
networking in OpenStack Compute. We recommend *FlatDHCP* and to use
|
||||
*Multi-Host* networking mode for high availability, running one
|
||||
``nova-network`` daemon per OpenStack compute host. This provides a
|
||||
robust mechanism for ensuring network interruptions are isolated to
|
||||
individual compute hosts, and allows for the direct use of hardware
|
||||
network gateways.
|
||||
|
||||
*Live Migration* is supported by way of shared storage, with *NFS* as
|
||||
the distributed file system.
|
||||
|
||||
Acknowledging that many small-scale deployments see running Object
|
||||
Storage just for the storage of virtual machine images as too costly, we
|
||||
opted for the file back end in the OpenStack :term:`Image service (Glance)`.
|
||||
If your cloud will include Object Storage, you can easily add it as a back
|
||||
end.
|
||||
|
||||
We chose the *SQL back end for Identity* over others, such as LDAP. This
|
||||
back end is simple to install and is robust. The authors acknowledge
|
||||
that many installations want to bind with existing directory services
|
||||
and caution careful understanding of the `array of options available
|
||||
<https://docs.openstack.org/ocata/config-reference/identity/options.html#keystone-ldap>`_.
|
||||
|
||||
Block Storage (cinder) is installed natively on external storage nodes
|
||||
and uses the *LVM/iSCSI plug-in*. Most Block Storage plug-ins are tied
|
||||
to particular vendor products and implementations limiting their use to
|
||||
consumers of those hardware platforms, but LVM/iSCSI is robust and
|
||||
stable on commodity hardware.
|
||||
|
||||
While the cloud can be run without the *OpenStack Dashboard*, we
|
||||
consider it to be indispensable, not just for user interaction with the
|
||||
cloud, but also as a tool for operators. Additionally, the dashboard's
|
||||
use of Django makes it a flexible framework for extension.
|
||||
|
||||
Why not use OpenStack Networking?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This example architecture does not use OpenStack Networking, because it
|
||||
does not yet support multi-host networking and our organizations
|
||||
(university, government) have access to a large range of
|
||||
publicly-accessible IPv4 addresses.
|
||||
|
||||
Why use multi-host networking?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
In a default OpenStack deployment, there is a single ``nova-network``
|
||||
service that runs within the cloud (usually on the cloud controller)
|
||||
that provides services such as
|
||||
:term:`Network Address Translation (NAT)`, :term:`DHCP <Dynamic Host
|
||||
Configuration Protocol (DHCP)>`, and :term:`DNS <Domain Name System (DNS)>`
|
||||
to the guest instances. If the single node that runs the ``nova-network``
|
||||
service goes down, you cannot access your instances, and the instances
|
||||
cannot access the Internet. The single node that runs the ``nova-network``
|
||||
service can become a bottleneck if excessive network traffic comes in and
|
||||
goes out of the cloud.
|
||||
|
||||
.. tip::
|
||||
|
||||
`Multi-host <https://docs.openstack.org/havana/install-guide/install/apt/content/nova-network.html>`_
|
||||
is a high-availability option for the network configuration, where
|
||||
the ``nova-network`` service is run on every compute node instead of
|
||||
running on only a single node.
|
||||
|
||||
Detailed Description
|
||||
--------------------
|
||||
|
||||
The reference architecture consists of multiple compute nodes, a cloud
|
||||
controller, an external NFS storage server for instance storage, and an
|
||||
OpenStack Block Storage server for volume storage.
|
||||
A network time service (:term:`Network Time Protocol (NTP)`)
|
||||
synchronizes time on all the nodes. FlatDHCPManager in
|
||||
multi-host mode is used for the networking. A logical diagram for this
|
||||
example architecture shows which services are running on each node:
|
||||
|
||||
.. image:: figures/osog_01in01.png
|
||||
:width: 100%
|
||||
|
||||
|
|
||||
|
||||
The cloud controller runs the dashboard, the API services, the database
|
||||
(MySQL), a message queue server (RabbitMQ), the scheduler for choosing
|
||||
compute resources (``nova-scheduler``), Identity services (keystone,
|
||||
``nova-consoleauth``), Image services (``glance-api``,
|
||||
``glance-registry``), services for console access of guests, and Block
|
||||
Storage services, including the scheduler for storage resources
|
||||
(``cinder-api`` and ``cinder-scheduler``).
|
||||
|
||||
Compute nodes are where the computing resources are held, and in our
|
||||
example architecture, they run the hypervisor (KVM), libvirt (the driver
|
||||
for the hypervisor, which enables live migration from node to node),
|
||||
``nova-compute``, ``nova-api-metadata`` (generally only used when
|
||||
running in multi-host mode, it retrieves instance-specific metadata),
|
||||
``nova-vncproxy``, and ``nova-network``.
|
||||
|
||||
The network consists of two switches, one for the management or private
|
||||
traffic, and one that covers public access, including floating IPs. To
|
||||
support this, the cloud controller and the compute nodes have two
|
||||
network cards. The OpenStack Block Storage and NFS storage servers only
|
||||
need to access the private network and therefore only need one network
|
||||
card, but multiple cards run in a bonded configuration are recommended
|
||||
if possible. Floating IP access is direct to the Internet, whereas Flat
|
||||
IP access goes through a NAT. To envision the network traffic, use this
|
||||
diagram:
|
||||
|
||||
.. image:: figures/osog_01in02.png
|
||||
:width: 100%
|
||||
|
||||
|
|
||||
|
||||
Optional Extensions
|
||||
-------------------
|
||||
|
||||
You can extend this reference architecture as follows:
|
||||
|
||||
- Add additional cloud controllers (see :doc:`ops-maintenance`).
|
||||
|
||||
- Add an OpenStack Storage service (see the Object Storage chapter in
|
||||
the `Installation Tutorials and Guides
|
||||
<https://docs.openstack.org/project-install-guide/ocata/>`_ for your distribution).
|
||||
|
||||
- Add additional OpenStack Block Storage hosts (see
|
||||
:doc:`ops-maintenance`).
|
@ -1,12 +0,0 @@
|
||||
=========================================
|
||||
Parting Thoughts on Architecture Examples
|
||||
=========================================
|
||||
|
||||
With so many considerations and options available, our hope is to
|
||||
provide a few clearly-marked and tested paths for your OpenStack
|
||||
exploration. If you're looking for additional ideas, check out
|
||||
:doc:`app-usecases`, the
|
||||
`Installation Tutorials and Guides
|
||||
<https://docs.openstack.org/project-install-guide/ocata/>`_, or the
|
||||
`OpenStack User Stories
|
||||
page <https://www.openstack.org/user-stories/>`_.
|
@ -1,30 +0,0 @@
|
||||
=====================
|
||||
Architecture Examples
|
||||
=====================
|
||||
|
||||
To understand the possibilities that OpenStack offers, it's best to
|
||||
start with basic architecture that has been tested in production
|
||||
environments. We offer two examples with basic pivots on the base
|
||||
operating system (Ubuntu and Red Hat Enterprise Linux) and the
|
||||
networking architecture. There are other differences between these two
|
||||
examples and this guide provides reasons for each choice made.
|
||||
|
||||
Because OpenStack is highly configurable, with many different back ends
|
||||
and network configuration options, it is difficult to write
|
||||
documentation that covers all possible OpenStack deployments. Therefore,
|
||||
this guide defines examples of architecture to simplify the task of
|
||||
documenting, as well as to provide the scope for this guide. Both of the
|
||||
offered architecture examples are currently running in production and
|
||||
serving users.
|
||||
|
||||
.. tip::
|
||||
|
||||
As always, refer to the :doc:`common/glossary` if you are unclear
|
||||
about any of the terminology mentioned in architecture examples.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
arch-example-nova-network.rst
|
||||
arch-example-neutron.rst
|
||||
arch-example-thoughts.rst
|
@ -1,293 +0,0 @@
|
||||
==============
|
||||
Network Design
|
||||
==============
|
||||
|
||||
OpenStack provides a rich networking environment, and this chapter
|
||||
details the requirements and options to deliberate when designing your
|
||||
cloud.
|
||||
|
||||
.. warning::
|
||||
|
||||
If this is the first time you are deploying a cloud infrastructure
|
||||
in your organization, after reading this section, your first
|
||||
conversations should be with your networking team. Network usage in
|
||||
a running cloud is vastly different from traditional network
|
||||
deployments and has the potential to be disruptive at both a
|
||||
connectivity and a policy level.
|
||||
|
||||
For example, you must plan the number of IP addresses that you need for
|
||||
both your guest instances as well as management infrastructure.
|
||||
Additionally, you must research and discuss cloud network connectivity
|
||||
through proxy servers and firewalls.
|
||||
|
||||
In this chapter, we'll give some examples of network implementations to
|
||||
consider and provide information about some of the network layouts that
|
||||
OpenStack uses. Finally, we have some brief notes on the networking
|
||||
services that are essential for stable operation.
|
||||
|
||||
Management Network
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
A :term:`management network` (a separate network for use by your cloud
|
||||
operators) typically consists of a separate switch and separate NICs
|
||||
(network interface cards), and is a recommended option. This segregation
|
||||
prevents system administration and the monitoring of system access from
|
||||
being disrupted by traffic generated by guests.
|
||||
|
||||
Consider creating other private networks for communication between
|
||||
internal components of OpenStack, such as the message queue and
|
||||
OpenStack Compute. Using a virtual local area network (VLAN) works well
|
||||
for these scenarios because it provides a method for creating multiple
|
||||
virtual networks on a physical network.
|
||||
|
||||
Public Addressing Options
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
There are two main types of IP addresses for guest virtual machines:
|
||||
fixed IPs and floating IPs. Fixed IPs are assigned to instances on boot,
|
||||
whereas floating IP addresses can change their association between
|
||||
instances by action of the user. Both types of IP addresses can be
|
||||
either public or private, depending on your use case.
|
||||
|
||||
Fixed IP addresses are required, whereas it is possible to run OpenStack
|
||||
without floating IPs. One of the most common use cases for floating IPs
|
||||
is to provide public IP addresses to a private cloud, where there are a
|
||||
limited number of IP addresses available. Another is for a public cloud
|
||||
user to have a "static" IP address that can be reassigned when an
|
||||
instance is upgraded or moved.
|
||||
|
||||
Fixed IP addresses can be private for private clouds, or public for
|
||||
public clouds. When an instance terminates, its fixed IP is lost. It is
|
||||
worth noting that newer users of cloud computing may find their
|
||||
ephemeral nature frustrating.
|
||||
|
||||
IP Address Planning
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
An OpenStack installation can potentially have many subnets (ranges of
|
||||
IP addresses) and different types of services in each. An IP address
|
||||
plan can assist with a shared understanding of network partition
|
||||
purposes and scalability. Control services can have public and private
|
||||
IP addresses, and as noted above, there are a couple of options for an
|
||||
instance's public addresses.
|
||||
|
||||
An IP address plan might be broken down into the following sections:
|
||||
|
||||
Subnet router
|
||||
Packets leaving the subnet go via this address, which could be a
|
||||
dedicated router or a ``nova-network`` service.
|
||||
|
||||
Control services public interfaces
|
||||
Public access to ``swift-proxy``, ``nova-api``, ``glance-api``, and
|
||||
horizon come to these addresses, which could be on one side of a
|
||||
load balancer or pointing at individual machines.
|
||||
|
||||
Object Storage cluster internal communications
|
||||
Traffic among object/account/container servers and between these and
|
||||
the proxy server's internal interface uses this private network.
|
||||
|
||||
Compute and storage communications
|
||||
If ephemeral or block storage is external to the compute node, this
|
||||
network is used.
|
||||
|
||||
Out-of-band remote management
|
||||
If a dedicated remote access controller chip is included in servers,
|
||||
often these are on a separate network.
|
||||
|
||||
In-band remote management
|
||||
Often, an extra (such as 1 GB) interface on compute or storage nodes
|
||||
is used for system administrators or monitoring tools to access the
|
||||
host instead of going through the public interface.
|
||||
|
||||
Spare space for future growth
|
||||
Adding more public-facing control services or guest instance IPs
|
||||
should always be part of your plan.
|
||||
|
||||
For example, take a deployment that has both OpenStack Compute and
|
||||
Object Storage, with private ranges 172.22.42.0/24 and 172.22.87.0/26
|
||||
available. One way to segregate the space might be as follows:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
172.22.42.0/24:
|
||||
172.22.42.1 - 172.22.42.3 - subnet routers
|
||||
172.22.42.4 - 172.22.42.20 - spare for networks
|
||||
172.22.42.21 - 172.22.42.104 - Compute node remote access controllers
|
||||
(inc spare)
|
||||
172.22.42.105 - 172.22.42.188 - Compute node management interfaces (inc spare)
|
||||
172.22.42.189 - 172.22.42.208 - Swift proxy remote access controllers
|
||||
(inc spare)
|
||||
172.22.42.209 - 172.22.42.228 - Swift proxy management interfaces (inc spare)
|
||||
172.22.42.229 - 172.22.42.252 - Swift storage servers remote access controllers
|
||||
(inc spare)
|
||||
172.22.42.253 - 172.22.42.254 - spare
|
||||
172.22.87.0/26:
|
||||
172.22.87.1 - 172.22.87.3 - subnet routers
|
||||
172.22.87.4 - 172.22.87.24 - Swift proxy server internal interfaces
|
||||
(inc spare)
|
||||
172.22.87.25 - 172.22.87.63 - Swift object server internal interfaces
|
||||
(inc spare)
|
||||
|
||||
A similar approach can be taken with public IP addresses, taking note
|
||||
that large, flat ranges are preferred for use with guest instance IPs.
|
||||
Take into account that for some OpenStack networking options, a public
|
||||
IP address in the range of a guest instance public IP address is
|
||||
assigned to the ``nova-compute`` host.
|
||||
|
||||
Network Topology
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
OpenStack Compute with ``nova-network`` provides predefined network
|
||||
deployment models, each with its own strengths and weaknesses. The
|
||||
selection of a network manager changes your network topology, so the
|
||||
choice should be made carefully. You also have a choice between the
|
||||
tried-and-true legacy ``nova-network`` settings or the neutron project
|
||||
for OpenStack Networking. Both offer networking for launched instances
|
||||
with different implementations and requirements.
|
||||
|
||||
For OpenStack Networking with the neutron project, typical
|
||||
configurations are documented with the idea that any setup you can
|
||||
configure with real hardware you can re-create with a software-defined
|
||||
equivalent. Each tenant can contain typical network elements such as
|
||||
routers, and services such as :term:`DHCP <Dynamic Host Configuration
|
||||
Protocol (DHCP)>`.
|
||||
|
||||
:ref:`table_networking_deployment` describes the networking deployment
|
||||
options for both legacy ``nova-network`` options and an equivalent
|
||||
neutron configuration.
|
||||
|
||||
.. _table_networking_deployment:
|
||||
|
||||
.. list-table:: Networking deployment options
|
||||
:widths: 10 30 30 30
|
||||
:header-rows: 1
|
||||
|
||||
* - Network deployment model
|
||||
- Strengths
|
||||
- Weaknesses
|
||||
- Neutron equivalent
|
||||
* - Flat
|
||||
- Extremely simple topology. No DHCP overhead.
|
||||
- Requires file injection into the instance to configure network
|
||||
interfaces.
|
||||
- Configure a single bridge as the integration bridge (br-int) and
|
||||
connect it to a physical network interface with the Modular Layer 2
|
||||
(ML2) plug-in, which uses Open vSwitch by default.
|
||||
* - FlatDHCP
|
||||
- Relatively simple to deploy. Standard networking. Works with all guest
|
||||
operating systems.
|
||||
- Requires its own DHCP broadcast domain.
|
||||
- Configure DHCP agents and routing agents. Network Address Translation
|
||||
(NAT) performed outside of compute nodes, typically on one or more
|
||||
network nodes.
|
||||
* - VlanManager
|
||||
- Each tenant is isolated to its own VLANs.
|
||||
- More complex to set up. Requires its own DHCP broadcast domain.
|
||||
Requires many VLANs to be trunked onto a single port. Standard VLAN
|
||||
number limitation. Switches must support 802.1q VLAN tagging.
|
||||
- Isolated tenant networks implement some form of isolation of layer 2
|
||||
traffic between distinct networks. VLAN tagging is key concept, where
|
||||
traffic is “tagged” with an ordinal identifier for the VLAN. Isolated
|
||||
network implementations may or may not include additional services like
|
||||
DHCP, NAT, and routing.
|
||||
* - FlatDHCP Multi-host with high availability (HA)
|
||||
- Networking failure is isolated to the VMs running on the affected
|
||||
hypervisor. DHCP traffic can be isolated within an individual host.
|
||||
Network traffic is distributed to the compute nodes.
|
||||
- More complex to set up. Compute nodes typically need IP addresses
|
||||
accessible by external networks. Options must be carefully configured
|
||||
for live migration to work with networking services.
|
||||
- Configure neutron with multiple DHCP and layer-3 agents. Network nodes
|
||||
are not able to failover to each other, so the controller runs
|
||||
networking services, such as DHCP. Compute nodes run the ML2 plug-in
|
||||
with support for agents such as Open vSwitch or Linux Bridge.
|
||||
|
||||
Both ``nova-network`` and neutron services provide similar capabilities,
|
||||
such as VLAN between VMs. You also can provide multiple NICs on VMs with
|
||||
either service. Further discussion follows.
|
||||
|
||||
VLAN Configuration Within OpenStack VMs
|
||||
---------------------------------------
|
||||
|
||||
VLAN configuration can be as simple or as complicated as desired. The
|
||||
use of VLANs has the benefit of allowing each project its own subnet and
|
||||
broadcast segregation from other projects. To allow OpenStack to
|
||||
efficiently use VLANs, you must allocate a VLAN range (one for each
|
||||
project) and turn each compute node switch port into a trunk
|
||||
port.
|
||||
|
||||
For example, if you estimate that your cloud must support a maximum of
|
||||
100 projects, pick a free VLAN range that your network infrastructure is
|
||||
currently not using (such as VLAN 200–299). You must configure OpenStack
|
||||
with this range and also configure your switch ports to allow VLAN
|
||||
traffic from that range.
|
||||
|
||||
Multi-NIC Provisioning
|
||||
----------------------
|
||||
|
||||
OpenStack Networking with ``neutron`` and OpenStack Compute with
|
||||
``nova-network`` have the ability to assign multiple NICs to instances. For
|
||||
``nova-network`` this can be done on a per-request basis, with each
|
||||
additional NIC using up an entire subnet or VLAN, reducing the total
|
||||
number of supported projects.
|
||||
|
||||
Multi-Host and Single-Host Networking
|
||||
-------------------------------------
|
||||
|
||||
The ``nova-network`` service has the ability to operate in a multi-host
|
||||
or single-host mode. Multi-host is when each compute node runs a copy of
|
||||
``nova-network`` and the instances on that compute node use the compute
|
||||
node as a gateway to the Internet. The compute nodes also host the
|
||||
floating IPs and security groups for instances on that node. Single-host
|
||||
is when a central server—for example, the cloud controller—runs the
|
||||
``nova-network`` service. All compute nodes forward traffic from the
|
||||
instances to the cloud controller. The cloud controller then forwards
|
||||
traffic to the Internet. The cloud controller hosts the floating IPs and
|
||||
security groups for all instances on all compute nodes in the
|
||||
cloud.
|
||||
|
||||
There are benefits to both modes. Single-node has the downside of a
|
||||
single point of failure. If the cloud controller is not available,
|
||||
instances cannot communicate on the network. This is not true with
|
||||
multi-host, but multi-host requires that each compute node has a public
|
||||
IP address to communicate on the Internet. If you are not able to obtain
|
||||
a significant block of public IP addresses, multi-host might not be an
|
||||
option.
|
||||
|
||||
Services for Networking
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
OpenStack, like any network application, has a number of standard
|
||||
considerations to apply, such as NTP and DNS.
|
||||
|
||||
NTP
|
||||
---
|
||||
|
||||
Time synchronization is a critical element to ensure continued operation
|
||||
of OpenStack components. Correct time is necessary to avoid errors in
|
||||
instance scheduling, replication of objects in the object store, and
|
||||
even matching log timestamps for debugging.
|
||||
|
||||
All servers running OpenStack components should be able to access an
|
||||
appropriate NTP server. You may decide to set up one locally or use the
|
||||
public pools available from the `Network Time Protocol
|
||||
project <http://www.pool.ntp.org/>`_.
|
||||
|
||||
DNS
|
||||
---
|
||||
|
||||
OpenStack does not currently provide DNS services, aside from the
|
||||
dnsmasq daemon, which resides on ``nova-network`` hosts. You could
|
||||
consider providing a dynamic DNS service to allow instances to update a
|
||||
DNS entry with new IP addresses. You can also consider making a generic
|
||||
forward and reverse DNS mapping for instances' IP addresses, such as
|
||||
vm-203-0-113-123.example.com.
|
||||
|
||||
Conclusion
|
||||
~~~~~~~~~~
|
||||
|
||||
Armed with your IP address layout and numbers and knowledge about the
|
||||
topologies and services you can use, it's now time to prepare the
|
||||
network for your installation. Be sure to also check out the `OpenStack
|
||||
Security Guide <https://docs.openstack.org/security-guide/>`_ for tips on securing
|
||||
your network. We wish you a good relationship with your networking team!
|
@ -1,251 +0,0 @@
|
||||
===========================
|
||||
Provisioning and Deployment
|
||||
===========================
|
||||
|
||||
A critical part of a cloud's scalability is the amount of effort that it
|
||||
takes to run your cloud. To minimize the operational cost of running
|
||||
your cloud, set up and use an automated deployment and configuration
|
||||
infrastructure with a configuration management system, such as :term:`Puppet`
|
||||
or :term:`Chef`. Combined, these systems greatly reduce manual effort and the
|
||||
chance for operator error.
|
||||
|
||||
This infrastructure includes systems to automatically install the
|
||||
operating system's initial configuration and later coordinate the
|
||||
configuration of all services automatically and centrally, which reduces
|
||||
both manual effort and the chance for error. Examples include Ansible,
|
||||
CFEngine, Chef, Puppet, and Salt. You can even use OpenStack to deploy
|
||||
OpenStack, named TripleO (OpenStack On OpenStack).
|
||||
|
||||
Automated Deployment
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
An automated deployment system installs and configures operating systems
|
||||
on new servers, without intervention, after the absolute minimum amount
|
||||
of manual work, including physical racking, MAC-to-IP assignment, and
|
||||
power configuration. Typically, solutions rely on wrappers around PXE
|
||||
boot and TFTP servers for the basic operating system install and then
|
||||
hand off to an automated configuration management system.
|
||||
|
||||
Both Ubuntu and Red Hat Enterprise Linux include mechanisms for
|
||||
configuring the operating system, including preseed and kickstart, that
|
||||
you can use after a network boot. Typically, these are used to bootstrap
|
||||
an automated configuration system. Alternatively, you can use an
|
||||
image-based approach for deploying the operating system, such as
|
||||
systemimager. You can use both approaches with a virtualized
|
||||
infrastructure, such as when you run VMs to separate your control
|
||||
services and physical infrastructure.
|
||||
|
||||
When you create a deployment plan, focus on a few vital areas because
|
||||
they are very hard to modify post deployment. The next two sections talk
|
||||
about configurations for:
|
||||
|
||||
- Disk partitioning and disk array setup for scalability
|
||||
|
||||
- Networking configuration just for PXE booting
|
||||
|
||||
Disk Partitioning and RAID
|
||||
--------------------------
|
||||
|
||||
At the very base of any operating system are the hard drives on which
|
||||
the operating system (OS) is installed.
|
||||
|
||||
You must complete the following configurations on the server's hard
|
||||
drives:
|
||||
|
||||
- Partitioning, which provides greater flexibility for layout of
|
||||
operating system and swap space, as described below.
|
||||
|
||||
- Adding to a RAID array (RAID stands for redundant array of
|
||||
independent disks), based on the number of disks you have available,
|
||||
so that you can add capacity as your cloud grows. Some options are
|
||||
described in more detail below.
|
||||
|
||||
The simplest option to get started is to use one hard drive with two
|
||||
partitions:
|
||||
|
||||
- File system to store files and directories, where all the data lives,
|
||||
including the root partition that starts and runs the system.
|
||||
|
||||
- Swap space to free up memory for processes, as an independent area of
|
||||
the physical disk used only for swapping and nothing else.
|
||||
|
||||
RAID is not used in this simplistic one-drive setup because generally
|
||||
for production clouds, you want to ensure that if one disk fails,
|
||||
another can take its place. Instead, for production, use more than one
|
||||
disk. The number of disks determine what types of RAID arrays to build.
|
||||
|
||||
We recommend that you choose one of the following multiple disk options:
|
||||
|
||||
Option 1
|
||||
Partition all drives in the same way in a horizontal fashion, as
|
||||
shown in :ref:`partition_setup`.
|
||||
|
||||
With this option, you can assign different partitions to different
|
||||
RAID arrays. You can allocate partition 1 of disk one and two to the
|
||||
``/boot`` partition mirror. You can make partition 2 of all disks
|
||||
the root partition mirror. You can use partition 3 of all disks for
|
||||
a ``cinder-volumes`` LVM partition running on a RAID 10 array.
|
||||
|
||||
.. _partition_setup:
|
||||
|
||||
.. figure:: figures/osog_0201.png
|
||||
|
||||
Figure. Partition setup of drives
|
||||
|
||||
While you might end up with unused partitions, such as partition 1
|
||||
in disk three and four of this example, this option allows for
|
||||
maximum utilization of disk space. I/O performance might be an issue
|
||||
as a result of all disks being used for all tasks.
|
||||
|
||||
Option 2
|
||||
Add all raw disks to one large RAID array, either hardware or
|
||||
software based. You can partition this large array with the boot,
|
||||
root, swap, and LVM areas. This option is simple to implement and
|
||||
uses all partitions. However, disk I/O might suffer.
|
||||
|
||||
Option 3
|
||||
Dedicate entire disks to certain partitions. For example, you could
|
||||
allocate disk one and two entirely to the boot, root, and swap
|
||||
partitions under a RAID 1 mirror. Then, allocate disk three and four
|
||||
entirely to the LVM partition, also under a RAID 1 mirror. Disk I/O
|
||||
should be better because I/O is focused on dedicated tasks. However,
|
||||
the LVM partition is much smaller.
|
||||
|
||||
.. tip::
|
||||
|
||||
You may find that you can automate the partitioning itself. For
|
||||
example, MIT uses `Fully Automatic Installation
|
||||
(FAI) <http://fai-project.org/>`_ to do the initial PXE-based
|
||||
partition and then install using a combination of min/max and
|
||||
percentage-based partitioning.
|
||||
|
||||
As with most architecture choices, the right answer depends on your
|
||||
environment. If you are using existing hardware, you know the disk
|
||||
density of your servers and can determine some decisions based on the
|
||||
options above. If you are going through a procurement process, your
|
||||
user's requirements also help you determine hardware purchases. Here are
|
||||
some examples from a private cloud providing web developers custom
|
||||
environments at AT&T. This example is from a specific deployment, so
|
||||
your existing hardware or procurement opportunity may vary from this.
|
||||
AT&T uses three types of hardware in its deployment:
|
||||
|
||||
- Hardware for controller nodes, used for all stateless OpenStack API
|
||||
services. About 32–64 GB memory, small attached disk, one processor,
|
||||
varied number of cores, such as 6–12.
|
||||
|
||||
- Hardware for compute nodes. Typically 256 or 144 GB memory, two
|
||||
processors, 24 cores. 4–6 TB direct attached storage, typically in a
|
||||
RAID 5 configuration.
|
||||
|
||||
- Hardware for storage nodes. Typically for these, the disk space is
|
||||
optimized for the lowest cost per GB of storage while maintaining
|
||||
rack-space efficiency.
|
||||
|
||||
Again, the right answer depends on your environment. You have to make
|
||||
your decision based on the trade-offs between space utilization,
|
||||
simplicity, and I/O performance.
|
||||
|
||||
Network Configuration
|
||||
---------------------
|
||||
|
||||
Network configuration is a very large topic that spans multiple areas of
|
||||
this book. For now, make sure that your servers can PXE boot and
|
||||
successfully communicate with the deployment server.
|
||||
|
||||
For example, you usually cannot configure NICs for VLANs when PXE
|
||||
booting. Additionally, you usually cannot PXE boot with bonded NICs. If
|
||||
you run into this scenario, consider using a simple 1 GB switch in a
|
||||
private network on which only your cloud communicates.
|
||||
|
||||
Automated Configuration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The purpose of automatic configuration management is to establish and
|
||||
maintain the consistency of a system without using human intervention.
|
||||
You want to maintain consistency in your deployments so that you can
|
||||
have the same cloud every time, repeatably. Proper use of automatic
|
||||
configuration-management tools ensures that components of the cloud
|
||||
systems are in particular states, in addition to simplifying deployment,
|
||||
and configuration change propagation.
|
||||
|
||||
These tools also make it possible to test and roll back changes, as they
|
||||
are fully repeatable. Conveniently, a large body of work has been done
|
||||
by the OpenStack community in this space. Puppet, a configuration
|
||||
management tool, even provides official modules for OpenStack projects
|
||||
in an OpenStack infrastructure system known as `Puppet
|
||||
OpenStack <https://wiki.openstack.org/wiki/Puppet>`_. Chef
|
||||
configuration management is provided within `openstack/openstack-chef-repo
|
||||
<https://git.openstack.org/cgit/openstack/openstack-chef-repo>`_. Additional
|
||||
configuration management systems include Juju, Ansible, and Salt. Also,
|
||||
PackStack is a command-line utility for Red Hat Enterprise Linux and
|
||||
derivatives that uses Puppet modules to support rapid deployment of
|
||||
OpenStack on existing servers over an SSH connection.
|
||||
|
||||
An integral part of a configuration-management system is the item that
|
||||
it controls. You should carefully consider all of the items that you
|
||||
want, or do not want, to be automatically managed. For example, you may
|
||||
not want to automatically format hard drives with user data.
|
||||
|
||||
Remote Management
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
In our experience, most operators don't sit right next to the servers
|
||||
running the cloud, and many don't necessarily enjoy visiting the data
|
||||
center. OpenStack should be entirely remotely configurable, but
|
||||
sometimes not everything goes according to plan.
|
||||
|
||||
In this instance, having an out-of-band access into nodes running
|
||||
OpenStack components is a boon. The IPMI protocol is the de facto
|
||||
standard here, and acquiring hardware that supports it is highly
|
||||
recommended to achieve that lights-out data center aim.
|
||||
|
||||
In addition, consider remote power control as well. While IPMI usually
|
||||
controls the server's power state, having remote access to the PDU that
|
||||
the server is plugged into can really be useful for situations when
|
||||
everything seems wedged.
|
||||
|
||||
Parting Thoughts for Provisioning and Deploying OpenStack
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
You can save time by understanding the use cases for the cloud you want
|
||||
to create. Use cases for OpenStack are varied. Some include object
|
||||
storage only; others require preconfigured compute resources to speed
|
||||
development-environment set up; and others need fast provisioning of
|
||||
compute resources that are already secured per tenant with private
|
||||
networks. Your users may have need for highly redundant servers to make
|
||||
sure their legacy applications continue to run. Perhaps a goal would be
|
||||
to architect these legacy applications so that they run on multiple
|
||||
instances in a cloudy, fault-tolerant way, but not make it a goal to add
|
||||
to those clusters over time. Your users may indicate that they need
|
||||
scaling considerations because of heavy Windows server use.
|
||||
|
||||
You can save resources by looking at the best fit for the hardware you
|
||||
have in place already. You might have some high-density storage hardware
|
||||
available. You could format and repurpose those servers for OpenStack
|
||||
Object Storage. All of these considerations and input from users help
|
||||
you build your use case and your deployment plan.
|
||||
|
||||
.. tip::
|
||||
|
||||
For further research about OpenStack deployment, investigate the
|
||||
supported and documented preconfigured, prepackaged installers for
|
||||
OpenStack from companies such as
|
||||
`Canonical <http://www.ubuntu.com/cloud/openstack>`_,
|
||||
`Cisco <http://www.cisco.com/web/solutions/openstack/index.html>`_,
|
||||
`Cloudscaling <http://www.cloudscaling.com/>`_,
|
||||
`IBM <http://www-03.ibm.com/software/products/en/ibm-cloud-orchestrator>`_,
|
||||
`Metacloud <http://www.metacloud.com/>`_,
|
||||
`Mirantis <https://www.mirantis.com/>`_,
|
||||
`Rackspace <http://www.rackspace.com/cloud/private>`_,
|
||||
`Red Hat <http://www.redhat.com/openstack/>`_,
|
||||
`SUSE <https://www.suse.com/products/suse-openstack-cloud/>`_,
|
||||
and `SwiftStack <https://www.swiftstack.com/>`_.
|
||||
|
||||
Conclusion
|
||||
~~~~~~~~~~
|
||||
|
||||
The decisions you make with respect to provisioning and deployment will
|
||||
affect your day-to-day, week-to-week, and month-to-month maintenance of
|
||||
the cloud. Your configuration management will be able to evolve over
|
||||
time. However, more thought and design need to be done for upfront
|
||||
choices about deployment, disk partitioning, and network configuration.
|
@ -1,430 +0,0 @@
|
||||
=======
|
||||
Scaling
|
||||
=======
|
||||
|
||||
Whereas traditional applications required larger hardware to scale
|
||||
("vertical scaling"), cloud-based applications typically request more,
|
||||
discrete hardware ("horizontal scaling"). If your cloud is successful,
|
||||
eventually you must add resources to meet the increasing demand.
|
||||
|
||||
To suit the cloud paradigm, OpenStack itself is designed to be
|
||||
horizontally scalable. Rather than switching to larger servers, you
|
||||
procure more servers and simply install identically configured services.
|
||||
Ideally, you scale out and load balance among groups of functionally
|
||||
identical services (for example, compute nodes or ``nova-api`` nodes),
|
||||
that communicate on a message bus.
|
||||
|
||||
The Starting Point
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Determining the scalability of your cloud and how to improve it is an
|
||||
exercise with many variables to balance. No one solution meets
|
||||
everyone's scalability goals. However, it is helpful to track a number
|
||||
of metrics. Since you can define virtual hardware templates, called
|
||||
"flavors" in OpenStack, you can start to make scaling decisions based on
|
||||
the flavors you'll provide. These templates define sizes for memory in
|
||||
RAM, root disk size, amount of ephemeral data disk space available, and
|
||||
number of cores for starters.
|
||||
|
||||
The default OpenStack flavors are shown in :ref:`table_default_flavors`.
|
||||
|
||||
.. _table_default_flavors:
|
||||
|
||||
.. list-table:: Table. OpenStack default flavors
|
||||
:widths: 20 20 20 20 20
|
||||
:header-rows: 1
|
||||
|
||||
* - Name
|
||||
- Virtual cores
|
||||
- Memory
|
||||
- Disk
|
||||
- Ephemeral
|
||||
* - m1.tiny
|
||||
- 1
|
||||
- 512 MB
|
||||
- 1 GB
|
||||
- 0 GB
|
||||
* - m1.small
|
||||
- 1
|
||||
- 2 GB
|
||||
- 10 GB
|
||||
- 20 GB
|
||||
* - m1.medium
|
||||
- 2
|
||||
- 4 GB
|
||||
- 10 GB
|
||||
- 40 GB
|
||||
* - m1.large
|
||||
- 4
|
||||
- 8 GB
|
||||
- 10 GB
|
||||
- 80 GB
|
||||
* - m1.xlarge
|
||||
- 8
|
||||
- 16 GB
|
||||
- 10 GB
|
||||
- 160 GB
|
||||
|
||||
The starting point for most is the core count of your cloud. By applying
|
||||
some ratios, you can gather information about:
|
||||
|
||||
- The number of virtual machines (VMs) you expect to run,
|
||||
``((overcommit fraction × cores) / virtual cores per instance)``
|
||||
|
||||
- How much storage is required ``(flavor disk size × number of instances)``
|
||||
|
||||
You can use these ratios to determine how much additional infrastructure
|
||||
you need to support your cloud.
|
||||
|
||||
Here is an example using the ratios for gathering scalability
|
||||
information for the number of VMs expected as well as the storage
|
||||
needed. The following numbers support (200 / 2) × 16 = 1600 VM instances
|
||||
and require 80 TB of storage for ``/var/lib/nova/instances``:
|
||||
|
||||
- 200 physical cores.
|
||||
|
||||
- Most instances are size m1.medium (two virtual cores, 50 GB of
|
||||
storage).
|
||||
|
||||
- Default CPU overcommit ratio (``cpu_allocation_ratio`` in nova.conf)
|
||||
of 16:1.
|
||||
|
||||
.. note::
|
||||
Regardless of the overcommit ratio, an instance can not be placed
|
||||
on any physical node with fewer raw (pre-overcommit) resources than
|
||||
instance flavor requires.
|
||||
|
||||
However, you need more than the core count alone to estimate the load
|
||||
that the API services, database servers, and queue servers are likely to
|
||||
encounter. You must also consider the usage patterns of your cloud.
|
||||
|
||||
As a specific example, compare a cloud that supports a managed
|
||||
web-hosting platform with one running integration tests for a
|
||||
development project that creates one VM per code commit. In the former,
|
||||
the heavy work of creating a VM happens only every few months, whereas
|
||||
the latter puts constant heavy load on the cloud controller. You must
|
||||
consider your average VM lifetime, as a larger number generally means
|
||||
less load on the cloud controller.
|
||||
|
||||
Aside from the creation and termination of VMs, you must consider the
|
||||
impact of users accessing the service—particularly on ``nova-api`` and
|
||||
its associated database. Listing instances garners a great deal of
|
||||
information and, given the frequency with which users run this
|
||||
operation, a cloud with a large number of users can increase the load
|
||||
significantly. This can occur even without their knowledge—leaving the
|
||||
OpenStack dashboard instances tab open in the browser refreshes the list
|
||||
of VMs every 30 seconds.
|
||||
|
||||
After you consider these factors, you can determine how many cloud
|
||||
controller cores you require. A typical eight core, 8 GB of RAM server
|
||||
is sufficient for up to a rack of compute nodes — given the above
|
||||
caveats.
|
||||
|
||||
You must also consider key hardware specifications for the performance
|
||||
of user VMs, as well as budget and performance needs, including storage
|
||||
performance (spindles/core), memory availability (RAM/core), network
|
||||
bandwidth (Gbps/core), and overall CPU performance (CPU/core).
|
||||
|
||||
.. tip::
|
||||
|
||||
For a discussion of metric tracking, including how to extract
|
||||
metrics from your cloud, see :doc:`ops-logging-monitoring`.
|
||||
|
||||
Adding Cloud Controller Nodes
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
You can facilitate the horizontal expansion of your cloud by adding
|
||||
nodes. Adding compute nodes is straightforward—they are easily picked up
|
||||
by the existing installation. However, you must consider some important
|
||||
points when you design your cluster to be highly available.
|
||||
|
||||
Recall that a cloud controller node runs several different services. You
|
||||
can install services that communicate only using the message queue
|
||||
internally—\ ``nova-scheduler`` and ``nova-console``—on a new server for
|
||||
expansion. However, other integral parts require more care.
|
||||
|
||||
You should load balance user-facing services such as dashboard,
|
||||
``nova-api``, or the Object Storage proxy. Use any standard HTTP
|
||||
load-balancing method (DNS round robin, hardware load balancer, or
|
||||
software such as Pound or HAProxy). One caveat with dashboard is the VNC
|
||||
proxy, which uses the WebSocket protocol—something that an L7 load
|
||||
balancer might struggle with. See also `Horizon session storage
|
||||
<https://docs.openstack.org/developer/horizon/topics/deployment.html#session-storage>`_.
|
||||
|
||||
You can configure some services, such as ``nova-api`` and
|
||||
``glance-api``, to use multiple processes by changing a flag in their
|
||||
configuration file—allowing them to share work between multiple cores on
|
||||
the one machine.
|
||||
|
||||
.. tip::
|
||||
|
||||
Several options are available for MySQL load balancing, and the
|
||||
supported AMQP brokers have built-in clustering support. Information
|
||||
on how to configure these and many of the other services can be
|
||||
found in :doc:`operations`.
|
||||
|
||||
Segregating Your Cloud
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
When you want to offer users different regions to provide legal
|
||||
considerations for data storage, redundancy across earthquake fault
|
||||
lines, or for low-latency API calls, you segregate your cloud. Use one
|
||||
of the following OpenStack methods to segregate your cloud: *cells*,
|
||||
*regions*, *availability zones*, or *host aggregates*.
|
||||
|
||||
Each method provides different functionality and can be best divided
|
||||
into two groups:
|
||||
|
||||
- Cells and regions, which segregate an entire cloud and result in
|
||||
running separate Compute deployments.
|
||||
|
||||
- :term:`Availability zones <availability zone>` and host aggregates,
|
||||
which merely divide a single Compute deployment.
|
||||
|
||||
:ref:`table_segregation_methods` provides a comparison view of each
|
||||
segregation method currently provided by OpenStack Compute.
|
||||
|
||||
.. _table_segregation_methods:
|
||||
|
||||
.. list-table:: Table. OpenStack segregation methods
|
||||
:widths: 20 20 20 20 20
|
||||
:header-rows: 1
|
||||
|
||||
* -
|
||||
- Cells
|
||||
- Regions
|
||||
- Availability zones
|
||||
- Host aggregates
|
||||
* - **Use when you need**
|
||||
- A single :term:`API endpoint` for compute, or you require a second
|
||||
level of scheduling.
|
||||
- Discrete regions with separate API endpoints and no coordination
|
||||
between regions.
|
||||
- Logical separation within your nova deployment for physical isolation
|
||||
or redundancy.
|
||||
- To schedule a group of hosts with common features.
|
||||
* - **Example**
|
||||
- A cloud with multiple sites where you can schedule VMs "anywhere" or on
|
||||
a particular site.
|
||||
- A cloud with multiple sites, where you schedule VMs to a particular
|
||||
site and you want a shared infrastructure.
|
||||
- A single-site cloud with equipment fed by separate power supplies.
|
||||
- Scheduling to hosts with trusted hardware support.
|
||||
* - **Overhead**
|
||||
- Considered experimental. A new service, nova-cells. Each cell has a full
|
||||
nova installation except nova-api.
|
||||
- A different API endpoint for every region. Each region has a full nova
|
||||
installation.
|
||||
- Configuration changes to ``nova.conf``.
|
||||
- Configuration changes to ``nova.conf``.
|
||||
* - **Shared services**
|
||||
- Keystone, ``nova-api``
|
||||
- Keystone
|
||||
- Keystone, All nova services
|
||||
- Keystone, All nova services
|
||||
|
||||
|
||||
Cells and Regions
|
||||
-----------------
|
||||
|
||||
OpenStack Compute cells are designed to allow running the cloud in a
|
||||
distributed fashion without having to use more complicated technologies,
|
||||
or be invasive to existing nova installations. Hosts in a cloud are
|
||||
partitioned into groups called *cells*. Cells are configured in a tree.
|
||||
The top-level cell ("API cell") has a host that runs the ``nova-api``
|
||||
service, but no ``nova-compute`` services. Each child cell runs all of
|
||||
the other typical ``nova-*`` services found in a regular installation,
|
||||
except for the ``nova-api`` service. Each cell has its own message queue
|
||||
and database service and also runs ``nova-cells``, which manages the
|
||||
communication between the API cell and child cells.
|
||||
|
||||
This allows for a single API server being used to control access to
|
||||
multiple cloud installations. Introducing a second level of scheduling
|
||||
(the cell selection), in addition to the regular ``nova-scheduler``
|
||||
selection of hosts, provides greater flexibility to control where
|
||||
virtual machines are run.
|
||||
|
||||
Unlike having a single API endpoint, regions have a separate API
|
||||
endpoint per installation, allowing for a more discrete separation.
|
||||
Users wanting to run instances across sites have to explicitly select a
|
||||
region. However, the additional complexity of a running a new service is
|
||||
not required.
|
||||
|
||||
The OpenStack dashboard (horizon) can be configured to use multiple
|
||||
regions. This can be configured through the ``AVAILABLE_REGIONS``
|
||||
parameter.
|
||||
|
||||
Availability Zones and Host Aggregates
|
||||
--------------------------------------
|
||||
|
||||
You can use availability zones, host aggregates, or both to partition a
|
||||
nova deployment.
|
||||
|
||||
Availability zones are implemented through and configured in a similar
|
||||
way to host aggregates.
|
||||
|
||||
However, you use them for different reasons.
|
||||
|
||||
Availability zone
|
||||
^^^^^^^^^^^^^^^^^
|
||||
|
||||
This enables you to arrange OpenStack compute hosts into logical groups
|
||||
and provides a form of physical isolation and redundancy from other
|
||||
availability zones, such as by using a separate power supply or network
|
||||
equipment.
|
||||
|
||||
You define the availability zone in which a specified compute host
|
||||
resides locally on each server. An availability zone is commonly used to
|
||||
identify a set of servers that have a common attribute. For instance, if
|
||||
some of the racks in your data center are on a separate power source,
|
||||
you can put servers in those racks in their own availability zone.
|
||||
Availability zones can also help separate different classes of hardware.
|
||||
|
||||
When users provision resources, they can specify from which availability
|
||||
zone they want their instance to be built. This allows cloud consumers
|
||||
to ensure that their application resources are spread across disparate
|
||||
machines to achieve high availability in the event of hardware failure.
|
||||
|
||||
Host aggregates zone
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
This enables you to partition OpenStack Compute deployments into logical
|
||||
groups for load balancing and instance distribution. You can use host
|
||||
aggregates to further partition an availability zone. For example, you
|
||||
might use host aggregates to partition an availability zone into groups
|
||||
of hosts that either share common resources, such as storage and
|
||||
network, or have a special property, such as trusted computing
|
||||
hardware.
|
||||
|
||||
A common use of host aggregates is to provide information for use with
|
||||
the ``nova-scheduler``. For example, you might use a host aggregate to
|
||||
group a set of hosts that share specific flavors or images.
|
||||
|
||||
The general case for this is setting key-value pairs in the aggregate
|
||||
metadata and matching key-value pairs in flavor's ``extra_specs``
|
||||
metadata. The ``AggregateInstanceExtraSpecsFilter`` in the filter
|
||||
scheduler will enforce that instances be scheduled only on hosts in
|
||||
aggregates that define the same key to the same value.
|
||||
|
||||
An advanced use of this general concept allows different flavor types to
|
||||
run with different CPU and RAM allocation ratios so that high-intensity
|
||||
computing loads and low-intensity development and testing systems can
|
||||
share the same cloud without either starving the high-use systems or
|
||||
wasting resources on low-utilization systems. This works by setting
|
||||
``metadata`` in your host aggregates and matching ``extra_specs`` in
|
||||
your flavor types.
|
||||
|
||||
The first step is setting the aggregate metadata keys
|
||||
``cpu_allocation_ratio`` and ``ram_allocation_ratio`` to a
|
||||
floating-point value. The filter schedulers ``AggregateCoreFilter`` and
|
||||
``AggregateRamFilter`` will use those values rather than the global
|
||||
defaults in ``nova.conf`` when scheduling to hosts in the aggregate. It
|
||||
is important to be cautious when using this feature, since each host can
|
||||
be in multiple aggregates but should have only one allocation ratio for
|
||||
each resources. It is up to you to avoid putting a host in multiple
|
||||
aggregates that define different values for the same resource.
|
||||
|
||||
This is the first half of the equation. To get flavor types that are
|
||||
guaranteed a particular ratio, you must set the ``extra_specs`` in the
|
||||
flavor type to the key-value pair you want to match in the aggregate.
|
||||
For example, if you define ``extra_specs`` ``cpu_allocation_ratio`` to
|
||||
"1.0", then instances of that type will run in aggregates only where the
|
||||
metadata key ``cpu_allocation_ratio`` is also defined as "1.0." In
|
||||
practice, it is better to define an additional key-value pair in the
|
||||
aggregate metadata to match on rather than match directly on
|
||||
``cpu_allocation_ratio`` or ``core_allocation_ratio``. This allows
|
||||
better abstraction. For example, by defining a key ``overcommit`` and
|
||||
setting a value of "high," "medium," or "low," you could then tune the
|
||||
numeric allocation ratios in the aggregates without also needing to
|
||||
change all flavor types relating to them.
|
||||
|
||||
.. note::
|
||||
|
||||
Previously, all services had an availability zone. Currently, only
|
||||
the ``nova-compute`` service has its own availability zone. Services
|
||||
such as ``nova-scheduler`` and ``nova-conductor`` span all
|
||||
availability zones.
|
||||
|
||||
When you run any of the following operations, the services appear in
|
||||
their own internal availability zone
|
||||
(CONF.internal_service_availability_zone):
|
||||
|
||||
- :command:`openstack host list` (os-hosts)
|
||||
|
||||
- :command:`euca-describe-availability-zones verbose`
|
||||
|
||||
- :command:`openstack compute service list`
|
||||
|
||||
The internal availability zone is hidden in
|
||||
euca-describe-availability_zones (nonverbose).
|
||||
|
||||
CONF.node_availability_zone has been renamed to
|
||||
CONF.default_availability_zone and is used only by the
|
||||
``nova-api`` and ``nova-scheduler`` services.
|
||||
|
||||
CONF.node_availability_zone still works but is deprecated.
|
||||
|
||||
Scalable Hardware
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
While several resources already exist to help with deploying and
|
||||
installing OpenStack, it's very important to make sure that you have
|
||||
your deployment planned out ahead of time. This guide presumes that you
|
||||
have at least set aside a rack for the OpenStack cloud but also offers
|
||||
suggestions for when and what to scale.
|
||||
|
||||
Hardware Procurement
|
||||
--------------------
|
||||
|
||||
“The Cloud” has been described as a volatile environment where servers
|
||||
can be created and terminated at will. While this may be true, it does
|
||||
not mean that your servers must be volatile. Ensuring that your cloud's
|
||||
hardware is stable and configured correctly means that your cloud
|
||||
environment remains up and running. Basically, put effort into creating
|
||||
a stable hardware environment so that you can host a cloud that users
|
||||
may treat as unstable and volatile.
|
||||
|
||||
OpenStack can be deployed on any hardware supported by an
|
||||
OpenStack-compatible Linux distribution.
|
||||
|
||||
Hardware does not have to be consistent, but it should at least have the
|
||||
same type of CPU to support instance migration.
|
||||
|
||||
The typical hardware recommended for use with OpenStack is the standard
|
||||
value-for-money offerings that most hardware vendors stock. It should be
|
||||
straightforward to divide your procurement into building blocks such as
|
||||
"compute," "object storage," and "cloud controller," and request as many
|
||||
of these as you need. Alternatively, should you be unable to spend more,
|
||||
if you have existing servers—provided they meet your performance
|
||||
requirements and virtualization technology—they are quite likely to be
|
||||
able to support OpenStack.
|
||||
|
||||
Capacity Planning
|
||||
-----------------
|
||||
|
||||
OpenStack is designed to increase in size in a straightforward manner.
|
||||
Taking into account the considerations that we've mentioned in this
|
||||
chapter—particularly on the sizing of the cloud controller—it should be
|
||||
possible to procure additional compute or object storage nodes as
|
||||
needed. New nodes do not need to be the same specification, or even
|
||||
vendor, as existing nodes.
|
||||
|
||||
For compute nodes, ``nova-scheduler`` will take care of differences in
|
||||
sizing having to do with core count and RAM amounts; however, you should
|
||||
consider that the user experience changes with differing CPU speeds.
|
||||
When adding object storage nodes, a :term:`weight` should be specified
|
||||
that reflects the :term:`capability` of the node.
|
||||
|
||||
Monitoring the resource usage and user growth will enable you to know
|
||||
when to procure. :doc:`ops-logging-monitoring` details some useful metrics.
|
||||
|
||||
Burn-in Testing
|
||||
---------------
|
||||
|
||||
The chances of failure for the server's hardware are high at the start
|
||||
and the end of its life. As a result, dealing with hardware failures
|
||||
while in production can be avoided by appropriate burn-in testing to
|
||||
attempt to trigger the early-stage failures. The general principle is to
|
||||
stress the hardware to its limits. Examples of burn-in tests include
|
||||
running a CPU or disk benchmark for several days.
|
||||
|
@ -1,498 +0,0 @@
|
||||
=================
|
||||
Storage Decisions
|
||||
=================
|
||||
|
||||
Storage is found in many parts of the OpenStack stack, and the differing
|
||||
types can cause confusion to even experienced cloud engineers. This
|
||||
section focuses on persistent storage options you can configure with
|
||||
your cloud. It's important to understand the distinction between
|
||||
:term:`ephemeral <ephemeral volume>` storage and
|
||||
:term:`persistent <persistent volume>` storage.
|
||||
|
||||
Ephemeral Storage
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
If you deploy only the OpenStack :term:`Compute service (nova)`,
|
||||
your users do not have access to any form of persistent storage by default.
|
||||
The disks associated with VMs are "ephemeral," meaning that (from the user's
|
||||
point of view) they effectively disappear when a virtual machine is
|
||||
terminated.
|
||||
|
||||
Persistent Storage
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Persistent storage means that the storage resource outlives any other
|
||||
resource and is always available, regardless of the state of a running
|
||||
instance.
|
||||
|
||||
Today, OpenStack clouds explicitly support three types of persistent
|
||||
storage: *object storage*, *block storage*, and *file system storage*.
|
||||
|
||||
Object Storage
|
||||
--------------
|
||||
|
||||
With object storage, users access binary objects through a REST API. You
|
||||
may be familiar with Amazon S3, which is a well-known example of an
|
||||
object storage system. Object storage is implemented in OpenStack by the
|
||||
OpenStack Object Storage (swift) project. If your intended users need to
|
||||
archive or manage large datasets, you want to provide them with object
|
||||
storage. In addition, OpenStack can store your virtual machine (VM)
|
||||
images inside of an object storage system, as an alternative to storing
|
||||
the images on a file system.
|
||||
|
||||
OpenStack Object Storage provides a highly scalable, highly available
|
||||
storage solution by relaxing some of the constraints of traditional file
|
||||
systems. In designing and procuring for such a cluster, it is important
|
||||
to understand some key concepts about its operation. Essentially, this
|
||||
type of storage is built on the idea that all storage hardware fails, at
|
||||
every level, at some point. Infrequently encountered failures that would
|
||||
hamstring other storage systems, such as issues taking down RAID cards
|
||||
or entire servers, are handled gracefully with OpenStack Object
|
||||
Storage.
|
||||
|
||||
A good document describing the Object Storage architecture is found
|
||||
within the `developer
|
||||
documentation <https://docs.openstack.org/developer/swift/overview_architecture.html>`_
|
||||
— read this first. Once you understand the architecture, you should know what a
|
||||
proxy server does and how zones work. However, some important points are
|
||||
often missed at first glance.
|
||||
|
||||
When designing your cluster, you must consider durability and
|
||||
availability. Understand that the predominant source of these is the
|
||||
spread and placement of your data, rather than the reliability of the
|
||||
hardware. Consider the default value of the number of replicas, which is
|
||||
three. This means that before an object is marked as having been
|
||||
written, at least two copies exist—in case a single server fails to
|
||||
write, the third copy may or may not yet exist when the write operation
|
||||
initially returns. Altering this number increases the robustness of your
|
||||
data, but reduces the amount of storage you have available. Next, look
|
||||
at the placement of your servers. Consider spreading them widely
|
||||
throughout your data center's network and power-failure zones. Is a zone
|
||||
a rack, a server, or a disk?
|
||||
|
||||
Object Storage's network patterns might seem unfamiliar at first.
|
||||
Consider these main traffic flows:
|
||||
|
||||
* Among :term:`object`, :term:`container`, and
|
||||
:term:`account servers <account server>`
|
||||
* Between those servers and the proxies
|
||||
* Between the proxies and your users
|
||||
|
||||
Object Storage is very "chatty" among servers hosting data—even a small
|
||||
cluster does megabytes/second of traffic, which is predominantly, “Do
|
||||
you have the object?”/“Yes I have the object!” Of course, if the answer
|
||||
to the aforementioned question is negative or the request times out,
|
||||
replication of the object begins.
|
||||
|
||||
Consider the scenario where an entire server fails and 24 TB of data
|
||||
needs to be transferred "immediately" to remain at three copies—this can
|
||||
put significant load on the network.
|
||||
|
||||
Another fact that's often forgotten is that when a new file is being
|
||||
uploaded, the proxy server must write out as many streams as there are
|
||||
replicas—giving a multiple of network traffic. For a three-replica
|
||||
cluster, 10 Gbps in means 30 Gbps out. Combining this with the previous
|
||||
high bandwidth demands of replication is what results in the
|
||||
recommendation that your private network be of significantly higher
|
||||
bandwidth than your public need be. Oh, and OpenStack Object Storage
|
||||
communicates internally with unencrypted, unauthenticated rsync for
|
||||
performance—you do want the private network to be private.
|
||||
|
||||
The remaining point on bandwidth is the public-facing portion. The
|
||||
``swift-proxy`` service is stateless, which means that you can easily
|
||||
add more and use HTTP load-balancing methods to share bandwidth and
|
||||
availability between them.
|
||||
|
||||
More proxies means more bandwidth, if your storage can keep up.
|
||||
|
||||
Block Storage
|
||||
-------------
|
||||
|
||||
Block storage (sometimes referred to as volume storage) provides users
|
||||
with access to block-storage devices. Users interact with block storage
|
||||
by attaching volumes to their running VM instances.
|
||||
|
||||
These volumes are persistent: they can be detached from one instance and
|
||||
re-attached to another, and the data remains intact. Block storage is
|
||||
implemented in OpenStack by the OpenStack Block Storage (cinder)
|
||||
project, which supports multiple back ends in the form of drivers. Your
|
||||
choice of a storage back end must be supported by a Block Storage
|
||||
driver.
|
||||
|
||||
Most block storage drivers allow the instance to have direct access to
|
||||
the underlying storage hardware's block device. This helps increase the
|
||||
overall read/write IO. However, support for utilizing files as volumes
|
||||
is also well established, with full support for NFS and other protocols.
|
||||
|
||||
These drivers work a little differently than a traditional "block"
|
||||
storage driver. On an NFS file system, a single file is
|
||||
created and then mapped as a "virtual" volume into the instance. This
|
||||
mapping/translation is similar to how OpenStack utilizes QEMU's
|
||||
file-based virtual machines stored in ``/var/lib/nova/instances``.
|
||||
|
||||
Shared File Systems Service
|
||||
---------------------------
|
||||
|
||||
The Shared File Systems service provides a set of services for
|
||||
management of Shared File Systems in a multi-tenant cloud environment.
|
||||
Users interact with Shared File Systems service by mounting remote File
|
||||
Systems on their instances with the following usage of those systems for
|
||||
file storing and exchange. Shared File Systems service provides you with
|
||||
shares. A share is a remote, mountable file system. You can mount a
|
||||
share to and access a share from several hosts by several users at a
|
||||
time. With shares, user can also:
|
||||
|
||||
* Create a share specifying its size, shared file system protocol,
|
||||
visibility level
|
||||
* Create a share on either a share server or standalone, depending on
|
||||
the selected back-end mode, with or without using a share network.
|
||||
* Specify access rules and security services for existing shares.
|
||||
* Combine several shares in groups to keep data consistency inside the
|
||||
groups for the following safe group operations.
|
||||
* Create a snapshot of a selected share or a share group for storing
|
||||
the existing shares consistently or creating new shares from that
|
||||
snapshot in a consistent way
|
||||
* Create a share from a snapshot.
|
||||
* Set rate limits and quotas for specific shares and snapshots
|
||||
* View usage of share resources
|
||||
* Remove shares.
|
||||
|
||||
Like Block Storage, the Shared File Systems service is persistent. It
|
||||
can be:
|
||||
|
||||
* Mounted to any number of client machines.
|
||||
* Detached from one instance and attached to another without data loss.
|
||||
During this process the data are safe unless the Shared File Systems
|
||||
service itself is changed or removed.
|
||||
|
||||
Shares are provided by the Shared File Systems service. In OpenStack,
|
||||
Shared File Systems service is implemented by Shared File System
|
||||
(manila) project, which supports multiple back-ends in the form of
|
||||
drivers. The Shared File Systems service can be configured to provision
|
||||
shares from one or more back-ends. Share servers are, mostly, virtual
|
||||
machines that export file shares via different protocols such as NFS,
|
||||
CIFS, GlusterFS, or HDFS.
|
||||
|
||||
OpenStack Storage Concepts
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
:ref:`table_openstack_storage` explains the different storage concepts
|
||||
provided by OpenStack.
|
||||
|
||||
.. _table_openstack_storage:
|
||||
|
||||
.. list-table:: Table. OpenStack storage
|
||||
:widths: 20 20 20 20 20
|
||||
:header-rows: 1
|
||||
|
||||
* -
|
||||
- Ephemeral storage
|
||||
- Block storage
|
||||
- Object storage
|
||||
- Shared File System storage
|
||||
* - Used to…
|
||||
- Run operating system and scratch space
|
||||
- Add additional persistent storage to a virtual machine (VM)
|
||||
- Store data, including VM images
|
||||
- Add additional persistent storage to a virtual machine
|
||||
* - Accessed through…
|
||||
- A file system
|
||||
- A block device that can be partitioned, formatted, and mounted
|
||||
(such as, /dev/vdc)
|
||||
- The REST API
|
||||
- A Shared File Systems service share (either manila managed or an
|
||||
external one registered in manila) that can be partitioned, formatted
|
||||
and mounted (such as /dev/vdc)
|
||||
* - Accessible from…
|
||||
- Within a VM
|
||||
- Within a VM
|
||||
- Anywhere
|
||||
- Within a VM
|
||||
* - Managed by…
|
||||
- OpenStack Compute (nova)
|
||||
- OpenStack Block Storage (cinder)
|
||||
- OpenStack Object Storage (swift)
|
||||
- OpenStack Shared File System Storage (manila)
|
||||
* - Persists until…
|
||||
- VM is terminated
|
||||
- Deleted by user
|
||||
- Deleted by user
|
||||
- Deleted by user
|
||||
* - Sizing determined by…
|
||||
- Administrator configuration of size settings, known as *flavors*
|
||||
- User specification in initial request
|
||||
- Amount of available physical storage
|
||||
- * User specification in initial request
|
||||
* Requests for extension
|
||||
* Available user-level quotes
|
||||
* Limitations applied by Administrator
|
||||
* - Encryption set by…
|
||||
- Parameter in nova.conf
|
||||
- Admin establishing `encrypted volume type
|
||||
<https://docs.openstack.org/admin-guide/dashboard-manage-volumes.html>`_,
|
||||
then user selecting encrypted volume
|
||||
- Not yet available
|
||||
- Shared File Systems service does not apply any additional encryption
|
||||
above what the share’s back-end storage provides
|
||||
* - Example of typical usage…
|
||||
- 10 GB first disk, 30 GB second disk
|
||||
- 1 TB disk
|
||||
- 10s of TBs of dataset storage
|
||||
- Depends completely on the size of back-end storage specified when
|
||||
a share was being created. In case of thin provisioning it can be
|
||||
partial space reservation (for more details see
|
||||
`Capabilities and Extra-Specs
|
||||
<https://docs.openstack.org/developer/manila/devref/capabilities_and_extra_specs.html?highlight=extra%20specs#common-capabilities>`_
|
||||
specification)
|
||||
|
||||
.. note::
|
||||
|
||||
**File-level Storage (for Live Migration)**
|
||||
|
||||
With file-level storage, users access stored data using the operating
|
||||
system's file system interface. Most users, if they have used a network
|
||||
storage solution before, have encountered this form of networked
|
||||
storage. In the Unix world, the most common form of this is NFS. In the
|
||||
Windows world, the most common form is called CIFS (previously, SMB).
|
||||
|
||||
OpenStack clouds do not present file-level storage to end users.
|
||||
However, it is important to consider file-level storage for storing
|
||||
instances under ``/var/lib/nova/instances`` when designing your cloud,
|
||||
since you must have a shared file system if you want to support live
|
||||
migration.
|
||||
|
||||
Choosing Storage Back Ends
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Users will indicate different needs for their cloud use cases. Some may
|
||||
need fast access to many objects that do not change often, or want to
|
||||
set a time-to-live (TTL) value on a file. Others may access only storage
|
||||
that is mounted with the file system itself, but want it to be
|
||||
replicated instantly when starting a new instance. For other systems,
|
||||
ephemeral storage—storage that is released when a VM attached to it is
|
||||
shut down— is the preferred way. When you select
|
||||
:term:`storage back ends <storage back end>`,
|
||||
ask the following questions on behalf of your users:
|
||||
|
||||
* Do my users need block storage?
|
||||
* Do my users need object storage?
|
||||
* Do I need to support live migration?
|
||||
* Should my persistent storage drives be contained in my compute nodes,
|
||||
or should I use external storage?
|
||||
* What is the platter count I can achieve? Do more spindles result in
|
||||
better I/O despite network access?
|
||||
* Which one results in the best cost-performance scenario I'm aiming for?
|
||||
* How do I manage the storage operationally?
|
||||
* How redundant and distributed is the storage? What happens if a
|
||||
storage node fails? To what extent can it mitigate my data-loss
|
||||
disaster scenarios?
|
||||
|
||||
To deploy your storage by using only commodity hardware, you can use a number
|
||||
of open-source packages, as shown in :ref:`table_persistent_file_storage`.
|
||||
|
||||
.. _table_persistent_file_storage:
|
||||
|
||||
.. list-table:: Table. Persistent file-based storage support
|
||||
:widths: 25 25 25 25
|
||||
:header-rows: 1
|
||||
|
||||
* -
|
||||
- Object
|
||||
- Block
|
||||
- File-level
|
||||
* - Swift
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
-
|
||||
-
|
||||
* - LVM
|
||||
-
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
-
|
||||
* - Ceph
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
- Experimental
|
||||
* - Gluster
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
-
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
* - NFS
|
||||
-
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
* - ZFS
|
||||
-
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
-
|
||||
* - Sheepdog
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
-
|
||||
|
||||
This list of open source file-level shared storage solutions is not
|
||||
exhaustive; other open source solutions exist (MooseFS). Your
|
||||
organization may already have deployed a file-level shared storage
|
||||
solution that you can use.
|
||||
|
||||
.. note::
|
||||
|
||||
**Storage Driver Support**
|
||||
|
||||
In addition to the open source technologies, there are a number of
|
||||
proprietary solutions that are officially supported by OpenStack Block
|
||||
Storage. The full list of options can be found in the
|
||||
`Available Drivers <https://docs.openstack.org/developer/cinder/drivers.html>`_
|
||||
list.
|
||||
|
||||
You can find a matrix of the functionality provided by all of the
|
||||
supported Block Storage drivers on the `OpenStack
|
||||
wiki <https://wiki.openstack.org/wiki/CinderSupportMatrix>`_.
|
||||
|
||||
Also, you need to decide whether you want to support object storage in
|
||||
your cloud. The two common use cases for providing object storage in a
|
||||
compute cloud are:
|
||||
|
||||
* To provide users with a persistent storage mechanism
|
||||
* As a scalable, reliable data store for virtual machine images
|
||||
|
||||
Commodity Storage Back-end Technologies
|
||||
---------------------------------------
|
||||
|
||||
This section provides a high-level overview of the differences among the
|
||||
different commodity storage back end technologies. Depending on your
|
||||
cloud user's needs, you can implement one or many of these technologies
|
||||
in different combinations:
|
||||
|
||||
OpenStack Object Storage (swift)
|
||||
The official OpenStack Object Store implementation. It is a mature
|
||||
technology that has been used for several years in production by
|
||||
Rackspace as the technology behind Rackspace Cloud Files. As it is
|
||||
highly scalable, it is well-suited to managing petabytes of storage.
|
||||
OpenStack Object Storage's advantages are better integration with
|
||||
OpenStack (integrates with OpenStack Identity, works with the
|
||||
OpenStack dashboard interface) and better support for multiple data
|
||||
center deployment through support of asynchronous eventual
|
||||
consistency replication.
|
||||
|
||||
Therefore, if you eventually plan on distributing your storage
|
||||
cluster across multiple data centers, if you need unified accounts
|
||||
for your users for both compute and object storage, or if you want
|
||||
to control your object storage with the OpenStack dashboard, you
|
||||
should consider OpenStack Object Storage. More detail can be found
|
||||
about OpenStack Object Storage in the section below.
|
||||
|
||||
Ceph
|
||||
A scalable storage solution that replicates data across commodity
|
||||
storage nodes. Ceph was originally developed by one of the founders
|
||||
of DreamHost and is currently used in production there.
|
||||
|
||||
Ceph was designed to expose different types of storage interfaces to
|
||||
the end user: it supports object storage, block storage, and
|
||||
file-system interfaces, although the file-system interface is not
|
||||
yet considered production-ready. Ceph supports the same API as swift
|
||||
for object storage and can be used as a back end for cinder block
|
||||
storage as well as back-end storage for glance images. Ceph supports
|
||||
"thin provisioning," implemented using copy-on-write.
|
||||
|
||||
This can be useful when booting from volume because a new volume can
|
||||
be provisioned very quickly. Ceph also supports keystone-based
|
||||
authentication (as of version 0.56), so it can be a seamless swap in
|
||||
for the default OpenStack swift implementation.
|
||||
|
||||
Ceph's advantages are that it gives the administrator more
|
||||
fine-grained control over data distribution and replication
|
||||
strategies, enables you to consolidate your object and block
|
||||
storage, enables very fast provisioning of boot-from-volume
|
||||
instances using thin provisioning, and supports a distributed
|
||||
file-system interface, though this interface is `not yet
|
||||
recommended <http://ceph.com/docs/master/cephfs/>`_ for use in
|
||||
production deployment by the Ceph project.
|
||||
|
||||
If you want to manage your object and block storage within a single
|
||||
system, or if you want to support fast boot-from-volume, you should
|
||||
consider Ceph.
|
||||
|
||||
Gluster
|
||||
A distributed, shared file system. As of Gluster version 3.3, you
|
||||
can use Gluster to consolidate your object storage and file storage
|
||||
into one unified file and object storage solution, which is called
|
||||
Gluster For OpenStack (GFO). GFO uses a customized version of swift
|
||||
that enables Gluster to be used as the back-end storage.
|
||||
|
||||
The main reason to use GFO rather than regular swift is if you also
|
||||
want to support a distributed file system, either to support shared
|
||||
storage live migration or to provide it as a separate service to
|
||||
your end users. If you want to manage your object and file storage
|
||||
within a single system, you should consider GFO.
|
||||
|
||||
LVM
|
||||
The Logical Volume Manager is a Linux-based system that provides an
|
||||
abstraction layer on top of physical disks to expose logical volumes
|
||||
to the operating system. The LVM back-end implements block storage
|
||||
as LVM logical partitions.
|
||||
|
||||
On each host that will house block storage, an administrator must
|
||||
initially create a volume group dedicated to Block Storage volumes.
|
||||
Blocks are created from LVM logical volumes.
|
||||
|
||||
.. note::
|
||||
|
||||
LVM does *not* provide any replication. Typically,
|
||||
administrators configure RAID on nodes that use LVM as block
|
||||
storage to protect against failures of individual hard drives.
|
||||
However, RAID does not protect against a failure of the entire
|
||||
host.
|
||||
|
||||
ZFS
|
||||
The Solaris iSCSI driver for OpenStack Block Storage implements
|
||||
blocks as ZFS entities. ZFS is a file system that also has the
|
||||
functionality of a volume manager. This is unlike on a Linux system,
|
||||
where there is a separation of volume manager (LVM) and file system
|
||||
(such as, ext3, ext4, xfs, and btrfs). ZFS has a number of
|
||||
advantages over ext4, including improved data-integrity checking.
|
||||
|
||||
The ZFS back end for OpenStack Block Storage supports only
|
||||
Solaris-based systems, such as Illumos. While there is a Linux port
|
||||
of ZFS, it is not included in any of the standard Linux
|
||||
distributions, and it has not been tested with OpenStack Block
|
||||
Storage. As with LVM, ZFS does not provide replication across hosts
|
||||
on its own; you need to add a replication solution on top of ZFS if
|
||||
your cloud needs to be able to handle storage-node failures.
|
||||
|
||||
We don't recommend ZFS unless you have previous experience with
|
||||
deploying it, since the ZFS back end for Block Storage requires a
|
||||
Solaris-based operating system, and we assume that your experience
|
||||
is primarily with Linux-based systems.
|
||||
|
||||
Sheepdog
|
||||
Sheepdog is a userspace distributed storage system. Sheepdog scales
|
||||
to several hundred nodes, and has powerful virtual disk management
|
||||
features like snapshot, cloning, rollback, thin provisioning.
|
||||
|
||||
It is essentially an object storage system that manages disks and
|
||||
aggregates the space and performance of disks linearly in hyper
|
||||
scale on commodity hardware in a smart way. On top of its object
|
||||
store, Sheepdog provides elastic volume service and http service.
|
||||
Sheepdog does not assume anything about kernel version and can work
|
||||
nicely with xattr-supported file systems.
|
||||
|
||||
Conclusion
|
||||
~~~~~~~~~~
|
||||
|
||||
We hope that you now have some considerations in mind and questions to
|
||||
ask your future cloud users about their storage use cases. As you can
|
||||
see, your storage decisions will also influence your network design for
|
||||
performance and security needs. Continue with us to make more informed
|
||||
decisions about your OpenStack cloud design.
|
||||
|
@ -1,52 +0,0 @@
|
||||
============
|
||||
Architecture
|
||||
============
|
||||
|
||||
Designing an OpenStack cloud is a great achievement. It requires a
|
||||
robust understanding of the requirements and needs of the cloud's users
|
||||
to determine the best possible configuration to meet them. OpenStack
|
||||
provides a great deal of flexibility to achieve your needs, and this
|
||||
part of the book aims to shine light on many of the decisions you need
|
||||
to make during the process.
|
||||
|
||||
To design, deploy, and configure OpenStack, administrators must
|
||||
understand the logical architecture. A diagram can help you envision all
|
||||
the integrated services within OpenStack and how they interact with each
|
||||
other.
|
||||
|
||||
OpenStack modules are one of the following types:
|
||||
|
||||
Daemon
|
||||
Runs as a background process. On Linux platforms, a daemon is usually
|
||||
installed as a service.
|
||||
|
||||
Script
|
||||
Installs a virtual environment and runs tests.
|
||||
|
||||
Command-line interface (CLI)
|
||||
Enables users to submit API calls to OpenStack services through commands.
|
||||
|
||||
As shown, end users can interact through the dashboard, CLIs, and APIs.
|
||||
All services authenticate through a common Identity service, and
|
||||
individual services interact with each other through public APIs, except
|
||||
where privileged administrator commands are necessary.
|
||||
:ref:`logical_architecture` shows the most common, but not the only logical
|
||||
architecture for an OpenStack cloud.
|
||||
|
||||
.. _logical_architecture:
|
||||
|
||||
.. figure:: figures/osog_0001.png
|
||||
:width: 100%
|
||||
|
||||
OpenStack Logical Architecture
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
arch-examples.rst
|
||||
arch-provision.rst
|
||||
arch-cloud-controller.rst
|
||||
arch-compute-nodes.rst
|
||||
arch-scaling.rst
|
||||
arch-storage.rst
|
||||
arch-network-design.rst
|
Binary file not shown.
Before ![]() (image error) Size: 675 KiB |
Binary file not shown.
Before ![]() (image error) Size: 39 KiB |
Binary file not shown.
Before ![]() (image error) Size: 41 KiB |
Binary file not shown.
Before ![]() (image error) Size: 196 KiB |
Binary file not shown.
Before ![]() (image error) Size: 59 KiB |
Binary file not shown.
Before ![]() (image error) Size: 99 KiB |
Binary file not shown.
Before ![]() (image error) Size: 89 KiB |
Binary file not shown.
Before ![]() (image error) Size: 95 KiB |
Binary file not shown.
Before ![]() (image error) Size: 105 KiB |
Binary file not shown.
Before ![]() (image error) Size: 42 KiB |
@ -16,7 +16,6 @@ Contents
|
||||
acknowledgements.rst
|
||||
preface.rst
|
||||
common/conventions.rst
|
||||
architecture.rst
|
||||
operations.rst
|
||||
|
||||
Appendix
|
||||
|
@ -151,6 +151,9 @@ Installation Tutorials and Guides
|
||||
Contains a reference listing of all configuration options for core
|
||||
and integrated OpenStack services by release version
|
||||
|
||||
`OpenStack Architecture Design Guide <https://docs.openstack.org/arch-design/>`_
|
||||
Contains guidelines for designing an OpenStack cloud
|
||||
|
||||
`OpenStack Administrator Guide <https://docs.openstack.org/admin-guide/>`_
|
||||
Contains how-to information for managing an OpenStack cloud as
|
||||
needed for your use cases, such as storage, computing, or
|
||||
@ -184,50 +187,8 @@ Installation Tutorials and Guides
|
||||
How This Book Is Organized
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This book is organized into two parts: the architecture decisions for
|
||||
designing OpenStack clouds and the repeated operations for running
|
||||
OpenStack clouds.
|
||||
|
||||
**Part I:**
|
||||
|
||||
:doc:`arch-examples`
|
||||
Because of all the decisions the other chapters discuss, this
|
||||
chapter describes the decisions made for this particular book and
|
||||
much of the justification for the example architecture.
|
||||
|
||||
:doc:`arch-provision`
|
||||
While this book doesn't describe installation, we do recommend
|
||||
automation for deployment and configuration, discussed in this
|
||||
chapter.
|
||||
|
||||
:doc:`arch-cloud-controller`
|
||||
The cloud controller is an invention for the sake of consolidating
|
||||
and describing which services run on which nodes. This chapter
|
||||
discusses hardware and network considerations as well as how to
|
||||
design the cloud controller for performance and separation of
|
||||
services.
|
||||
|
||||
:doc:`arch-compute-nodes`
|
||||
This chapter describes the compute nodes, which are dedicated to
|
||||
running virtual machines. Some hardware choices come into play here,
|
||||
as well as logging and networking descriptions.
|
||||
|
||||
:doc:`arch-scaling`
|
||||
This chapter discusses the growth of your cloud resources through
|
||||
scaling and segregation considerations.
|
||||
|
||||
:doc:`arch-storage`
|
||||
As with other architecture decisions, storage concepts within
|
||||
OpenStack offer many options. This chapter lays out the choices for
|
||||
you.
|
||||
|
||||
:doc:`arch-network-design`
|
||||
Your OpenStack cloud networking needs to fit into your existing
|
||||
networks while also enabling the best design for your users and
|
||||
administrators, and this chapter gives you in-depth information
|
||||
about networking decisions.
|
||||
|
||||
**Part II:**
|
||||
This book contains several parts to show best practices and tips for
|
||||
the repeated operations for running OpenStack clouds.
|
||||
|
||||
:doc:`ops-lay-of-the-land`
|
||||
This chapter is written to let you get your hands wrapped around
|
||||
|
@ -87,6 +87,9 @@ redirect 301 /trunk/openstack-ops/oreilly-openstack-ops-guide.pdf /openstack-ops
|
||||
redirectmatch 301 /trunk/openstack-ops/.*$ /ops-guide/
|
||||
redirect 301 /ops/index.html /ops-guide/index.html
|
||||
|
||||
# Redirect Operations Guide architecture part to Architecture Guide
|
||||
redirectmatch 301 /ops-guide/arch.*$ /arch-design/index.html
|
||||
|
||||
# Redirect Architecture Guide to /arch-design/
|
||||
redirect 301 /arch/index.html /arch-design/index.html
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user