Merge "[arch-design] Convert multi-site sections"
This commit is contained in:
commit
55b172c189
@ -2,3 +2,155 @@
|
||||
Operational considerations
|
||||
==========================
|
||||
|
||||
Multi-site OpenStack cloud deployment using regions requires that the
|
||||
service catalog contains per-region entries for each service deployed
|
||||
other than the Identity service. Most off-the-shelf OpenStack deployment
|
||||
tools have limited support for defining multiple regions in this
|
||||
fashion.
|
||||
|
||||
Deployers should be aware of this and provide the appropriate
|
||||
customization of the service catalog for their site either manually, or
|
||||
by customizing deployment tools in use.
|
||||
|
||||
.. note::
|
||||
|
||||
As of the Kilo release, documentation for implementing this feature
|
||||
is in progress. See this bug for more information:
|
||||
https://bugs.launchpad.net/openstack-manuals/+bug/1340509.
|
||||
|
||||
Licensing
|
||||
~~~~~~~~~
|
||||
|
||||
Multi-site OpenStack deployments present additional licensing
|
||||
considerations over and above regular OpenStack clouds, particularly
|
||||
where site licenses are in use to provide cost efficient access to
|
||||
software licenses. The licensing for host operating systems, guest
|
||||
operating systems, OpenStack distributions (if applicable),
|
||||
software-defined infrastructure including network controllers and
|
||||
storage systems, and even individual applications need to be evaluated.
|
||||
|
||||
Topics to consider include:
|
||||
|
||||
* The definition of what constitutes a site in the relevant licenses,
|
||||
as the term does not necessarily denote a geographic or otherwise
|
||||
physically isolated location.
|
||||
|
||||
* Differentiations between "hot" (active) and "cold" (inactive) sites,
|
||||
where significant savings may be made in situations where one site is
|
||||
a cold standby for disaster recovery purposes only.
|
||||
|
||||
* Certain locations might require local vendors to provide support and
|
||||
services for each site which may vary with the licensing agreement in
|
||||
place.
|
||||
|
||||
Logging and monitoring
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Logging and monitoring does not significantly differ for a multi-site
|
||||
OpenStack cloud. The tools described in the `Logging and monitoring
|
||||
chapter <http://docs.openstack.org/openstack-ops/content/logging_monitoring.html>`__
|
||||
of the Operations Guide remain applicable. Logging and monitoring can be
|
||||
provided on a per-site basis, and in a common centralized location.
|
||||
|
||||
When attempting to deploy logging and monitoring facilities to a
|
||||
centralized location, care must be taken with the load placed on the
|
||||
inter-site networking links.
|
||||
|
||||
Upgrades
|
||||
~~~~~~~~
|
||||
|
||||
In multi-site OpenStack clouds deployed using regions, sites are
|
||||
independent OpenStack installations which are linked together using
|
||||
shared centralized services such as OpenStack Identity. At a high level
|
||||
the recommended order of operations to upgrade an individual OpenStack
|
||||
environment is (see the `Upgrades
|
||||
chapter <http://docs.openstack.org/openstack-ops/content/ops_upgrades-general-steps.html>`__
|
||||
of the Operations Guide for details):
|
||||
|
||||
#. Upgrade the OpenStack Identity service (keystone).
|
||||
|
||||
#. Upgrade the OpenStack Image service (glance).
|
||||
|
||||
#. Upgrade OpenStack Compute (nova), including networking components.
|
||||
|
||||
#. Upgrade OpenStack Block Storage (cinder).
|
||||
|
||||
#. Upgrade the OpenStack dashboard (horizon).
|
||||
|
||||
The process for upgrading a multi-site environment is not significantly
|
||||
different:
|
||||
|
||||
#. Upgrade the shared OpenStack Identity service (keystone) deployment.
|
||||
|
||||
#. Upgrade the OpenStack Image service (glance) at each site.
|
||||
|
||||
#. Upgrade OpenStack Compute (nova), including networking components, at
|
||||
each site.
|
||||
|
||||
#. Upgrade OpenStack Block Storage (cinder) at each site.
|
||||
|
||||
#. Upgrade the OpenStack dashboard (horizon), at each site or in the
|
||||
single central location if it is shared.
|
||||
|
||||
Compute upgrades within each site can also be performed in a rolling
|
||||
fashion. Compute controller services (API, Scheduler, and Conductor) can
|
||||
be upgraded prior to upgrading of individual compute nodes. This allows
|
||||
operations staff to keep a site operational for users of Compute
|
||||
services while performing an upgrade.
|
||||
|
||||
Quota management
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
Quotas are used to set operational limits to prevent system capacities
|
||||
from being exhausted without notification. They are currently enforced
|
||||
at the tenant (or project) level rather than at the user level.
|
||||
|
||||
Quotas are defined on a per-region basis. Operators can define identical
|
||||
quotas for tenants in each region of the cloud to provide a consistent
|
||||
experience, or even create a process for synchronizing allocated quotas
|
||||
across regions. It is important to note that only the operational limits
|
||||
imposed by the quotas will be aligned consumption of quotas by users
|
||||
will not be reflected between regions.
|
||||
|
||||
For example, given a cloud with two regions, if the operator grants a
|
||||
user a quota of 25 instances in each region then that user may launch a
|
||||
total of 50 instances spread across both regions. They may not, however,
|
||||
launch more than 25 instances in any single region.
|
||||
|
||||
For more information on managing quotas refer to the `Managing projects
|
||||
and users
|
||||
chapter <http://docs.openstack.org/openstack-ops/content/projects_users.html>`__
|
||||
of the OpenStack Operators Guide.
|
||||
|
||||
Policy management
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
OpenStack provides a default set of Role Based Access Control (RBAC)
|
||||
policies, defined in a ``policy.json`` file, for each service. Operators
|
||||
edit these files to customize the policies for their OpenStack
|
||||
installation. If the application of consistent RBAC policies across
|
||||
sites is a requirement, then it is necessary to ensure proper
|
||||
synchronization of the ``policy.json`` files to all installations.
|
||||
|
||||
This must be done using system administration tools such as rsync as
|
||||
functionality for synchronizing policies across regions is not currently
|
||||
provided within OpenStack.
|
||||
|
||||
Documentation
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
Users must be able to leverage cloud infrastructure and provision new
|
||||
resources in the environment. It is important that user documentation is
|
||||
accessible by users to ensure they are given sufficient information to
|
||||
help them leverage the cloud. As an example, by default OpenStack
|
||||
schedules instances on a compute node automatically. However, when
|
||||
multiple regions are available, the end user needs to decide in which
|
||||
region to schedule the new instance. The dashboard presents the user
|
||||
with the first region in your configuration. The API and CLI tools do
|
||||
not execute commands unless a valid region is specified. It is therefore
|
||||
important to provide documentation to your users describing the region
|
||||
layout as well as calling out that quotas are region-specific. If a user
|
||||
reaches his or her quota in one region, OpenStack does not automatically
|
||||
build new instances in another. Documenting specific examples helps
|
||||
users understand how to operate the cloud, thereby reducing calls and
|
||||
tickets filed with the help desk.
|
||||
|
@ -2,3 +2,163 @@
|
||||
Technical considerations
|
||||
========================
|
||||
|
||||
There are many technical considerations to take into account with regard
|
||||
to designing a multi-site OpenStack implementation. An OpenStack cloud
|
||||
can be designed in a variety of ways to handle individual application
|
||||
needs. A multi-site deployment has additional challenges compared to
|
||||
single site installations and therefore is a more complex solution.
|
||||
|
||||
When determining capacity options be sure to take into account not just
|
||||
the technical issues, but also the economic or operational issues that
|
||||
might arise from specific decisions.
|
||||
|
||||
Inter-site link capacity describes the capabilities of the connectivity
|
||||
between the different OpenStack sites. This includes parameters such as
|
||||
bandwidth, latency, whether or not a link is dedicated, and any business
|
||||
policies applied to the connection. The capability and number of the
|
||||
links between sites determine what kind of options are available for
|
||||
deployment. For example, if two sites have a pair of high-bandwidth
|
||||
links available between them, it may be wise to configure a separate
|
||||
storage replication network between the two sites to support a single
|
||||
Swift endpoint and a shared Object Storage capability between them. An
|
||||
example of this technique, as well as a configuration walk-through, is
|
||||
available at
|
||||
http://docs.openstack.org/developer/swift/replication_network.html#dedicated-replication-network.
|
||||
Another option in this scenario is to build a dedicated set of tenant
|
||||
private networks across the secondary link, using overlay networks with
|
||||
a third party mapping the site overlays to each other.
|
||||
|
||||
The capacity requirements of the links between sites is driven by
|
||||
application behavior. If the link latency is too high, certain
|
||||
applications that use a large number of small packets, for example RPC
|
||||
calls, may encounter issues communicating with each other or operating
|
||||
properly. Additionally, OpenStack may encounter similar types of issues.
|
||||
To mitigate this, Identity service call timeouts can be tuned to prevent
|
||||
issues authenticating against a central Identity service.
|
||||
|
||||
Another network capacity consideration for a multi-site deployment is
|
||||
the amount and performance of overlay networks available for tenant
|
||||
networks. If using shared tenant networks across zones, it is imperative
|
||||
that an external overlay manager or controller be used to map these
|
||||
overlays together. It is necessary to ensure the amount of possible IDs
|
||||
between the zones are identical.
|
||||
|
||||
.. note::
|
||||
|
||||
As of the Kilo release, OpenStack Networking was not capable of
|
||||
managing tunnel IDs across installations. So if one site runs out of
|
||||
IDs, but another does not, that tenant's network is unable to reach
|
||||
the other site.
|
||||
|
||||
Capacity can take other forms as well. The ability for a region to grow
|
||||
depends on scaling out the number of available compute nodes. This topic
|
||||
is covered in greater detail in the section for compute-focused
|
||||
deployments. However, it may be necessary to grow cells in an individual
|
||||
region, depending on the size of your cluster and the ratio of virtual
|
||||
machines per hypervisor.
|
||||
|
||||
A third form of capacity comes in the multi-region-capable components of
|
||||
OpenStack. Centralized Object Storage is capable of serving objects
|
||||
through a single namespace across multiple regions. Since this works by
|
||||
accessing the object store through swift proxy, it is possible to
|
||||
overload the proxies. There are two options available to mitigate this
|
||||
issue:
|
||||
|
||||
* Deploy a large number of swift proxies. The drawback is that the
|
||||
proxies are not load-balanced and a large file request could
|
||||
continually hit the same proxy.
|
||||
|
||||
* Add a caching HTTP proxy and load balancer in front of the swift
|
||||
proxies. Since swift objects are returned to the requester via HTTP,
|
||||
this load balancer would alleviate the load required on the swift
|
||||
proxies.
|
||||
|
||||
Utilization
|
||||
~~~~~~~~~~~
|
||||
|
||||
While constructing a multi-site OpenStack environment is the goal of
|
||||
this guide, the real test is whether an application can utilize it.
|
||||
|
||||
The Identity service is normally the first interface for OpenStack users
|
||||
and is required for almost all major operations within OpenStack.
|
||||
Therefore, it is important that you provide users with a single URL for
|
||||
Identity service authentication, and document the configuration of
|
||||
regions within the Identity service. Each of the sites defined in your
|
||||
installation is considered to be a region in Identity nomenclature. This
|
||||
is important for the users, as it is required to define the region name
|
||||
when providing actions to an API endpoint or in the dashboard.
|
||||
|
||||
Load balancing is another common issue with multi-site installations.
|
||||
While it is still possible to run HAproxy instances with
|
||||
Load-Balancer-as-a-Service, these are defined to a specific region. Some
|
||||
applications can manage this using internal mechanisms. Other
|
||||
applications may require the implementation of an external system,
|
||||
including global services load balancers or anycast-advertised DNS.
|
||||
|
||||
Depending on the storage model chosen during site design, storage
|
||||
replication and availability are also a concern for end-users. If an
|
||||
application can support regions, then it is possible to keep the object
|
||||
storage system separated by region. In this case, users who want to have
|
||||
an object available to more than one region need to perform cross-site
|
||||
replication. However, with a centralized swift proxy, the user may need
|
||||
to benchmark the replication timing of the Object Storage back end.
|
||||
Benchmarking allows the operational staff to provide users with an
|
||||
understanding of the amount of time required for a stored or modified
|
||||
object to become available to the entire environment.
|
||||
|
||||
Performance
|
||||
~~~~~~~~~~~
|
||||
|
||||
Determining the performance of a multi-site installation involves
|
||||
considerations that do not come into play in a single-site deployment.
|
||||
Being a distributed deployment, performance in multi-site deployments
|
||||
may be affected in certain situations.
|
||||
|
||||
Since multi-site systems can be geographically separated, there may be
|
||||
greater latency or jitter when communicating across regions. This can
|
||||
especially impact systems like the OpenStack Identity service when
|
||||
making authentication attempts from regions that do not contain the
|
||||
centralized Identity implementation. It can also affect applications
|
||||
which rely on Remote Procedure Call (RPC) for normal operation. An
|
||||
example of this can be seen in high performance computing workloads.
|
||||
|
||||
Storage availability can also be impacted by the architecture of a
|
||||
multi-site deployment. A centralized Object Storage service requires
|
||||
more time for an object to be available to instances locally in regions
|
||||
where the object was not created. Some applications may need to be tuned
|
||||
to account for this effect. Block Storage does not currently have a
|
||||
method for replicating data across multiple regions, so applications
|
||||
that depend on available block storage need to manually cope with this
|
||||
limitation by creating duplicate block storage entries in each region.
|
||||
|
||||
OpenStack components
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Most OpenStack installations require a bare minimum set of pieces to
|
||||
function. These include the OpenStack Identity (keystone) for
|
||||
authentication, OpenStack Compute (nova) for compute, OpenStack Image
|
||||
service (glance) for image storage, OpenStack Networking (neutron) for
|
||||
networking, and potentially an object store in the form of OpenStack
|
||||
Object Storage (swift). Deploying a multi-site installation also demands
|
||||
extra components in order to coordinate between regions. A centralized
|
||||
Identity service is necessary to provide the single authentication
|
||||
point. A centralized dashboard is also recommended to provide a single
|
||||
login point and a mapping to the API and CLI options available. A
|
||||
centralized Object Storage service may also be used, but will require
|
||||
the installation of the swift proxy service.
|
||||
|
||||
It may also be helpful to install a few extra options in order to
|
||||
facilitate certain use cases. For example, installing Designate may
|
||||
assist in automatically generating DNS domains for each region with an
|
||||
automatically-populated zone full of resource records for each instance.
|
||||
This facilitates using DNS as a mechanism for determining which region
|
||||
will be selected for certain applications.
|
||||
|
||||
Another useful tool for managing a multi-site installation is
|
||||
Orchestration (heat). The Orchestration service allows the use of
|
||||
templates to define a set of instances to be launched together or for
|
||||
scaling existing sets. It can also be used to set up matching or
|
||||
differentiated groupings based on regions. For instance, if an
|
||||
application requires an equally balanced number of nodes across sites,
|
||||
the same heat template can be used to cover each site with small
|
||||
alterations to only the region name.
|
||||
|
Loading…
x
Reference in New Issue
Block a user