Merge "[arch-design] Convert multi-site sections"

2015-11-17 08:09:22 +00:00 · 2015-11-17 08:09:22 +00:00 · 55b172c189
commit 55b172c189
parent a755efc631 e55a2fa20d
2 changed files with 312 additions and 0 deletions
--- a/doc/arch-design-rst/source/multi-site-operational-considerations.rst
+++ b/doc/arch-design-rst/source/multi-site-operational-considerations.rst
@ -2,3 +2,155 @@
 Operational considerations
 ==========================

+Multi-site OpenStack cloud deployment using regions requires that the
+service catalog contains per-region entries for each service deployed
+other than the Identity service. Most off-the-shelf OpenStack deployment
+tools have limited support for defining multiple regions in this
+fashion.
+
+Deployers should be aware of this and provide the appropriate
+customization of the service catalog for their site either manually, or
+by customizing deployment tools in use.
+
+.. note::
+
+   As of the Kilo release, documentation for implementing this feature
+   is in progress. See this bug for more information:
+   https://bugs.launchpad.net/openstack-manuals/+bug/1340509.
+
+Licensing
+~~~~~~~~~
+
+Multi-site OpenStack deployments present additional licensing
+considerations over and above regular OpenStack clouds, particularly
+where site licenses are in use to provide cost efficient access to
+software licenses. The licensing for host operating systems, guest
+operating systems, OpenStack distributions (if applicable),
+software-defined infrastructure including network controllers and
+storage systems, and even individual applications need to be evaluated.
+
+Topics to consider include:
+
+* The definition of what constitutes a site in the relevant licenses,
+  as the term does not necessarily denote a geographic or otherwise
+  physically isolated location.
+
+* Differentiations between "hot" (active) and "cold" (inactive) sites,
+  where significant savings may be made in situations where one site is
+  a cold standby for disaster recovery purposes only.
+
+* Certain locations might require local vendors to provide support and
+  services for each site which may vary with the licensing agreement in
+  place.
+
+Logging and monitoring
+~~~~~~~~~~~~~~~~~~~~~~
+
+Logging and monitoring does not significantly differ for a multi-site
+OpenStack cloud. The tools described in the `Logging and monitoring
+chapter <http://docs.openstack.org/openstack-ops/content/logging_monitoring.html>`__
+of the Operations Guide remain applicable. Logging and monitoring can be
+provided on a per-site basis, and in a common centralized location.
+
+When attempting to deploy logging and monitoring facilities to a
+centralized location, care must be taken with the load placed on the
+inter-site networking links.
+
+Upgrades
+~~~~~~~~
+
+In multi-site OpenStack clouds deployed using regions, sites are
+independent OpenStack installations which are linked together using
+shared centralized services such as OpenStack Identity. At a high level
+the recommended order of operations to upgrade an individual OpenStack
+environment is (see the `Upgrades
+chapter <http://docs.openstack.org/openstack-ops/content/ops_upgrades-general-steps.html>`__
+of the Operations Guide for details):
+
+#. Upgrade the OpenStack Identity service (keystone).
+
+#. Upgrade the OpenStack Image service (glance).
+
+#. Upgrade OpenStack Compute (nova), including networking components.
+
+#. Upgrade OpenStack Block Storage (cinder).
+
+#. Upgrade the OpenStack dashboard (horizon).
+
+The process for upgrading a multi-site environment is not significantly
+different:
+
+#. Upgrade the shared OpenStack Identity service (keystone) deployment.
+
+#. Upgrade the OpenStack Image service (glance) at each site.
+
+#. Upgrade OpenStack Compute (nova), including networking components, at
+   each site.
+
+#. Upgrade OpenStack Block Storage (cinder) at each site.
+
+#. Upgrade the OpenStack dashboard (horizon), at each site or in the
+   single central location if it is shared.
+
+Compute upgrades within each site can also be performed in a rolling
+fashion. Compute controller services (API, Scheduler, and Conductor) can
+be upgraded prior to upgrading of individual compute nodes. This allows
+operations staff to keep a site operational for users of Compute
+services while performing an upgrade.
+
+Quota management
+~~~~~~~~~~~~~~~~
+
+Quotas are used to set operational limits to prevent system capacities
+from being exhausted without notification. They are currently enforced
+at the tenant (or project) level rather than at the user level.
+
+Quotas are defined on a per-region basis. Operators can define identical
+quotas for tenants in each region of the cloud to provide a consistent
+experience, or even create a process for synchronizing allocated quotas
+across regions. It is important to note that only the operational limits
+imposed by the quotas will be aligned consumption of quotas by users
+will not be reflected between regions.
+
+For example, given a cloud with two regions, if the operator grants a
+user a quota of 25 instances in each region then that user may launch a
+total of 50 instances spread across both regions. They may not, however,
+launch more than 25 instances in any single region.
+
+For more information on managing quotas refer to the `Managing projects
+and users
+chapter <http://docs.openstack.org/openstack-ops/content/projects_users.html>`__
+of the OpenStack Operators Guide.
+
+Policy management
+~~~~~~~~~~~~~~~~~
+
+OpenStack provides a default set of Role Based Access Control (RBAC)
+policies, defined in a ``policy.json`` file, for each service. Operators
+edit these files to customize the policies for their OpenStack
+installation. If the application of consistent RBAC policies across
+sites is a requirement, then it is necessary to ensure proper
+synchronization of the ``policy.json`` files to all installations.
+
+This must be done using system administration tools such as rsync as
+functionality for synchronizing policies across regions is not currently
+provided within OpenStack.
+
+Documentation
+~~~~~~~~~~~~~
+
+Users must be able to leverage cloud infrastructure and provision new
+resources in the environment. It is important that user documentation is
+accessible by users to ensure they are given sufficient information to
+help them leverage the cloud. As an example, by default OpenStack
+schedules instances on a compute node automatically. However, when
+multiple regions are available, the end user needs to decide in which
+region to schedule the new instance. The dashboard presents the user
+with the first region in your configuration. The API and CLI tools do
+not execute commands unless a valid region is specified. It is therefore
+important to provide documentation to your users describing the region
+layout as well as calling out that quotas are region-specific. If a user
+reaches his or her quota in one region, OpenStack does not automatically
+build new instances in another. Documenting specific examples helps
+users understand how to operate the cloud, thereby reducing calls and
+tickets filed with the help desk.
--- a/doc/arch-design-rst/source/multi-site-technical-considerations.rst
+++ b/doc/arch-design-rst/source/multi-site-technical-considerations.rst
@ -2,3 +2,163 @@
 Technical considerations
 ========================

+There are many technical considerations to take into account with regard
+to designing a multi-site OpenStack implementation. An OpenStack cloud
+can be designed in a variety of ways to handle individual application
+needs. A multi-site deployment has additional challenges compared to
+single site installations and therefore is a more complex solution.
+
+When determining capacity options be sure to take into account not just
+the technical issues, but also the economic or operational issues that
+might arise from specific decisions.
+
+Inter-site link capacity describes the capabilities of the connectivity
+between the different OpenStack sites. This includes parameters such as
+bandwidth, latency, whether or not a link is dedicated, and any business
+policies applied to the connection. The capability and number of the
+links between sites determine what kind of options are available for
+deployment. For example, if two sites have a pair of high-bandwidth
+links available between them, it may be wise to configure a separate
+storage replication network between the two sites to support a single
+Swift endpoint and a shared Object Storage capability between them. An
+example of this technique, as well as a configuration walk-through, is
+available at
+http://docs.openstack.org/developer/swift/replication_network.html#dedicated-replication-network.
+Another option in this scenario is to build a dedicated set of tenant
+private networks across the secondary link, using overlay networks with
+a third party mapping the site overlays to each other.
+
+The capacity requirements of the links between sites is driven by
+application behavior. If the link latency is too high, certain
+applications that use a large number of small packets, for example RPC
+calls, may encounter issues communicating with each other or operating
+properly. Additionally, OpenStack may encounter similar types of issues.
+To mitigate this, Identity service call timeouts can be tuned to prevent
+issues authenticating against a central Identity service.
+
+Another network capacity consideration for a multi-site deployment is
+the amount and performance of overlay networks available for tenant
+networks. If using shared tenant networks across zones, it is imperative
+that an external overlay manager or controller be used to map these
+overlays together. It is necessary to ensure the amount of possible IDs
+between the zones are identical.
+
+.. note::
+
+   As of the Kilo release, OpenStack Networking was not capable of
+   managing tunnel IDs across installations. So if one site runs out of
+   IDs, but another does not, that tenant's network is unable to reach
+   the other site.
+
+Capacity can take other forms as well. The ability for a region to grow
+depends on scaling out the number of available compute nodes. This topic
+is covered in greater detail in the section for compute-focused
+deployments. However, it may be necessary to grow cells in an individual
+region, depending on the size of your cluster and the ratio of virtual
+machines per hypervisor.
+
+A third form of capacity comes in the multi-region-capable components of
+OpenStack. Centralized Object Storage is capable of serving objects
+through a single namespace across multiple regions. Since this works by
+accessing the object store through swift proxy, it is possible to
+overload the proxies. There are two options available to mitigate this
+issue:
+
+* Deploy a large number of swift proxies. The drawback is that the
+  proxies are not load-balanced and a large file request could
+  continually hit the same proxy.
+
+* Add a caching HTTP proxy and load balancer in front of the swift
+  proxies. Since swift objects are returned to the requester via HTTP,
+  this load balancer would alleviate the load required on the swift
+  proxies.
+
+Utilization
+~~~~~~~~~~~
+
+While constructing a multi-site OpenStack environment is the goal of
+this guide, the real test is whether an application can utilize it.
+
+The Identity service is normally the first interface for OpenStack users
+and is required for almost all major operations within OpenStack.
+Therefore, it is important that you provide users with a single URL for
+Identity service authentication, and document the configuration of
+regions within the Identity service. Each of the sites defined in your
+installation is considered to be a region in Identity nomenclature. This
+is important for the users, as it is required to define the region name
+when providing actions to an API endpoint or in the dashboard.
+
+Load balancing is another common issue with multi-site installations.
+While it is still possible to run HAproxy instances with
+Load-Balancer-as-a-Service, these are defined to a specific region. Some
+applications can manage this using internal mechanisms. Other
+applications may require the implementation of an external system,
+including global services load balancers or anycast-advertised DNS.
+
+Depending on the storage model chosen during site design, storage
+replication and availability are also a concern for end-users. If an
+application can support regions, then it is possible to keep the object
+storage system separated by region. In this case, users who want to have
+an object available to more than one region need to perform cross-site
+replication. However, with a centralized swift proxy, the user may need
+to benchmark the replication timing of the Object Storage back end.
+Benchmarking allows the operational staff to provide users with an
+understanding of the amount of time required for a stored or modified
+object to become available to the entire environment.
+
+Performance
+~~~~~~~~~~~
+
+Determining the performance of a multi-site installation involves
+considerations that do not come into play in a single-site deployment.
+Being a distributed deployment, performance in multi-site deployments
+may be affected in certain situations.
+
+Since multi-site systems can be geographically separated, there may be
+greater latency or jitter when communicating across regions. This can
+especially impact systems like the OpenStack Identity service when
+making authentication attempts from regions that do not contain the
+centralized Identity implementation. It can also affect applications
+which rely on Remote Procedure Call (RPC) for normal operation. An
+example of this can be seen in high performance computing workloads.
+
+Storage availability can also be impacted by the architecture of a
+multi-site deployment. A centralized Object Storage service requires
+more time for an object to be available to instances locally in regions
+where the object was not created. Some applications may need to be tuned
+to account for this effect. Block Storage does not currently have a
+method for replicating data across multiple regions, so applications
+that depend on available block storage need to manually cope with this
+limitation by creating duplicate block storage entries in each region.
+
+OpenStack components
+~~~~~~~~~~~~~~~~~~~~
+
+Most OpenStack installations require a bare minimum set of pieces to
+function. These include the OpenStack Identity (keystone) for
+authentication, OpenStack Compute (nova) for compute, OpenStack Image
+service (glance) for image storage, OpenStack Networking (neutron) for
+networking, and potentially an object store in the form of OpenStack
+Object Storage (swift). Deploying a multi-site installation also demands
+extra components in order to coordinate between regions. A centralized
+Identity service is necessary to provide the single authentication
+point. A centralized dashboard is also recommended to provide a single
+login point and a mapping to the API and CLI options available. A
+centralized Object Storage service may also be used, but will require
+the installation of the swift proxy service.
+
+It may also be helpful to install a few extra options in order to
+facilitate certain use cases. For example, installing Designate may
+assist in automatically generating DNS domains for each region with an
+automatically-populated zone full of resource records for each instance.
+This facilitates using DNS as a mechanism for determining which region
+will be selected for certain applications.
+
+Another useful tool for managing a multi-site installation is
+Orchestration (heat). The Orchestration service allows the use of
+templates to define a set of instances to be launched together or for
+scaling existing sets. It can also be used to set up matching or
+differentiated groupings based on regions. For instance, if an
+application requires an equally balanced number of nodes across sites,
+the same heat template can be used to cover each site with small
+alterations to only the region name.