diff --git a/doc/arch-design-rst/source/multi-site-operational-considerations.rst b/doc/arch-design-rst/source/multi-site-operational-considerations.rst index 3e27954f03..63ec050698 100644 --- a/doc/arch-design-rst/source/multi-site-operational-considerations.rst +++ b/doc/arch-design-rst/source/multi-site-operational-considerations.rst @@ -2,3 +2,155 @@ Operational considerations ========================== +Multi-site OpenStack cloud deployment using regions requires that the +service catalog contains per-region entries for each service deployed +other than the Identity service. Most off-the-shelf OpenStack deployment +tools have limited support for defining multiple regions in this +fashion. + +Deployers should be aware of this and provide the appropriate +customization of the service catalog for their site either manually, or +by customizing deployment tools in use. + +.. note:: + + As of the Kilo release, documentation for implementing this feature + is in progress. See this bug for more information: + https://bugs.launchpad.net/openstack-manuals/+bug/1340509. + +Licensing +~~~~~~~~~ + +Multi-site OpenStack deployments present additional licensing +considerations over and above regular OpenStack clouds, particularly +where site licenses are in use to provide cost efficient access to +software licenses. The licensing for host operating systems, guest +operating systems, OpenStack distributions (if applicable), +software-defined infrastructure including network controllers and +storage systems, and even individual applications need to be evaluated. + +Topics to consider include: + +* The definition of what constitutes a site in the relevant licenses, + as the term does not necessarily denote a geographic or otherwise + physically isolated location. + +* Differentiations between "hot" (active) and "cold" (inactive) sites, + where significant savings may be made in situations where one site is + a cold standby for disaster recovery purposes only. + +* Certain locations might require local vendors to provide support and + services for each site which may vary with the licensing agreement in + place. + +Logging and monitoring +~~~~~~~~~~~~~~~~~~~~~~ + +Logging and monitoring does not significantly differ for a multi-site +OpenStack cloud. The tools described in the `Logging and monitoring +chapter `__ +of the Operations Guide remain applicable. Logging and monitoring can be +provided on a per-site basis, and in a common centralized location. + +When attempting to deploy logging and monitoring facilities to a +centralized location, care must be taken with the load placed on the +inter-site networking links. + +Upgrades +~~~~~~~~ + +In multi-site OpenStack clouds deployed using regions, sites are +independent OpenStack installations which are linked together using +shared centralized services such as OpenStack Identity. At a high level +the recommended order of operations to upgrade an individual OpenStack +environment is (see the `Upgrades +chapter `__ +of the Operations Guide for details): + +#. Upgrade the OpenStack Identity service (keystone). + +#. Upgrade the OpenStack Image service (glance). + +#. Upgrade OpenStack Compute (nova), including networking components. + +#. Upgrade OpenStack Block Storage (cinder). + +#. Upgrade the OpenStack dashboard (horizon). + +The process for upgrading a multi-site environment is not significantly +different: + +#. Upgrade the shared OpenStack Identity service (keystone) deployment. + +#. Upgrade the OpenStack Image service (glance) at each site. + +#. Upgrade OpenStack Compute (nova), including networking components, at + each site. + +#. Upgrade OpenStack Block Storage (cinder) at each site. + +#. Upgrade the OpenStack dashboard (horizon), at each site or in the + single central location if it is shared. + +Compute upgrades within each site can also be performed in a rolling +fashion. Compute controller services (API, Scheduler, and Conductor) can +be upgraded prior to upgrading of individual compute nodes. This allows +operations staff to keep a site operational for users of Compute +services while performing an upgrade. + +Quota management +~~~~~~~~~~~~~~~~ + +Quotas are used to set operational limits to prevent system capacities +from being exhausted without notification. They are currently enforced +at the tenant (or project) level rather than at the user level. + +Quotas are defined on a per-region basis. Operators can define identical +quotas for tenants in each region of the cloud to provide a consistent +experience, or even create a process for synchronizing allocated quotas +across regions. It is important to note that only the operational limits +imposed by the quotas will be aligned consumption of quotas by users +will not be reflected between regions. + +For example, given a cloud with two regions, if the operator grants a +user a quota of 25 instances in each region then that user may launch a +total of 50 instances spread across both regions. They may not, however, +launch more than 25 instances in any single region. + +For more information on managing quotas refer to the `Managing projects +and users +chapter `__ +of the OpenStack Operators Guide. + +Policy management +~~~~~~~~~~~~~~~~~ + +OpenStack provides a default set of Role Based Access Control (RBAC) +policies, defined in a ``policy.json`` file, for each service. Operators +edit these files to customize the policies for their OpenStack +installation. If the application of consistent RBAC policies across +sites is a requirement, then it is necessary to ensure proper +synchronization of the ``policy.json`` files to all installations. + +This must be done using system administration tools such as rsync as +functionality for synchronizing policies across regions is not currently +provided within OpenStack. + +Documentation +~~~~~~~~~~~~~ + +Users must be able to leverage cloud infrastructure and provision new +resources in the environment. It is important that user documentation is +accessible by users to ensure they are given sufficient information to +help them leverage the cloud. As an example, by default OpenStack +schedules instances on a compute node automatically. However, when +multiple regions are available, the end user needs to decide in which +region to schedule the new instance. The dashboard presents the user +with the first region in your configuration. The API and CLI tools do +not execute commands unless a valid region is specified. It is therefore +important to provide documentation to your users describing the region +layout as well as calling out that quotas are region-specific. If a user +reaches his or her quota in one region, OpenStack does not automatically +build new instances in another. Documenting specific examples helps +users understand how to operate the cloud, thereby reducing calls and +tickets filed with the help desk. diff --git a/doc/arch-design-rst/source/multi-site-technical-considerations.rst b/doc/arch-design-rst/source/multi-site-technical-considerations.rst index ced93841fe..ce135a0906 100644 --- a/doc/arch-design-rst/source/multi-site-technical-considerations.rst +++ b/doc/arch-design-rst/source/multi-site-technical-considerations.rst @@ -2,3 +2,163 @@ Technical considerations ======================== +There are many technical considerations to take into account with regard +to designing a multi-site OpenStack implementation. An OpenStack cloud +can be designed in a variety of ways to handle individual application +needs. A multi-site deployment has additional challenges compared to +single site installations and therefore is a more complex solution. + +When determining capacity options be sure to take into account not just +the technical issues, but also the economic or operational issues that +might arise from specific decisions. + +Inter-site link capacity describes the capabilities of the connectivity +between the different OpenStack sites. This includes parameters such as +bandwidth, latency, whether or not a link is dedicated, and any business +policies applied to the connection. The capability and number of the +links between sites determine what kind of options are available for +deployment. For example, if two sites have a pair of high-bandwidth +links available between them, it may be wise to configure a separate +storage replication network between the two sites to support a single +Swift endpoint and a shared Object Storage capability between them. An +example of this technique, as well as a configuration walk-through, is +available at +http://docs.openstack.org/developer/swift/replication_network.html#dedicated-replication-network. +Another option in this scenario is to build a dedicated set of tenant +private networks across the secondary link, using overlay networks with +a third party mapping the site overlays to each other. + +The capacity requirements of the links between sites is driven by +application behavior. If the link latency is too high, certain +applications that use a large number of small packets, for example RPC +calls, may encounter issues communicating with each other or operating +properly. Additionally, OpenStack may encounter similar types of issues. +To mitigate this, Identity service call timeouts can be tuned to prevent +issues authenticating against a central Identity service. + +Another network capacity consideration for a multi-site deployment is +the amount and performance of overlay networks available for tenant +networks. If using shared tenant networks across zones, it is imperative +that an external overlay manager or controller be used to map these +overlays together. It is necessary to ensure the amount of possible IDs +between the zones are identical. + +.. note:: + + As of the Kilo release, OpenStack Networking was not capable of + managing tunnel IDs across installations. So if one site runs out of + IDs, but another does not, that tenant's network is unable to reach + the other site. + +Capacity can take other forms as well. The ability for a region to grow +depends on scaling out the number of available compute nodes. This topic +is covered in greater detail in the section for compute-focused +deployments. However, it may be necessary to grow cells in an individual +region, depending on the size of your cluster and the ratio of virtual +machines per hypervisor. + +A third form of capacity comes in the multi-region-capable components of +OpenStack. Centralized Object Storage is capable of serving objects +through a single namespace across multiple regions. Since this works by +accessing the object store through swift proxy, it is possible to +overload the proxies. There are two options available to mitigate this +issue: + +* Deploy a large number of swift proxies. The drawback is that the + proxies are not load-balanced and a large file request could + continually hit the same proxy. + +* Add a caching HTTP proxy and load balancer in front of the swift + proxies. Since swift objects are returned to the requester via HTTP, + this load balancer would alleviate the load required on the swift + proxies. + +Utilization +~~~~~~~~~~~ + +While constructing a multi-site OpenStack environment is the goal of +this guide, the real test is whether an application can utilize it. + +The Identity service is normally the first interface for OpenStack users +and is required for almost all major operations within OpenStack. +Therefore, it is important that you provide users with a single URL for +Identity service authentication, and document the configuration of +regions within the Identity service. Each of the sites defined in your +installation is considered to be a region in Identity nomenclature. This +is important for the users, as it is required to define the region name +when providing actions to an API endpoint or in the dashboard. + +Load balancing is another common issue with multi-site installations. +While it is still possible to run HAproxy instances with +Load-Balancer-as-a-Service, these are defined to a specific region. Some +applications can manage this using internal mechanisms. Other +applications may require the implementation of an external system, +including global services load balancers or anycast-advertised DNS. + +Depending on the storage model chosen during site design, storage +replication and availability are also a concern for end-users. If an +application can support regions, then it is possible to keep the object +storage system separated by region. In this case, users who want to have +an object available to more than one region need to perform cross-site +replication. However, with a centralized swift proxy, the user may need +to benchmark the replication timing of the Object Storage back end. +Benchmarking allows the operational staff to provide users with an +understanding of the amount of time required for a stored or modified +object to become available to the entire environment. + +Performance +~~~~~~~~~~~ + +Determining the performance of a multi-site installation involves +considerations that do not come into play in a single-site deployment. +Being a distributed deployment, performance in multi-site deployments +may be affected in certain situations. + +Since multi-site systems can be geographically separated, there may be +greater latency or jitter when communicating across regions. This can +especially impact systems like the OpenStack Identity service when +making authentication attempts from regions that do not contain the +centralized Identity implementation. It can also affect applications +which rely on Remote Procedure Call (RPC) for normal operation. An +example of this can be seen in high performance computing workloads. + +Storage availability can also be impacted by the architecture of a +multi-site deployment. A centralized Object Storage service requires +more time for an object to be available to instances locally in regions +where the object was not created. Some applications may need to be tuned +to account for this effect. Block Storage does not currently have a +method for replicating data across multiple regions, so applications +that depend on available block storage need to manually cope with this +limitation by creating duplicate block storage entries in each region. + +OpenStack components +~~~~~~~~~~~~~~~~~~~~ + +Most OpenStack installations require a bare minimum set of pieces to +function. These include the OpenStack Identity (keystone) for +authentication, OpenStack Compute (nova) for compute, OpenStack Image +service (glance) for image storage, OpenStack Networking (neutron) for +networking, and potentially an object store in the form of OpenStack +Object Storage (swift). Deploying a multi-site installation also demands +extra components in order to coordinate between regions. A centralized +Identity service is necessary to provide the single authentication +point. A centralized dashboard is also recommended to provide a single +login point and a mapping to the API and CLI options available. A +centralized Object Storage service may also be used, but will require +the installation of the swift proxy service. + +It may also be helpful to install a few extra options in order to +facilitate certain use cases. For example, installing Designate may +assist in automatically generating DNS domains for each region with an +automatically-populated zone full of resource records for each instance. +This facilitates using DNS as a mechanism for determining which region +will be selected for certain applications. + +Another useful tool for managing a multi-site installation is +Orchestration (heat). The Orchestration service allows the use of +templates to define a set of instances to be launched together or for +scaling existing sets. It can also be used to set up matching or +differentiated groupings based on regions. For instance, if an +application requires an equally balanced number of nodes across sites, +the same heat template can be used to cover each site with small +alterations to only the region name.