Adding 'running slowly' troubleshooting section

Added in Steve Deaton's content about troubleshooting a slow cloud. Also, address the broken link. Change-Id: Iadf7d2df62e9d4d77e0c36cb33467af3546bb2cb Closes-Bug: #1251088 Co-Authored-By: Steven Deaton <sdeaton2@gmail.com>
2016-03-09 12:38:52 +10:00 · 2016-03-09 12:38:52 +10:00 · a15d78f652
commit a15d78f652
parent b08db66706
1 changed files with 122 additions and 1 deletions
--- a/doc/openstack-ops/ch_ops_maintenance.xml
+++ b/doc/openstack-ops/ch_ops_maintenance.xml
@ -899,7 +899,7 @@ inner join nova.instances on cinder.volumes.instance_uuid=nova.instances.uuid
      xlink:href="https://github.com/opscode/openstack-chef-repo">OpenStack Chef recipes</link>.
      Other newer configuration tools include <link
      xlink:href="https://juju.ubuntu.com/">Juju</link>, <link
-      xlink:href="http://www.ansible.com/home">Ansible</link>, and <link
+      xlink:href="https://www.ansible.com/">Ansible</link>, and <link
      xlink:href="http://www.saltstack.com/">Salt</link>; and more mature
      configuration management tools include <link
      xlink:href="http://cfengine.com/">CFEngine</link> and <link
@ -1330,6 +1330,127 @@ sql_connection = mysql+pymysql://cinder:password@cloud.example.com/cinder
  <?hard-pagebreak ?>
  <section xml:id="runningslow">
    <?dbhtml stop-chunking?>
    <title>What to do when things are running slowly</title>
    <para>
      When you are getting slow responses from various services, it can be
      hard to know where to start looking. The first thing to check is the
      extent of the slowness: is it specific to a single service, or varied
      among different services? If your problem is isolated to a specific
      service, it can temporarily be fixed by restarting the service, but that
      is often only a fix for the symptom and not the actual problem.
    </para>
    <para>
      This is a collection of ideas from experienced operators on common
      things to look at that may be the cause of slowness. It is not, however,
      designed to be an exhaustive list.
    </para>
    <section xml:id="runningslow_keystone">
      <?dbhtml stop-chunking?>
      <title>OpenStack Identity service</title>
      <para>
        If OpenStack Identity is responding slowly, it could be due to the
        token table getting large. This can be fixed by running the
        <command>keystone-manage token_flush</command> command.
      </para>
      <para>
        Additionally, for Identity-related issues, try the tips in
        <xref linkend="runningslow_sql" />.
      </para>
    </section>
    <section xml:id="runningslow_glance">
      <?dbhtml stop-chunking?>
      <title>OpenStack Image service</title>
      <para>
        OpenStack Image service can be slowed down by things related to the
        Identity service, but the Image service itself can be slowed down if
        connectivity to the back-end storage in use is slow or otherwise
        problematic. For example, your back-end NFS server might have gone
        down.
      </para>
    </section>
    <section xml:id="runningslow_cinder">
      <?dbhtml stop-chunking?>
      <title>OpenStack Block Storage service</title>
      <para>
        OpenStack Block Storage service is similar to the Image service, so
        start by checking Identity-related services, and the back-end storage.
        Additionally, both the Block Storage and Image services rely on AMQP
        and SQL functionality, so consider these when debugging.
      </para>
    </section>
    <section xml:id="runningslow_nova">
      <?dbhtml stop-chunking?>
      <title>OpenStack Compute service</title>
      <para>
        Services related to OpenStack Compute are normally fairly fast and
        rely on a couple of backend services: Identity for authentication and
        authorization), and AMQP for interoperability. Any slowness related to
        services is normally related to one of these. Also, as with all other
        services, SQL is used extensively.
      </para>
    </section>
    <section xml:id="runningslow_neutron">
      <?dbhtml stop-chunking?>
      <title>OpenStack Networking service</title>
      <para>
        Slowness in the OpenStack Networking service can be caused by services
        that it relies upon, but it can also be related to either physical or
        virtual networking. For example: network namespaces that do not exist
        or are not tied to interfaces correctly; DHCP daemons that have hung
        or are not running; a cable being physically disconnected; a switch
        not being configured correctly. When debugging Networking service
        problems, begin by verifying all physical networking functionality
        (switch configuration, physical cabling, etc.). After the physical
        networking is verified, check to be sure all of the Networking
        services are running (neutron-server, neutron-dhcp-agent, etc.), then
        check on AMQP and SQL back ends.
      </para>
    </section>
    <section xml:id="runningslow_amqp">
      <?dbhtml stop-chunking?>
      <title>AMQP broker</title>
      <para>
        Regardless of which AMQP broker you use, such as RabbitMQ, there are
        common issues which not only slow down operations, but can also cause
        real problems. Sometimes messages queued for services stay on the
        queues and are not consumed. This can be due to dead or stagnant
        services and can be commonly cleared up by either restarting the
        AMQP-related services or the OpenStack service in question.
      </para>
    </section>
    <section xml:id="runningslow_sql">
      <?dbhtml stop-chunking?>
      <title>SQL back end</title>
      <para>
        Whether you use SQLite or an RDBMS (such as MySQL), SQL
        interoperability is essential to a functioning OpenStack environment.
        A large or fragmented SQLite file can cause slowness when using files
        as a back end. A locked or long-running query can cause delays for
        most RDBMS services. In this case, do not kill the query immediately,
        but look into it to see if it is a problem with something that is
        hung, or something that is just taking a long time to run and needs to
        finish on its own. The administration of an RDBMS is outside the scope
        of this document, but it should be noted that a properly functioning
        RDBMS is essential to most OpenStack services.
      </para>
    </section>
  </section>
  <?hard-pagebreak ?>
  <section xml:id="uninstalling">
    <?dbhtml stop-chunking?>