operations-guide/doc/openstack-ops/ch_ops_network_troubleshooting.xml

<?xml version="1.0" encoding="UTF-8"?>
<chapter version="5.0" xml:id="network_troubleshooting"
         xmlns="http://docbook.org/ns/docbook"
         xmlns:xlink="http://www.w3.org/1999/xlink"
         xmlns:xi="http://www.w3.org/2001/XInclude"
         xmlns:ns5="http://www.w3.org/2000/svg"
         xmlns:ns4="http://www.w3.org/1998/Math/MathML"
         xmlns:ns3="http://www.w3.org/1999/xhtml"
         xmlns:ns="http://docbook.org/ns/docbook">
  <?dbhtml stop-chunking?>

  <title>Network Troubleshooting</title>

  <para>Network troubleshooting can unfortunately be a very difficult and
  confusing procedure. A network issue can cause a problem at several points
  in the cloud. Using a logical troubleshooting procedure can help mitigate
  the confusion and more quickly isolate where exactly the network issue is.
  This chapter aims to give you the information you need to identify any
  issues for either <literal>nova-network</literal> or OpenStack Networking
  (neutron) with Linux Bridge or Open vSwitch.<indexterm class="singular">
      <primary>OpenStack Networking (neutron)</primary>

      <secondary>troubleshooting</secondary>
    </indexterm><indexterm class="singular">
      <primary>Linux Bridge</primary>

      <secondary>troubleshooting</secondary>
    </indexterm><indexterm class="singular">
      <primary>network troubleshooting</primary>

      <see>troubleshooting</see>
    </indexterm></para>

  <section xml:id="check_interface_states">
    <title>Using "ip a" to Check Interface States</title>

    <para>On compute nodes and nodes running <literal>nova-network</literal>,
    use the following command to see information about interfaces, including
    information about IPs, VLANs, and whether your interfaces are
    up:<indexterm class="singular">
        <primary>ip a command</primary>
      </indexterm><indexterm class="singular">
        <primary>interface states, checking</primary>
      </indexterm><indexterm class="singular">
        <primary>troubleshooting</primary>

        <secondary>checking interface states</secondary>
      </indexterm></para>

    <screen><prompt>#</prompt> <userinput>ip a</userinput></screen>

    <para>If you're encountering any sort of networking difficulty, one good
    initial sanity check is to make sure that your interfaces are up. For
    example:</para>

    <screen><prompt>$</prompt> <userinput>ip a | grep state</userinput>
<computeroutput>1: lo: &lt;LOOPBACK,UP,LOWER_UP&gt; mtu 16436 qdisc noqueue state UNKNOWN
2: eth0: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast state UP
   qlen 1000
3: eth1: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast
   master br100 state UP qlen 1000
4: virbr0: &lt;NO-CARRIER,BROADCAST,MULTICAST,UP&gt; mtu 1500 qdisc noqueue state DOWN
5: br100: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc noqueue state UP
</computeroutput></screen>

    <para>You can safely ignore the state of <literal>virbr0</literal>, which
    is a default bridge created by libvirt and not used by OpenStack.</para>
  </section>

  <section xml:id="nova_network_traffic_in_cloud">
    <title>Visualizing nova-network Traffic in the Cloud</title>

    <para>If you are logged in to an instance and ping an external host—for
    example, Google—the ping packet takes the route shown in <xref
    linkend="traffic-12-1" />.<indexterm class="singular">
        <primary>ping packets</primary>
      </indexterm><indexterm class="singular">
        <primary>troubleshooting</primary>

        <secondary>nova-network traffic</secondary>
      </indexterm></para>

    <figure xml:id="traffic-12-1">
      <title>Traffic route for ping packet</title>

      <mediaobject>
        <imageobject>
          <imagedata fileref="http://git.openstack.org/cgit/openstack/operations-guide/plain/doc/openstack-ops/figures/osog_1201.png"></imagedata>
        </imageobject>
      </mediaobject>
    </figure>

    <orderedlist>
      <listitem>
        <para>The instance generates a packet and places it on the virtual
        Network Interface Card (NIC) inside the instance, such as
        <literal>eth0</literal>.</para>
      </listitem>

      <listitem>
        <para>The packet transfers to the virtual NIC of the compute host,
        such as, <literal>vnet1</literal>. You can find out what vnet NIC is
        being used by looking at the
        <filename>/etc/libvirt/qemu/instance-xxxxxxxx.xml</filename>
        file.</para>
      </listitem>

      <listitem>
        <para>From the vnet NIC, the packet transfers to a bridge on the
        compute node, such as <code>br100</code>.</para>

        <para>If you run FlatDHCPManager, one bridge is on the compute node.
        If you run VlanManager, one bridge exists for each VLAN.</para>

        <para>To see which bridge the packet will use, run the command:
        <screen><prompt>$</prompt> <userinput>brctl show</userinput></screen></para>

        <para>Look for the vnet NIC. You can also reference
        <filename>nova.conf</filename> and look for the
        <code>flat_interface_bridge</code> option.</para>
      </listitem>

      <listitem>
        <para>The packet transfers to the main NIC of the compute node. You
        can also see this NIC in the <literal>brctl</literal> output, or you
        can find it by referencing the <literal>flat_interface</literal>
        option in <filename>nova.conf</filename>.</para>
      </listitem>

      <listitem>
        <para>After the packet is on this NIC, it transfers to the compute
        node's default gateway. The packet is now most likely out of your
        control at this point. The diagram depicts an external gateway.
        However, in the default configuration with multi-host, the compute
        host is the gateway.</para>
      </listitem>
    </orderedlist>

    <para>Reverse the direction to see the path of a ping reply. From this
    path, you can see that a single packet travels across four different NICs.
    If a problem occurs with any of these NICs, a network issue occurs.</para>
  </section>

  <section xml:id="neutron_network_traffic_in_cloud">
    <title>Visualizing OpenStack Networking Service Traffic in the
    Cloud</title>

    <para>The OpenStack Networking Service, neutron, has many more degrees of
    freedom than <literal>nova-network</literal> does because of its pluggable
    backend. It can be configured with open source or vendor proprietary
    plug-ins that control software defined networking (SDN) hardware or
    plug-ins that use Linux native facilities on your hosts, such as Open
    vSwitch or Linux Bridge.<indexterm class="startofrange" xml:id="Topen">
        <primary>troubleshooting</primary>

        <secondary>OpenStack traffic</secondary>
      </indexterm></para>

    <para>The networking chapter of the OpenStack <link
    xlink:href="http://docs.openstack.org/admin-guide-cloud/content/ch_networking.html"
    xlink:title="Cloud Administrator Guide">Cloud Administrator Guide</link>
    shows a variety of networking scenarios and their connection paths. The
    purpose of this section is to give you the tools to troubleshoot the
    various components involved however they are plumbed together in your
    environment.</para>

    <para>For this example, we will use the Open vSwitch (OVS) backend. Other
    backend plug-ins will have very different flow paths. OVS is the most
    popularly deployed network driver, according to the October 2013 OpenStack
    User Survey, with 50 percent more sites using it than the second place
    Linux Bridge driver. We'll describe each step in turn, with <xref
    linkend="neutron-packet-ping" /> for reference.</para>

    <orderedlist>
      <listitem>
        <para>The instance generates a packet and places it on the virtual NIC
        inside the instance, such as eth0.</para>
      </listitem>

      <listitem>
        <para>The packet transfers to a Test Access Point (TAP) device on the
        compute host, such as tap690466bc-92. You can find out what TAP is
        being used by looking at the
        <filename>/etc/libvirt/qemu/instance-xxxxxxxx.xml</filename>
        file.</para>

        <para>The TAP device name is constructed using the first 11 characters
        of the port ID (10 hex digits plus an included '-'), so another means
        of finding the device name is to use the <literal>neutron</literal>
        command. This returns a pipe-delimited list, the first item of which
        is the port ID. For example, to get the port ID associated with IP
        address 10.0.0.10, do this:</para>

        <screen><prompt>#</prompt> <userinput>neutron port-list | grep 10.0.0.10 | cut -d \| -f 2</userinput>
<computeroutput> ff387e54-9e54-442b-94a3-aa4481764f1d</computeroutput></screen>

        <para>Taking the first 11 characters, we can construct a device name
        of tapff387e54-9e from this output.</para>
      </listitem>
    </orderedlist>

    <figure xml:id="neutron-packet-ping">
      <title>Neutron network paths</title>

      <mediaobject>
        <imageobject>
          <imagedata fileref="http://git.openstack.org/cgit/openstack/operations-guide/plain/doc/openstack-ops/figures/osog_1202.png"></imagedata>
        </imageobject>
      </mediaobject>
    </figure>

    <orderedlist continuation="continues" inheritnum="inherit">
      <listitem>
        <para>The TAP device is connected to the integration bridge,
        <code>br-int</code>. This bridge connects all the instance TAP devices
        and any other bridges on the system. In this example, we have
        <code>int-br-eth1</code> and <code>patch-tun</code>.
        <code>int-br-eth1</code> is one half of a veth pair connecting to the
        bridge <code>br-eth1</code>, which handles VLAN networks trunked over
        the physical Ethernet device <code>eth1</code>. <code>patch-tun</code>
        is an Open vSwitch internal port that connects to the
        <code>br-tun</code> bridge for GRE networks.</para>

        <para>The TAP devices and veth devices are normal Linux network
        devices and may be inspected with the usual tools, such as
        <literal>ip</literal> and <literal>tcpdump</literal>. Open vSwitch
        internal devices, such as <code>patch-tun</code>, are only visible
        within the Open vSwitch environment. If you try to run
        <literal>tcpdump -i patch-tun</literal>, it will raise an error,
        saying that the device does not exist.</para>

        <para>It is possible to watch packets on internal interfaces, but it
        does take a little bit of networking gymnastics. First you need to
        create a dummy network device that normal Linux tools can see. Then
        you need to add it to the bridge containing the internal interface you
        want to snoop on. Finally, you need to tell Open vSwitch to mirror all
        traffic to or from the internal port onto this dummy port. After all
        this, you can then run <literal>tcpdump</literal> on the dummy
        interface and see the traffic on the internal port.</para>

        <procedure>
          <title>To capture packets from the <code>patch-tun</code> internal
          interface on integration bridge, <code>br-int</code>:</title>

          <step>
            <para>Create and bring up a dummy interface,
            <code>snooper0</code>:</para>

            <screen><prompt>#</prompt> <userinput>ip link add name snooper0 type dummy</userinput></screen>

            <screen><prompt>#</prompt> <userinput>ip link set dev snooper0 up</userinput>
</screen>
          </step>

          <step>
            <para>Add device <code>snooper0</code> to bridge
            <code>br-int</code>:</para>

            <screen><prompt>#</prompt> <userinput>ovs-vsctl add-port br-int snooper0</userinput>
</screen>
          </step>

          <step>
            <para>Create mirror of <code>patch-tun</code> to
            <code>snooper0</code> (returns UUID of mirror port):</para>

            <screen><prompt>#</prompt> <userinput>ovs-vsctl -- set Bridge br-int mirrors=@m  -- --id=@snooper0 \
get Port snooper0  -- --id=@patch-tun get Port patch-tun  \
-- --id=@m create Mirror name=mymirror select-dst-port=@patch-tun \
select-src-port=@patch-tun output-port=@snooper0</userinput></screen>
          </step>

          <step>
            <para>Profit. You can now see traffic on <code>patch-tun</code> by
            running <literal>tcpdump -i snooper0</literal>.</para>
          </step>

          <step>
            <para>Clean up by clearing all mirrors on <code>br-int</code> and
            deleting the dummy interface:</para>

            <screen><prompt>#</prompt> <userinput>ovs-vsctl clear Bridge br-int mirrors</userinput>
</screen>
<screen><prompt>#</prompt> <userinput>ovs-vsctl del-port br-int snooper0</userinput>
</screen>
            <screen><prompt>#</prompt> <userinput>ip link delete dev snooper0</userinput>
</screen>
          </step>
        </procedure>

        <para>On the integration bridge, networks are distinguished using
        internal VLANs regardless of how the networking service defines them.
        This allows instances on the same host to communicate directly without
        transiting the rest of the virtual, or physical, network. These
        internal VLAN IDs are based on the order they are created on the node
        and may vary between nodes. These IDs are in no way related to the
        segmentation IDs used in the network definition and on the physical
        wire.</para>

        <para>VLAN tags are translated between the external tag defined in the
        network settings, and internal tags in several places. On the
        <code>br-int</code>, incoming packets from the
        <code>int-br-eth1</code> are translated from external tags to internal
        tags. Other translations also happen on the other bridges and will be
        discussed in those sections.</para>

        <?hard-pagebreak ?>

        <procedure>
          <title>To discover which internal VLAN tag is in use for a given
          external VLAN by using the <literal>ovs-ofctl</literal>
          command:</title>

          <step>
            <para>Find the external VLAN tag of the network you're interested
            in. This is the <code>provider:segmentation_id</code> as returned
            by the networking service:</para>

            <screen><prompt>#</prompt> <userinput>neutron net-show --fields provider:segmentation_id &lt;network name&gt;</userinput>
<computeroutput>+---------------------------+--------------------------------------+
| Field                     | Value                                |
+---------------------------+--------------------------------------+
| provider:network_type     | vlan                                 |
| provider:segmentation_id  | 2113                                 |
+---------------------------+--------------------------------------+
</computeroutput></screen>
          </step>

          <step>
            <para>Grep for the <code>provider:segmentation_id</code>, 2113 in
            this case, in the output of <literal>ovs-ofctl dump-flows
            br-int</literal>:</para>

            <screen><prompt>#</prompt> <userinput>ovs-ofctl dump-flows br-int|grep vlan=2113</userinput>
<computeroutput>cookie=0x0, duration=173615.481s, table=0, n_packets=7676140, \
n_bytes=444818637, idle_age=0, hard_age=65534, priority=3, \
in_port=1,dl_vlan=2113 actions=mod_vlan_vid:7,NORMAL
</computeroutput></screen>

            <para>Here you can see packets received on port ID 1 with the VLAN
            tag 2113 are modified to have the internal VLAN tag 7. Digging a
            little deeper, you can confirm that port 1 is in fact
            <code>int-br-eth1</code>:</para>

            <screen><prompt>#</prompt> <userinput>ovs-ofctl show br-int</userinput>
<computeroutput>OFPT_FEATURES_REPLY (xid=0x2): dpid:000022bc45e1914b
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS \
ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC \
SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC \
SET_TP_DST ENQUEUE
 1(int-br-eth1): addr:c2:72:74:7f:86:08
     config:     0
     state:      0
     current:    10GB-FD COPPER
     speed: 10000 Mbps now, 0 Mbps max
 2(patch-tun): addr:fa:24:73:75:ad:cd
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 3(tap9be586e6-79): addr:fe:16:3e:e6:98:56
     config:     0
     state:      0
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 LOCAL(br-int): addr:22:bc:45:e1:91:4b
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
</computeroutput></screen>
          </step>
        </procedure>
      </listitem>

      <listitem>
        <para>The next step depends on whether the virtual network is
        configured to use 802.1q VLAN tags or GRE:</para>

        <orderedlist>
          <listitem>
            <para>VLAN-based networks exit the integration bridge via veth
            interface <code>int-br-eth1</code> and arrive on the bridge
            <code>br-eth1</code> on the other member of the veth pair
            <code>phy-br-eth1</code>. Packets on this interface arrive with
            internal VLAN tags and are translated to external tags in the
            reverse of the process described above:</para>

            <screen><prompt>#</prompt> <userinput>ovs-ofctl dump-flows br-eth1|grep 2113</userinput>
<computeroutput>cookie=0x0, duration=184168.225s, table=0, n_packets=0, n_bytes=0, \
idle_age=65534, hard_age=65534, priority=4,in_port=1,dl_vlan=7 \
actions=mod_vlan_vid:2113,NORMAL</computeroutput></screen>

            <para>Packets, now tagged with the external VLAN tag, then exit
            onto the physical network via <code>eth1</code>. The Layer2 switch
            this interface is connected to must be configured to accept
            traffic with the VLAN ID used. The next hop for this packet must
            also be on the same layer-2 network.</para>
          </listitem>

          <listitem>
            <para>GRE-based networks are passed with <code>patch-tun</code> to
            the tunnel bridge <code>br-tun</code> on interface
            <code>patch-int</code>. This bridge also contains one port for
            each GRE tunnel peer, so one for each compute node and network
            node in your network. The ports are named sequentially from
            <code>gre-1</code> onward.</para>

            <para>Matching <code>gre-&lt;n&gt;</code> interfaces to tunnel
            endpoints is possible by looking at the Open vSwitch state:</para>

            <screen><prompt>#</prompt> <userinput>ovs-vsctl show |grep -A 3 -e Port\ \"gre-</userinput>
<computeroutput>        Port "gre-1"
            Interface "gre-1"
                type: gre
                options: {in_key=flow, local_ip="10.10.128.21", \
                out_key=flow, remote_ip="10.10.128.16"}
</computeroutput></screen>

            <para>In this case, <code>gre-1</code> is a tunnel from IP
            10.10.128.21, which should match a local interface on this node,
            to IP 10.10.128.16 on the remote side.</para>

            <para>These tunnels use the regular routing tables on the host to
            route the resulting GRE packet, so there is no requirement that
            GRE endpoints are all on the same layer-2 network, unlike VLAN
            encapsulation.</para>

            <para>All interfaces on the <code>br-tun</code> are internal to
            Open vSwitch. To monitor traffic on them, you need to set up a
            mirror port as described above for <code>patch-tun</code> in the
            <code>br-int</code> bridge.</para>

            <para>All translation of GRE tunnels to and from internal VLANs
            happens on this bridge.</para>
          </listitem>
        </orderedlist>

        <procedure>
          <title>To discover which internal VLAN tag is in use for a GRE
          tunnel by using the <literal>ovs-ofctl</literal> command:</title>

          <step>
            <para>Find the <code>provider:segmentation_id</code> of the
            network you're interested in. This is the same field used for the
            VLAN ID in VLAN-based networks:</para>

            <screen><prompt>#</prompt> <userinput>neutron net-show --fields provider:segmentation_id &lt;network name&gt;</userinput>
<computeroutput>+--------------------------+-------+
| Field                    | Value |
+--------------------------+-------+
| provider:network_type    | gre   |
| provider:segmentation_id | 3     |
+--------------------------+-------+
</computeroutput></screen>
          </step>

          <step>
            <para>Grep for 0x&lt;<code>provider:segmentation_id</code>&gt;,
            0x3 in this case, in the output of <literal>ovs-ofctl dump-flows
            br-tun</literal>:</para>

            <screen><prompt>#</prompt> <userinput>ovs-ofctl dump-flows br-tun|grep 0x3</userinput>
<computeroutput>cookie=0x0, duration=380575.724s, table=2, n_packets=1800, \
n_bytes=286104, priority=1,tun_id=0x3 \
actions=mod_vlan_vid:1,resubmit(,10)
 cookie=0x0, duration=715.529s, table=20, n_packets=5, \
n_bytes=830, hard_timeout=300,priority=1, \
vlan_tci=0x0001/0x0fff,dl_dst=fa:16:3e:a6:48:24 \
actions=load:0-&gt;NXM_OF_VLAN_TCI[], \
load:0x3-&gt;NXM_NX_TUN_ID[],output:53
 cookie=0x0, duration=193729.242s, table=21, n_packets=58761, \
n_bytes=2618498, dl_vlan=1 actions=strip_vlan,set_tunnel:0x3, \
output:4,output:58,output:56,output:11,output:12,output:47, \
output:13,output:48,output:49,output:44,output:43,output:45, \
output:46,output:30,output:31,output:29,output:28,output:26, \
output:27,output:24,output:25,output:32,output:19,output:21, \
output:59,output:60,output:57,output:6,output:5,output:20, \
output:18,output:17,output:16,output:15,output:14,output:7, \
output:9,output:8,output:53,output:10,output:3,output:2, \
output:38,output:37,output:39,output:40,output:34,output:23, \
output:36,output:35,output:22,output:42,output:41,output:54, \
output:52,output:51,output:50,output:55,output:33
</computeroutput></screen>

            <para>Here, you see three flows related to this GRE tunnel. The
            first is the translation from inbound packets with this tunnel ID
            to internal VLAN ID 1. The second shows a unicast flow to output
            port 53 for packets destined for MAC address fa:16:3e:a6:48:24.
            The third shows the translation from the internal VLAN
            representation to the GRE tunnel ID flooded to all output ports.
            For further details of the flow descriptions, see the man page for
            <literal>ovs-ofctl</literal>. As in the previous VLAN example,
            numeric port IDs can be matched with their named representations
            by examining the output of <literal>ovs-ofctl show
            br-tun</literal>.</para>
          </step>
        </procedure>
      </listitem>

      <listitem>
        <para>The packet is then received on the network node. Note that any
          traffic to the l3-agent or dhcp-agent will be visible only within
          their network namespace. Watching any interfaces outside those
          namespaces, even those that carry the network traffic, will only show
          broadcast packets like Address Resolution Protocols (ARPs), but
          unicast traffic to the router or DHCP address will not be seen. See
            <link
            xlink:href="http://docs.openstack.org/openstack-ops/content/network_troubleshooting.html#dealing_with_netns"
            >Dealing with Network Namespaces</link> for detail on how to run
          commands within these namespaces.</para>

        <para>Alternatively, it is possible to configure VLAN-based networks to
          use external routers rather than the l3-agent shown here, so long as
          the external router is on the same VLAN:</para>

        <orderedlist>
          <listitem>
            <para>VLAN-based networks are received as tagged packets on a
              physical network interface, <code>eth1</code> in this example.
              Just as on the compute node, this interface is a member of the
                <code>br-eth1</code> bridge.</para>
          </listitem>

          <listitem>
            <para>GRE-based networks will be passed to the tunnel bridge
                <code>br-tun</code>, which behaves just like the GRE interfaces
              on the compute node.</para>
          </listitem>
        </orderedlist>
      </listitem>

      <listitem>
        <para>Next, the packets from either input go through the integration
        bridge, again just as on the compute node.</para>
      </listitem>

      <listitem>
        <para>The packet then makes it to the l3-agent. This is actually
        another TAP device within the router's network namespace. Router
        namespaces are named in the form
        <code>qrouter-&lt;router-uuid&gt;</code>. Running <literal>ip
        a</literal> within the namespace will show the TAP device name,
        qr-e6256f7d-31 in this example:</para>

        <screen><prompt>#</prompt> <userinput>ip netns exec qrouter-e521f9d0-a1bd-4ff4-bc81-78a60dd88fe5 ip a|grep state</userinput>
<computeroutput>10: qr-e6256f7d-31: &lt;BROADCAST,UP,LOWER_UP&gt; mtu 1500 qdisc noqueue \
    state UNKNOWN
11: qg-35916e1f-36: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 \
    qdisc pfifo_fast state UNKNOWN qlen 500
28: lo: &lt;LOOPBACK,UP,LOWER_UP&gt; mtu 16436 qdisc noqueue state UNKNOWN
</computeroutput></screen>
      </listitem>

      <listitem>
        <para>The <code>qg-&lt;n&gt;</code> interface in the l3-agent router
        namespace sends the packet on to its next hop through device
        <code>eth2</code> on the external bridge <code>br-ex</code>. This
        bridge is constructed similarly to <code>br-eth1</code> and may be
        inspected in the same way.</para>
      </listitem>

      <listitem>
        <para>This external bridge also includes a physical network interface,
        <code>eth2</code> in this example, which finally lands the packet on
        the external network destined for an external router or
        destination.</para>
      </listitem>

      <listitem>
        <para>DHCP agents running on OpenStack networks run in namespaces
        similar to the l3-agents. DHCP namespaces are named
        <code>qdhcp-&lt;uuid&gt;</code> and have a TAP device on the
        integration bridge. Debugging of DHCP issues usually involves working
        inside this network namespace.<indexterm class="endofrange"
        startref="Topen" /></para>
      </listitem>
    </orderedlist>
  </section>

  <section xml:id="failure_in_path">
    <title>Finding a Failure in the Path</title>

    <para>Use ping to quickly find where a failure exists in the network path.
    In an instance, first see whether you can ping an external host, such as
    google.com. If you can, then there shouldn't be a network problem at
    all.</para>

    <para>If you can't, try pinging the IP address of the compute node where
    the instance is hosted. If you can ping this IP, then the problem is
    somewhere between the compute node and that compute node's gateway.</para>

    <para>If you can't ping the IP address of the compute node, the problem is
    between the instance and the compute node. This includes the bridge
    connecting the compute node's main NIC with the vnet NIC of the
    instance.</para>

    <para>One last test is to launch a second instance and see whether the two
    instances can ping each other. If they can, the issue might be related to
    the firewall on the compute node.<indexterm class="singular">
        <primary>path failures</primary>
      </indexterm><indexterm class="singular">
        <primary>troubleshooting</primary>

        <secondary>detecting path failures</secondary>
      </indexterm></para>
  </section>

  <section xml:id="tcpdump">
    <title>tcpdump</title>

    <para>One great, although very in-depth, way of troubleshooting network
    issues is to use <literal>tcpdump</literal>. We recommended using
    <literal>tcpdump</literal> at several points along the network path to
    correlate where a problem might be. If you prefer working with a GUI,
    either live or by using a <literal>tcpdump</literal> capture, do also
    check out <link xlink:href="http://www.wireshark.org/"
    xlink:title="Wireshark">Wireshark</link>.<indexterm class="singular">
        <primary>tcpdump</primary>
      </indexterm></para>

    <para>For example, run the following command:</para>

    <screen>tcpdump -i any -n -v \ 'icmp[icmptype] = icmp-echoreply or icmp[icmptype] =
icmp-echo'</screen>

    <para>Run this on the command line of the following areas:</para>

    <orderedlist>
      <listitem>
        <para>An external server outside of the cloud</para>
      </listitem>

      <listitem>
        <para>A compute node</para>
      </listitem>

      <listitem>
        <para>An instance running on that compute node</para>
      </listitem>
    </orderedlist>

    <para>In this example, these locations have the following IP
    addresses:</para>

    <screen><computeroutput>Instance
                          10.0.2.24
                          203.0.113.30
                          Compute Node
                          10.0.0.42
                          203.0.113.34
                          External Server
                          1.2.3.4</computeroutput></screen>

    <para>Next, open a new shell to the instance and then ping the external
    host where <literal>tcpdump</literal> is running. If the network path to
    the external server and back is fully functional, you see something like
    the following:</para>

    <para>On the external server:</para>

    <screen><computeroutput>12:51:42.020227 IP (tos 0x0, ttl 61, id 0, offset 0, flags [DF], \
proto ICMP (1), length 84)
    203.0.113.30 &gt; 1.2.3.4: ICMP echo request, id 24895, seq 1, length 64
12:51:42.020255 IP (tos 0x0, ttl 64, id 8137, offset 0, flags [none], \
proto ICMP (1), length 84)
    1.2.3.4 &gt; 203.0.113.30: ICMP echo reply, id 24895, seq 1, \
    length 64</computeroutput></screen>

    <para>On the compute node:</para>

    <screen><computeroutput>12:51:42.019519 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], \
proto ICMP (1), length 84)
    10.0.2.24 &gt; 1.2.3.4: ICMP echo request, id 24895, seq 1, length 64
12:51:42.019519 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], \
proto ICMP (1), length 84)
    10.0.2.24 &gt; 1.2.3.4: ICMP echo request, id 24895, seq 1, length 64
12:51:42.019545 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], \
proto ICMP (1), length 84)
    203.0.113.30 &gt; 1.2.3.4: ICMP echo request, id 24895, seq 1, length 64
12:51:42.019780 IP (tos 0x0, ttl 62, id 8137, offset 0, flags [none], \
proto ICMP (1), length 84)
    1.2.3.4 &gt; 203.0.113.30: ICMP echo reply, id 24895, seq 1, length 64
12:51:42.019801 IP (tos 0x0, ttl 61, id 8137, offset 0, flags [none], \
proto ICMP (1), length 84)
    1.2.3.4 &gt; 10.0.2.24: ICMP echo reply, id 24895, seq 1, length 64
12:51:42.019807 IP (tos 0x0, ttl 61, id 8137, offset 0, flags [none], \
proto ICMP (1), length 84)
    1.2.3.4 &gt; 10.0.2.24: ICMP echo reply, id 24895, seq 1, length 64</computeroutput></screen>

    <para>On the instance:</para>

    <screen><computeroutput>12:51:42.020974 IP (tos 0x0, ttl 61, id 8137, offset 0, flags [none], \
proto ICMP (1), length 84)
 1.2.3.4 &gt; 10.0.2.24: ICMP echo reply, id 24895, seq 1, length 64</computeroutput></screen>

    <para>Here, the external server received the ping request and sent a ping
    reply. On the compute node, you can see that both the ping and ping reply
    successfully passed through. You might also see duplicate packets on the
    compute node, as seen above, because <literal>tcpdump</literal> captured
    the packet on both the bridge and outgoing interface.</para>
  </section>

  <section xml:id="iptables">
    <title>iptables</title>

    <para>Through <literal>nova-network</literal>, OpenStack Compute
    automatically manages iptables, including forwarding packets to and from
    instances on a compute node, forwarding floating IP traffic, and managing
    security group rules.<indexterm class="singular">
        <primary>iptables</primary>
      </indexterm><indexterm class="singular">
        <primary>troubleshooting</primary>

        <secondary>iptables</secondary>
      </indexterm></para>

    <para>Run the following command to view the current iptables
    configuration:</para>

    <screen><prompt>#</prompt> <userinput>iptables-save</userinput></screen>

    <note>
      <para>If you modify the configuration, it reverts the next time you
      restart <literal>nova-network</literal>. You must use OpenStack to
      manage iptables.</para>
    </note>
  </section>

  <section xml:id="network_config_database">
    <title>Network Configuration in the Database for nova-network</title>

    <para>With <literal>nova-network</literal>, the nova database table
    contains a few tables with networking information:<indexterm
        class="singular">
        <primary>databases</primary>

        <secondary>nova-network troubleshooting</secondary>
      </indexterm><indexterm class="singular">
        <primary>troubleshooting</primary>

        <secondary>nova-network database</secondary>
      </indexterm></para>

    <variablelist>
      <varlistentry>
        <term><literal>fixed_ips</literal></term>

        <listitem>
          <para>Contains each possible IP address for the subnet(s) added to
          Compute. This table is related to the <literal>instances</literal>
          table by way of the <literal>fixed_ips.instance_uuid</literal>
          column.</para>
        </listitem>
      </varlistentry>

      <varlistentry>
        <term><literal>floating_ips</literal></term>

        <listitem>
          <para>Contains each floating IP address that was added to Compute.
          This table is related to the <literal>fixed_ips</literal> table by
          way of the <literal>floating_ips.fixed_ip_id</literal>
          column.</para>
        </listitem>
      </varlistentry>

      <varlistentry>
        <term><literal>instances</literal></term>

        <listitem>
          <para>Not entirely network specific, but it contains information
          about the instance that is utilizing the <literal>fixed_ip</literal>
          and optional <literal>floating_ip</literal>.</para>
        </listitem>
      </varlistentry>
    </variablelist>

    <para>From these tables, you can see that a floating IP is technically
    never directly related to an instance; it must always go through a fixed
    IP.</para>

    <section xml:id="dissasociate_floating_ip">
      <title>Manually Disassociating a Floating IP</title>

      <para>Sometimes an instance is terminated but the floating IP was not
      correctly de-associated from that instance. Because the database is in
      an inconsistent state, the usual tools to dissaociate the IP no longer
      work. To fix this, you must manually update the database.<indexterm
          class="singular">
          <primary>IP addresses</primary>

          <secondary>floating</secondary>
        </indexterm><indexterm class="singular">
          <primary>floating IP address</primary>
        </indexterm></para>

      <para>First, find the UUID of the instance in question:</para>

      <screen><prompt>mysql&gt;</prompt> <userinput>select uuid from instances where hostname = 'hostname';</userinput></screen>

      <para>Next, find the fixed IP entry for that UUID:</para>

      <screen><prompt>mysql&gt;</prompt> <userinput>select * from fixed_ips where instance_uuid = '&lt;uuid&gt;';</userinput></screen>

      <para>You can now get the related floating IP entry:</para>

      <screen><prompt>mysql&gt;</prompt> <userinput>select * from floating_ips where fixed_ip_id = '&lt;fixed_ip_id&gt;';</userinput></screen>

      <para>And finally, you can disassociate the floating IP:</para>

      <screen><prompt>mysql&gt;</prompt> <userinput>update floating_ips set fixed_ip_id = NULL, host = NULL where
       fixed_ip_id = '&lt;fixed_ip_id&gt;';</userinput></screen>

      <para>You can optionally also deallocate the IP from the user's
      pool:</para>

      <screen><prompt>mysql&gt;</prompt> <userinput>update floating_ips set project_id = NULL where
       fixed_ip_id = '&lt;fixed_ip_id&gt;';</userinput></screen>
    </section>
  </section>

  <section xml:id="debug_dhcp_issues">
    <title>Debugging DHCP Issues with nova-network</title>

    <para>One common networking problem is that an instance boots successfully
    but is not reachable because it failed to obtain an IP address from
    dnsmasq, which is the DHCP server that is launched by the
    <literal>nova-network</literal> service.<indexterm class="singular">
        <primary>DHCP (Dynamic Host Configuration Protocol)</primary>

        <secondary>debugging</secondary>
      </indexterm><indexterm class="singular">
        <primary>troubleshooting</primary>

        <secondary>nova-network DHCP</secondary>
      </indexterm></para>

    <para>The simplest way to identify that this is the problem with your
    instance is to look at the console output of your instance. If DHCP
    failed, you can retrieve the console log by doing:</para>

    <screen><prompt>$</prompt> <userinput>nova console-log &lt;instance name or uuid&gt;</userinput></screen>

    <para>If your instance failed to obtain an IP through DHCP, some messages
    should appear in the console. For example, for the Cirros image, you see
    output that looks like the following:</para>

    <screen><computeroutput>udhcpc (v1.17.2) started
Sending discover...
Sending discover...
Sending discover...
No lease, forking to background
starting DHCP forEthernet interface eth0 [ [1;32mOK[0;39m ]
cloud-setup: checking http://169.254.169.254/2009-04-04/meta-data/instance-id
wget: can't connect to remote host (169.254.169.254): Network is
unreachable</computeroutput></screen>

    <para>After you establish that the instance booted properly, the task is
    to figure out where the failure is.</para>

    <para>A DHCP problem might be caused by a misbehaving dnsmasq process.
    First, debug by checking logs and then restart the dnsmasq processes only
    for that project (tenant). In VLAN mode, there is a dnsmasq process for
    each tenant. Once you have restarted targeted dnsmasq processes, the
    simplest way to rule out dnsmasq causes is to kill all of the dnsmasq
    processes on the machine and restart <literal>nova-network</literal>. As a
    last resort, do this as root:</para>

    <screen><prompt>#</prompt> <userinput>killall dnsmasq
# restart nova-network</userinput></screen>

    <note>
      <para>Use <literal>openstack-nova-network</literal> on
      RHEL/CentOS/Fedora but <literal>nova-network</literal> on
      Ubuntu/Debian.</para>
    </note>

    <para>Several minutes after <literal>nova-network</literal> is restarted,
    you should see new dnsmasq processes running:</para>

    <?hard-pagebreak ?>

    <screen><prompt>#</prompt> <userinput>ps aux | grep dnsmasq</userinput></screen>

    <screen><computeroutput>nobody 3735 0.0 0.0 27540 1044 ? S 15:40 0:00 /usr/sbin/dnsmasq --strict-order \
    --bind-interfaces --conf-file= \
    --domain=novalocal --pid-file=/var/lib/nova/networks/nova-br100.pid \
    --listen-address=192.168.100.1 --except-interface=lo \
    --dhcp-range=set:'novanetwork',192.168.100.2,static,120s \
    --dhcp-lease-max=256 \
    --dhcp-hostsfile=/var/lib/nova/networks/nova-br100.conf \
    --dhcp-script=/usr/bin/nova-dhcpbridge --leasefile-ro
root 3736 0.0 0.0 27512 444 ? S 15:40 0:00 /usr/sbin/dnsmasq --strict-order \
     --bind-interfaces --conf-file= \
     --domain=novalocal --pid-file=/var/lib/nova/networks/nova-br100.pid \
     --listen-address=192.168.100.1 --except-interface=lo \
     --dhcp-range=set:'novanetwork',192.168.100.2,static,120s \
     --dhcp-lease-max=256
     --dhcp-hostsfile=/var/lib/nova/networks/nova-br100.conf
     --dhcp-script=/usr/bin/nova-dhcpbridge --leasefile-ro</computeroutput></screen>

    <para>If your instances are still not able to obtain IP addresses, the
    next thing to check is whether dnsmasq is seeing the DHCP requests from
    the instance. On the machine that is running the dnsmasq process, which is
    the compute host if running in multi-host mode, look at
    <literal>/var/log/syslog</literal> to see the dnsmasq output. If dnsmasq
    is seeing the request properly and handing out an IP, the output looks
    like this:</para>

    <screen><computeroutput>Feb 27 22:01:36 mynode dnsmasq-dhcp[2438]: DHCPDISCOVER(br100) fa:16:3e:56:0b:6f
Feb 27 22:01:36 mynode dnsmasq-dhcp[2438]: DHCPOFFER(br100) 192.168.100.3
                                           fa:16:3e:56:0b:6f
Feb 27 22:01:36 mynode dnsmasq-dhcp[2438]: DHCPREQUEST(br100) 192.168.100.3
                                           fa:16:3e:56:0b:6f
Feb 27 22:01:36 mynode dnsmasq-dhcp[2438]: DHCPACK(br100) 192.168.100.3
fa:16:3e:56:0b:6f test</computeroutput></screen>

    <para>If you do not see the <literal>DHCPDISCOVER</literal>, a problem
    exists with the packet getting from the instance to the machine running
    dnsmasq. If you see all of the preceding output and your instances are
    still not able to obtain IP addresses, then the packet is able to get from
    the instance to the host running dnsmasq, but it is not able to make the
    return trip.</para>

    <para>You might also see a message such as this:</para>

    <screen><computeroutput>Feb 27 22:01:36 mynode dnsmasq-dhcp[25435]: DHCPDISCOVER(br100)
            fa:16:3e:78:44:84 no address available</computeroutput></screen>

    <para>This may be a dnsmasq and/or <literal>nova-network</literal> related
    issue. (For the preceding example, the problem happened to be that dnsmasq
    did not have any more IP addresses to give away because there were no more
    fixed IPs available in the OpenStack Compute database.)</para>

    <para>If there's a suspicious-looking dnsmasq log message, take a look at
    the command-line arguments to the dnsmasq processes to see if they look
    correct:</para>

    <screen><prompt>$</prompt> <userinput>ps aux | grep dnsmasq</userinput></screen>

    <para>The output looks something like the following:</para>

    <screen><computeroutput>108 1695 0.0 0.0 25972 1000 ? S Feb26 0:00 /usr/sbin/dnsmasq
-u libvirt-dnsmasq \
--strict-order --bind-interfaces
 --pid-file=/var/run/libvirt/network/default.pid --conf-file=
 --except-interface lo --listen-address 192.168.122.1
 --dhcp-range 192.168.122.2,192.168.122.254
 --dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases
 --dhcp-lease-max=253 --dhcp-no-override
nobody 2438 0.0 0.0 27540 1096 ? S Feb26 0:00 /usr/sbin/dnsmasq --strict-order
--bind-interfaces --conf-file=
 --domain=novalocal --pid-file=/var/lib/nova/networks/nova-br100.pid
 --listen-address=192.168.100.1
 --except-interface=lo \
 --dhcp-range=set:'novanetwork',192.168.100.2,static,120s
 --dhcp-lease-max=256
 --dhcp-hostsfile=/var/lib/nova/networks/nova-br100.conf
 --dhcp-script=/usr/bin/nova-dhcpbridge --leasefile-ro
  root 2439 0.0 0.0 27512 472 ? S Feb26 0:00 /usr/sbin/dnsmasq --strict-order
--bind-interfaces --conf-file=
 --domain=novalocal --pid-file=/var/lib/nova/networks/nova-br100.pid
 --listen-address=192.168.100.1
 --except-interface=lo
 --dhcp-range=set:'novanetwork',192.168.100.2,static,120s
 --dhcp-lease-max=256
 --dhcp-hostsfile=/var/lib/nova/networks/nova-br100.conf
 --dhcp-script=/usr/bin/nova-dhcpbridge --leasefile-ro</computeroutput></screen>

    <para>The output shows three different dnsmasq processes. The dnsmasq
    process that has the DHCP subnet range of 192.168.122.0 belongs to libvirt
    and can be ignored. The other two dnsmasq processes belong to
    <literal>nova-network</literal>. The two processes are actually
    related—one is simply the parent process of the other. The arguments of
    the dnsmasq processes should correspond to the details you configured
    <literal>nova-network</literal> with.</para>

    <para>If the problem does not seem to be related to dnsmasq itself, at
    this point use <code>tcpdump</code> on the interfaces to determine where
    the packets are getting lost.</para>

    <para>DHCP traffic uses UDP. The client sends from port 68 to port 67 on
    the server. Try to boot a new instance and then systematically listen on
    the NICs until you identify the one that isn't seeing the traffic. To use
    <code>tcpdump</code> to listen to ports 67 and 68 on br100, you would
    do:</para>

    <screen><prompt>#</prompt> <userinput>tcpdump -i br100 -n port 67 or port 68</userinput></screen>

    <para>You should be doing sanity checks on the interfaces using command
    such as <code>ip a</code> and <code>brctl show</code> to ensure that the
    interfaces are actually up and configured the way that you think that they
    are.</para>
  </section>

  <section xml:id="debugging_dns_issues">
    <title>Debugging DNS Issues</title>

    <para>If you are able to use SSH to log into an instance, but it takes a
    very long time (on the order of a minute) to get a prompt, then you might
    have a DNS issue. The reason a DNS issue can cause this problem is that
    the SSH server does a reverse DNS lookup on the IP address that you are
    connecting from. If DNS lookup isn't working on your instances, then you
    must wait for the DNS reverse lookup timeout to occur for the SSH login
    process to complete.<indexterm class="singular">
        <primary>DNS (Domain Name Server, Service or System)</primary>

        <secondary>debugging</secondary>
      </indexterm><indexterm class="singular">
        <primary>troubleshooting</primary>

        <secondary>DNS issues</secondary>
      </indexterm></para>

    <para>When debugging DNS issues, start by making sure that the host where
    the dnsmasq process for that instance runs is able to correctly resolve.
    If the host cannot resolve, then the instances won't be able to
    either.</para>

    <para>A quick way to check whether DNS is working is to resolve a hostname
    inside your instance by using the <code>host</code> command. If DNS is
    working, you should see:</para>

    <screen><prompt>$</prompt> <userinput>host openstack.org</userinput>
<computeroutput>openstack.org has address 174.143.194.225
openstack.org mail is handled by 10 mx1.emailsrvr.com.
openstack.org mail is handled by 20 mx2.emailsrvr.com.</computeroutput></screen>

    <para>If you're running the Cirros image, it doesn't have the "host"
    program installed, in which case you can use ping to try to access a
    machine by hostname to see whether it resolves. If DNS is working, the
    first line of ping would be:</para>

    <screen><prompt>$</prompt> <userinput>ping openstack.org</userinput>
<computeroutput>PING openstack.org (174.143.194.225): 56 data bytes</computeroutput></screen>

    <para>If the instance fails to resolve the hostname, you have a DNS
    problem. For example:</para>

    <screen><prompt>$</prompt> <userinput>ping openstack.org</userinput>
<computeroutput>ping: bad address 'openstack.org'</computeroutput></screen>

    <para>In an OpenStack cloud, the dnsmasq process acts as the DNS server
    for the instances in addition to acting as the DHCP server. A misbehaving
    dnsmasq process may be the source of DNS-related issues inside the
    instance. As mentioned in the previous section, the simplest way to rule
    out a misbehaving dnsmasq process is to kill all the dnsmasq processes on
    the machine and restart <literal>nova-network</literal>. However, be aware
    that this command affects everyone running instances on this node,
    including tenants that have not seen the issue. As a last resort, as
    root:</para>

    <screen><prompt>#</prompt> <userinput>killall dnsmasq</userinput>
<prompt>#</prompt> <userinput>restart nova-network</userinput></screen>

    <para>After the dnsmasq processes start again, check whether DNS is
    working.</para>

    <para>If restarting the dnsmasq process doesn't fix the issue, you might
    need to use <code>tcpdump</code> to look at the packets to trace where the
    failure is. The DNS server listens on UDP port 53. You should see the DNS
    request on the bridge (such as, br100) of your compute node. Let's say you
    start listening with <code>tcpdump</code> on the compute node:</para>

    <screen><prompt>#</prompt> <userinput>tcpdump -i br100 -n -v udp port 53
tcpdump: listening on br100, link-type EN10MB (Ethernet), capture size 65535
bytes</userinput></screen>

    <para>Then, if you use SSH to log into your instance and try <code>ping
    openstack.org</code>, you should see something like:</para>

    <screen><computeroutput>16:36:18.807518 IP (tos 0x0, ttl 64, id 56057, offset 0, flags [DF],
proto UDP (17), length 59)
 192.168.100.4.54244 &gt; 192.168.100.1.53: 2+ A? openstack.org. (31)
16:36:18.808285 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
proto UDP (17), length 75)
 192.168.100.1.53 &gt; 192.168.100.4.54244: 2 1/0/0 openstack.org. A
 174.143.194.225 (47)</computeroutput></screen>
  </section>

  <section xml:id="trouble_shooting_ovs">
    <title>Troubleshooting Open vSwitch</title>

    <para>Open vSwitch as used in the previous OpenStack Networking Service
    examples is a full-featured multilayer virtual switch licensed under the
    open source Apache 2.0 license. Full documentation can be found at <link
    xlink:href="http://openvswitch.org/">the project's website</link>. In
    practice, given the preceding configuration, the most common issues are
    being sure that the required bridges (<code>br-int</code>,
    <code>br-tun</code>, <code>br-ex</code>, etc.) exist and have the proper
    ports connected to them.<indexterm class="singular">
        <primary>Open vSwitch</primary>

        <secondary>troubleshooting</secondary>
      </indexterm><indexterm class="singular">
        <primary>troubleshooting</primary>

        <secondary>Open vSwitch</secondary>
      </indexterm></para>

    <para>The Open vSwitch driver should and usually does manage this
    automatically, but it is useful to know how to do this by hand with the
    <literal>ovs-vsctl</literal> command. This command has many more
    subcommands than we will use here; see the man page or use
    <literal>ovs-vsctl --help</literal> for the full listing.</para>

    <para>To list the bridges on a system, use <literal>ovs-vsctl
    list-br</literal>. This example shows a compute node that has an internal
    bridge and a tunnel bridge. VLAN networks are trunked through the
    <code>eth1</code> network interface:</para>

    <screen><prompt>#</prompt> <userinput>ovs-vsctl list-br</userinput>
<computeroutput>br-int
br-tun
eth1-br
      </computeroutput></screen>

    <para>Working from the physical interface inwards, we can see the chain of
    ports and bridges. First, the bridge <code>eth1-br</code>, which contains
    the physical network interface <literal>eth1</literal> and the virtual
    interface <code>phy-eth1-br</code>:</para>

    <screen><prompt>#</prompt> <userinput>ovs-vsctl list-ports eth1-br</userinput>
<computeroutput>eth1
phy-eth1-br
      </computeroutput></screen>

    <para>Next, the internal bridge, <code>br-int</code>, contains
    <code>int-eth1-br</code>, which pairs with <code>phy-eth1-br</code> to
    connect to the physical network shown in the previous bridge,
    <code>patch-tun</code>, which is used to connect to the GRE tunnel bridge
    and the TAP devices that connect to the instances currently running on the
    system:</para>

    <screen><prompt>#</prompt> <userinput>ovs-vsctl list-ports br-int</userinput>
<computeroutput>int-eth1-br
patch-tun
tap2d782834-d1
tap690466bc-92
tap8a864970-2d
      </computeroutput></screen>

    <para>The tunnel bridge, <code>br-tun</code>, contains the
    <code>patch-int</code> interface and <code>gre-&lt;N&gt;</code> interfaces
    for each peer it connects to via GRE, one for each compute and network
    node in your cluster:</para>

    <screen><prompt>#</prompt> <userinput>ovs-vsctl list-ports br-tun</userinput>
<computeroutput>patch-int
gre-1
.
.
.
gre-&lt;N&gt;
      </computeroutput></screen>

    <para>If any of these links is missing or incorrect, it suggests a
    configuration error. Bridges can be added with <literal>ovs-vsctl
    add-br</literal>, and ports can be added to bridges with
    <literal>ovs-vsctl add-port</literal>. While running these by hand can be
    useful debugging, it is imperative that manual changes that you intend to
    keep be reflected back into your configuration files.</para>
  </section>

  <section xml:id="dealing_with_netns">
    <title>Dealing with Network Namespaces</title>

    <para>Linux network namespaces are a kernel feature the networking service
    uses to support multiple isolated layer-2 networks with overlapping IP
    address ranges. The support may be disabled, but it is on by default. If
    it is enabled in your environment, your network nodes will run their
    dhcp-agents and l3-agents in isolated namespaces. Network interfaces and
    traffic on those interfaces will not be visible in the default
    namespace.<indexterm class="singular">
        <primary>network namespaces, troubleshooting</primary>
      </indexterm><indexterm class="singular">
        <primary>namespaces, troubleshooting</primary>
      </indexterm><indexterm class="singular">
        <primary>troubleshooting</primary>

        <secondary>network namespaces</secondary>
      </indexterm></para>

    <para>To see whether you are using namespaces, run <literal>ip
    netns</literal>:</para>

    <screen><prompt>#</prompt> <userinput>ip netns</userinput>
<computeroutput>qdhcp-e521f9d0-a1bd-4ff4-bc81-78a60dd88fe5
qdhcp-a4d00c60-f005-400e-a24c-1bf8b8308f98
qdhcp-fe178706-9942-4600-9224-b2ae7c61db71
qdhcp-0a1d0a27-cffa-4de3-92c5-9d3fd3f2e74d
qrouter-8a4ce760-ab55-4f2f-8ec5-a2e858ce0d39
      </computeroutput></screen>

    <para>L3-agent router namespaces are named
    <literal>qrouter-<replaceable>&lt;router_uuid&gt;</replaceable></literal>,
    and dhcp-agent name spaces are named
    <literal>qdhcp-</literal><literal><replaceable>&lt;net_uuid&gt;</replaceable></literal>.
    This output shows a network node with four networks running dhcp-agents,
    one of which is also running an l3-agent router. It's important to know
    which network you need to be working in. A list of existing networks and
    their UUIDs can be obtained by running <literal>neutron
    net-list</literal> with administrative credentials.</para>

    <para>Once you've determined which namespace you need to work in, you can
    use any of the debugging tools mention earlier by prefixing the command
    with <literal>ip netns exec &lt;namespace&gt;</literal>. For example, to
    see what network interfaces exist in the first qdhcp namespace returned
    above, do this:</para>

    <screen><prompt>#</prompt> <userinput>ip netns exec qdhcp-e521f9d0-a1bd-4ff4-bc81-78a60dd88fe5 ip a</userinput>
<computeroutput>10: tape6256f7d-31: &lt;BROADCAST,UP,LOWER_UP&gt; mtu 1500 qdisc noqueue state UNKNOWN
    link/ether fa:16:3e:aa:f7:a1 brd ff:ff:ff:ff:ff:ff
    inet 10.0.1.100/24 brd 10.0.1.255 scope global tape6256f7d-31
    inet 169.254.169.254/16 brd 169.254.255.255 scope global tape6256f7d-31
    inet6 fe80::f816:3eff:feaa:f7a1/64 scope link
       valid_lft forever preferred_lft forever
28: lo: &lt;LOOPBACK,UP,LOWER_UP&gt; mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
      </computeroutput></screen>

    <para>From this you see that the DHCP server on that network is using the
    tape6256f7d-31 device and has an IP address of 10.0.1.100. Seeing the
    address 169.254.169.254, you can also see that the dhcp-agent is running a
    metadata-proxy service. Any of the commands mentioned previously in this
    chapter can be run in the same way. It is also possible to run a shell,
    such as <literal>bash</literal>, and have an interactive session within
    the namespace. In the latter case, exiting the shell returns you to the
    top-level default namespace.</para>
  </section>

  <section xml:id="ops-network-troubleshooting-summary">
    <title>Summary</title>

    <para>The authors have spent too much time looking at packet dumps in
    order to distill this information for you. We trust that, following the
    methods outlined in this chapter, you will have an easier time! Aside from
    working with the tools and steps above, don't forget that sometimes an
    extra pair of eyes goes a long way to assist.</para>
  </section>
</chapter>