Stone 037c99f0b7 Fault Management doc

Added Data Networks toctree

Changed case on doc title in top level index - changed doc directory to
fault-mgmt.

Added Distributed Cloud section.

Broke out "OpenStack Fault Management Overview" statement about remote log
collection to conditionally included file.

Incorporated patch 6 review comments. Also implemented rST :abbr:
for first instance of SNMP in each file.

Changed port number and community string in two SNMP walk examples.

Change-Id: I1afd71265e752c4c9a54bf2dc9a173b3e17332a7
Signed-off-by: Stone <ronald.stone@windriver.com>

2020-11-27 14:13:00 -05:00

3.9 KiB

Raw Blame History

800 Series Alarm Messages

The system inventory and maintenance service reports system changes with different degrees of severity. Use the reported alarms to monitor the overall health of the system.

Alarm ID: 800.001	Storage Alarm Condition: 1 mons down, quorum 1,2 controller-1,storage-0
Entity Instance	cluster=<dist-fs-uuid>
Degrade Affecting Severity:	None
Severity:	C/M*
Proposed Repair Action	If problem persists, contact next level of support.

Alarm ID: 800.003	Storage Alarm Condition: Quota/Space mismatch for the <tiername> tier. The sum of Ceph pool quotas does not match the tier size.
Entity Instance	cluster=<dist-fs-uuid>.tier=<tiername>
Degrade Affecting Severity:	None
Severity:	m
Proposed Repair Action	Update ceph storage pool quotas to use all available tier space.

Alarm ID: 800.010	Potential data loss. No available OSDs in storage replication group.
Entity Instance	cluster=<dist-fs-uuid>.peergroup=<group-x>
Degrade Affecting Severity:	None
Severity:	C*
Proposed Repair Action	Ensure storage hosts from replication group are unlocked and available. Check if OSDs of each storage host are up and running. If problem persists contact next level of support.

Alarm ID: 800.011	Loss of replication in peergroup.
Entity Instance	cluster=<dist-fs-uuid>.peergroup=<group-x>
Degrade Affecting Severity:	None
Severity:	M*
Proposed Repair Action	Ensure storage hosts from replication group are unlocked and available. Check if OSDs of each storage host are up and running. If problem persists contact next level of support.

Alarm ID: 800.102	Storage Alarm Condition: PV configuration <error/failed to apply> on <hostname>. Reason: <detailed reason>.
Entity Instance	pv=<pv_uuid>
Degrade Affecting Severity:	None
Severity:	C/M*
Proposed Repair Action	Remove failed PV and associated Storage Device then recreate them.

Alarm ID: 800.103	Storage Alarm Condition: [ Metadata usage for LVM thin pool <VG name>/<Pool name> exceeded threshold and automatic extension failed Metadata usage for LVM thin pool <VG name>/<Pool name> exceeded threshold ]; threshold x%, actual y%.
Entity Instance	<hostname>.lvmthinpool=<VG name>/<Pool name>
Degrade Affecting Severity:	None
Severity:	C*
Proposed Repair Action	Increase Storage Space Allotment for Cinder on the 'lvm' backend. Consult the user documentation for more details. If problem persists, contact next level of support.

Alarm ID: 800.104	Storage Alarm Condition: <storage-backend-name> configuration failed to apply on host: <host-uuid>.
Degrade Affecting Severity:	None
Severity:	C*
Proposed Repair Action	Update backend setting to reapply configuration. Consult the user documentation for more details. If problem persists, contact next level of support.

3.9 KiB Raw Blame History

800 Series Alarm Messages

3.9 KiB

Raw Blame History