Nicholas Kuechler 48b2d856a5 docs: adds link to oslo_messaging_notifications documentation
Change-Id: I1bf907247559c9945def34dc1b7fccb634324636
2024-10-23 09:51:43 -05:00

204 lines
8.6 KiB
ReStructuredText

Redfish hardware metrics
========================
The ``redfish`` hardware type supports sending hardware metrics via the
:doc:`notification system </admin/notifications>`. The ``event_type`` field of
a notification will be set to ``hardware.redfish.metrics`` (where ``redfish``
may be replaced by a different driver name for hardware types derived from it).
Enabling redfish hardware metrics requires some ironic.conf configuration file
updates:
.. code-block:: ini
[oslo_messaging_notifications]
# The Drivers(s) to handle sending notifications. Possible
# values are messaging, messagingv2, routing, log, test, noop,
# prometheus_exporter (multi valued)
# Example using the messagingv2 driver:
driver = messagingv2
[sensor_data]
send_sensor_data = true
[metrics]
backend = collector
A full list of ``[oslo_messaging_notifications]`` configuration options can be found in the
`oslo.messaging documentation <https://docs.openstack.org/oslo.messaging/latest/configuration/opts.html#oslo-messaging-notifications>`_
The payload of each notification is a mapping where keys are sensor types
(``Fan``, ``Temperature``, ``Power`` or ``Drive``) and values are also mappings
from sensor identifiers to the sensor data.
Each ``Fan`` payload contains the following fields:
* ``max_reading_range``, ``min_reading_range`` - the range of reading values.
* ``reading``, ``reading_units`` - the current reading and its units.
* ``serial_number`` - the serial number of the fan sensor.
* ``physical_context`` - the context of the sensor, such as ``SystemBoard``.
Can also be ``null`` or just ``Fan``.
Each ``Temperature`` payload contains the following fields:
* ``max_reading_range_temp``, ``min_reading_range_temp`` - the range of reading
values.
* ``reading_celsius`` - the current reading in degrees Celsius.
* ``sensor_number`` - the number of the temperature sensor.
* ``physical_context`` - the context of the sensor, usually reflecting its
location, such as ``CPU``, ``Memory``, ``Intake``, ``PowerSupply`` or
``SystemBoard``. Can also be ``null``.
Each ``Power`` payload contains the following fields:
* ``power_capacity_watts``, ``line_input_voltage``, ``last_power_output_watts``
* ``serial_number`` - the serial number of the power source.
* ``state`` - the power source state: ``enabled``, ``absent`` (``null`` if
unknown).
* ``health`` - the power source health status: ``ok``, ``warning``,
``critical`` (``null`` if unknown).
Each ``Drive`` payload contains the following fields:
* ``name`` - the drive name in the BMC (this is **not** a Linux device name
like ``/dev/sda``).
* ``model`` - the drive model (if known).
* ``capacity_bytes`` - the drive capacity in bytes.
* ``state`` - the drive state: ``enabled``, ``absent`` (``null`` if unknown).
* ``health`` - the drive health status: ``ok``, ``warning``, ``critical``
(``null`` if unknown).
.. note::
Drive payloads are often not available on real hardware.
.. warning::
Metrics collection works by polling several Redfish endpoints on the target
BMC. Some older BMC implementations may have hard rate limits or misbehave
under load. If this is the case for you, you need to reduce the metrics
collection frequency or completely disable it.
Example (Dell)
--------------
.. code-block:: json
{
"message_id": "578628d2-9967-4d33-97ca-7e7c27a76abc",
"publisher_id": "conductor-1.example.com",
"event_type": "hardware.redfish.metrics",
"priority": "INFO",
"payload": {
"message_id": "60653d54-87aa-43b8-a4ed-96d568dd4e96",
"instance_uuid": null,
"node_uuid": "aea161dc-2e96-4535-b003-ca70a4a7bb6d",
"timestamp": "2023-10-22T15:50:26.841964",
"node_name": "dell-430",
"event_type": "hardware.redfish.metrics.update",
"payload": {
"Fan": {
"0x17||Fan.Embedded.1A@System.Embedded.1": {
"identity": "0x17||Fan.Embedded.1A",
"max_reading_range": null,
"min_reading_range": 720,
"reading": 1680,
"reading_units": "RPM",
"serial_number": null,
"physical_context": "SystemBoard",
"state": "enabled",
"health": "ok"
},
"0x17||Fan.Embedded.2A@System.Embedded.1": {
"identity": "0x17||Fan.Embedded.2A",
"max_reading_range": null,
"min_reading_range": 720,
"reading": 3120,
"reading_units": "RPM",
"serial_number": null,
"physical_context": "SystemBoard",
"state": "enabled",
"health": "ok"
},
"0x17||Fan.Embedded.2B@System.Embedded.1": {
"identity": "0x17||Fan.Embedded.2B",
"max_reading_range": null,
"min_reading_range": 720,
"reading": 3000,
"reading_units": "RPM",
"serial_number": null,
"physical_context": "SystemBoard",
"state": "enabled",
"health": "ok"
}
},
"Temperature": {
"iDRAC.Embedded.1#SystemBoardInletTemp@System.Embedded.1": {
"identity": "iDRAC.Embedded.1#SystemBoardInletTemp",
"max_reading_range_temp": 47,
"min_reading_range_temp": -7,
"reading_celsius": 28,
"physical_context": "SystemBoard",
"sensor_number": 4,
"state": "enabled",
"health": "ok"
},
"iDRAC.Embedded.1#CPU1Temp@System.Embedded.1": {
"identity": "iDRAC.Embedded.1#CPU1Temp",
"max_reading_range_temp": 90,
"min_reading_range_temp": 3,
"reading_celsius": 63,
"physical_context": "CPU",
"sensor_number": 14,
"state": "enabled",
"health": "ok"
}
},
"Power": {
"PSU.Slot.1:Power@System.Embedded.1": {
"power_capacity_watts": null,
"line_input_voltage": 206,
"last_power_output_watts": null,
"serial_number": "CNLOD0075324D7",
"state": "enabled",
"health": "ok"
},
"PSU.Slot.2:Power@System.Embedded.1": {
"power_capacity_watts": null,
"line_input_voltage": null,
"last_power_output_watts": null,
"serial_number": "CNLOD0075324E5",
"state": null,
"health": "critical"
}
},
"Drive": {
"Solid State Disk 0:1:0:RAID.Integrated.1-1@System.Embedded.1": {
"name": "Solid State Disk 0:1:0",
"capacity_bytes": 479559942144,
"state": "enabled",
"health": "ok"
},
"Physical Disk 0:1:1:RAID.Integrated.1-1@System.Embedded.1": {
"name": "Physical Disk 0:1:1",
"capacity_bytes": 1799725514752,
"state": "enabled",
"health": "ok"
},
"Physical Disk 0:1:2:RAID.Integrated.1-1@System.Embedded.1": {
"name": "Physical Disk 0:1:2",
"capacity_bytes": 1799725514752,
"state": "enabled",
"health": "ok"
},
"Backplane 1 on Connector 0 of Integrated RAID Controller 1:RAID.Integrated.1-1@System.Embedded.1": {
"name": "Backplane 1 on Connector 0 of Integrated RAID Controller 1",
"capacity_bytes": null,
"state": "enabled",
"health": "ok"
}
}
}
},
"timestamp": "2023-10-22 15:50:36.700458"
}