
So, there are cases where say you may have multiple DPUs in a physical server, each card when fully operating can consume 100-150 watts. In some cases, these cards can have external power supplies, but need the physical host in a running power-on state in order for the device to be powered on. Conversely, we also now power off the child nodes if the parent node has been requested to be powered off, since we *really* don't want cause inadvertent harm to the child node. This is realistically a fix we should backport once we sort through the details, if we agree this makes sense to do, as is. Change-Id: Ib2bfe04cdaa82264ba8bb1e71477899bb6268179
6.2 KiB
Steps
What are steps?
Steps are exactly that, steps to achieve a goal, and in most cases, they are what an operator requested.
However, originally they were the internal list of actions to achieve
to perform automated cleaning. The conductor would determine a
list of steps or actions to take by generating a list of steps from data
the conductor via drivers, the ironic-python-agent
, and any
loaded hardware managers determined to be needed.
As time passed and Ironic's capabilities were extended, this was extended to manual cleaning, and later into deploy steps, and deploy templates allowing an operator to request for firmware to be updated by a driver, or RAID to be configured by the agent prior to the machine being released to the end user for use.
Reserved Functional Steps
In the execution of the cleaning, and deployment steps frameworks, some step names are reserved for specific functions which can be invoked by a user to perform specific actions.
Step Name | Description |
hold | Pauses the execution of the steps by moving the node from the
current This step cannot be used against a child node in the context of being requested when executing against a parent node. The use case for this verb is if you have external automation or processes which need to be executed in the entire process to achieve the overall goal. |
power_on | Powers on the node, which may be useful if a node's power must be toggled multiple times to enable embedded behavior such as to boot from network. This step can be executed against child nodes. |
power_off | Turn the node power off via the conductor. This step can be used against child nodes. When used outside of the context of a child node, any agent token metadata is also removed as so the machine can reboot back to the agent, if applicable. |
reboot | Reboot the node utilizing the conductor. This generally signals for power to be turned off and back on, however driver specific code may request an CPU interrupt based reset. This step can be executed on child nodes. |
wait | Causes a brief pause in the overall step execution which pauses until the next heartbeat operation, unless a seconds argument is provided. If a seconds argument is provided, then the step execution will pause for the requested amount of time. |
In the these cases, the interface upon which the method is expected is ignored, and the step is acted upon based upon just the step's name.
Example
In this example, we utilize the cleaning step
erase_devices
and then trigger hold of the node. In this
specific case the node will enter a clean hold
state.
{
"target":"clean",
"clean_steps": [{
"interface": "deploy",
"step": "erase_devices"
},
{
"interface": "deploy",
"step": "hold"
}]
}
Once you have completed whatever action which needed to be performed while the node was in a held state, you will need to issue an unhold provision state command, via the API or command line to inform the node to proceed.
Set the environment
When using steps with the functionality to execute on child nodes,
i.e. nodes who a populated parent_node
field, you always
want to ensure you have set the environment appropriately for your next
action.
For example, if you are executing steps against a parent node, which
then execute against a child node via the
execute_on_child_nodes
step option, and it requires power
to be on, you will want to explicitly ensure the power is on for the
parent node unless the child node can operate
independently, as signaled through the driver_info option
has_dedicated_power_supply
on the child node. Power is an
obvious case because Ironic has guarding logic internally to attempt to
power-on the parent node, but it cannot be an after thought due to
internal task locking.
Power specifically aside, the general principle applies to the execution of all steps. You need always want to build upon the prior step or existing existing known state of the system.
Note
Ironic will attempt to ensure power is active for a
parent_node
when powering on a child node. Conversely,
Ironic will also attempt to power down child nodes if a parent node is
requested to be turned off, unless the
has_dedicated_power_supply
option is set for the child
node. This pattern of behavior prevents parent nodes from being
automatically powered back on should a child node be left online.