bootc deploy interface - for bootable containers
Adds a ``bootc`` deployment interface which can be enabled to perform deployment of bootable containers. This enables a streamlined workflow where an operator/user can push container updates and does not need to build intermediate disk images and then post those disk images to facilitate the deployment of a bare metal node. Closes-Bug: 2085801 Change-Id: Iedb93fe47162abe0bd9391921792203301bfc456
This commit is contained in:
parent
db4412d570
commit
c7fa447ab6
@ -190,3 +190,116 @@ completely orchestrate writing the instance image using
|
||||
responsible to provide all necessary deploy steps with priorities between
|
||||
61 and 99 (see :ref:`node-deployment-core-steps` for information on
|
||||
priorities).
|
||||
|
||||
Bootc Agent Deploy
|
||||
==================
|
||||
|
||||
The ``bootc`` deploy interface is designed to enable operators to deploy
|
||||
containers directly from a container image registry without intermediate
|
||||
conversion steps, such as creating custom disk images for modifications.
|
||||
This deployment interface utilizes the
|
||||
`bootc project <https://containers.github.io/bootc/>`_.
|
||||
|
||||
Ultimately this enables a streamlined flow, where a user of the deployment
|
||||
interface *can* create updated containers rapidly and the deployment interface
|
||||
will deploy that container image in a streamlined fashion without the need
|
||||
to create intermediate disk images and post the disk images in a location
|
||||
where they can be accessed for deployment.
|
||||
|
||||
Ultimately this interface enables a streamlined flow, and offers
|
||||
limited flexibility in the model of deployment. As a result, this
|
||||
interface consumes the entire target disk on the host being deployed
|
||||
and offers no customization in terms of partitioning. This is largely
|
||||
because the overall security model of a bootc deployment, which leverages
|
||||
os-tree, is also fundamentally different than the model to leverage
|
||||
partition separation.
|
||||
|
||||
.. NOTE::
|
||||
This interface should be considered experimental and may evolve
|
||||
to include additional features as the Ironic project maintainers
|
||||
receive additional feedback.
|
||||
|
||||
.. NOTE::
|
||||
This interface is dependent upon the existence of ``bootc`` within a
|
||||
container image along with sufficient memory on the baremetal
|
||||
node being deployed to enable a complete download and extraction of image
|
||||
contents within system memory. It is this memory constraint which is
|
||||
why this interface is not actively tested in upstream CI.
|
||||
The possible failure modes of this interface are mainly focused upon
|
||||
the ability of the ramdisk being able to download, launch, and
|
||||
run bootc to trigger the installation which also isolates most risk
|
||||
to the actual bootc process execution.
|
||||
|
||||
Features
|
||||
--------
|
||||
|
||||
While this ``deploy_interface`` supports deploying configuration drives
|
||||
like most other Ironic supplied deploy interfaces, some additional
|
||||
parameters can be supplied via ``instance_info`` to enable
|
||||
tuning of deploy-time behavior by the user which cannot be modified
|
||||
post-deployment.
|
||||
|
||||
* ``bootc_authorized_keys`` - This option allows injection of a
|
||||
root user authorized keys file which is preserved inside of the deployed
|
||||
container on the host. This option is for actual key file content and can
|
||||
be one or more keys with a new line character.
|
||||
* ``bootc_tpm2_luks`` - A boolean option, default False, enabling bootc
|
||||
to attempt to utilize auto-encryption of the deployed host filesystem
|
||||
upon which the container is deployed. This is not enabled by default
|
||||
due to a lack of software TPMs in Ironic CI. If operators would like
|
||||
this setting default changed, please discuss with Ironic developers.
|
||||
|
||||
Additionally, this interface also supports the passing of a pull secret
|
||||
to enable download from the remote image registry, which is part of the
|
||||
support for retrieval of artifacts from OCI Container registires.
|
||||
This parameter is ``image_pull_secret``.
|
||||
|
||||
Caveats
|
||||
-------
|
||||
|
||||
* This deployment interface was not designed to be compatible with the
|
||||
OpenStack Compute service. This is because OpenStack focuses on
|
||||
disk images from Glance as to what to deploy, where as this interface
|
||||
is modeled to utilize a container image registry.
|
||||
* Performance wise, this deployment interface performs many smaller actions,
|
||||
which at some times need to performed in a specific sequence, such as
|
||||
when unpacking layers. As a result, when comparing similar size
|
||||
containers to disk images, this interface is slower than the ``direct``
|
||||
deploy interface.
|
||||
* Container Images *must* have the bootc command present along with
|
||||
the applicable bootloader and artifacts required for whatever platform
|
||||
is being deployed.
|
||||
* Because of how `bootc <https://containers.github.io/bootc/>`_ works,
|
||||
there is no concept of "image streaming" directly to disk. This is because
|
||||
the way this interface works, `podman <https://podman.io/>`_ is used to
|
||||
download all container image layer artifacts, along with extracting the
|
||||
layers. At which point ``bootc`` is executed and it begins to setup the
|
||||
disk for the host. As a result, most of the time a deploy is in progress
|
||||
will be observable as ``deploy wait`` while ``bootc`` executes.
|
||||
* The memory requirements of the ramdisk, due to the way this interface
|
||||
works, requires the ability to download a container image, copy, and
|
||||
ultimatley extract all layers into the in-memory filesystem. Due to the way
|
||||
the kernel launches and allocates ramdisk memory for filesystem usage,
|
||||
a 600MB container image may require upwards of 10GB of RAM to be available
|
||||
on the overall host.
|
||||
* This deployment interface explicitly signals to ``bootc`` that it should
|
||||
not execute it's internal post-deployment "fetch check" to ensure upgrades
|
||||
are working. This is because this action may require authentication
|
||||
to succeed, **and** thus require credentials in the container to
|
||||
work. Configuration of credentials for **day-2** operations
|
||||
such as the execution of ``bootc upgrade``, must be addressed
|
||||
post-deployment.
|
||||
* If you intend SELinux to be enabled on the deployed host, it must also
|
||||
be enabled inside of the ironic-python-agent ramdisk. This is a design
|
||||
limitation of bootc outside of Ironic's control.
|
||||
|
||||
Limitations
|
||||
-----------
|
||||
|
||||
* At present, this interface does not support use of caching proxies. This
|
||||
may be addressed in the future.
|
||||
* This deployment interface directly downloads artifacts from the requested
|
||||
Container Registry. Caching the container artifacts on the
|
||||
``ironic-conductor`` host is not available. If you need the contaitainer
|
||||
content localized to the conductor, consider utilizing your own container
|
||||
registry.
|
||||
|
@ -51,7 +51,7 @@ class GenericHardware(hardware_type.AbstractHardwareType):
|
||||
"""List of supported deploy interfaces."""
|
||||
return [agent.AgentDeploy, ansible_deploy.AnsibleDeploy,
|
||||
ramdisk.RamdiskDeploy, pxe.PXEAnacondaDeploy,
|
||||
agent.CustomAgentDeploy]
|
||||
agent.BootcAgentDeploy, agent.CustomAgentDeploy]
|
||||
|
||||
@property
|
||||
def supported_inspect_interfaces(self):
|
||||
|
@ -12,6 +12,7 @@
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import base64
|
||||
from urllib import parse as urlparse
|
||||
|
||||
from oslo_log import log
|
||||
@ -21,12 +22,14 @@ import tenacity
|
||||
|
||||
from ironic.common import async_steps
|
||||
from ironic.common import boot_devices
|
||||
from ironic.common import boot_modes
|
||||
from ironic.common import exception
|
||||
from ironic.common.glance_service import service_utils
|
||||
from ironic.common.i18n import _
|
||||
from ironic.common import image_service
|
||||
from ironic.common import images
|
||||
from ironic.common import metrics_utils
|
||||
from ironic.common import oci_registry as oci
|
||||
from ironic.common import raid
|
||||
from ironic.common import states
|
||||
from ironic.common import utils
|
||||
@ -248,6 +251,51 @@ def soft_power_off(task, client=None):
|
||||
manager_utils.node_power_action(task, states.POWER_OFF)
|
||||
|
||||
|
||||
def set_boot_to_disk(task, target_boot_mode=None):
|
||||
"""Boot a node to disk.
|
||||
|
||||
This is a helper method to reduce duplication of code around
|
||||
handling vendor specifics for setting boot modes between multiple
|
||||
deployment interfaces inside of Ironic.
|
||||
|
||||
:param task: A Taskmanager object.
|
||||
:param target_boot_mode: The target boot_mode, defaults to UEFI.
|
||||
"""
|
||||
if not target_boot_mode:
|
||||
target_boot_mode = boot_modes.UEFI
|
||||
node = task.node
|
||||
try:
|
||||
persistent = True
|
||||
# NOTE(TheJulia): We *really* only should be doing this in bios
|
||||
# boot mode. In UEFI this might just get disregarded, or cause
|
||||
# issues/failures.
|
||||
if node.driver_info.get('force_persistent_boot_device',
|
||||
'Default') == 'Never':
|
||||
persistent = False
|
||||
|
||||
vendor = task.node.properties.get('vendor', None)
|
||||
if not (vendor and vendor.lower() == 'lenovo'
|
||||
and target_boot_mode == 'uefi'):
|
||||
# Lenovo hardware is modeled on a "just update"
|
||||
# UEFI nvram model of use, and if multiple actions
|
||||
# get requested, you can end up in cases where NVRAM
|
||||
# changes are deleted as the host "restores" to the
|
||||
# backup. For more information see
|
||||
# https://bugs.launchpad.net/ironic/+bug/2053064
|
||||
# NOTE(TheJulia): We likely just need to do this with
|
||||
# all hosts in uefi mode, but libvirt VMs don't handle
|
||||
# nvram only changes *and* this pattern is known to generally
|
||||
# work for Ironic operators.
|
||||
deploy_utils.try_set_boot_device(task, boot_devices.DISK,
|
||||
persistent=persistent)
|
||||
except Exception as e:
|
||||
msg = (_("Failed to change the boot device to %(boot_dev)s "
|
||||
"when deploying node %(node)s: %(error)s") %
|
||||
{'boot_dev': boot_devices.DISK, 'node': node.uuid,
|
||||
'error': e})
|
||||
agent_base.log_and_raise_deployment_error(task, msg, exc=e)
|
||||
|
||||
|
||||
class CustomAgentDeploy(agent_base.AgentBaseMixin,
|
||||
agent_base.HeartbeatMixin,
|
||||
agent_base.AgentOobStepsMixin,
|
||||
@ -910,40 +958,94 @@ class AgentDeploy(CustomAgentDeploy):
|
||||
'error': agent_client.get_command_error(result)})
|
||||
agent_base.log_and_raise_deployment_error(task, msg)
|
||||
|
||||
try:
|
||||
persistent = True
|
||||
# NOTE(TheJulia): We *really* only should be doing this in bios
|
||||
# boot mode. In UEFI this might just get disregarded, or cause
|
||||
# issues/failures.
|
||||
if node.driver_info.get('force_persistent_boot_device',
|
||||
'Default') == 'Never':
|
||||
persistent = False
|
||||
|
||||
vendor = task.node.properties.get('vendor', None)
|
||||
if not (vendor and vendor.lower() == 'lenovo'
|
||||
and target_boot_mode == 'uefi'):
|
||||
# Lenovo hardware is modeled on a "just update"
|
||||
# UEFI nvram model of use, and if multiple actions
|
||||
# get requested, you can end up in cases where NVRAM
|
||||
# changes are deleted as the host "restores" to the
|
||||
# backup. For more information see
|
||||
# https://bugs.launchpad.net/ironic/+bug/2053064
|
||||
# NOTE(TheJulia): We likely just need to do this with
|
||||
# all hosts in uefi mode, but libvirt VMs don't handle
|
||||
# nvram only changes *and* this pattern is known to generally
|
||||
# work for Ironic operators.
|
||||
deploy_utils.try_set_boot_device(task, boot_devices.DISK,
|
||||
persistent=persistent)
|
||||
except Exception as e:
|
||||
msg = (_("Failed to change the boot device to %(boot_dev)s "
|
||||
"when deploying node %(node)s: %(error)s") %
|
||||
{'boot_dev': boot_devices.DISK, 'node': node.uuid,
|
||||
'error': e})
|
||||
agent_base.log_and_raise_deployment_error(task, msg, exc=e)
|
||||
|
||||
set_boot_to_disk(task, target_boot_mode)
|
||||
LOG.info('Local boot successfully configured for node %s', node.uuid)
|
||||
|
||||
|
||||
class BootcAgentDeploy(CustomAgentDeploy):
|
||||
"""Interface for deploy-related actions."""
|
||||
|
||||
@METRICS.timer('AgentBootcDeploy.validate')
|
||||
def validate(self, task):
|
||||
"""Validate the driver-specific Node deployment info.
|
||||
|
||||
This method validates whether the properties of the supplied node
|
||||
contain the required information for this driver to deploy images to
|
||||
the node.
|
||||
|
||||
:param task: a TaskManager instance
|
||||
:raises: MissingParameterValue, if any of the required parameters are
|
||||
missing.
|
||||
:raises: InvalidParameterValue, if any of the parameters have invalid
|
||||
value.
|
||||
"""
|
||||
super().validate(task)
|
||||
|
||||
node = task.node
|
||||
|
||||
image_source = node.instance_info.get('image_source')
|
||||
if not image_source or not image_source.startswith('oci://'):
|
||||
raise exception.InvalidImageRef(image_href=image_source)
|
||||
|
||||
@METRICS.timer('AgentBootcDeploy.execute_bootc_install')
|
||||
@base.deploy_step(priority=80)
|
||||
@task_manager.require_exclusive_lock
|
||||
def execute_bootc_install(self, task):
|
||||
node = task.node
|
||||
image_source = node.instance_info.get('image_source')
|
||||
# FIXME(TheJulia): We likely, either need to grab/collect creds
|
||||
# and pass them along in the step call, or initialize the client.
|
||||
# bootc runs in the target container as well, so ... hmmm
|
||||
configdrive = manager_utils.get_configdrive_image(node)
|
||||
|
||||
img_auth = image_service.get_image_service_auth_override(task.node)
|
||||
|
||||
if not img_auth:
|
||||
fqdn = urlparse.urlparse(image_source).netloc
|
||||
img_auth = oci.RegistrySessionHelper.get_token_from_config(
|
||||
fqdn)
|
||||
else:
|
||||
# Internally, image data is a username and password, and we
|
||||
# only currently support pull secrets which are just transmitted
|
||||
# via the password value.
|
||||
img_auth = img_auth.get('password')
|
||||
if img_auth:
|
||||
# This is not encryption, but obfustication.
|
||||
img_auth = base64.standard_b64encode(img_auth.encode())
|
||||
# Now switch into the corresponding in-band deploy step and let the
|
||||
# result be polled normally.
|
||||
new_step = {'interface': 'deploy',
|
||||
'step': 'execute_bootc_install',
|
||||
'args': {'image_source': image_source,
|
||||
'configdrive': configdrive,
|
||||
'oci_pull_secret': img_auth}}
|
||||
client = agent_client.get_client(task)
|
||||
return agent_base.execute_step(task, new_step, 'deploy',
|
||||
client=client)
|
||||
|
||||
@METRICS.timer('AgentBootcDeploy.set_boot_to_disk')
|
||||
@base.deploy_step(priority=60)
|
||||
@task_manager.require_exclusive_lock
|
||||
def set_boot_to_disk(self, task):
|
||||
"""Sets the node to boot from disk.
|
||||
|
||||
In some cases, other steps may handle aspects like bootloaders
|
||||
and UEFI NVRAM entries required to boot. That leaves one last
|
||||
aspect, resetting the node to boot from disk.
|
||||
|
||||
This primarily exists for compatibility reasons of flow
|
||||
for Ironic, but we know some BMCs *really* need to be
|
||||
still told to boot from disk. The exception to this is
|
||||
Lenovo hardware, where we skip the action because it
|
||||
can create a UEFI NVRAM update failure case, which
|
||||
reverts the NVRAM state to "last known good configuration".
|
||||
|
||||
:param task: A Taskmanager object.
|
||||
"""
|
||||
# Call the helper to de-duplicate code.
|
||||
set_boot_to_disk(task)
|
||||
|
||||
|
||||
class AgentRAID(base.RAIDInterface):
|
||||
"""Implementation of RAIDInterface which uses agent ramdisk."""
|
||||
|
||||
|
@ -548,6 +548,100 @@ class TestCustomAgentDeploy(CommonTestsMixin, db_base.DbTestCase):
|
||||
node_power_action_mock.assert_not_called()
|
||||
|
||||
|
||||
class TestBootcAgentDeploy(db_base.DbTestCase):
|
||||
|
||||
def setUp(self):
|
||||
super().setUp()
|
||||
self.deploy = agent.BootcAgentDeploy()
|
||||
self.node = object_utils.create_test_node(
|
||||
self.context,
|
||||
instance_info={
|
||||
'image_source': 'oci://localhost/user/container:tag',
|
||||
'image_pull_secret': 'f00'})
|
||||
|
||||
def test_validate(self):
|
||||
with task_manager.acquire(self.context, self.node['uuid'],
|
||||
shared=False) as task:
|
||||
self.deploy.validate(task)
|
||||
|
||||
def test_validate_fails_with_non_oci(self):
|
||||
i_info = self.node.instance_info
|
||||
i_info['image_source'] = 'http://foo/bar'
|
||||
self.node.instance_info = i_info
|
||||
self.node.save()
|
||||
with task_manager.acquire(self.context, self.node['uuid'],
|
||||
shared=False) as task:
|
||||
self.assertRaises(exception.InvalidImageRef,
|
||||
self.deploy.validate, task)
|
||||
|
||||
def test_validate_fails_image_source_not_set(self):
|
||||
i_info = self.node.instance_info
|
||||
i_info.pop('image_source')
|
||||
self.node.instance_info = i_info
|
||||
self.node.save()
|
||||
with task_manager.acquire(self.context, self.node['uuid'],
|
||||
shared=False) as task:
|
||||
self.assertRaises(exception.InvalidImageRef,
|
||||
self.deploy.validate, task)
|
||||
|
||||
@mock.patch.object(agent_base, 'execute_step', autospec=True)
|
||||
def test_execute_bootc_install(self, execute_mock):
|
||||
src = self.node.instance_info.get('image_source')
|
||||
expected_step = {
|
||||
'interface': 'deploy',
|
||||
'step': 'execute_bootc_install',
|
||||
'args': {'image_source': src,
|
||||
'configdrive': None,
|
||||
'oci_pull_secret': b'ZjAw'}
|
||||
}
|
||||
|
||||
with task_manager.acquire(self.context, self.node.uuid) as task:
|
||||
execute_mock.return_value = states.DEPLOYWAIT
|
||||
res = self.deploy.execute_bootc_install(task)
|
||||
self.assertEqual(states.DEPLOYWAIT, res)
|
||||
execute_mock.assert_called_once_with(task, expected_step,
|
||||
'deploy', client=mock.ANY)
|
||||
|
||||
@mock.patch.object(agent_client.AgentClient, 'install_bootloader',
|
||||
autospec=True)
|
||||
@mock.patch.object(deploy_utils, 'try_set_boot_device', autospec=True)
|
||||
@mock.patch.object(boot_mode_utils, 'get_boot_mode', autospec=True,
|
||||
return_value='whatever')
|
||||
def test_set_boot_to_disk(self, boot_mode_mock,
|
||||
try_set_boot_device_mock,
|
||||
install_bootloader_mock):
|
||||
with task_manager.acquire(self.context, self.node['uuid'],
|
||||
shared=False) as task:
|
||||
self.deploy.set_boot_to_disk(task)
|
||||
try_set_boot_device_mock.assert_called_once_with(
|
||||
task, boot_devices.DISK, persistent=True)
|
||||
boot_mode_mock.assert_not_called()
|
||||
# While not referenced, just want to make sure somehow
|
||||
# we don't again wire this together, since it is not needed
|
||||
# in the bootc case as it does it for us as part of deploy.
|
||||
install_bootloader_mock.assert_not_called()
|
||||
|
||||
@mock.patch.object(agent_client.AgentClient, 'install_bootloader',
|
||||
autospec=True)
|
||||
@mock.patch.object(deploy_utils, 'try_set_boot_device', autospec=True)
|
||||
@mock.patch.object(boot_mode_utils, 'get_boot_mode', autospec=True,
|
||||
return_value='uefi')
|
||||
def test_set_boot_to_disk_lenovo(self, boot_mode_mock,
|
||||
try_set_boot_device_mock,
|
||||
install_bootloader_mock):
|
||||
props = self.node.properties
|
||||
props['vendor'] = 'Lenovo'
|
||||
props['capabilities'] = 'boot_mode:uefi'
|
||||
self.node.properties = props
|
||||
self.node.save()
|
||||
with task_manager.acquire(self.context, self.node['uuid'],
|
||||
shared=False) as task:
|
||||
self.deploy.set_boot_to_disk(task)
|
||||
try_set_boot_device_mock.assert_not_called()
|
||||
boot_mode_mock.assert_not_called()
|
||||
install_bootloader_mock.assert_not_called()
|
||||
|
||||
|
||||
class TestAgentDeploy(CommonTestsMixin, db_base.DbTestCase):
|
||||
def setUp(self):
|
||||
super(TestAgentDeploy, self).setUp()
|
||||
|
@ -0,0 +1,10 @@
|
||||
---
|
||||
features:
|
||||
- |
|
||||
Adds a ``bootc`` deploy interface which can be enabled by an Ironic
|
||||
deployment administrator, which can then enable users of the ``bootc``
|
||||
deploy interface to have a streamlined path for the deployment of
|
||||
bootc supporting container images to a host directly,
|
||||
without additional intermediate steps. More information about
|
||||
bootc can be found on the
|
||||
`bootc website <https://containers.github.io/bootc/>`_.
|
@ -94,6 +94,7 @@ ironic.hardware.interfaces.console =
|
||||
ironic.hardware.interfaces.deploy =
|
||||
anaconda = ironic.drivers.modules.pxe:PXEAnacondaDeploy
|
||||
ansible = ironic.drivers.modules.ansible.deploy:AnsibleDeploy
|
||||
bootc = ironic.drivers.modules.agent:BootcAgentDeploy
|
||||
custom-agent = ironic.drivers.modules.agent:CustomAgentDeploy
|
||||
direct = ironic.drivers.modules.agent:AgentDeploy
|
||||
fake = ironic.drivers.modules.fake:FakeDeploy
|
||||
|
Loading…
x
Reference in New Issue
Block a user