From 0a64d51c3d39d197c33e30dbb4f94d2a5db4dc6f Mon Sep 17 00:00:00 2001
From: Ian Wienand <iwienand@redhat.com>
Date: Thu, 30 Mar 2023 09:14:26 +1100
Subject: [PATCH] promote-container-image: add promote_container_image_method

After recent conversations, we've come to the conclusion it will be
good to have two models of promotion

 - using tags, where gate directly uploads to the final repository and
   promote retags the image.

 - from an intermediate-registry, where upload stores the built image
   in an i-r and the promote step uploads to the final registry.

To facilitate this, we add a "promote_container_image_method" flag to
the promote roles.

The documentation is expanded to explain how all this is intended to
work together.

These roles haven't been publicised yet, but this should be a no-op as
it defaults to tags, which is the current operation.

c.f. Ia24bbd101e01ab371ceacfed006b5ff806418a97

Change-Id: I1c25f60f835b1cab983bcdd169eeffc0e250a56c
---
 playbooks/container-image/README.rst          | 135 +++++++++++++++---
 roles/build-container-image/common.rst        |  41 ++++--
 roles/promote-container-image/README.rst      |   9 ++
 roles/promote-container-image/tasks/main.yaml |  38 +----
 .../tasks/promote-from-tag.yaml               |  32 +++++
 5 files changed, 193 insertions(+), 62 deletions(-)
 create mode 100644 roles/promote-container-image/tasks/promote-from-tag.yaml

diff --git a/playbooks/container-image/README.rst b/playbooks/container-image/README.rst
index 5eeea6444..b788e0da2 100644
--- a/playbooks/container-image/README.rst
+++ b/playbooks/container-image/README.rst
@@ -6,29 +6,126 @@ context:
   * :zuul:job:`upload-container-image`: Build and stage the images in a registry.
   * :zuul:job:`promote-container-image`: Promote previously uploaded images.
 
-The :zuul:job:`build-container-image` job is designed to be used in
-a `check` pipeline and simply builds the images to verify that
-the build functions.
-
-The :zuul:job:`upload-container-image` job builds and uploads the
-images to a registry, but only with a single tag corresponding to the
-change ID.  This job is designed in a `gate` pipeline so that the
-build produced by the gate is staged and can later be promoted to
-production if the change is successful.
-
-The :zuul:job:`promote-container-image` job is designed to be
-used in a `promote` pipeline.  It requires no nodes and runs very
-quickly on the Zuul executor.  It simply re-tags a previously uploaded
-image for a change with whatever tags are supplied by
-:zuul:jobvar:`build-container-image.container_images.tags`.
-It also removes the change ID tag from the repository in the registry.
-If any changes fail to merge, this cleanup will not run and those tags
-will need to be deleted manually.
-
+The jobs can work in multiple modes depending on your requirements.
 They all accept the same input data, principally a list of
 dictionaries representing the images to build.  YAML anchors_ can be
 used to supply the same data to all three jobs.
 
+*Promotion via tags*
+
+The :zuul:job:`build-container-image` job runs in the `check` pipeline
+to validate the change.
+
+The :zuul:job:`upload-container-image` job runs in the `gate` pipeline
+and builds and uploads the images to a remote registry, but only with
+a single temporary tag corresponding to the change ID.  This is a
+*speculative* upload; the change is not "live" (the main tag is not
+updated) and other gate jobs may fail and the change may not merge,
+effectively invalidating the upload.
+
+The :zuul:job:`promote-container-image` job runs in a post-merge
+`promote` pipeline.  It requires no nodes and runs very quickly on the
+Zuul executor.  It simply re-tags a previously uploaded image for a
+change with whatever tags are supplied by
+:zuul:jobvar:`build-container-image.container_images.tags` after the
+code has merged.  It also cleans up and removes the change ID tag from
+the repository in the registry.  If any changes fail to merge, this
+cleanup will not run and those tags will need to be deleted manually.
+
+This advantage of this method is that it minimises the window in which
+the published image differs from the merged code.  There are some
+caveats to be aware of. `gate` failures may mean that unused layers
+and tags are present in the remote repository, which need to be
+cleaned up.  Removing registry tags is not a generic option; you will
+need to check the promote role documentation to ensure you are passing
+the right registry details so tags can be cleaned up.
+
+In the `tag` and `release` pipelines there is no need for a
+speculative upload (the tagged/released change is committed code and
+has already passed gate tests).  In this case,
+:zuul:job:`upload-container-image` job is run with the flag
+``upload_container_image_promote: false`` to directly build and push
+with the final tags.
+
+Summary:
+
+* :zuul:job:`build-container-image` in `check`
+* :zuul:job:`upload-container-image` in `gate`
+* :zuul:job:`promote-container-image` in `promote` with
+  ``promote_container_method: tag``
+* :zuul:job:`upload-container-image` with
+  ``upload_container_image_promote: false`` in `tag` and `release`
+
+*Promotion via intermediate registry*
+
+Note that as of 2023-03, this path is not fully implemented.  It is
+documented here for compeleteness.
+
+The :zuul:job:`build-container-image` runs in the `check` pipeline,
+but also in the `gate` pipeline.  Usually in both cases the job builds
+and uploads the images to an intermediate registry; but at least the
+`gate` pipeline job must..
+
+The :zuul:job:`promote-container-image` job is designed to be used in
+a post-merge `promote` pipeline.  It requires no nodes and run on the
+Zuul executor.  It inspects the artifacts of the gate job to find the
+correct tags to pull from the intermediate registry.  It then uploads
+this image from the intermediate registry to the remote registry with
+the final tags supplied by
+:zuul:jobvar:`build-container-image.container_images.tags`.
+
+In the `tag` and `release` pipelines the
+:zuul:job:`upload-container-image` job is run with the flag
+``upload_container_image_promote: false`` to directly build and push
+with the final tags.
+
+The advantages of this method is that no partial or unused images will
+ever be present in the final repository.  Copying from the
+intermediate registry effectively caches the expensive build process.
+This means that although the window that the production tags are
+out-of-sync with the merged code is larger than when using speculative
+uploads, it is smaller than having to rebuild *and* upload the image.
+Copying is a generic operation, so it should work with any registry.
+The layer upload has more exposure to transient errors than the
+``tag`` promotion step, so needs to be monitored more carefully.  You
+also must manage an external intermediate registry to hold the image
+between upload and promote steps in this model.
+
+Summary:
+
+* :zuul:job:`build-container-image` in `check`
+* :zuul:job:`build-container-image` in `gate`.  This must push to an
+  intermediate registry.
+* :zuul:job:`promote-container-image` in `promote` with
+  ``promote_container_method: intermediate-registry``
+* :zuul:job:`upload-container-image` with
+  ``upload_container_image_promote: false`` in `tag` and `release`
+
+*Publish via full release*
+
+The :zuul:job:`build-container-image` job runs in the `check` pipeline
+to validate the change.
+
+The :zuul:job:`build-container-image` job also runs in the `gate`
+pipeline to validate the change before merge.
+
+Once the change has merged, :zuul:job:`upload-container-image` job is
+run with the flag ``upload_container_image_promote: false`` to
+directly build and push with the final tags.  This is also run in the
+`tag` and `release` piplines in the same way.
+
+The advantage of this mode is that it requires no external
+dependencies or management of speculative uploads.  The disadvantage
+is that it has the longest window where published image is out-of-sync
+with merged-code, as the post-merge release process must re-build the
+entire container and upload it.
+
+* :zuul:job:`build-container-image` in `check`
+* :zuul:job:`build-container-image` in `gate`
+* :zuul:job:`upload-container-image` with
+  ``upload_container_image_promote: false`` after code merge, and
+  `tag` and `release` pipelines.
+
 **Job Variables**
 
 .. zuul:jobvar:: zuul_work_dir
diff --git a/roles/build-container-image/common.rst b/roles/build-container-image/common.rst
index 52583594e..737887001 100644
--- a/roles/build-container-image/common.rst
+++ b/roles/build-container-image/common.rst
@@ -22,9 +22,9 @@ use of subsequent roles to upload the images to a registry.
 The :zuul:role:`upload-container-image` role uploads the images to a
 registry.  It can be used in one of two modes:
 
-1. The default mode is as part of a two-step `promote` pipeline.  This
-   mode is designed to minimize the time the published registry tag is
-   out of sync with the changes Zuul has merged to the underlying code
+1. Using tags as part of a two-step `promote` pipeline.  This mode is
+   designed to minimize the time the published registry tag is out of
+   sync with the changes Zuul has merged to the underlying code
    repository.
 
    In this mode, the role is intended to run in the `gate` pipeline.
@@ -45,13 +45,23 @@ registry.  It can be used in one of two modes:
    to by ``<tag>`` will now reflect the underlying code closing the
    out-of-sync window.
 
-2. The other mode allows for use of this job in a `release` pipeline
-   to directly upload a release build with the final set of tags.
+2. The second mode allows for use of this job in `release` and `tag`
+   pipelines to directly upload a release build with the final set of
+   tags.
 
-   In this mode, the completion of the `gate` jobs will have merged
-   the code changes, and the role will now have to build and upload
-   the resulting image to the remote repository.  Once uploaded, the
-   tags will be updated.
+   In this mode, ``upload_container_image_promote: false`` should be
+   set.  The role will build and upload the resulting image to the
+   remote repository with the final tags.
+
+   This should be used with `tag` and `release` pipelines, where
+   committed code has been tagged for publishing.  The tagged commit
+   is "known good" thanks to gating, so the build and upload process
+   is expected to work unconditionally.
+
+   This can be used in a post-commit pipeline, with the caveat that it
+   has a much longer window where published code is out of sync with
+   the published image, as the image must be completely rebuilt and
+   uploaded after code merge in the `gate` job.
 
    The alternative `promote` method can be thought of as a
    "speculative" upload.  There is a possibility the `gate` job
@@ -77,9 +87,11 @@ registry.  It can be used in one of two modes:
 *Promoting*
 
 As discussed above, the :zuul:role:`promote-container-image` role is
-designed to be used in a `promote` pipeline.  It re-tags a previously
-uploaded image by copying the temporary change-id based tags made
-during upload to the final production tags supplied by
+designed to be used in a `promote` pipeline.
+
+In ``tag`` mode, it re-tags a previously uploaded image by copying the
+temporary change-id based tags made during upload to the final
+production tags supplied by
 :zuul:rolevar:`build-container-image.container_images.tags`.  It is
 intended to run very quickly and with no dependencies, so it can run
 directly on the Zuul executor.
@@ -90,6 +102,11 @@ the registry, and removes any similar change-ids tags.  This keeps the
 repository tidy in the case that gated changes fail to merge after
 uploading their staged images.
 
+In ``intermediate-registry`` mode, this role queries Zuul to find the
+build performed by the build role in the ``gate``.  It then copies
+this image from the intermediate-registry to the final location in the
+remote registry.
+
 *Dependencies*
 
 Use the :zuul:role:`ensure-skopeo` role as well as the
diff --git a/roles/promote-container-image/README.rst b/roles/promote-container-image/README.rst
index de99c32e8..dc97afd8d 100644
--- a/roles/promote-container-image/README.rst
+++ b/roles/promote-container-image/README.rst
@@ -1,3 +1,12 @@
 Promote one or more previously uploaded container images.
 
 .. include:: ../../roles/build-container-image/common.rst
+
+.. zuul:rolevar:: promote_container_image_method
+   :type: string
+   :default: tag
+
+   If ``tag`` (the default), then this role will update tags created
+   by the upload-container-image role.  Set to
+   ``intermediate-registry`` to have this role copy an image created
+   and pushed to an intermediate registry by the build-container-role.
diff --git a/roles/promote-container-image/tasks/main.yaml b/roles/promote-container-image/tasks/main.yaml
index 2a14f2e15..aea1b0a22 100644
--- a/roles/promote-container-image/tasks/main.yaml
+++ b/roles/promote-container-image/tasks/main.yaml
@@ -1,32 +1,8 @@
-- name: Verify repository names
-  when: |
-    container_registry_credentials is defined
-    and zj_image.registry not in container_registry_credentials
-  loop: "{{ container_images }}"
-  loop_control:
-    loop_var: zj_image
+- name: Promote container image with tags
+  when: promote_container_image_method|default('tag') == 'tag'
+  include_tasks: promote-from-tag.yaml
+
+- name: Promote container image with intermediate registry
+  when: promote_container_image_method|default('tag') == 'intermediate-registry'
   fail:
-    msg: "{{ zj_image.registry }} credentials not found"
-
-- name: Verify repository permission
-  when: |
-    container_registry_credentials[zj_image.registry].repository is defined and
-    not zj_image.repository | regex_search(container_registry_credentials[zj_image.registry].repository)
-  loop: "{{ container_images }}"
-  loop_control:
-    loop_var: zj_image
-  fail:
-    msg: "{{ zj_image.repository }} not permitted by {{ container_registry_credentials[zj_image.registry].repository }}"
-
-- name: Promote image
-  loop: "{{ container_images }}"
-  loop_control:
-    loop_var: zj_image
-  include_tasks: promote-retag.yaml
-
-# The docker roles prune obsolete tags here, but that relies on a
-# timestamp to make sure we're not deleting in-progress tags (that the
-# gate pipeline may be uploading at the same time we're promoting).
-# That timestamp is not available with skopeo list-tags, so some other
-# mechanism will need to be devised to clean them up.  In the
-# meantime, we hope that the cleanup in promote-retag succeeds.
+    msg: 'The intermediate-registry promote role is not yet complete'
diff --git a/roles/promote-container-image/tasks/promote-from-tag.yaml b/roles/promote-container-image/tasks/promote-from-tag.yaml
new file mode 100644
index 000000000..2a14f2e15
--- /dev/null
+++ b/roles/promote-container-image/tasks/promote-from-tag.yaml
@@ -0,0 +1,32 @@
+- name: Verify repository names
+  when: |
+    container_registry_credentials is defined
+    and zj_image.registry not in container_registry_credentials
+  loop: "{{ container_images }}"
+  loop_control:
+    loop_var: zj_image
+  fail:
+    msg: "{{ zj_image.registry }} credentials not found"
+
+- name: Verify repository permission
+  when: |
+    container_registry_credentials[zj_image.registry].repository is defined and
+    not zj_image.repository | regex_search(container_registry_credentials[zj_image.registry].repository)
+  loop: "{{ container_images }}"
+  loop_control:
+    loop_var: zj_image
+  fail:
+    msg: "{{ zj_image.repository }} not permitted by {{ container_registry_credentials[zj_image.registry].repository }}"
+
+- name: Promote image
+  loop: "{{ container_images }}"
+  loop_control:
+    loop_var: zj_image
+  include_tasks: promote-retag.yaml
+
+# The docker roles prune obsolete tags here, but that relies on a
+# timestamp to make sure we're not deleting in-progress tags (that the
+# gate pipeline may be uploading at the same time we're promoting).
+# That timestamp is not available with skopeo list-tags, so some other
+# mechanism will need to be devised to clean them up.  In the
+# meantime, we hope that the cleanup in promote-retag succeeds.