Several things have bit rotted in here that we need to take care of.
First is that we updated the default nodeset to Noble which breaks our
ability to install Pillow<10 for blockdiag. To fix this we need to
install libjpeg-dev so that we can build a Pillow wheel locally during
testing.
Next old ansible-lint doesn't run on Noble's python3.12. We bump up
Ansible lint to a modern version that matches Zuul's current default
Ansible. We also stop installing zuul to get zuul_console and
zuul_return and instead simply mock them in the linter. To make this
work we have to drop the ansible-playbook syntax check run which is fine
because ansible-lint runs this too, but when done via ansible-lint the
mocked modules are respected [0].
Finally we have to clean up/ignore some of the new linter
warnings/errors.
[0] https://ansible.readthedocs.io/projects/lint/rules/syntax-check/
Change-Id: Ia0e936fefc9e2b0f2fa614c93a2f168e14b2825b
We move the disk utilization debugging into the post run playbook as it
isn't really a cleanup item. We may adjust this in the future if we find
that zuul isn't running this aggressively enough when test nodes run out
of disk space.
This leaves the ssh key cleanup as the dedicated task in the cleanup
which is what we want. A small self contained cleanup playbook that
actually cleans things up for subsequent runs.
Change-Id: I0bfbda8e04f43fb5df3475d52d59b1b9ba651037
Zuul 11.0.0 deprecated the cleanup-run attribute for jobs and folded it
into post-run with a special flag on the playbook. This change converts
to the new format to cleanup a deprecation warning. We also take
advantage of the new behavior to rearrange when things run relative to
log collection so that we can log the stats collection in the cleanup
run.
Finally to have the cleanup playbook execture successfully we move the
ssh key removal into the cleanup playbook instead of the post playbook.
Otherwise the keys won't be available for ssh during the cleanup
playbook. This match the zuul-base-job examples more closely.
Note we only make these changes on base-test to start in order to check
things before we affect all jobs. The followup change modifies the other
base jobs and will be landed after this one checks out ok.
Change-Id: I639691f12adc8a5dcebdbe23693765e42548aaaf
This adds timeouts to cleanup playbook tasks so that if they have
problems running the entire job does not stall. We noticed that df in
particular can hang on broken nfs mounts and if that happens the job's
cleanup playbook does not end (even after a couple days).
This change adds the timeouts to the testing playbook. If testing shows
this works well we will add it to the production playbooks too.
Change-Id: Ibc875bf99e6da29e2fffbddae8590ecde06b5c3b
To reduce noise in logs and job overhead only collect last effort debug
info when jobs fail. This should make jobs run more quickly and keep
logs cleaner.
Change-Id: Id62986e5801ddbb46665985766f93a34b462c321
The way zuul logs things these prints are not necessary to see the
return values. They were necessary in my local testing, but zuul is
smarter than me.
Remove the prints to avoid duplicate logging.
Change-Id: I845c273f4c1c618bfe61488d416d82bf8177f41c
This is the first change to add a cleanup phase to our jobs. This
cleanup phase will use the raw module to get disk and networking info
from hosts. This info can be useful if the tests fail due to
"networking" problems when full ansible modules can't be bootstrapped.
I tested this usage of the raw module against a host with a full root
disk so it should work just fine.
Adding to base-test to test it in general usage before adding to base
and base-minimal.
Change-Id: I0c1569d3ca699cbd6be035f987e90805863eb6b2