This change removes review02 from our inventory and configuration
management. This should be landed after we're confident we're unlikely
to need to roll back to that server. That said if we do rollback before
the server is cleaned up reverting this change isn't too bad.
Change-Id: Ica14ae92c4c1ef6db76acef93d6d65977aab4def
This is a new Noble mirror that will replace the old mirror. We update
the inventory test cases to stop matching the old mirror because that
old mirror will eventually be removed from the inventory. Otherwise this
is a pretty standard mirror replacement.
Depends-On: https://review.opendev.org/c/opendev/zone-opendev.org/+/945230
Change-Id: Ib18d834e16ebeec75fb7f16e1dc83b357efb646c
Currently jobs have several tracebacks like[1]. Treat tzdata as if it
were listed as an ARA requirement.
[1] Grabbed from:
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_22b/928656/1/gate/system-config-run-etherpad/22bc1fd/job-output.txt
Using /etc/ansible/ansible.cfg as config file
Operations to perform:
Apply all migrations: admin, api, auth, contenttypes, db, sessions
Running migrations:
No migrations to apply.
Traceback (most recent call last):
File "/usr/lib/python3.10/zoneinfo/_common.py", line 12, in load_tzdata
return importlib.resources.open_binary(package_name, resource_name)
File "/usr/lib/python3.10/importlib/resources.py", line 43, in open_binary
package = _common.get_package(package)
File "/usr/lib/python3.10/importlib/_common.py", line 66, in get_package
resolved = resolve(package)
File "/usr/lib/python3.10/importlib/_common.py", line 57, in resolve
return cand if isinstance(cand, types.ModuleType) else importlib.import_module(cand)
File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'tzdata'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/ansible-venv/lib/python3.10/site-packages/django/core/handlers/exception.py", line 55, in inner
response = get_response(request)
File "/usr/ansible-venv/lib/python3.10/site-packages/django/core/handlers/base.py", line 197, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/ansible-venv/lib/python3.10/site-packages/django/views/decorators/csrf.py", line 56, in wrapper_view
return view_func(*args, **kwargs)
File "/usr/ansible-venv/lib/python3.10/site-packages/rest_framework/viewsets.py", line 124, in view
return self.dispatch(request, *args, **kwargs)
File "/usr/ansible-venv/lib/python3.10/site-packages/rest_framework/views.py", line 509, in dispatch
response = self.handle_exception(exc)
File "/usr/ansible-venv/lib/python3.10/site-packages/rest_framework/views.py", line 469, in handle_exception
self.raise_uncaught_exception(exc)
File "/usr/ansible-venv/lib/python3.10/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
raise exc
File "/usr/ansible-venv/lib/python3.10/site-packages/rest_framework/views.py", line 506, in dispatch
response = handler(request, *args, **kwargs)
File "/usr/ansible-venv/lib/python3.10/site-packages/rest_framework/mixins.py", line 18, in create
serializer.is_valid(raise_exception=True)
File "/usr/ansible-venv/lib/python3.10/site-packages/rest_framework/serializers.py", line 223, in is_valid
self._validated_data = self.run_validation(self.initial_data)
File "/usr/ansible-venv/lib/python3.10/site-packages/rest_framework/serializers.py", line 442, in run_validation
value = self.to_internal_value(data)
File "/usr/ansible-venv/lib/python3.10/site-packages/rest_framework/serializers.py", line 499, in to_internal_value
validated_value = field.run_validation(primitive_value)
File "/usr/ansible-venv/lib/python3.10/site-packages/rest_framework/fields.py", line 538, in run_validation
value = self.to_internal_value(data)
File "/usr/ansible-venv/lib/python3.10/site-packages/rest_framework/fields.py", line 1190, in to_internal_value
return self.enforce_timezone(parsed)
File "/usr/ansible-venv/lib/python3.10/site-packages/rest_framework/fields.py", line 1150, in enforce_timezone
field_timezone = self.timezone if hasattr(self, 'timezone') else self.default_timezone()
File "/usr/ansible-venv/lib/python3.10/site-packages/rest_framework/fields.py", line 1174, in default_timezone
return timezone.get_current_timezone() if settings.USE_TZ else None
File "/usr/ansible-venv/lib/python3.10/site-packages/django/utils/timezone.py", line 96, in get_current_timezone
return getattr(_active, "value", get_default_timezone())
File "/usr/ansible-venv/lib/python3.10/site-packages/django/utils/timezone.py", line 82, in get_default_timezone
return zoneinfo.ZoneInfo(settings.TIME_ZONE)
File "/usr/lib/python3.10/zoneinfo/_common.py", line 24, in load_tzdata
raise ZoneInfoNotFoundError(f"No time zone found with key {key}")
zoneinfo._common.ZoneInfoNotFoundError: 'No time zone found with key UTC'
Change-Id: Ib8923a306b7e15d7ae4a6f4001f46906a454abd0
This removes ansible configuration for the linaro cloud itself and the
linaro cloud mirror. This cloud is in the process of going away and
having these nodes in our inventory is creating base jobs failures due
to unreachable nodes. This then dominoes into not running the LE refresh
job and now some certs are not getting renewed. Clean this all up so
that the rest of our systems are happy.
Note that we don't fully clean up the idea of an unmanaged group as
there may be other locations we want to do something similar (OpenMetal
perhaps?). We also don't remove the openstack clouds.yaml entries for
the linaro cloud yet. It isn't entirely clear when things will go
offline, but it may be as late as August 10 so we keep those credentials
around as they may be useful until then.
Change-Id: Idd6b455de8da2aa9901bf989b1d131f1f4533420
Rackspace is requiring multi-factor authentication for all users
beginning 2024-03-26. Enabling MFA on our accounts will immediately
render password-based authentication inoperable for the API. In
preparation for this switch, add new cloud entries for the provider
which authenticate by API key so that we can test and move more
smoothly between the two while we work out any unanticipated kinks.
Change-Id: I787df458aa048ad80e246128085b252bb5888285
There is an Ansible bug where if successive tasks are separated in time
by the ssh controlpersist timeout Ansible will race ssh's updates to the
connection causing the second task to fail with an rc of -13 [0].
Statistically we believe that the longer times between tasks are less
likely. This means ssh controlpersist timeout values that are larger
will be less likely to have ansible hit this bug. Increase the value
from a default of 60s to 180s to take advantage of this probability.
[0] https://github.com/ansible/ansible/issues/81777
Change-Id: Ic40730c3e0bd814e6a5c739e4415657594362032
This reverts commit a77eebe911b9651575c32dec8cb5ac84e4057192.
Ruamel.yaml 0.18.2 converted the error assocaited with the use of this
deprecated method from a sys.exit(1) to a raised Exception. It is
believed that this will allow Ara to run in some capacity and we don't
need to pin this dependency anymore.
More details in the upstream bug here:
https://github.com/ansible-community/ara/issues/524
Change-Id: I694b8a016755d828490f0bcf4c6ceb812edf43d9
ARA is not compatible with latest ruamel.yaml which leads to errors
running ansible. Fix this by capping the ruamel.yaml version we install.
Change-Id: Ia5db3ba8579e7e5c1fe375b156323b94f341ad3e
Zuul has already made the move; we should catch up. Part of this is
motivated by the weird failures we've seen when creating the LE
certcheck domains list in an Ansible loop though I've no real evidence
that upgrading would fix this. Python on bridge is 3.10 which should be
compatible with Ansible 8.
Full (and probably far too dense) changelogs can be found here:
https://github.com/ansible-community/ansible-build-data/blob/main/8/CHANGELOG-v8.rst
A prior patchset temporarily updated zuul configs to run most of our
system-config-run-* jobs using ansible 8. They all passed implying that
our playbooks and roles will function under the newer version of
ansible.
Change-Id: Ie1b4e5363c56c0dcd61721fb0ea061d5198ecfed
This uncomments the list additions for the lists.airshipit.org and
lists.katacontainers.io sites on the new mailman server, removing
the configuration for them from the lists.opendev.org server and, in
the case of the latter, removing all our configuration management
for the server as it was the only site hosted there.
Change-Id: Ic1c735469583e922313797f709182f960e691efc
Switch the DNS testing names to "99" which helps disambiguate testing
from production, and makes you think harder about ensuring references
are abstracted properly.
The LE zone gets installed on the hidden primary, so it should just
use the inventory_hostname rather than hard-coding. Instead of
hard-coding the secondaries, we grab them from the secondary DNS
group. This should allow us to start up replacement DNS servers which
will be inactive until they are enabled for the domain.
This requires an update to the LE job, as it currently doesn't have a
secondary nameserver as part of the nodes. This means the
"adns-secondary" group is blank there. Even though this node isn't
doing anything, I think it's worth adding to cover this path (I did
consider some sort of dummy host add type thing, but that just makes
things hard to follow). We also use the 99 suffix in that job just
for consistency.
Change-Id: I1a4be41b70180deab51a3cc8a2b3e83ffd0ff1dc
Firstly, my understanding of "adns" is that it's short for
authoritative-dns; i.e. things related to our main non-recursive DNS
servers for the zones we manage. The "a" is useful to distinguish
this from any sort of other dns services we might run for CI, etc.
The way we do this is with a "hidden" server that applies updates from
config management, which then notifies secondary public servers which
do a zone transfer from the primary. They're all "authoritative" in
the sense they're not for general recursive queries.
As mentioned in Ibd8063e92ad7ff9ee683dcc7dfcc115a0b19dcaa, we
currently have 3 groups
adns : the hidden primary bind server
ns : the secondary public authoratitive servers
dns : both of the above
This proposes a refactor into the following 3 groups
adns-primary : hidden primary bind server
adns-secondary : the secondary public authoritative servers
adns : both of the above
This is meant to be a no-op; I just feel like this makes it a bit
clearer as to the "lay of the land" with these servers. It will need
some considering of the hiera variables on bridge if we merge.
Change-Id: I9ffef52f27bd23ceeec07fe0f45f9fee08b5559a
All references to this cloud have been removed from nodepool, so we
can now remove nb03 and the mirror node.
Change-Id: I4d97f7bbb6392656017b1774b413b58bdb797323
I'm not sure if this is clearer or not (which is why I proposed it
separately here).
From inspection of the code, adding "state: latest" just means Ansible
runs "install -U" ... which is pretty much the same thing as adding
--upgrade. Which is clearer, I'm not sure?
Change-Id: I6e31523686555e33d062f3b05f2385d7e21e2620
In reviews for I3696740112fa691d1700040b557f53f6721393e7 clarkb
correctly pointed out that a constraint like ansible<8 will never
result in the production venv being updated.
The point of having the requirements.txt was to avoid a full update
run on the venv on every one of its frequent runs.
A good in-between seems to be writing out the current day timestamp
into the requirements file. Since the template: return value is based
on comparing the hash of the old/new (we suspected this, but I also
double confirmed with a local test), this results in the template
being updated just once a day. Ergo we will run a --update run on the
ansible-venv just once a day.
Change-Id: I78a914f71cef687f09fcfee0f3f498b79d810f5d
Change I4789fe99651597b073e35066ec3be312e18659b8 made me realise that
with the extant code, nothing will update the /usr/ansible-env
environment when we bump the versions.
The installation of the Ansible, openstacksdk and ARA packages as part
of the "install-ansible" role was done this way to facilitate being
able to install all three of these from their main/master/devel
branches for the "-devel" job, which is our basic canary for upstream
things that might affect us. Because of the way the pip: role works
with "state: latest" and mixing on-disk paths with pypi package names,
this became a bit of a complex swizzling operation.
Some thing have changed since then; particularly us now using a
separate venv and upstream Ansible's change to use "collections"; so
pulling in a bug-fix for Ansible is not as simple as just cloning
github.com/ansible/ansible at a particular tag any more. This means
we should reconsider how we're specifying the packages here.
This simplifies things to list the required packages in a
requirements.txt file, which we install into the venv root. The nice
thing about this is that creating requirements.txt with the template:
role is idempotent, so we can essentially monitor the file for changes
and only (re-)run the pip install into /usr/ansible-env when we change
versions (forcing upgrades so we get the versions we want, and fixing
the original issue mentioned above).
Change-Id: I3696740112fa691d1700040b557f53f6721393e7
As a follow-on to Iaf0dd577cf1bdc0c9464b7413d22eec9db37a640; also
install the python dev packages so python things can build.
Change-Id: I99cde1a93671da500d3013b5eb6ba4f3509e646f
Deployment to the new Jammy bridge host is failing because it can't
build netifaces for Python 3.10. Upstream doesn't have a wheel --
this must not fail in the gate because we setup the testing bridge
node to use our wheel cache.
We should unconditionally install this for maximum flexiblity when
deploying fresh hosts.
Change-Id: Iaf0dd577cf1bdc0c9464b7413d22eec9db37a640
The pip3 role installs the latest upstream pip, overwriting the
packaged versions. We would prefer to install things in
venv/virtualenvs moving forward to keep better isolation.
Unfortunately thanks to time the Bionic era packaged pip is so old
that it can't install anything modern like Ansible. Thus we have to
squash installing Ansible into a separate venv into this change as
well.
Although the venv created by default on the Bionic host also has an
old pip, luckily we already worked around that in
I81fd268a9354685496a75e33a6f038a32b686352 which provides a create-venv
role that creates a fully updated venv for us.
To minimise other changes, this symlinks ansible/ansible-playbook into
/usr/local/bin. On our current production bastion host this will make
a bit of a mess -- but we are looking at replacing that with a fresh
system soon. The idea is that this new system will not be
bootstrapped with a globally installed Ansible, so we won't have
things lying around in multiple places.
Change-Id: I7551eb92bb6dc5918c367cc347f046ff562eab0c
The current version of ARA doesn't depend on the "ansible" package any
more, so we don't need this fake stub package installed to fool it.
Remove this workaround.
Change-Id: I9330a22650204464b0b677fca06deec494731641
This was added with 3cd8cd0765b447049fe57e75c6aa5d0d5c980873 but
ansible-core is a package now. We can remove this workaround.
Change-Id: Ide1f3bbfe8887315a9f574bb1c19bf3234f58686
We indicated to the OpenStack TC that this service would be going away
after the Yoga cycle if no one stepped up to start maintaining it. That
help didn't arrive in the form of OpenDev assistance (there is effort
to use OpenSearch external to OpenDev) and Yoga has released. This means
we are now clear to retire and shutdown this service.
This change attempts to remove our configuration management for these
services so that we can shutdown the servers afterwards. It was a good
run. Sad to see it go but it wasn't sustainable anymore.
Note a follow-up will clean up elastic-recheck which runs on the status
server.
Depends-On: https://review.opendev.org/c/opendev/base-jobs/+/837619
Change-Id: I5f7f73affe7b97c74680d182e68eb4bfebbe23e1
We're going to want Mailman 3 served over HTTPS for security
reasons, so start by generating certificates for each of the sites
we have in v2. Also collect the acme.sh logs for verification.
Change-Id: I261ae55c6bc0a414beb473abcb30f9a86c63db85
Having two groups here was confusing. We seem to use the review group
for most ansible stuff so we prefer that one. We move contents of the
gerrit group_vars into the review group_vars and then clean up the use
of the old group vars file.
Change-Id: I7fa7467f703f5cec075e8e60472868c60ac031f7
Start backing up the new review server. Stop backing up the old
server. Fix the group matching test for the new server.
Change-Id: I8d84b80099d5c4ff7630aca9df312eb388665b86
This moves review02 out of the review-staging group and into the main
review group. At this point, review01.openstack.org is inactive so we
can remove all references to openstack.org from the groups. We update
the system-config job to run against a focal production server, and
remove the unneeded rsync setup used to move data.
This additionally enables replication; this should be a no-op when
applied as part of the transition process is to manually apply this,
so that DNS setup can pull zone changes from opendev.org.
It also switches to the mysql connector, as noted inline we found some
issues with mariadb.
Note backups follow in a separate step to avoid doing too much at
once, hence dropping the backup group from the testing list.
Change-Id: I7ee3e3051ea8f3237fd5f6bf1dcc3e5996c16d10
ARA's master branch now has static site generation, so we can move
away from the stable branch and get the new reports.
In the mean time ARA upstream has moved to github, so this updates the
references for the -devel job.
Depends-On: https://review.opendev.org/c/openstack/project-config/+/793530
Change-Id: I008b35562994f1205a4f66e53f93b9885a6b8754
This converts our existing puppeted mailman configuration into a set of
ansible roles and a new playbook. We don't try to do anything new and
instead do our best to map from puppet to ansible as closely as
possible. This helps reduce churn and will help us find problems more
quickly if they happen.
Followups will further cleanup the puppetry.
Change-Id: If8cdb1164c9000438d1977d8965a92ca8eebe4df
We will be rotating zk01-03.openstack.org out and replacing them with
zk04-06.opendev.org. This is the first change in that process which puts
zk04 into the rotation. This should only be landed when operators are
ready to manually stop zookeeper on zk03 (which is being replaced by
zk04 in this change).
Change-Id: Iea69130f6b3b2c8e54e3938c60e4a3295601c46f
Once we are satisfied that we have disabled the inputs to firehose we
can land this change to stop managing it in config management. Once that
is complete the server can be removed.
Change-Id: I7ebd54f566f8d6f940a921b38139b54a9c4569d8
We duplicate the KDC settings over all our kerberos clients. Add
clients to a "kerberos-client" group and set the variables in a group
file.
Change-Id: I25ed5f8c68065060205dfbb634c6558488003a38
These are new focal replacement servers. Because this is the last set of
replacements for the executors we also cleanup the testing of the old
servers in the system-config-run-zuul job and the inventory group
checker job.
Change-Id: I111d42c9dfd6488ef69ff1a7f76062a73d1f37bf
We have identified an issue with stevedore < 3.3.0 where the
cloud-launcher, running under ansible, makes stevedore hashe a /tmp
path into a entry-point cache file it makes, causing a never-ending
expansion.
This appears to be fixed by [1] which is available in 3.3.0. Ensure
we install this on bridge. For good measure, add a ".disable" file as
we don't really need caches here.
There's currently 491,089 leaked files, so I didn't think it wise to
delete these in a ansible loop as it will probably time out the job.
We can do this manually once we stop creating them :)
[1] d7cfadbb7d
Change-Id: If5773613f953f64941a1d8cc779e893e0b2dd516
This server has been replaced by ze01.opendev.org running Focal. Lets
remove the old ze01.openstack.org from inventory so that we can delete
the server. We will follow this up with a rotation of new focal servers
being put in place.
This also renames the xenial executor in testing to ze12.openstack.org
as that will be the last one to be rotated out in production. We will
remove it from testing at that point as well.
We also remove a completely unused zuul-executor-opendev.yaml group_vars
file to avoid confusion.
Change-Id: Ida9c9a5a11578d32a6de2434a41b5d3c54fb7e0c
This is a focal replacement for ze01.openstack.org. Cleanup for
ze01.openstack.org will happen in a followup when we are happy with the
results of running zuul-executor on focal.
Change-Id: If1fef88e2f4778c6e6fbae6b4a5e7621694b64c5
All hosts are now running thier backups via borg to servers in
vexxhost and rax.ord.
For reference, the servers being backed up at this time are:
borg-ask01
borg-ethercalc02
borg-etherpad01
borg-gitea01
borg-lists
borg-review-dev01
borg-review01
borg-storyboard01
borg-translate01
borg-wiki-update-test
borg-zuul01
This removes the old bup backup hosts, the no-longer used ansible
roles for the bup backup server and client roles, and any remaining
bup related configuration.
For simplicity, we will remove any remaining bup cron jobs on the
above servers manually after this merges.
Change-Id: I32554ca857a81ae8a250ce082421a7ede460ea3c
Both the filesevers and db servers have common key material deployed
by the openafs-server-config role. Put both types of server in a new
group "afs-server-common" so we can define this key material in just
one group file on bridge.
Then separate out the two into afs-<file|db>-server groups for
consistent naming.
Rename afs-admin for consistent naming.
The service file is updated to reflect the new groups.
Change-Id: Ifa5f251fdfb8de737ad2ed96491d45294ce23a0c
With all AFS file-servers upgraded to 1.8, we can move afs01.dfw back
and rename the group to just "afs".
Change-Id: Ib31bde124e01cd07d6ff7eb31679c55728b95222
As described inline, installing ansible from source now installs the
"ansible-core" package, instead of "ansible-base". Since they can't
live together nicely, we have to do a manual override for the devel
job.
Change-Id: I1299ea330e6de048b661fc087f016491758631c7
Backups have been going well on ethercalc02, so add borg backup runs
to all backed-up servers. Port in some additional excludes for Zuul
and slightly modify the /var/ matching.
Change-Id: Ic3adfd162fa9bedd84402e3c25b5c1bebb21f3cb
This wasn't quite fixed right when these were moved into
project-config. Get the projects and install them.
Change-Id: I0f854609fc9aebffc1fa2a2e14d5231cce9b71d0