system-config

Author	SHA1	Message	Date
Clark Boylan	8cd8784825	Fix haproxy access to rsyslogd on Noble Ubuntu Noble ships with an enforcing rsyslogd apparmor profile. This profile prevents our haproxy container from opening the syslog socket we bind mount into the container. I discussed this in #ubuntu-security which resulted in this issue: https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/2098148 which includes many details on what is going on. This change implements the suggested workaround for our haproxy nodes. I believe this is the only place we are currently attempting to directly access rsyslog sockets from within containers. The tl;dr on the fix is that we have to tell rsyslogd to attach disconnected connections as the container runs in a different filesystem namespace which disconnects the paths for the socket. Unfortunately sarnold indicates that we have to edit the primary profile configuration file as this flag applies to the top level of the profile. We cannot use one of the files this profile #includes. Change-Id: I4e09211a1bdc4dfbf3012a66e79c181c6fb957a4	2025-02-13 08:30:37 -08:00
Clark Boylan	170c003bc7	Install apparmor when installing podman The old install-docker upstream.yaml tasks installed apparmor for docker (it was origianlly a dependency but then docker removed it as an explicit dependency while still explicitly depending on it so we manually installed it). When we started deploying Noble nodes with podman via the install-docker role we didn't get apparmor because podman doesn't appear to depend on it. However when we got to production the production images already come with apparmor which includes profiles for things like podman and rsyslog which have caused problems for us deploying services with podman. Attempt to catch these issues in CI by explicitly installing apparmor. This should be a noop for production beceaus apparmor is already installed. This should help us catch problems with podman in CI before we ever get to production. To ensure that apparmor is working properly we capture apparmor_status output as part of our system-config-run job log collection. Note we remove the zuul lb test for haproxy.log being present as current apparmor problems with the rsyslogd profile prevent that from occuring on noble. The next change will correct that issue and reinstate the test case. Change-Id: Iea5966dbb2dcfbe1e51d9c00bad67a9d37e1b7e1	2025-02-13 08:12:55 -08:00
Clark Boylan	15e0d6c7df	Move haproxy config into /var/lib/haproxy Rsyslog on Noble has apparmor rules that restrict rsyslog socket creation to /var/lib/*/dev/log. Previously we were configuring haproxy hosts to create an rsyslog socket for haproxy at /var/haproxy/dev/log which doesn't match the apparmor rule so gets denied. To address this we move all the host side haproxy config from /var/haproxy to /var/lib/haproxy. This allows rsyslog to create the socket. To avoid needing to update docker images (for haproxy statsd) and to continue to make the haproxy container itself happy we don't adjust paths on the target side of our bind mounts. This means some things still refer to /var/haproxy but they should all be within containers. I don't believe this will be impactful to existing load balancer servers. We should deploy new content to /var/lib/haproxy then automatically restart services (rsyslog and haproxy container) because their configs are updating. One potential problem with this is rsyslog will restart before the containers do and its log path will have moved. If we are concerned about this we can configure rsyslog to continue to attempt to create the old path in addition to the new path (this will fail on Noble). Change-Id: I4582e6b2dda188583f76265ab78bcb00a302e375	2025-02-12 09:10:08 -08:00
Clark Boylan	681088951b	Perform haproxy HUP signals with kill Podman on Ubuntu Noble has apparmor config that prevents SIGHUP from being delivered via `podman kill -s HUP` or `docker compose kill -s HUP`. Attempting to do so results in: kernel: audit: type=1400 audit(1739232042.996:129): apparmor="DENIED" operation="signal" class="signal" profile="containers-default-0.57.4-apparmor1" pid=17067 comm="runc" requested_mask="receive" denied_mask="receive" signal=hup peer="podman" This appears to be due to issues with the apparmor configuration that was edited to make other signals work: https://bugs.launchpad.net/ubuntu/+source/libpod/+bug/2040483 We work around that by using kill to issue the signal instead which seems to work based on some manual testing. Change-Id: I49435fdda662e25c7192faf24e0ae4b527e943b9	2025-02-11 08:04:55 -08:00
Zuul	fe75c3b194	Merge "Switch codesearch to journald logging"	2025-02-10 23:26:14 +00:00
Clark Boylan	61fd8dd59d	Deploy zuul-lb02 This is a new Noble server to replace the existing zuul-lb01 server. As part of this transition we switch to podman as the runtime container runtime and docker compose replaces docker-compose. This requires a small update to testing to check the new container name. The depends on isn't strictly necessary but seems like good hygiene to deploy a server with DNS records in place. Depends-On: https://review.opendev.org/c/opendev/zone-opendev.org/+/941146 Change-Id: I2bb74809b00d4a554a26601c46a2aa4c3c75d4f1	2025-02-10 10:24:15 -08:00
Clark Boylan	db409d6f75	Switch codesearch to journald logging Currently codesearch uses syslog logging with docker but podman doesn't support syslog. Podman does support journald which is basically equivalent for us since we have journald log to syslog too. Update for podman compatibility in preparation for upgrades to Noble. Change-Id: Id7da6b70faad9521da6a39eaa9543b97c0136d58	2025-02-10 08:29:29 -08:00
Zuul	81bdbb3ded	Merge "Switch standalone mariadb to opendevmirror hosted mariadb image"	2025-02-07 17:25:04 +00:00
Zuul	ac8c76c831	Merge "Update to gitea 1.23.3"	2025-02-07 17:25:02 +00:00
Zuul	a176217c26	Merge "Switch Gerrit to opendevmirror hosted mariadb image"	2025-02-06 17:20:45 +00:00
Clark Boylan	97cc7520da	Update to gitea 1.23.3 We are upgrading from 1.23.1 to 1.23.3. Both 1.23.2 and 1.23.3 are bugfix releases but 1.23.2 includes a breaking change to webhooks. We don't use webhooks so this shouldn't affect us. Complete changelog can be found here: https://github.com/go-gitea/gitea/blob/v1.23.3/CHANGELOG.md There is also a minor update to one of the templates we override which I have synced over. Change-Id: I97ba30309da63ecb4fb4fc301209c60ea8dc8504	2025-02-06 08:02:54 -08:00
Zuul	4764268d29	Merge "Remove grafana01 from config management"	2025-02-05 21:38:16 +00:00
Zuul	fa22a0b1ce	Merge "Add AI openstack working group list to lists.openinfra.org"	2025-02-05 20:37:47 +00:00
Clark Boylan	dc47b469b7	Add AI openstack working group list to lists.openinfra.org This has been requested by Jimmy at the foundation. Change-Id: I7bcbca594f42287b6219704e1797a2e2c5d2b1d5	2025-02-05 10:13:09 -08:00
Clark Boylan	4ed1d2c63a	Fix keycloak container restart handler This handler used an incorrect path to the docker-compose file and failed with no such file or directory errors. Update the handler to use the correct path to the docker-compose file. I also add a note that the check to avoid restarts when we just restarted containers may not be working as we did restart at least the mariadb container which is how I discovered this issue. Change-Id: If004b72e3efc0d0d4665c6fd56e514a5cb6191c5	2025-02-05 09:29:24 -08:00
Zuul	6648760059	Merge "Switch keycloak to opendevmirror hosted mariadb image"	2025-02-05 17:20:30 +00:00
Clark Boylan	fb0674d402	Fix zuul-admins doc ref This sphinx internal ref was missing ``'s surrounding the token identifier. Add them which should fix the reference. Change-Id: I6261ab3a96cecbf63d0934441650d9d91baac798	2025-02-05 07:56:43 -08:00
Clark Boylan	eade0b35b6	Remove grafana01 from config management Now that we have grafana02 up and running we need to remove grafana01 from management so that it can be deleted. A followup change will clean up DNS for us. Change-Id: Ib90dadf404eb24aed5673d2611584bd00a278d45	2025-02-04 14:44:28 -08:00
Clark Boylan	d92410106e	Update to the even more recent grafana 10.4.15 We just deployed grafana02 with 10.4.14 which was latest when I started poking at this. Since then 10.4.15 has been released. Update to this latest release. Changelog can be seen here: `58a279e109/CHANGELOG.md (10415-2025-01-28)` Change-Id: I7a8fd7bc273e628475df8bfc492e8a0fdf480457	2025-02-03 16:22:34 -08:00
Zuul	2a04501e45	Merge "Deploy grafana02"	2025-02-03 23:02:52 +00:00
Zuul	db987a3d09	Merge "Update grafana to 10.4.14"	2025-02-03 22:19:55 +00:00
Clark Boylan	193151adc9	Test launch installation on launch edits We have a testinfra test case for checking the launch tooling is installed properly. Unfortunately, we weren't running that test case when we make updates to the launch tooling. Fix that. Change-Id: Ie497d60aaf1842a7478a8550d45608daeec4625a	2025-02-03 13:34:53 -08:00
Clark Boylan	b0e396dc21	Deploy grafana02 This adds a grafana02 server to our inventory with associated LE host vars. This should deploy grafana on our newly created noble grafana02 server. Note we switch the system-config-run-grafana job over to interact with 02 to match production. To simplify this effort in the future we convert the old grafana01 testing host var to a group var file. This change was already done on bridge. We will need to followup with at least one change to clean out grafana01 when we are happy with the new server. Depends-On: https://review.opendev.org/c/opendev/zone-opendev.org/+/940653 Change-Id: Ifd7f83185fbd59935a63973642e9d165bd8105a2	2025-02-03 11:47:02 -08:00
Clark Boylan	42445986e7	Fix launch node string quoting This is what I get for not testing it before pushing. I've made this minor edit in place in the venv contents on bridge and launching grafana02 appears to have worked. This should be the only fixup needed. Change-Id: Ief32094fb0b216dac99879a285a7dbd0fd005b49	2025-02-03 11:08:18 -08:00
Clark Boylan	680a120bf1	Update grafana to 10.4.14 This is the latest release of the 10.x series and we're currently stuck on 10.2.2. We can update to 11.x after we're up to date on 10.x. We bundle this change up with an update to run on Noble. The plan is we'll put the old 10.2 focal server in the emergency file, land this change, then add a new Noble server to inventory. This should allow us to easily rollback to 10.2 if Noble with Grafana 10.4.14 don't work for some reason. Basically killing two birds with one stone here and getting a safer upgrade process out of it. Depends-On: https://review.opendev.org/c/openstack/project-config/+/940276 Change-Id: Icc5e02d4b80cb1f8524ab3dde888aba7db430ffe	2025-02-03 10:10:14 -08:00
Clark Boylan	59ca749779	Add cpu count check to launch node We've seen Noble nodes booting in rax legacy come up with a single vcpu even when we've requested 8. Avoid unexpected reduction in CPU counts when booting new noble nodes by explicitly checking for at least 2 CPUs. We don't want to discover a month after replacing a server that we need to replace it again because it booted on the wrong hypervisor. Change-Id: I043dc8d6eb1131d0fec49734c7959e6c123f8f8f	2025-02-03 09:55:44 -08:00
Zuul	dbb13be886	Merge "Switch our haproxy image to quay opendevmirror location"	2025-02-03 17:32:55 +00:00
Zuul	6ce41794a9	Merge "Switch refstack to opendevmirror hosted mariadb image"	2025-01-31 17:19:37 +00:00
Clark Boylan	fa115e59dd	Switch our haproxy image to quay opendevmirror location This will pull the haproxy:lts image from the mirror we have at quay.io/opendevmirror/haproxy rather than docker hub directly. This should improve reliability in CI in particular when pulling that image. One fewer image to pull from docker hub also means more rate limit to spend where we are still pulling from docker hub. Note this will affect the gitea and zuul web front ends as they are both fronted by haproxy. Expect a minor blip while the container "updates" (hashes should match) and is restarted. Change-Id: Ic242ea3975ada1c7a698be8e41b9c5c8f8d07ed3	2025-01-31 08:25:26 -08:00
Zuul	9de8bf3489	Merge "Mirror haproxy container image to opendevmirror on quay.io"	2025-01-30 19:29:48 +00:00
Zuul	d0725f9927	Merge "Mirror node 23 container image"	2025-01-30 00:19:58 +00:00
James E. Blair	595400dfe6	Mirror node 23 container image This is the current version of node; mirror it so that Zuul can consider upgrading. Change-Id: I666f91cfac755839cbfa3cc1034dea99d9e964e0	2025-01-29 16:10:48 -08:00
Zuul	39c86d6c36	Merge "Switch mailman3 to opendevmirror hosted mariadb image"	2025-01-29 22:59:15 +00:00
Zuul	94d061788d	Merge "Increase message_linelength_limit to 1G"	2025-01-29 22:01:27 +00:00
Clark Boylan	66aea07548	Mirror haproxy container image to opendevmirror on quay.io This adds a new daily job to mirror haproxy to our quay.io hosted opendevmirror set of images. We'll be able to use this to update the location we pull haproxy from for zuul and gitea once the image is mirrored. Change-Id: Iba17aacdfbfede00ac09aea7c57325a09c7da9f2	2025-01-29 11:03:05 -08:00
Zuul	e992e5a747	Merge "Upgrade etherpad to v2.2.7"	2025-01-29 18:20:29 +00:00
Zuul	883faa1325	Merge "Deploy mariadb for etherpad from opendev's quay mirror"	2025-01-29 18:18:54 +00:00
Clark Boylan	d2bc4fdb9c	Switch Gerrit to opendevmirror hosted mariadb image One fewer image to pull from docker hub eating into our rate limits. Note that Gerrit its db container are not automatically updated by ansible. This change will need manual intervention to get reflected in production. Change-Id: Ibbfbf2ecfb7f972720bfc0f7b97831231d217633	2025-01-28 15:48:27 -08:00
Clark Boylan	b0fff9ccbd	Switch standalone mariadb to opendevmirror hosted mariadb image One fewer image to pull from docker hub eating into our rate limits. Note that this will restart the standalone mariadb container when deployed. This may impact Zuul's ability to record jobs temporarily. Change-Id: I4f46c63f3002740c2246f11d1ad69bd43e61036c	2025-01-28 15:46:51 -08:00
Clark Boylan	15ee7d46e6	Switch mailman3 to opendevmirror hosted mariadb image One fewer image to pull from docker hub eating into our rate limits. Note that this update will restart at least the mariadb container on the mailman list server. Change-Id: I8f90956d945baa1826783ed8a6de6b1ce24a84d2	2025-01-28 15:45:11 -08:00
Clark Boylan	35e7b10f23	Switch keycloak to opendevmirror hosted mariadb image One fewer image to pull from docker hub that eats into our rate limits. Note that deployment of this change will restart at least the mariadb container on the server. Change-Id: I21e7f707f0876aeb348af14efe57fe327ab594a9	2025-01-28 15:41:31 -08:00
Clark Boylan	545b91ec52	Switch refstack to opendevmirror hosted mariadb image One fewer image to pull from docker hub that eats into our rate limits. Note this will restart at least the mariadb service on the refstack server when it deploys. Change-Id: I15eb36bc570fe22e2e2b85b3bf321bb254636410	2025-01-28 15:39:43 -08:00
Clark Boylan	20d61cd8b9	Upgrade etherpad to v2.2.7 This appears to be a very minor update from 2.2.6 (as far as I can tell dockerfile and settings haven't changed). The changelog indicates that important changes were rewritten to use react 19 and react router v7. Other than that only dependency updates were made. https://github.com/ether/etherpad-lite/blob/v2.2.7/CHANGELOG.md Change-Id: I48e8914ffa7026e35b6341628a709301c6a61c26	2025-01-28 12:59:37 -08:00
Clark Boylan	1d951ccf1d	Deploy mariadb for etherpad from opendev's quay mirror This is all in an effort to reduce our total dependency on docker hub as rate limits there are quite low. Every image we can pull from somewhere else is more rate limit bandwidht we can use for images still on docker hub. Change-Id: I3566383acf43e556fcd5854f6dfb70af8ffa1ba2	2025-01-28 12:56:36 -08:00
Clark Boylan	088df00edc	Update graphite to send CORS headers even on 400 responses Newer grafana sends an options request that graphite responds to with a 400 response. This response did not include allowed origin headers because it is a failure case. Update this header and the allowed methods header to always be included even on 400 or other error responses. This should ideally address the CORS errors we see with updated grafana. An alternative is to update grafana to proxy the requests for us, but this is less flexible as other tools may not have built in proxies. The suggestion comes from this stackoverflow question and answer: https://stackoverflow.com/questions/20414669/nginx-add-headers-when-returning-400-codes Change-Id: Icf1179d35e420384da72af839ca329548226ee63	2025-01-28 12:47:22 -08:00
Zuul	c0d930ca25	Merge "Retire paste01 backups on the smaller backup server"	2025-01-27 21:59:01 +00:00
Zuul	86403ab6d2	Merge "Log grafana to /var/log/containers with journald/syslog"	2025-01-27 17:45:04 +00:00
Zuul	95b3a0aa97	Merge "Take screenshots of all grafana dashboards"	2025-01-27 17:35:16 +00:00
Jeremy Stanley	1b38a18473	Increase message_linelength_limit to 1G Exim 4.95 on Ubuntu Jammy started enforcing an outbound line length limit of 998 bytes, easily exceeded by some badly-behaved MUAs. Unfortunately, because Exim only checks this in its remote_smtp transport, it results in mass bounces back for Mailman mailing lists, incrementing all subscribers bounce scores on lists where bounce processing is enabled. The telltale indicator is that the messages are returned to Mailman citing a delivery error of "message has lines too long for transport". Ubuntu added a workaround in later versions of their packages, but did not backport that to Jammy. Regardless, it's overridden by a config option and we replace the default Exim config entirely, so need to incorporate it into ours directly anyway. Because this message_linelength_limit option to the remote_smtp transport is only supported by exim versions on Jammy and newer, exclude it for our older platforms so that it won't result in a configuration loading error. This copies the override value used in Ubuntu Noble's exim4.conf.template file. Change-Id: I38e169dc14e7fc3c5c1d43b5f147e6b35b718bb2	2025-01-27 17:15:53 +00:00
James E. Blair	24d300347a	Mount /etc/openstack in zuul-web The zuul-web component now needs to read the cloud config in order to fully parse the cloud provider information. Change-Id: I4b1356bb118afa317e49898b5cf40191e5f0955d	2025-01-26 06:46:22 -08:00

1 2 3 4 5 ...

19411 Commits