Gerrit is running on a Noble node now which uses docker compose not
docker-compose. This newer tool warns about the version in our
docker-compose.yaml file because it is ignored by the newer tool. Drop
it to clean up the warning.
Change-Id: Idebf6bb40309e4e8a50a0ed39e23e67e37510af8
This change removes review02 from our inventory and configuration
management. This should be landed after we're confident we're unlikely
to need to roll back to that server. That said if we do rollback before
the server is cleaned up reverting this change isn't too bad.
Change-Id: Ica14ae92c4c1ef6db76acef93d6d65977aab4def
The old gerrit init script uses sighup to request a graceful shutdown
of the service which is why when we ported to docker-compose we
configured it to also use sighup. Unfortunately, on noble with podman
the podman container apparmor profiles don't allow podman to issue a
sighup to the container. This means when we try to stop the service we
wait until the 5 minute timeout expires then docker compose + podman
issue a sigkill.
This is less graceful than we want. To address this we switch to sigint
instead. The reason for this is the podman container apparmor profiles
do allow signit and the jvm appears to treat sigint, sigterm, and sighup
as equivalent triggers for the shutdown hook.
Change-Id: Iacfc70713d63443d58bb563b895fdc5dfb0642e2
This came up as something that was missing while we bootstrapped a new
gerrit server. The rsa hostkey is managed but none of the three ecdsa
keys or the ed25519 key is. Fix that by managing these keys in the same
manner we manager the RSA key.
Change-Id: Iaf58543b6833273ca45fa5c359dc88eaf64d7a03
One fewer image to pull from docker hub eating into our rate limits.
Note that Gerrit its db container are not automatically updated by
ansible. This change will need manual intervention to get reflected in
production.
Change-Id: Ibbfbf2ecfb7f972720bfc0f7b97831231d217633
By default h2.maxCompactTime is set to 200 milliseconds. This means that
when h2 databases are shutdown Gerrit will only spend 200 milliseconds
compacting the on disk file to reduce total disk consumption.
Unfortunately, this is insufficient to keep these files in check (we had
one grow to 222GB and another to 61GB).
Hashar suggests that we set the h2.maxCompactTime java command line
option to 15000 to give h2 up to 15 seconds to compact things which is
what wikimedia has done. It sounds like this has led to good disk usage
improvements in their Gerrit installation.
Note that this will only compact things when we restart Gerrit so we may
also consider doing semi regular gerrit restarts?
More info can be found in this phabricator document [0] that captures
hashar's investigation, debugging, and fixing process.
[0] https://phabricator.wikimedia.org/phame/post/view/300/shrinking_h2_database_files/
Change-Id: Iffb8b37e0539f7d148c47a5aad79f03e3b9a8f79
This replaces syslog logging for containers with journald. Our syslog
rules for /var/log/containers/ log files should continue working because
journald emits to syslog. This is in preparation for an eventual docker
compose backed by podman setup on newer platforms.
Note the main Gerrit container doesn't configure syslog or journald
logging as Gerrit manages its own logging setup.
Change-Id: Idb426262d78591da4b74b390b31b933edfe08fbf
Now that Ansible has removed the cron job from the review node we can
remove the management of the cronjob from Ansible. This is step 2 in a
two step process. This should only merge after step 1 has landed and
been applied to the review server.
Change-Id: If8fd5bf83394a9caf2c3878311995437d241cf6d
Gerrit should be cleaning up logfiles on its own now. That means we
don't need a cronjob to do it. This is step 1 of a two step process
where we first have ansible remove the cronjob then step 2 will remove
the cronjob from ansible entirely.
Change-Id: I82825de65c28cea43e9a472884b880d6f01efabe
Note this will not perform the upgrade for us. Gerrit upgrades are still
manual. But we will land this change after the upgrade is completed to
reflect the new state.
Change-Id: I3fc3aca62d9226d86fb0decd5db6e3596e5516b0
Prior to Gerrit 3.10 Gerrit would automatically rotate and compress log
files; however, it would not delete them. With Gerrit 3.10 you can now
configure a time to keep value for rotated log files and Gerrit will
delete files older than the keep value.
Explicitly configure log compression and rotation to true (these are
the defaults but being explicit here makes sense to me if setting time
to keep) and add a 30 day timeToKeep value. This matches our current
Gerrit log pruning cronjob retention period of 30 days. Landing this
under Gerrit 3.9 should be safe then when we upgrade to 3.10 we can
remove our cronjob and rely on Gerrit to do this for us.
Note that cleanup happens in Gerrit at midnight and our cronjob runs at
0600. These should be separated by enough time that we can safely have
both running after the upgrade.
Change-Id: I4219e48c1fab5da97f80130d45badb759af680a1
When we restarted Gerrit recently there were a number of caches that
were over their default max sizes so were pruned. Gerrit prunes daily at
0100 or when restarted. This gives us a good indication for which caches
are currently configured to be too small for typically operation as we
restarted several hours before 0100.
All of the logged cache pruning can be found in this paste [0]. Many of
these caches were floating around their configured maximum and we leave
them alone. However four related caches are well above their default max
which is a good indication we need to increase their sizes. The four are
identified below with their documented purpose/function from the
upstream docs [1]:
* cache "git_modified_files"
Each item caches the list of git modified files between two git trees
corresponding to two different commits. This cache does not read the
actual file contents nor does it include the edits (modified regions)
of the files.
* cache "modified_files"
Each item caches the list of modified files between two commits. This
cache is similar to the git_modified_files cache but performs extra
logic including filtering out files that are untouched by both
commits because they were purely modified between the parent commits.
* cache "git_file_diff"
Each item caches the pure git diff between two git trees for a
specific file path. The diff includes all the file attributes
(old/new paths, change/patch types) as well as the list of edits
corresponding to the modified regions in the file.
* cache "gerrit_file_diff"
Each item caches the diff between two git commits for a specific file
path. This cache is similar to the git_file_diff cache but performs
extra logic including identifying the edits that are due to rebase.
The diff for the "commit message" and "merge list" can also be
requested from this cache.
Entries in this cache are relatively large, so memoryLimit is an
estimate in bytes of memory used. Administrators should try to target
cache.diff.memoryLimit to fit all changes users will view in a 1 or 2
day span. The same applies for other diff caches:
"git_modified_files", "modified_files" and "git_file_diff".
The note at the end of cache "gerrit_file_diff" is what we use to
determine these new sizes though we're more conservative with the memory
limits (default of which is 10m for each of these caches) as memory is
more scarce than disk.
[0] https://paste.opendev.org/show/bk4pTIuQLCsWaF3dVVF7/
[1] https://gerrit-review.googlesource.com/Documentation/config-gerrit.html#cache
Change-Id: I521b53c130892fc2152586da1c4858ea4099479f
Note this will not perform the upgrade for us. Gerrit upgrades are still
manual. But we will land this change after the upgrade is completed to
reflect the new state.
Change-Id: I439d5588e05a15b2d2fad4bafad8d59babf9d468
This change won't automatically perform the upgrade because we don't let
ansible manage Gerrit's containers directly. But it will update the
docker-compose file for us so that we can manually pull images, down,
then up the containers again which will do the upgrade for us.
This SQL database behind Gerrit only keeps tracks of which files users
have reviewed so its importance is minimal. Getting it updated so we can
bother even less about it is a good thing.
Change-Id: I78b683770496bb3d8e97464ddedaf813780a2a4e
This reverts commit d346d5375ffb70c3cea37def33f4d52887d8d276.
We make small edits to the .ssh/config file to make MINA ssh client
happy. In particular we need to use the path to the ssh key within the
Gerrit container and not on the host side.
This exact .ssh/config file has been tested on held nodes that appears
to properly replication from a test gerrit99 to a test gitea99 after
adding the pubkey to gerrit and accepting the hostkey for gitea on the
gerrit side.
Change-Id: I41caac08f6713ad385c98eea46fb004a414fab5d
Gerrit is unable to load the key, further testing is required to
figure out why.
This reverts commit 3ea2ca4bab1dc273d72ab3b0008d892f1fcd9407.
Change-Id: Ic169b2d0bf16c25caf7e61d824f5d6500147767c
This change is related to a similar change [0] in gitea that
adds/rotates public keys for the gerrit user in gitea. We should be
happy with the approach on both sides of the gitea and gerrit
replication interaction before proceeding.
This is motivated by changes in gitea that make it more picky about the
keys it will accept by default. Rather than disable those checks we're
switching keys to be more acceptable.
The end result is the use of 4096 bit RSA keys. We did consider ed25519
keys but there is concern that the Gerrit replication plugin may not be
able to handle them as they only come in the new openssh key file
format. The replication plugin docs indicate PEM format should be used
instead. It is possible that new MINA in gerrit handles this fine but we
stick with what we know works to avoid problems.
[0] https://review.opendev.org/c/opendev/system-config/+/901082
Change-Id: I36704b7f8c0710fb5142153f99418eb200860bee
Note this should only be merged after the manual upgrade process is
completed. We still don't have that automated yet, but do eventually
need our config management to match what we've updated by hand.
Change-Id: I721228637ceaab47263afbae6522da0166d6ed27
Gerrit 3.8 drops support for html in commentlinks entirely. Gerrit 3.7
supports both html and the new non html system. Update our 3.7
installation to the new system on 3.7 so that we are ready for the
Gerrit 3.8 upgrade later.
Most of our comment links did not use html entries so we drop the html
lines entirely. A single commentlink does use html and there we convert
it to the new prefix, link, text, suffix system. More details can be
found here:
https://gerrit.googlesource.com/gerrit/+/refs/tags/v3.8.2/tools/migration/html_to_link_commentlink.md
This should be a 1:1 mapping for our config and not change any behavior.
Change-Id: I0b87aac7b90814d242338be8fd03cfc9a76200f7
The default value is 1024, which causes issues for users that have
starred more than that number of changes. Bump by 50% hoping that the
possible performance impact will be moderate.
[0] https://gerrit-review.googlesource.com/Documentation/config-gerrit.html#index
Change-Id: I0c00110cfd6ba6d235821f6a5db7e1b91e2a8945
There were two problems with our gerrit upgrade config diff checking.
The first is that we were comparing using command exit codes after
pipeing diff to tee without setting pipefail. This meant that even if
the diff failed we got an exit code of 0 from tee and everything passed
happily.
Second we were not checking our pre gerrit state diff state. If the old
gerrit version also updated the config on disk we wouldn't get a diff
after upgrading to the new version. I think that is ultimately what
broke for us here because the 3.6 and 3.7 config diffs are empty, but
they differ from what we initially write to disk. As for explaining why
this might happen I can only assume some update to 3.6 made the changes
we saw after we had deployed 3.6.
As a result of checking things more thoroughly we need to update our
config to remove any delta. This removes some extraneous quoting around
gitweb config entries to do that.
Change-Id: I9c7e73d5f64546fb57a21a249b29e2aca9229ac7
This updates our base config to 3.7. This should only be merged as
part of the update process described at
https://etherpad.opendev.org/p/gerrit-upgrade-3.7
Change-Id: I9a1fc4a9f35ed0f60b9899cb9d08aa81995e640b
We changed review01.openstack.org to review02.openstack.org in the host
var file matchers for this job thinking that was the issue previously.
Unfortunately the actual file is review02.opendev.org. Update the
matcher again to actually trigger the job.
We also make a small edit to the gerrit role's README to ensure we
trigger the job when this change lands.
Change-Id: I1f235d0ddbb2d7f400ea2e99ffabdf5db35671a1
The replication, manage-plugins, and delete-project plugins all seem to
want to write content out to /var/gerrit/data within the Gerrit
container. At /home/gerrit2/review_site/data we've got an old carried
over dir from previous installations but this does not appear to be bind
mounted.
Best I can tell the replication plugin may use this disk location to
keep track of tasks that are queued,running,etc and this may work around
the issues with autoreloading gerrit replication configs. However, we
don't get those benefits when we delete the container (as with
docker-compose down/up-d) as the content is ephemeral within the
container. Address this by bind mounting the location along with the
other bind mounts.
Note I have excluded this from backups as I think we don't need backups
of things like replication queues. That said depending on what the other
plugins use this for we may need to refine our backup rules in the
future.
Change-Id: If3a91aeb1bd86c8514179b8ecfde17e98c29af6a
Enable Gerrit replication autoreload to simplify the process of adding
new Gitea backend servers and removing old ones. Without this we would
need to enable remote Gerrit plugin administration (which is global for
all plugins including plugin installations) or restart Gerrit everytime
we want ot change the repliction config file.
Note we did have this setting set at one time and it was removed in
e7c6b7602609d14bc49eaca958bcdef788e861cf. This was due to replication
events being dropped and gitea's not being kept in sync when the plugin
updated its config. I think we can toggle this setting to true while we
add do gitea server work and plan for the occasional manual full sync to
ensure nothing gets missed. Then go back to having this set to false
long term when we are done.
Change-Id: I8cf37f6b84516e36deb143a36697874c640c0635
This updates the Gerrit role readme to be a bit more explicit that the
role is deploying both Gerrit and MariaDB.
Change-Id: Ibd39781f0560179d40c3d3d723eec2286dec8583
In order to limit impact to Gerrit's embedded sshd from runaway
automated systems, we employ a concurrent connection limit. Having
the ability to diagnose that limit when users may be encountering it
is necessary. To that end, add a logging rule matching the
connection limit rule, and install an additional administrative tool
capable of interfacing with the kernel's connection tracking
feature.
Change-Id: If5e61bb34cbe2f9fe0c2db9b923842428771c5f0
This is done for a number of reasons. First it will allow us to update
the python version used in the images as we can have a 3.10 builder and
base images (but not a 3.10 openjdk:11 image). Second it will allow us
to easily switch to openjdk 17 by simply updating the package we install
and some paths for the jdk location.
The goal here is to have more control over the images so that we can do
things like change python and java versions when we want to.
Depends-On: https://review.opendev.org/c/opendev/jeepyb/+/870873
Change-Id: I7ea2658caf71336d582c01be17a91759e9ac2043
This updates the gerrit upgrade testing job to upgrade from 3.6 to 3.7.
This upgrade requires an offline reindex which is new for us since we've
been on Gerrit 3.x. In order to support this offline reindex requirement
the gerrit role is modified to trigger an offline reindex in the role's
start tasks if the flag to do so is set. I expect this will really only
be used in testing, but it allows us to reuse most everything else in
testing and in production which is nice.
Change-Id: Ibe68176970394cbe71c3126ff3fe7a1b0601b09a
This should only be landed as part of our upgrade process. This change
will not upgrade Gerrit properly on its own.
Note, we keep Gerrit 3.5 image builds and 3.5 -> 3.6 upgrade jobs in
place until we are certain we won't roll back. Once we've crossed that
threshold we can drop 3.5 image builds, add 3.7 image builds, and update
the upgrade testing to perform a 3.6 -> 3.7 upgrade.
Change-Id: I40c4f96cc40edc5caeb32a1af80069ef784967fd
We've seen CI systems consume all of our threads which causes the web UI
to become non responsive. To address this increase the number of httpd
threads from 100 to 150. Note that we do not modify sshd.threads beacuse
sshd.threads determines the max number of git requests across both ssh
and http.
In theory what this means is that httpd has an additional 50 threads to
process non git requests (for example web UI requests) which will
hopefully keep that responsive even if git requests are max'd out.
It is possible that we also need to increase the sshd.threads value to
handle those git requests, but we will start by modifying one config
value at a time. If we do bump sshd.threads we should increase
httpd.maxThreads to give it that additional headroom.
Finally, I believe this is likely to be safe as we doubled the size of
our Gerrit server when we moved it to vexxhost. The old server was
pretty well maxed out though so increase these values on the new server
slowly and monitor the results.
Details on the configuration can be found at:
https://gerrit-review.googlesource.com/Documentation/config-gerrit.html#httpd
Change-Id: I57a1e248c3c01597bb29c7afc304688e834a64cc
This is a new config option for Gerrit 3.5. While it defaults to true we
set it explicitly to true to avoid any changes in behavior should that
default change eventually with newer Gerrit. They note this is expensive
to calculate, but our users rely on it and it hasn't caused us problems
yet. We can always explicitly disable it in the future if that becomes
necessary.
Change-Id: Idc002810de2d848af043978894ef9dc194ac5b6a
This updates the gerrit configuration to deploy 3.5 in production.
For details of the upgrade process see:
https://etherpad.opendev.org/p/gerrit-upgrade-3.5
Change-Id: I50c9c444ef9f798c97e5ba3dd426cc4d1f9446c1
As part of the Gerrit 3.5 upgrade we are also upgrading the reviewdb
to the latest mariadb LTS. This should be merged after the update
process; see
https://etherpad.opendev.org/p/gerrit-upgrade-3.5
Change-Id: Ie30c84eeb003ee86a7a66e0c1c5fd7f95ddf3f5f
According to the docs [0] this shouldn't be necessary as performance
logging only happens if a performance tracing plugin is installed.
However according to this repo discuss thread [1] there is always a
dummy performanceLogging instance installed. This same thread identifies
this as a likely source for large increase in memory utilization by
Gerrit when upgrading to 3.5.
Let's explicitly disable this tracing due to the memory overhead in prep
for our 3.5 upgrade. We can always flip the setting if we install a
performance tracing plugin in our Gerrit.
[0] https://gerrit-review.googlesource.com/Documentation/config-gerrit.html#tracing
[1] https://groups.google.com/g/repo-discuss/c/QUD7_LsEVks/m/kBDEeam4AgAJ
Change-Id: Iff438695aa6488fb5886120121946494b1edf003
Because we proxy to Gerrit and set listenUrl with a proxy-http:// prefix
httpd.requestLog is disabled by default. We choose to explicitly enable
it here to add more logging to the Gerrit system even if this logging is
slightly less useful when behind a proxy. In particular this logging
will track memory utilization per request which we can use to benchmark
change query memory cost between 3.4 and 3.5.
Change-Id: Ia3ccf820ee0e5ca7d68bcc37da7004dea2ad7128
These were added when we faced significant memory pressure on the old
server. That is no longer a problem and there is an issue with the
specification that breaks file compression due to destination files
already existing. It seems like the log specification is only able to
rotate once then it cannot keep moving files aside because they already
exist as eg jvm_gc.log.0.gz. This results in annoying errors in the
Gerrit error_log.
Note that it doesn't appear sufficient to remove this log specification
we also need to move the existing jvm_gc.log* files aside or delete
them. This was tested on a held zuul node and I stopped gerrit, updated
the docker-compose file, moved the files aside, then started gerrit and
that got rid of the startup errors in error_log. Merely updating
docker-compose resulted in the same errors on startup.
Change-Id: Ied1464c57b2e8331b9bdf7cbc9ad74f92dea2dfd