52 Commits

Author SHA1 Message Date
Jenkins
0d63741e11 Merge "ensure all the required files are there" 2014-02-08 01:09:44 +00:00
Sean Dague
2224bfbe98 ensure all the required files are there
add in neutron, glance, and n-net logs as required files when
appropriate. This will help ensure that we don't miss a pattern
because we searched before the log was in the system.

Change-Id: Ia8f2cdedfc9964f1d9589fda253174e972fcc770
2014-02-07 09:13:41 -05:00
Joe Gordon
61190329f7 Map failed jobs to bugs in gerrit comment
Instead of just listing which bugs were seen in an entire gerrit event
(multiple jenkins/zuul jobs), list which bugs were seen in which job.
If one of the jobs has an unrecognized error don't display the comment
about running recheck, just list which bugs were seen on which jobs (and
which has an unrecognized error)

Change-Id: I55b2eb8f0efe43ab22540294150d4bc9f5885510
2014-02-06 15:58:27 -08:00
Joe Gordon
2308b4a947 Convert failed_job into an object
We are starting to track a decent amount of data per zuul/jenkins job,
so track data in an object instead of assorted variables and
dictionaries. For example bugs are now tracked by job and not
gerrit event. Now, we can support reporting which bug caused which
specific job to fail. This also does some assorted object related
cleanups. This consists of internal changes only, a future patch will
make the gerrit and irc comments take advantage of this.

Change-Id: I2116cd0e10b45617a8d572b27f1672f695fa91d0
2014-02-06 15:56:27 -08:00
Michael Still
9bfff2bbd6 Don't encourage reverify
Reverify makes the gate cry. Let's not encourage it.

Change-Id: I601a31a8899c6752cb8ba69a8fe87a2d8d0742a6
2014-02-07 09:47:31 +11:00
Joe Gordon
3dd6f1a74d Remove unused main in elasticRecheck.py
main in elasticRecheck was originally used for testing before the bot
was ready, but now that we have the bot working and it supports noirc
and no gerrit comment modes (tox -erun) there is no need to include a
main() here.

Change-Id: I6e1d790b78d2f2eafacd8efcaf132cf4479fe8ca
2014-02-05 11:20:43 -08:00
Joe Gordon
5c6dc37552 Log gerrit comment
Always log the gerrit comment, and when running in nocomment just don't
send it to gerrit. This helps make testing changes to the gerrit comment
easier.

Change-Id: Ie26b86ed374d284154389b4bd5a86b9d2f365800
2014-02-05 11:20:38 -08:00
Jenkins
87ace80de0 Merge "Add option to elastic-recheck-graph to pick a single queue" 2014-01-31 00:20:21 +00:00
Joe Gordon
c9602ca826 Add option to elastic-recheck-graph to pick a single queue
In preperation for providing a web page that will just show hits on the
gate queue, add a '-q queue' option to elastic-recheck-graph.

Change-Id: I9217a2ceedf86ffe04851084df78238384fccd51
2014-01-30 13:40:52 -08:00
Joe Gordon
73053f997e Make IRC messages provide more context
Now that we are running this on all jobs (not just tempest) we are
getting significantly more IRC messages. Add failed job name to logs to
provide more context of what job is failing. For unclassified failures
also include the queue (as a unclassified unit test failure in check queue is
much less important then one in the gate).

Change-Id: I485bf06721fa5afd102b99b26e38f12449deec7b
2014-01-27 15:29:15 -08:00
Joe Gordon
fc468e4d4c Fix gerrit leave comment capability
When adding support for short build_uuid's in
AI6356a971ca250ddf5f01a9734f13d0b080a62c89 event.bugs was converted to a
set since we would know can run classify multiple times on a singe
event and don't want duplicate bugs. That patch didn't update the gerrit
leave comment capabilties to understand event.bugs as a set (instead of
a list).

Change-Id: I9032e23e0e53426a57bebf42f4c4d4167624280e
2014-01-25 18:32:09 -08:00
Sean Dague
0c8a97b19b stop being rediculous with our time formats
Change-Id: Ic70fe49a60d642f230e26dcca8ab5e390b2a1f9a
2014-01-25 08:40:21 -05:00
Joe Gordon
8c57a4f639 Use short build_uuids in elasticSearch queries
In addition to searching by change and patch search by the short build_uuid.
This prevents us accidentally classifying multiple builds when we classify
a failure on gerrit. This can happen in the gate queue if there is a
gate reset, or if there are multiple 'recheck bug x' on a single patch
revision in the check queue.

Change-Id: I6356a971ca250ddf5f01a9734f13d0b080a62c89
2014-01-23 12:45:46 -08:00
Sean Dague
8ef26cbed2 objectify the gerrit event for our purposes
instead of passing around complex data structures, create an
event object for our purposes that means we can pass around the
payload relevant to us. Simplifies some things, and will make
adding build_uuid tighter.

Change-Id: I8172b25ae3c60e38d63cf7f4d8a0f6c854bae766
2014-01-20 16:25:07 -05:00
Sean Dague
359525be40 expose on channel when we timeout on logs
we have been timing out on logs a lot, and not noticing. Redo this
logic to be exception based so we can tell the IRC channel when we
timeout on logs, to get to the bottom of reliability issues with
indexing logstash data.

Change-Id: Ia63d801235c6959eb7b97c334291a6d2f06411b6
2014-01-17 06:27:12 -05:00
Sean Dague
e75b996e60 move to static LOG
there is no reason this was an instance level variable, it should
instead be static.

Change-Id: I47276856b8a0504ce6bf5483b251e48145329f8b
2014-01-17 06:27:12 -05:00
Sean Dague
9567161544 create more sane logging for the er bot
this makes the er bot work at a more sane set of default logs,
plus also tells us how often we end up timing out.

it also makes the logs actually include timestamps.

Change-Id: I29877c4158a84bd46b0a437a12c14450a049b49d
2014-01-17 06:27:12 -05:00
Sean Dague
b3249f3dd0 only run on openstack gate projects
we only want to run on things we consider the "integrated" gate,
however, that's kind of a nebulus definition. Today a reasonable
heuristic is if we are running the tempest full job, so use that.

This check could be enhanced in the future.

Change-Id: Iad36d330f8f6db3bbaa0c54a0c8e70b0e01a17b6
2014-01-17 06:27:12 -05:00
Sean Dague
7f42043155 moving readiness checks into stream
this changes the interface to move the readiness check out of
the classifier and into the stream object. This massively
simplifies the logic connecting these pieces, as classifier is
now just a thin wrapper to elastic search.

This also adds unit testing for the stream processing through the
creation of a fake_gerrit mock class. That lets us run gerrit
event interactions in a sane way.

It also drops all the unit testing for the classifier which is now
largely useless, because all it tests is we can execute a for loop.

Change-Id: I1971c121276412e31f01eb5680b9c41fc7e442d3
2014-01-13 20:00:23 -05:00
Sean Dague
7624203006 parse the failed jobs in stream
one of the big issues today with er is the amount the there is
coupling between the bot and the classifier about knowing when
jobs are ready. The impact of this is that we are often
incorrectly determining when jobs are ready, because we have this
small set of files we test for, that aren't right for various
jobs.

This is the beginning of decoupling that. By parsing the job names
that have failed in the jenkins failure message we can move all
the readiness checking into the Stream.

This commit adds the parsing and the unit tests, though it doesn't
actually change behavior to use it yet (next patch).

Change-Id: I54ffa3495a36c2d61b1824794a672c8f5552df54
2014-01-13 14:41:55 -05:00
Davanum Srinivas
d14cde9dec Ability to Soft Delete Stale Bugs from elastic-recheck
Add a resolved_at attribute in the query yaml files
that can be used to mark when  a bug has been
fixed or does not occur any more. This can help us
re-enable bugs quickly when we see them again.

Change-Id: I7af7ce9417eec5ff9ecc2487a920ff9d1286a714
2013-12-10 16:52:16 -05:00
Jenkins
dc569ca482 Merge "Update job names" 2013-12-03 10:04:58 +00:00
James E. Blair
8ffc5facd5 Update job names
Job names are about to change in infra/config.  Be a little more
robust (but still, this is fragile).

Change-Id: I882de80dbb02aad68ef7b41095f36db2c7ebec49
2013-12-02 15:57:34 -08:00
Sean Dague
a0be1593f5 Fix E122,E126,E128 items in codebase
In the land of random cleanups, let more of the whitespace rules
back in. Also explicitly exclude E125 because of the overreach,
and leave E123 excluded because it creates some kind of odd
artifacts in the current code (possibly clean it up later).

Tox.ini adjusted with comments about the fact that what we are
ignoring is there for a reason.

Change-Id: I5636cb646d7898df71b715aa0e32a68ce279ee80
2013-12-02 11:43:51 -05:00
Sean Dague
3a8721bb51 method extraction for readability
extract out methods for readability, so the code has logical flow
and the details about each conditional can be encapsulated in its
own method.

Change-Id: I5b62842346e0e3774d8e0586ff6b2c6969602a07
2013-12-02 11:43:51 -05:00
Sean Dague
a176ec483b time to grow up and fit in 80 columns
elastic_recheck started off life ignoring the 80 column boundary.
We should stop that, as it's bad form. Also, I do multi column
emacs and it blows my column widths.

So fix all the E501 issues and start enforcing the rules in tox

Change-Id: Ib0a1d48d085d9b21fbc1bab75e93e9cc40d36988
2013-12-02 11:43:51 -05:00
Sean Dague
932986a876 move queries.yaml into a queries subdir
this handles the piece of work we've been talking about for a while
in moving the queries.yaml file into a directory with a bunch of
files. These remain yaml so that they can be tagged with additional
metadata. This would support the concept of soft deleting as well
as other useful meta data to gauge our evolution of the bugs we
track over time.

This should see some real review as it's extensive enough of a
change that the existing tests might not be sufficient. However it
should be enough to move this forward quite a bit.

This also makes future looking statements about doing soft deletes
with a resolved_at keyword in the future. That implementation will
come later.

Change-Id: I86317fcf6f1886ab5b6c0ee154b29e71865c52b7
2013-12-02 11:43:00 -05:00
Michael Still
b4f70ab8d3 Tell people to do a recheck
I was confused by the code review message, as I thought a recheck
was automatically kicked off. Make it clearer that I need to do
this manually.

Change-Id: I21497c6ae54c44b746375e6473b8501c99776451
2013-11-25 15:52:33 +11:00
Sean Dague
42e3402806 refactor templates into query_builder
as part of trying to simplify the core elasticRecheck, refactor
the query creation into a separate set of query_builder routines.
This takes away some of the duplication between the queries, and
attempts to add documentation to the uses for each of them.

add elasticRecheck fake pyelasticsearch testing

build basic fixtures for unit testing that let us fake out the
interaction to pyelasticsearch. This uses the json samples added
for previous testing as the return results should an inbound
query match one of the queries we know about.

If the query is unknown to us, return an empty result set. Unit
testing for both cases included going all the way from the top
level Classifier class.

Change-Id: I0d23b649274b31e8f281aaac588c4c6113a11a47
2013-10-21 13:46:57 -04:00
Sean Dague
4915ebb1a7 add SearchResultSet and Hit objects
in an attempt for long term simplification of the source tree, this
is the beginning of a ResultSet and Hit object type. The ResultSet
is contructed from the ElasticSearch returned json structure, and
it builds hits internally.

ResultSet is an iterator, and indexable, so that you can easily loop
through them. Both ResultSet and Hit objects have dynamic attributes
to make accessing the deep data structures easier (and without having
to make everything explicit), and also handling the multiline collapse
correctly.

A basic set of tests is included, as well as sample json dumps for all
the current bugs in the system for additional unit testing. Fortunately
this includes bugs which have hits, and those that don't.

In order to use ResultSet we need to pass everything through
our own SearchEngine object, so we get results back as expected.

We also need to teach ResultSet about facets, as those get used
when attempting to find specific files.

Lastly, we need __len__ implementation for ResultSet to support
the wait loop correctly.

ResultSet lets us simplify a bit of the code in elasticRecheck,
port it over.

There is a short term fix in the test_classifier test to get us
working here until real stub data can be applied.

Change-Id: I7b0d47a8802dcf6e6c052f137b5f9494b1b99501
2013-10-21 13:45:55 -04:00
Clark Boylan
222949b717 Make e-r compatible with old/new logstash schemas.
* elastic_recheck/elasticRecheck.py: Update templated queries to use non
'@' prefixed fields and flatten the old '@fields' field. This is
possible because query for foo_field will find foo_field and
@fields.foo_field. Also, handle the case where @fields may not be
present in the query results.

* queries.yaml: Update queries using the same rules as in
elasticRecheck.py

Change-Id: I48672912d05c7ad557e948cfef0402c7c89582f6
2013-10-17 14:53:07 -07:00
Clark Boylan
492142a5dd Add missing comma to REQUIRED_FILES list.
* elastic_recheck/elasticRecheck.py: There was a comma missing in the
REQUIRED_FILES list that cased the cinder volume log file and syslog log
file names to be appended together. Add the comma to fix the list.

Change-Id: I6aaf745f996e725c529ccd9f8b7444d8b9a5648f
2013-10-16 18:38:53 -07:00
Joe Gordon
3326bc7c90 Add query for bug 1240256
First syslog based query, using it get to the swift proxy-server logs.

Add log/syslog.txt to required files list as well.

Change-Id: I6f3090efe4945efcd67b53b89c1b64bc1db3afa7
2013-10-15 17:49:36 -07:00
Sean Dague
2f3f3ecd39 use join to list multiple bugs
previously when we had multiple bugs we did looped string appends,
but that meant we had a trailing "and", which was ugly. We can
do better by transforming bugs to bug_urls, then using join.

Change-Id: Iaf28dbe9909c60b1e2206a79faaf5190f792252d
2013-10-09 09:40:02 -04:00
Clark Boylan
e062345560 Leave comment when single bug is found.
* elastic_recheck/elasticRecheck.py: When a single bug is found be sure
to pass that single bug to the string formatter rather than an undefined
variable. This fixes a bug that caused elastic-recheck's Stream to die
previously.

Change-Id: Ie62abde1b571fa2b42b95519fc5c23e0199f732d
2013-10-03 17:39:47 -07:00
Jenkins
cfefe90f71 Merge "Add even more debug logs." 2013-10-03 22:27:27 +00:00
Joe Gordon
77cb913628 Add even more debug logs.
There is an issue somewhere in Stream.

Change-Id: I6c79f3408138f13576c3ba219b39ef4f52037d84
2013-10-03 15:05:05 -07:00
Jenkins
87ec23d976 Merge "Make test_required_files.py real unit tests" 2013-10-03 20:42:11 +00:00
Jenkins
ab7871a63b Merge "Cleanup tests" 2013-10-03 20:42:11 +00:00
Joe Gordon
955bbbd095 Add more debug logs to classify
Something is hanging in the bot, add more debug logs, to help sort out
what is happening.

Change-Id: Iabc05ec98567557d9a48988499ed6ab30246bd4b
2013-10-02 14:08:26 -07:00
Joe Gordon
d209b83ee3 Cleanup tests
Move test code into tests.
Remove last_failures test, as its replaced by other tests now.
Remove dead code.

Change-Id: I3514f62e003b1140fbe597cc91aea3089c268ac7
2013-10-02 13:57:18 -07:00
Jenkins
39b7ea7908 Merge "add check_success tool" 2013-10-02 20:24:38 +00:00
Sean Dague
362a7e47a4 add check_success tool
this adds a tool that runs through the query list and looks for
whether the queries exist in success runs in logstash. This helps
us classify whether or not queries need to be looked at for
narrowing.

make elastic-recheck-success the entry point when installed

Change-Id: I3eaa822af35146935b22100ffb1e3a4f18dc8d0e
2013-10-02 16:00:20 -04:00
Jenkins
c7e0f0b894 Merge "Stop using While True to wait for ElasticSearch" 2013-10-02 18:47:49 +00:00
Joe Gordon
f1e542c848 Stop using While True to wait for ElasticSearch
Now that ElasticSearch isn't backed way up, using a while True is
dangerous, because if something breaks for an individual tempest
failure the entire system will hang.

Even if something breaks in ElasticSearch we want elastic-recheck to
recover without needing to be restarted.

Update test_classifier, unfortunitly it uses logstash.o.o which removes
results every two weeks, so the test needs updating to work.

Change-Id: I119bb3d1ef814aabd393e65af97f851a54895985
2013-10-02 09:36:53 -07:00
James E. Blair
c35742c4be Add hits_by_query method
Change-Id: If50ce091f9dac5813d4c3de0212dbda5f77784b8
Co-Authored-By: Sean Dague <sdague@linux.vnet.ibm.com>
2013-10-02 09:13:34 -07:00
Matthew Treinish
633396bd0d Make test_required_files.py real unit tests
Change-Id: I0af14c6d9147d388a579d9586eefccd73fd7aab8
2013-10-02 10:29:21 -04:00
Matthew Treinish
d10e2f1596 Add support for multiple bug matches in a failure
This commit adds support for a test failure having more than one
bug match. Since there are normally more than one tempest run for
each commit there is the potential for multiple failures.

Change-Id: Ibd0a5e3c7ec64732b41186400da2af6cd4658fdd
2013-10-01 15:09:55 -04:00
James E. Blair
338c2135b4 Make pid file configurable
And some other fixups around starting the daemon:
* read config file before forking
* add '-d' option to avoid forking
* default pidfile to /var/run/elastic-recheck/elastic-recheck.pid
* add pidfile option to config file
* switch to python-daemon library (which is the version of the
  lib that the code was expecting anyway)
* use expanduser in the query file path (to match the rest of the
  paths)

Change-Id: I674778ef189cd216a80f74bd449cdc3b12b57a7d
2013-09-30 10:29:32 -07:00
James E. Blair
f173f8b8b3 Switch queries file to YAML
It is easier for a human to read, and by virtue of not requiring
escaped quotes, easier to copy/paste into a logstash field.

When copy/pasting, the newlines won't show up in the input field.
The '>' syntax in YAML indicates folding, which causes the newline
and indentation to be turned into a single space.

Change-Id: Ibd172fd4859c055096609f31ef09222147c34cf3
2013-09-30 08:48:41 -07:00