elastic-recheck

Author	SHA1	Message	Date
Jenkins	0d63741e11	Merge "ensure all the required files are there"	2014-02-08 01:09:44 +00:00
Sean Dague	2224bfbe98	ensure all the required files are there add in neutron, glance, and n-net logs as required files when appropriate. This will help ensure that we don't miss a pattern because we searched before the log was in the system. Change-Id: Ia8f2cdedfc9964f1d9589fda253174e972fcc770	2014-02-07 09:13:41 -05:00
Joe Gordon	61190329f7	Map failed jobs to bugs in gerrit comment Instead of just listing which bugs were seen in an entire gerrit event (multiple jenkins/zuul jobs), list which bugs were seen in which job. If one of the jobs has an unrecognized error don't display the comment about running recheck, just list which bugs were seen on which jobs (and which has an unrecognized error) Change-Id: I55b2eb8f0efe43ab22540294150d4bc9f5885510	2014-02-06 15:58:27 -08:00
Joe Gordon	2308b4a947	Convert failed_job into an object We are starting to track a decent amount of data per zuul/jenkins job, so track data in an object instead of assorted variables and dictionaries. For example bugs are now tracked by job and not gerrit event. Now, we can support reporting which bug caused which specific job to fail. This also does some assorted object related cleanups. This consists of internal changes only, a future patch will make the gerrit and irc comments take advantage of this. Change-Id: I2116cd0e10b45617a8d572b27f1672f695fa91d0	2014-02-06 15:56:27 -08:00
Michael Still	9bfff2bbd6	Don't encourage reverify Reverify makes the gate cry. Let's not encourage it. Change-Id: I601a31a8899c6752cb8ba69a8fe87a2d8d0742a6	2014-02-07 09:47:31 +11:00
Joe Gordon	3dd6f1a74d	Remove unused main in elasticRecheck.py main in elasticRecheck was originally used for testing before the bot was ready, but now that we have the bot working and it supports noirc and no gerrit comment modes (tox -erun) there is no need to include a main() here. Change-Id: I6e1d790b78d2f2eafacd8efcaf132cf4479fe8ca	2014-02-05 11:20:43 -08:00
Joe Gordon	5c6dc37552	Log gerrit comment Always log the gerrit comment, and when running in nocomment just don't send it to gerrit. This helps make testing changes to the gerrit comment easier. Change-Id: Ie26b86ed374d284154389b4bd5a86b9d2f365800	2014-02-05 11:20:38 -08:00
Jenkins	87ace80de0	Merge "Add option to elastic-recheck-graph to pick a single queue"	2014-01-31 00:20:21 +00:00
Joe Gordon	c9602ca826	Add option to elastic-recheck-graph to pick a single queue In preperation for providing a web page that will just show hits on the gate queue, add a '-q queue' option to elastic-recheck-graph. Change-Id: I9217a2ceedf86ffe04851084df78238384fccd51	2014-01-30 13:40:52 -08:00
Joe Gordon	73053f997e	Make IRC messages provide more context Now that we are running this on all jobs (not just tempest) we are getting significantly more IRC messages. Add failed job name to logs to provide more context of what job is failing. For unclassified failures also include the queue (as a unclassified unit test failure in check queue is much less important then one in the gate). Change-Id: I485bf06721fa5afd102b99b26e38f12449deec7b	2014-01-27 15:29:15 -08:00
Joe Gordon	fc468e4d4c	Fix gerrit leave comment capability When adding support for short build_uuid's in AI6356a971ca250ddf5f01a9734f13d0b080a62c89 event.bugs was converted to a set since we would know can run classify multiple times on a singe event and don't want duplicate bugs. That patch didn't update the gerrit leave comment capabilties to understand event.bugs as a set (instead of a list). Change-Id: I9032e23e0e53426a57bebf42f4c4d4167624280e	2014-01-25 18:32:09 -08:00
Sean Dague	0c8a97b19b	stop being rediculous with our time formats Change-Id: Ic70fe49a60d642f230e26dcca8ab5e390b2a1f9a	2014-01-25 08:40:21 -05:00
Joe Gordon	8c57a4f639	Use short build_uuids in elasticSearch queries In addition to searching by change and patch search by the short build_uuid. This prevents us accidentally classifying multiple builds when we classify a failure on gerrit. This can happen in the gate queue if there is a gate reset, or if there are multiple 'recheck bug x' on a single patch revision in the check queue. Change-Id: I6356a971ca250ddf5f01a9734f13d0b080a62c89	2014-01-23 12:45:46 -08:00
Sean Dague	8ef26cbed2	objectify the gerrit event for our purposes instead of passing around complex data structures, create an event object for our purposes that means we can pass around the payload relevant to us. Simplifies some things, and will make adding build_uuid tighter. Change-Id: I8172b25ae3c60e38d63cf7f4d8a0f6c854bae766	2014-01-20 16:25:07 -05:00
Sean Dague	359525be40	expose on channel when we timeout on logs we have been timing out on logs a lot, and not noticing. Redo this logic to be exception based so we can tell the IRC channel when we timeout on logs, to get to the bottom of reliability issues with indexing logstash data. Change-Id: Ia63d801235c6959eb7b97c334291a6d2f06411b6	2014-01-17 06:27:12 -05:00
Sean Dague	e75b996e60	move to static LOG there is no reason this was an instance level variable, it should instead be static. Change-Id: I47276856b8a0504ce6bf5483b251e48145329f8b	2014-01-17 06:27:12 -05:00
Sean Dague	9567161544	create more sane logging for the er bot this makes the er bot work at a more sane set of default logs, plus also tells us how often we end up timing out. it also makes the logs actually include timestamps. Change-Id: I29877c4158a84bd46b0a437a12c14450a049b49d	2014-01-17 06:27:12 -05:00
Sean Dague	b3249f3dd0	only run on openstack gate projects we only want to run on things we consider the "integrated" gate, however, that's kind of a nebulus definition. Today a reasonable heuristic is if we are running the tempest full job, so use that. This check could be enhanced in the future. Change-Id: Iad36d330f8f6db3bbaa0c54a0c8e70b0e01a17b6	2014-01-17 06:27:12 -05:00
Sean Dague	7f42043155	moving readiness checks into stream this changes the interface to move the readiness check out of the classifier and into the stream object. This massively simplifies the logic connecting these pieces, as classifier is now just a thin wrapper to elastic search. This also adds unit testing for the stream processing through the creation of a fake_gerrit mock class. That lets us run gerrit event interactions in a sane way. It also drops all the unit testing for the classifier which is now largely useless, because all it tests is we can execute a for loop. Change-Id: I1971c121276412e31f01eb5680b9c41fc7e442d3	2014-01-13 20:00:23 -05:00
Sean Dague	7624203006	parse the failed jobs in stream one of the big issues today with er is the amount the there is coupling between the bot and the classifier about knowing when jobs are ready. The impact of this is that we are often incorrectly determining when jobs are ready, because we have this small set of files we test for, that aren't right for various jobs. This is the beginning of decoupling that. By parsing the job names that have failed in the jenkins failure message we can move all the readiness checking into the Stream. This commit adds the parsing and the unit tests, though it doesn't actually change behavior to use it yet (next patch). Change-Id: I54ffa3495a36c2d61b1824794a672c8f5552df54	2014-01-13 14:41:55 -05:00
Davanum Srinivas	d14cde9dec	Ability to Soft Delete Stale Bugs from elastic-recheck Add a resolved_at attribute in the query yaml files that can be used to mark when a bug has been fixed or does not occur any more. This can help us re-enable bugs quickly when we see them again. Change-Id: I7af7ce9417eec5ff9ecc2487a920ff9d1286a714	2013-12-10 16:52:16 -05:00
Jenkins	dc569ca482	Merge "Update job names"	2013-12-03 10:04:58 +00:00
James E. Blair	8ffc5facd5	Update job names Job names are about to change in infra/config. Be a little more robust (but still, this is fragile). Change-Id: I882de80dbb02aad68ef7b41095f36db2c7ebec49	2013-12-02 15:57:34 -08:00
Sean Dague	a0be1593f5	Fix E122,E126,E128 items in codebase In the land of random cleanups, let more of the whitespace rules back in. Also explicitly exclude E125 because of the overreach, and leave E123 excluded because it creates some kind of odd artifacts in the current code (possibly clean it up later). Tox.ini adjusted with comments about the fact that what we are ignoring is there for a reason. Change-Id: I5636cb646d7898df71b715aa0e32a68ce279ee80	2013-12-02 11:43:51 -05:00
Sean Dague	3a8721bb51	method extraction for readability extract out methods for readability, so the code has logical flow and the details about each conditional can be encapsulated in its own method. Change-Id: I5b62842346e0e3774d8e0586ff6b2c6969602a07	2013-12-02 11:43:51 -05:00
Sean Dague	a176ec483b	time to grow up and fit in 80 columns elastic_recheck started off life ignoring the 80 column boundary. We should stop that, as it's bad form. Also, I do multi column emacs and it blows my column widths. So fix all the E501 issues and start enforcing the rules in tox Change-Id: Ib0a1d48d085d9b21fbc1bab75e93e9cc40d36988	2013-12-02 11:43:51 -05:00
Sean Dague	932986a876	move queries.yaml into a queries subdir this handles the piece of work we've been talking about for a while in moving the queries.yaml file into a directory with a bunch of files. These remain yaml so that they can be tagged with additional metadata. This would support the concept of soft deleting as well as other useful meta data to gauge our evolution of the bugs we track over time. This should see some real review as it's extensive enough of a change that the existing tests might not be sufficient. However it should be enough to move this forward quite a bit. This also makes future looking statements about doing soft deletes with a resolved_at keyword in the future. That implementation will come later. Change-Id: I86317fcf6f1886ab5b6c0ee154b29e71865c52b7	2013-12-02 11:43:00 -05:00
Michael Still	b4f70ab8d3	Tell people to do a recheck I was confused by the code review message, as I thought a recheck was automatically kicked off. Make it clearer that I need to do this manually. Change-Id: I21497c6ae54c44b746375e6473b8501c99776451	2013-11-25 15:52:33 +11:00
Sean Dague	42e3402806	refactor templates into query_builder as part of trying to simplify the core elasticRecheck, refactor the query creation into a separate set of query_builder routines. This takes away some of the duplication between the queries, and attempts to add documentation to the uses for each of them. add elasticRecheck fake pyelasticsearch testing build basic fixtures for unit testing that let us fake out the interaction to pyelasticsearch. This uses the json samples added for previous testing as the return results should an inbound query match one of the queries we know about. If the query is unknown to us, return an empty result set. Unit testing for both cases included going all the way from the top level Classifier class. Change-Id: I0d23b649274b31e8f281aaac588c4c6113a11a47	2013-10-21 13:46:57 -04:00
Sean Dague	4915ebb1a7	add SearchResultSet and Hit objects in an attempt for long term simplification of the source tree, this is the beginning of a ResultSet and Hit object type. The ResultSet is contructed from the ElasticSearch returned json structure, and it builds hits internally. ResultSet is an iterator, and indexable, so that you can easily loop through them. Both ResultSet and Hit objects have dynamic attributes to make accessing the deep data structures easier (and without having to make everything explicit), and also handling the multiline collapse correctly. A basic set of tests is included, as well as sample json dumps for all the current bugs in the system for additional unit testing. Fortunately this includes bugs which have hits, and those that don't. In order to use ResultSet we need to pass everything through our own SearchEngine object, so we get results back as expected. We also need to teach ResultSet about facets, as those get used when attempting to find specific files. Lastly, we need __len__ implementation for ResultSet to support the wait loop correctly. ResultSet lets us simplify a bit of the code in elasticRecheck, port it over. There is a short term fix in the test_classifier test to get us working here until real stub data can be applied. Change-Id: I7b0d47a8802dcf6e6c052f137b5f9494b1b99501	2013-10-21 13:45:55 -04:00
Clark Boylan	222949b717	Make e-r compatible with old/new logstash schemas. * elastic_recheck/elasticRecheck.py: Update templated queries to use non '@' prefixed fields and flatten the old '@fields' field. This is possible because query for foo_field will find foo_field and @fields.foo_field. Also, handle the case where @fields may not be present in the query results. * queries.yaml: Update queries using the same rules as in elasticRecheck.py Change-Id: I48672912d05c7ad557e948cfef0402c7c89582f6	2013-10-17 14:53:07 -07:00
Clark Boylan	492142a5dd	Add missing comma to REQUIRED_FILES list. * elastic_recheck/elasticRecheck.py: There was a comma missing in the REQUIRED_FILES list that cased the cinder volume log file and syslog log file names to be appended together. Add the comma to fix the list. Change-Id: I6aaf745f996e725c529ccd9f8b7444d8b9a5648f	2013-10-16 18:38:53 -07:00
Joe Gordon	3326bc7c90	Add query for bug 1240256 First syslog based query, using it get to the swift proxy-server logs. Add log/syslog.txt to required files list as well. Change-Id: I6f3090efe4945efcd67b53b89c1b64bc1db3afa7	2013-10-15 17:49:36 -07:00
Sean Dague	2f3f3ecd39	use join to list multiple bugs previously when we had multiple bugs we did looped string appends, but that meant we had a trailing "and", which was ugly. We can do better by transforming bugs to bug_urls, then using join. Change-Id: Iaf28dbe9909c60b1e2206a79faaf5190f792252d	2013-10-09 09:40:02 -04:00
Clark Boylan	e062345560	Leave comment when single bug is found. * elastic_recheck/elasticRecheck.py: When a single bug is found be sure to pass that single bug to the string formatter rather than an undefined variable. This fixes a bug that caused elastic-recheck's Stream to die previously. Change-Id: Ie62abde1b571fa2b42b95519fc5c23e0199f732d	2013-10-03 17:39:47 -07:00
Jenkins	cfefe90f71	Merge "Add even more debug logs."	2013-10-03 22:27:27 +00:00
Joe Gordon	77cb913628	Add even more debug logs. There is an issue somewhere in Stream. Change-Id: I6c79f3408138f13576c3ba219b39ef4f52037d84	2013-10-03 15:05:05 -07:00
Jenkins	87ec23d976	Merge "Make test_required_files.py real unit tests"	2013-10-03 20:42:11 +00:00
Jenkins	ab7871a63b	Merge "Cleanup tests"	2013-10-03 20:42:11 +00:00
Joe Gordon	955bbbd095	Add more debug logs to classify Something is hanging in the bot, add more debug logs, to help sort out what is happening. Change-Id: Iabc05ec98567557d9a48988499ed6ab30246bd4b	2013-10-02 14:08:26 -07:00
Joe Gordon	d209b83ee3	Cleanup tests Move test code into tests. Remove last_failures test, as its replaced by other tests now. Remove dead code. Change-Id: I3514f62e003b1140fbe597cc91aea3089c268ac7	2013-10-02 13:57:18 -07:00
Jenkins	39b7ea7908	Merge "add check_success tool"	2013-10-02 20:24:38 +00:00
Sean Dague	362a7e47a4	add check_success tool this adds a tool that runs through the query list and looks for whether the queries exist in success runs in logstash. This helps us classify whether or not queries need to be looked at for narrowing. make elastic-recheck-success the entry point when installed Change-Id: I3eaa822af35146935b22100ffb1e3a4f18dc8d0e	2013-10-02 16:00:20 -04:00
Jenkins	c7e0f0b894	Merge "Stop using While True to wait for ElasticSearch"	2013-10-02 18:47:49 +00:00
Joe Gordon	f1e542c848	Stop using While True to wait for ElasticSearch Now that ElasticSearch isn't backed way up, using a while True is dangerous, because if something breaks for an individual tempest failure the entire system will hang. Even if something breaks in ElasticSearch we want elastic-recheck to recover without needing to be restarted. Update test_classifier, unfortunitly it uses logstash.o.o which removes results every two weeks, so the test needs updating to work. Change-Id: I119bb3d1ef814aabd393e65af97f851a54895985	2013-10-02 09:36:53 -07:00
James E. Blair	c35742c4be	Add hits_by_query method Change-Id: If50ce091f9dac5813d4c3de0212dbda5f77784b8 Co-Authored-By: Sean Dague <sdague@linux.vnet.ibm.com>	2013-10-02 09:13:34 -07:00
Matthew Treinish	633396bd0d	Make test_required_files.py real unit tests Change-Id: I0af14c6d9147d388a579d9586eefccd73fd7aab8	2013-10-02 10:29:21 -04:00
Matthew Treinish	d10e2f1596	Add support for multiple bug matches in a failure This commit adds support for a test failure having more than one bug match. Since there are normally more than one tempest run for each commit there is the potential for multiple failures. Change-Id: Ibd0a5e3c7ec64732b41186400da2af6cd4658fdd	2013-10-01 15:09:55 -04:00
James E. Blair	338c2135b4	Make pid file configurable And some other fixups around starting the daemon: * read config file before forking * add '-d' option to avoid forking * default pidfile to /var/run/elastic-recheck/elastic-recheck.pid * add pidfile option to config file * switch to python-daemon library (which is the version of the lib that the code was expecting anyway) * use expanduser in the query file path (to match the rest of the paths) Change-Id: I674778ef189cd216a80f74bd449cdc3b12b57a7d	2013-09-30 10:29:32 -07:00
James E. Blair	f173f8b8b3	Switch queries file to YAML It is easier for a human to read, and by virtue of not requiring escaped quotes, easier to copy/paste into a logstash field. When copy/pasting, the newlines won't show up in the input field. The '>' syntax in YAML indicates folding, which causes the newline and indentation to be turned into a single space. Change-Id: Ibd172fd4859c055096609f31ef09222147c34cf3	2013-09-30 08:48:41 -07:00

1 2

52 Commits