Subcloud collect bundles have an extra level of directory heirarchy.
This update refactors the report.py bundle search and extraction
handling to support both single and multi host and subcloud collect
bundles.
Typical used is
report.py <bundle pointer option> /path/to/bundle
Bundle pointer options
--bundle Use this option to point to a 'directory' that 'contains'
host tarball files.
--directory Use this option when a collect bundle 'tar file' is in a
in a specific 'directory'.
--file Use this option to point to a specific collect bundle
tar file to analyze.
The following additional changes / improvements were made:
- improved report.py code structure
- improved management of the input and output dirs
- improved debug and error logging (new --state option)
- removed --clean option that can fail due to bundle file permissions
- added --bundle option to support pointing to a directory
containing a set of host tarballs.
- modified collect to use the new --bundle option when --report
option is used.
- implement tool logfile migration from /tmp to bundle output_dir
- create report_analysis dir in final output_dir only
- fix file permissions to allow execution from git
- order plugin analysis output based on size
- added additional error checking and handling
Test Plan:
PASS: Verify collect --report (std system, AIO and subcloud)
PASS: Verify report analysis
PASS: Verify report run on-system, git and cached copy
PASS: Verify on and off system analysis of
PASS: ... single-host collect bundle with --file option
PASS: ... multi-host collect bundle with --file option
PASS: ... single-subcloud collect bundle with --file option
PASS: ... multi-subcloud collect bundle with --file option
PASS: ... single-host collect bundle with --directory option
PASS: ... multi-host collect bundle with --directory option
PASS: ... single-subcloud collect bundle with --directory option
PASS: ... multi-subcloud collect bundle with --directory option
PASS: ... single-host collect bundle with --bundle option
PASS: ... multi-host collect bundle with --bundle option
PASS: Verify --directory option handling when
PASS: ... there are multiple bundles to select from (pass)
PASS: ... there are is a bundle without the date_time (prompt)
PASS: ... there are extra non-bundle files in target dir (ignore)
PASS: ... the target dir only contains host tarballs (fail)
PASS: ... the target dir has no tar files or extracted bundle (fail)
PASS: ... the target dir does not exist (fail)
PASS: Verify --bundle option handling when
PASS: ... there are host tarballs in the target directory (pass)
PASS: ... there are only extracted host dirs in target dir (pass)
PASS: ... there are no host tarballs or dirs in target dir (fail)
PASS: ... the target dir does not have a dated host dir (fail)
PASS: ... the target dir does not exist (fail)
PASS: ... the target is a file rather than a dir (fail)
PASS: Verify --file option handling when
PASS: ... the target tar file is found (pass)
PASS: ... the target tar file is not date_time named (prompt)
PASS: ... the target tar file does not exists (fail)
PASS: ... the target tar is not a collect bundle (fail)
PASS: Verify tar file(s) in a single and multi-subcloud collect
with the --report option each include a report analysis.
PASS: Verify logging with and without --debug and --state options
PASS: Verify error handling when no -b, -f or -d option is specified
Story: 2010533
Task: 48187
Change-Id: I4924034aa27577f94e97928265c752c204a447c7
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
The Report tool is used to gather relevant log, events
and information about the system from a collect bundle
and present that data for quick / easy issue analysis.
Report can run directly from a cloned starlingX utilities git
${MY_REPO}/stx/utilities/tools/collector/debian-scripts/report/report.py {options}
Report is installed and can be run on any 22.12 POSTGA system node.
/usr/local/bin/report/report.py --directory /scratch
Report can also be commanded to automatically run during a collect operation
collect all --report
See Report's --help option for additional optional command line arguments.
report.py --help
Selecting the right command option for your collect bundle:
Report is designed to analyze a host or subcloud 'collect bundle'.
Report needs to be told where to find the collect bundle to analyze
using one of three options
Analyze Host Bundle: --bundle or -b option
-------------------
Use this option to point to a 'directory' that 'contains'
host tarball files.
report.py --bundle /scratch/ALL_NODES_YYYYMMDD_hhmmss
Point to a directory containing host tarballs.
Such directory contains hostname's tarballs ; ending in tgz
/scratch/ALL_NODES_YYYYMMDD_hhmmss
├── controller-0_YYYMMDD_hhmmss.tgz
└── controller-1_YYYMMDD_hhmmss.tgz
This is the option collect uses to auto analyze a just
collected bundle with the collect --report option.
Analyze Directory: --directory or -d option
-----------------
Use this option when a collect bundle 'tar file' is in a
in a specific 'directory'. If there are multiple collect
bundles in that directory then the tool will prompt the
user to select one from a list.
report.py --directory /scratch
0 - exit
1 - ALL_NODES_20230608.235225
2 - ALL_NODES_20230609.004604
Please select bundle to analyze:
Analysis proceeds automatically if there is only a
single collect bundle found.
Analyze Specific Collect Bundle tar file: --file or -f option
----------------------------------------
Use this option to point to a specific collect bundle
tar file to analyze.
report.py --file /scratch/ALL_NODES_YYYYMMDD_hhmmss.tar
Host vs Subcloud Collect Bundles:
Expected Host Bundle Format:
├── SELECT_NODES_YYYYMMDD.hhmmss.tar
├── SELECT_NODES_YYYYMMDD.hhmmss
├── controller-0_YYYYMMDD.hhmmss
├── controller-0_YYYYMMDD.hhmmss.tgz
├── controller-1_YYYYMMDD.hhmmss
├── controller-1_YYYYMMDD.hhmmss.tgz
├── worker-0_YYYYMMDD.hhmmss
└── worker-1_YYYYMMDD.hhmmss.tgz
Expected Subcloud Bundle Format
├── SELECT_SUBCLOUDS_YYYYMMDD.hhmmss.tar
└── SELECT_SUBCLOUDS_YYYYMMDD.hhmmss
├── subcloudX_YYYYMMDD.hhmmss.tar
├── subcloudX_YYYYMMDD.hhmmss
│ ├── controller-0_YYYYMMDD.hhmmss
│ ├── controller-0_YYYYMMDD.hhmmss.tgz
│ ├── report_analysis
│ └── report_tool.tgz
├── subcloudY_YYYYMMDD.hhmmss.tar
├── subcloudY_YYYYMMDD.hhmmss
│ ├── controller-0_YYYYMMDD.hhmmss
│ ├── controller-0_YYYYMMDD.hhmmss.tgz
│ ├── report_analysis
│ └── report_tool.tgz
├── subcloudZ_YYYYMMDD.hhmmss.tar
└── subcloudZ_YYYYMMDD.hhmmss
├── controller-0_YYYYMMDD.hhmmss
└── controller-0_YYYYMMDD.hhmmss.tgz
If there are multiple bundles found at the specified --directory
then the list is displayed and the user is prompted to select a
bundle from the list.
This would be typical when analyzing a selected subcloud collect
bundle like in the example below
$ report -d /localdisk/issues/SELECT_SUBCLOUDS_YYYYMMDD.hhmmss.tar
Report will extract the subcloud tar file and if it sees more
than one tar file it will prompt the user to select which one
to analyze
0 - exit
1 - subcloudX_YYYYMMDD.hhmmss
2 - subcloudY_YYYYMMDD.hhmmss
3 - subcloudZ_YYYYMMDD.hhmmss
Please select the bundle to analyze:
Refer to report.py file header for a description of the tool
Report places the report analysis in the bundle itself.
Consider the following collect bundle structure and notice
the 'report_analysis' folder which contians the Report analysis.
SELECT_NODES_20220527.193605
├── controller-0_20220527.193605
│ ├── etc
│ ├── root
│ └── var
├── controller-1_20220527.193605
│ ├── etc
│ ├── root
│ └── var
└── report_analysis (where the output files will be placed)
Pass a collect bundle to Report's CLI for two phases of processing ...
Phase 1: Process algorithm specific plugins to collect plugin
specific 'report logs'. Basically fault, event,
alarm and state change logs.
Phase 2: Run the correlator against the plugin found 'report logs'
to produce descriptive strings that represent failures
that were found in the collect bundle and to summarize
the events, alarms and state change data.
Report then produces a report analysisthat gets stored with
the original bundle.
Example Analysis:
$ report -d /localdisk/CGTS-44887
extracting /localdisk/CGTS-44887/ALL_NODES_20230307.183540.tar
Report: /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis
extracting : /localdisk/CGTS-44887/ALL_NODES_20230307.183540/controller-1_20230307.183540.tgz
extracting : /localdisk/CGTS-44887/ALL_NODES_20230307.183540/compute-0_20230307.183540.tgz
extracting : /localdisk/CGTS-44887/ALL_NODES_20230307.183540/controller-0_20230307.183540.tgz
extracting : /localdisk/CGTS-44887/ALL_NODES_20230307.183540/compute-1_20230307.183540.tgz
Active Ctrl: controller-1
System Type: All-in-one
S/W Version: 22.12
System Mode: duplex
DC Role : systemcontroller
Node Type : controller
subfunction: controller,worker
Mgmt Iface : vlan809
Clstr Iface: vlan909
OAM Iface : eno8403
OS Release : Debian GNU/Linux 11 (bullseye)
Build Type : Formal
Build Date : 2023-03-01 23:00:06 +0000
controllers: controller-1,controller-0
workers : compute-1,compute-0
Plugin Results:
621 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/log
221 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/swact_activity
132 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/alarm
85 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/substring_controller-0
60 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/system_info
54 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/maintenance_errors
36 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/heartbeat_loss
26 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/process_failures
16 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/state_changes
13 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/substring_controller-1
2 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/puppet_errors
... nothing found by plugins: daemon_failures
Correlated Results:
Events : 8 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/events
Alarms : 26 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/alarms
State Changes: 16 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/state_changes
Failures : 4 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/failures
2023-03-07T05:00:11 controller-0 uncontrolled swact
2023-03-07T05:01:52 controller-0 heartbeat loss failure
2023-03-07T17:42:35 controller-0 configuration failure
2023-03-07T17:58:06 controller-0 goenabled failure
Inspect the Correlated and Plugin results files for failures,
alarms, events and state changes.