Documentation configuring grafana datasource
Documentation for administrators on how to configure the Grafana datasource. Change-Id: I5d1d3129b5d225f0f2fc86d149c046f9aab94d47
This commit is contained in:
parent
46a36d1ad7
commit
fa1642e323
@ -10,3 +10,4 @@ Administrator Guide
|
|||||||
policy
|
policy
|
||||||
ways-to-install
|
ways-to-install
|
||||||
../strategies/index
|
../strategies/index
|
||||||
|
../datasources/index
|
||||||
|
426
doc/source/datasources/grafana.rst
Normal file
426
doc/source/datasources/grafana.rst
Normal file
@ -0,0 +1,426 @@
|
|||||||
|
==================
|
||||||
|
Grafana datasource
|
||||||
|
==================
|
||||||
|
|
||||||
|
Synopsis
|
||||||
|
--------
|
||||||
|
|
||||||
|
Grafana can interface with many different types of storage backends that
|
||||||
|
Grafana calls datasources_. Since the term datasources causes significant
|
||||||
|
confusion by overlapping definitions used in Watcher these **datasources are
|
||||||
|
called projects instead**. Some examples of supported projects are InfluxDB
|
||||||
|
or Elasticsearch while others might be more familiar such as Monasca or
|
||||||
|
Gnocchi. The Grafana datasource provides the functionality to retrieve metrics
|
||||||
|
from Grafana for different projects. This functionality is achieved by using
|
||||||
|
the proxy interface exposed in Grafana to communicate with Grafana projects
|
||||||
|
directly.
|
||||||
|
|
||||||
|
Background
|
||||||
|
**********
|
||||||
|
|
||||||
|
Since queries to retrieve metrics from Grafana are proxied to the project the
|
||||||
|
format of these queries will change significantly depending on the type of
|
||||||
|
project. The structure of the projects themselves will also change
|
||||||
|
significantly as they are structured by users and administrators. For instance,
|
||||||
|
some developers might decide to store metrics about compute_nodes in MySQL and
|
||||||
|
use the UUID as primary key while others use InfluxDB and use the hostname as
|
||||||
|
primary key. Furthermore, datasources in Watcher should return metrics in
|
||||||
|
specific units strictly defined in the baseclass_ depending on how the units
|
||||||
|
are stored in the projects they might require conversion before being returned.
|
||||||
|
The flexible configuration parameters of the Grafana datasource allow to
|
||||||
|
specify exactly how the deployment is configured and this will enable to
|
||||||
|
correct retrieval of metrics and with the correct units.
|
||||||
|
|
||||||
|
.. _datasources: https://grafana.com/plugins?type=datasource
|
||||||
|
.. _baseclass: https://github.com/openstack/watcher/blob/584eeefdc8/watcher/datasources/base.py
|
||||||
|
|
||||||
|
Requirements
|
||||||
|
------------
|
||||||
|
|
||||||
|
The use of the Grafana datasource requires a reachable Grafana endpoint and an
|
||||||
|
authentication token for access to the desired projects. The projects behind
|
||||||
|
Grafana will need to contain the metrics for compute_nodes_ or instances_ and
|
||||||
|
these need to be identifiable by an attribute of the Watcher datamodel_ for
|
||||||
|
instance hostname or UUID.
|
||||||
|
|
||||||
|
.. _compute_nodes: https://opendev.org/openstack/watcher/src/branch/master/watcher/decision_engine/model/element/node.py
|
||||||
|
.. _instances: https://opendev.org/openstack/watcher/src/branch/master/watcher/decision_engine/model/element/instance.py
|
||||||
|
.. _datamodel: https://opendev.org/openstack/watcher/src/branch/master/watcher/decision_engine/model/element
|
||||||
|
|
||||||
|
Limitations
|
||||||
|
***********
|
||||||
|
|
||||||
|
* Only the InfluxDB project is currently supported [#f1]_.
|
||||||
|
* All metrics must be retrieved from the same Grafana endpoint (same URL).
|
||||||
|
* All metrics must be retrieved with the same authentication token.
|
||||||
|
|
||||||
|
.. [#f1] A base class for projects is available_ and easily extensible.
|
||||||
|
.. _available: https://review.opendev.org/#/c/649341/24/watcher/datasources/grafana_translator/base.py
|
||||||
|
|
||||||
|
Configuration
|
||||||
|
-------------
|
||||||
|
|
||||||
|
Several steps are required in order to use the Grafana datasource, Most steps
|
||||||
|
are related configuring Watcher to match the deployed Grafana setup such as
|
||||||
|
queries proxied to the project or the type of project for any given metric.
|
||||||
|
Most of the configuration can either be supplied via the traditional
|
||||||
|
configuration file or in a `special yaml`_ file.
|
||||||
|
|
||||||
|
.. _special yaml: https://specs.openstack.org/openstack/watcher-specs/specs/train/approved/file-based-metricmap.html
|
||||||
|
|
||||||
|
token
|
||||||
|
*****
|
||||||
|
|
||||||
|
First step is to generate an access token with access to the required projects.
|
||||||
|
This can be done from the api_ or from the web interface_. Tokens generated
|
||||||
|
from the web interface will have the same access to projects as the user that
|
||||||
|
created them while using the cli allows to generate a key for a specific
|
||||||
|
role.The token will only be displayed once so store it well. This token will go
|
||||||
|
into the configuration file later and this parameter can not be placed in the
|
||||||
|
yaml.
|
||||||
|
|
||||||
|
.. _api: https://grafana.com/docs/http_api/auth/#create-api-key
|
||||||
|
.. _interface: https://grafana.com/docs/http_api/auth/#create-api-token
|
||||||
|
|
||||||
|
base_url
|
||||||
|
********
|
||||||
|
|
||||||
|
Next step is supplying the base url of the Grafana endpoint. The base url
|
||||||
|
parameter will need to specify the type of http protocol and the use of
|
||||||
|
plain text http is strongly discouraged due to the transmission of the access
|
||||||
|
token. Additionally the path to the proxy interface needs to be supplied as
|
||||||
|
well in case Grafana is placed in a sub directory of the web server. An example
|
||||||
|
would be: `https://mygrafana.org/api/datasource/proxy/` were
|
||||||
|
`/api/datasource/proxy` is the default path without any subdirectories.
|
||||||
|
Likewise, this parameter can not be placed in the yaml.
|
||||||
|
|
||||||
|
To prevent many errors from occurring and potentially filing the logs files it
|
||||||
|
is advised to specify the desired datasource in the configuration as it would
|
||||||
|
prevent the datasource manager from having to iterate and try possible
|
||||||
|
datasources with the launch of each audit. To do this specify `datasources` in
|
||||||
|
the `[watcher_datasources]` group.
|
||||||
|
|
||||||
|
The current configuration that is required to be placed in the traditional
|
||||||
|
configuration file would look like the following:
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
[grafana_client]
|
||||||
|
token = 0JLbF0oB4R3Q2Fl337Gh4Df5VN12D3adBE3f==
|
||||||
|
base_url = https://mygranfa.org/api/datasource/proxy
|
||||||
|
|
||||||
|
[watcher_datasources]
|
||||||
|
datasources = grafana
|
||||||
|
|
||||||
|
metric parameters
|
||||||
|
*****************
|
||||||
|
|
||||||
|
The last five remaining configuration parameters can all be placed both in the
|
||||||
|
traditional configuration file or in the yaml, however, it is not advised to
|
||||||
|
mix and match but in the case it does occur the yaml would override the
|
||||||
|
settings from the traditional configuration file. All five of these parameters
|
||||||
|
are dictionaries mapping specific metrics to a configuration parameter. For
|
||||||
|
instance the `project_id_map` will specify the specific project id in Grafana
|
||||||
|
to be used. The parameters are named as follow:
|
||||||
|
|
||||||
|
* project_id_map
|
||||||
|
* database_map
|
||||||
|
* translator_map
|
||||||
|
* attribute_map
|
||||||
|
* query_map
|
||||||
|
|
||||||
|
These five parameters are named differently if configured using the yaml
|
||||||
|
configuration file. The parameters are named as follows and are in
|
||||||
|
identical order as to the list of the traditional configuration file:
|
||||||
|
|
||||||
|
* project
|
||||||
|
* db
|
||||||
|
* translator
|
||||||
|
* attribute
|
||||||
|
* query
|
||||||
|
|
||||||
|
When specified in the yaml the parameters are no longer dictionaries instead
|
||||||
|
each parameter needs to be defined per metric as sub-parameters. Examples of
|
||||||
|
these parameters configured for both the yaml and traditional configuration
|
||||||
|
are described at the end of this document.
|
||||||
|
|
||||||
|
project_id
|
||||||
|
**********
|
||||||
|
|
||||||
|
The project id's can only be determined by someone with the admin role in
|
||||||
|
Grafana as that role is required to open the list of projects. The list of
|
||||||
|
projects can be found on `/datasources` in the web interface but
|
||||||
|
unfortunately it does not immediately display the project id. To display
|
||||||
|
the id one can best hover the mouse over the projects and the url will show the
|
||||||
|
project id's for example `/datasources/edit/7563`. Alternatively the entire
|
||||||
|
list of projects can be retrieved using the `REST api`_. To easily make
|
||||||
|
requests to the REST api a tool such as Postman can be used.
|
||||||
|
|
||||||
|
.. _REST api: https://grafana.com/docs/http_api/data_source/#get-all-datasources
|
||||||
|
|
||||||
|
database
|
||||||
|
********
|
||||||
|
|
||||||
|
The database is the parameter for the schema / database that is actually
|
||||||
|
defined in the project. For instance, if the project would be based on MySQL
|
||||||
|
this is were the name of schema used within the MySQL server would be
|
||||||
|
specified. For many different projects it is possible to list all the databases
|
||||||
|
currently available. Tools like Postman can be used to list all the available
|
||||||
|
databases per project. For InfluxDB based projects this would be with the
|
||||||
|
following path and query, however be sure to construct these request in Postman
|
||||||
|
as the header needs to contain the authorization token:
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
https://URL.DOMAIN/api/datasources/proxy/PROJECT_ID/query?q=SHOW%20DATABASES
|
||||||
|
|
||||||
|
translator
|
||||||
|
**********
|
||||||
|
|
||||||
|
Each translator is for a specific type of project will have a uniquely
|
||||||
|
identifiable name and the baseclass allows to easily support new types of
|
||||||
|
projects such as elasticsearch or prometheus. Currently only InfluxDB based
|
||||||
|
projects are supported as a result the only valid value for this parameter is `
|
||||||
|
influxdb`.
|
||||||
|
|
||||||
|
attribute
|
||||||
|
*********
|
||||||
|
|
||||||
|
The attribute parameter specifies which attribute to use from Watcher's
|
||||||
|
data model in order to construct the query. The available attributes differ
|
||||||
|
per type of object in the data model but the following table shows the
|
||||||
|
attributes for ComputeNodes, Instances and IronicNodes.
|
||||||
|
|
||||||
|
+-----------------+-----------------+--------------------+
|
||||||
|
| ComputeNode | Instance | IronicNode |
|
||||||
|
+=================+=================+====================+
|
||||||
|
| uuid | uuid | uuid |
|
||||||
|
+-----------------+-----------------+--------------------+
|
||||||
|
| human_id | human_id | human_id |
|
||||||
|
+-----------------+-----------------+--------------------+
|
||||||
|
| id | project_id | power_state |
|
||||||
|
+-----------------+-----------------+--------------------+
|
||||||
|
| hostname | watcher_exclude | maintenance |
|
||||||
|
+-----------------+-----------------+--------------------+
|
||||||
|
| status | locked | maintenance_reason |
|
||||||
|
+-----------------+-----------------+--------------------+
|
||||||
|
| disabled_reason | metadata | extra |
|
||||||
|
+-----------------+-----------------+--------------------+
|
||||||
|
| state | state | |
|
||||||
|
+-----------------+-----------------+--------------------+
|
||||||
|
| memory | memory | |
|
||||||
|
+-----------------+-----------------+--------------------+
|
||||||
|
| disk | disk | |
|
||||||
|
+-----------------+-----------------+--------------------+
|
||||||
|
| disk_capacity | disk_capacity | |
|
||||||
|
+-----------------+-----------------+--------------------+
|
||||||
|
| vcpus | vcpus | |
|
||||||
|
+-----------------+-----------------+--------------------+
|
||||||
|
|
||||||
|
Many if not all of these attributes map to attributes of the objects that are
|
||||||
|
fetched from clients such as Nova. To see how these attributes are put into the
|
||||||
|
data model the following source files can be analyzed for Nova_ and Ironic_.
|
||||||
|
|
||||||
|
.. _Nova: https://opendev.org/openstack/watcher/src/branch/master/watcher/decision_engine/model/collector/nova.py#L304
|
||||||
|
.. _Ironic: https://opendev.org/openstack/watcher/src/branch/master/watcher/decision_engine/model/collector/ironic.py#L85
|
||||||
|
|
||||||
|
query
|
||||||
|
*****
|
||||||
|
|
||||||
|
The query is the single most important parameter it will be passed to the
|
||||||
|
project and should return the desired metric for the specific host and return
|
||||||
|
the value in the correct unit. The units for all available metrics are
|
||||||
|
documented in the `datasource baseclass`_. This might mean the query specified
|
||||||
|
in this parameter is responsible for converting the unit. The following query
|
||||||
|
demonstrates how such a conversion could be achieved and demonstrates the
|
||||||
|
conversion from bytes to megabytes.
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
SELECT value/1000000 FROM memory...
|
||||||
|
|
||||||
|
Queries will be formatted using the .format string method within Python. This
|
||||||
|
format will currently have give attributes exposed to it labeled `{0}` to
|
||||||
|
`{4}`. Every occurrence of these characters within the string will be replaced
|
||||||
|
with the specific attribute.
|
||||||
|
|
||||||
|
- {0} is the aggregate typically `mean`, `min`, `max` but `count` is also
|
||||||
|
supported.
|
||||||
|
- {1} is the attribute as specified in the attribute parameter.
|
||||||
|
- {2} is the period of time to aggregate data over in seconds.
|
||||||
|
- {3} is the granularity or the interval between data points in seconds.
|
||||||
|
- {4} is translator specific and in the case of InfluxDB it will be used for
|
||||||
|
retention_periods.
|
||||||
|
|
||||||
|
**InfluxDB**
|
||||||
|
|
||||||
|
Constructing the queries or rather anticipating how the results should look to
|
||||||
|
be correctly interpreted by Watcher can be a challenge. The following json
|
||||||
|
example demonstrates how what the result should look like and the query used to
|
||||||
|
get this result.
|
||||||
|
|
||||||
|
.. code-block:: json
|
||||||
|
|
||||||
|
{
|
||||||
|
"results": [
|
||||||
|
{
|
||||||
|
"statement_id": 0,
|
||||||
|
"series": [
|
||||||
|
{
|
||||||
|
"name": "vmstats",
|
||||||
|
"tags": {
|
||||||
|
"host": "autoserver01"
|
||||||
|
},
|
||||||
|
"columns": [
|
||||||
|
"time",
|
||||||
|
"mean"
|
||||||
|
],
|
||||||
|
"values": [
|
||||||
|
[
|
||||||
|
1560848284284,
|
||||||
|
7680000
|
||||||
|
]
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
SELECT {0}("{0}_value") FROM "vmstats" WHERE host =~ /^{1}$/ AND
|
||||||
|
"type_instance" =~ /^mem$/ AND time >= now() - {2}s GROUP BY host
|
||||||
|
|
||||||
|
.. _datasource baseclass: https://opendev.org/openstack/watcher/src/branch/master/watcher/datasources/base.py
|
||||||
|
|
||||||
|
Example configuration
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
The example configurations will show both how to achieve the entire
|
||||||
|
configuration in the config file or use a combination of the regular file and
|
||||||
|
yaml. Using yaml to define all the parameters for each metric is recommended
|
||||||
|
since it has better human readability and supports mutli-line option
|
||||||
|
definitions.
|
||||||
|
|
||||||
|
Configuration file
|
||||||
|
******************
|
||||||
|
|
||||||
|
**It is important to note that the line breaks shown in between assignments of
|
||||||
|
parameters can not be used in the actual configuration and these are simply here
|
||||||
|
for readability reasons.**
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
[grafana_client]
|
||||||
|
# Authentication token to gain access (string value)
|
||||||
|
# Note: This option can be changed without restarting.
|
||||||
|
token = eyJrIjoiT0tTcG1pUlY2RnVKZTFVaDFsNFZXdE9ZWmNrMkZYbk==
|
||||||
|
|
||||||
|
# first part of the url (including https:// or http://) up until project id
|
||||||
|
# part. Example: https://secure.org/api/datasource/proxy/ (string value)
|
||||||
|
# Note: This option can be changed without restarting.
|
||||||
|
base_url = https://monitoring-grafana.com/api/datasources/proxy/
|
||||||
|
|
||||||
|
# Project id as in url (integer value)
|
||||||
|
# Note: This option can be changed without restarting.
|
||||||
|
project_id_map = host_cpu_usage:1337,host_ram_usage:6969,
|
||||||
|
instance_cpu_usage:1337,instance_ram_usage:9696
|
||||||
|
|
||||||
|
# Mapping of grafana databases to datasource metrics. (dict value)
|
||||||
|
# Note: This option can be changed without restarting.
|
||||||
|
database_map = host_cpu_usage:monit_production,
|
||||||
|
host_ram_usage:monit_production,instance_cpu_usage:prod_cloud,
|
||||||
|
instance_ram_usage:prod_cloud
|
||||||
|
|
||||||
|
translator_map = host_cpu_usage:influxdb,host_ram_usage:influxdb,
|
||||||
|
instance_cpu_usage:influxdb,instance_ram_usage:influxdb
|
||||||
|
|
||||||
|
attribute_map = host_cpu_usage:hostname,host_ram_usage:hostname,
|
||||||
|
instance_cpu_usage:human_id,instance_ram_usage:human_id
|
||||||
|
|
||||||
|
query_map = host_cpu_usage:SELECT 100-{0}("{0}_value") FROM {4}.cpu WHERE
|
||||||
|
("host" =~ /^{1}$/ AND "type_instance" =~/^idle$/ AND time > now()-{2}s),
|
||||||
|
host_ram_usage:SELECT {0}("{0}_value")/1000000 FROM {4}.memory WHERE
|
||||||
|
("host" =~ /^{1}$/) AND "type_instance" =~ /^used$/ AND time >= now()-{2}s
|
||||||
|
GROUP BY "type_instance",instance_cpu_usage:SELECT {0}("{0}_value") FROM
|
||||||
|
"vmstats" WHERE host =~ /^{1}$/ AND "type_instance" =~ /^cpu$/ AND time >=
|
||||||
|
now() - {2}s GROUP BY host,instance_ram_usage:SELECT {0}("{0}_value") FROM
|
||||||
|
"vmstats" WHERE host =~ /^{1}$/ AND "type_instance" =~ /^mem$/ AND time >=
|
||||||
|
now() - {2}s GROUP BY host
|
||||||
|
|
||||||
|
[grafana_translators]
|
||||||
|
|
||||||
|
retention_periods = one_week:10080,one_month:302400,five_years:525600
|
||||||
|
|
||||||
|
[watcher_datasources]
|
||||||
|
datasources = grafana
|
||||||
|
|
||||||
|
yaml
|
||||||
|
****
|
||||||
|
|
||||||
|
When using the yaml configuration file some parameters still need to be defined
|
||||||
|
using the regular configuration such as the path for the yaml file these
|
||||||
|
parameters are detailed below:
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
[grafana_client]
|
||||||
|
token = eyJrIjoiT0tTcG1pUlY2RnVKZTFVaDFsNFZXdE9ZWmNrMkZYbk==
|
||||||
|
|
||||||
|
base_url = https://monitoring-grafana.com/api/datasources/proxy/
|
||||||
|
|
||||||
|
[watcher_datasources]
|
||||||
|
datasources = grafana
|
||||||
|
|
||||||
|
[watcher_decision_engine]
|
||||||
|
metric_map_path = /etc/watcher/metric_map.yaml
|
||||||
|
|
||||||
|
Using the yaml allows to more effectively define the parameters per metric with
|
||||||
|
greater human readability due to the availability of multi line options. These
|
||||||
|
multi line options are demonstrated in the query parameters.
|
||||||
|
|
||||||
|
.. code-block:: yaml
|
||||||
|
|
||||||
|
grafana:
|
||||||
|
host_cpu_usage:
|
||||||
|
project: 1337
|
||||||
|
db: monit_production
|
||||||
|
translator: influxdb
|
||||||
|
attribute: hostname
|
||||||
|
query: >
|
||||||
|
SELECT 100-{0}("{0}_value") FROM {4}.cpu
|
||||||
|
WHERE ("host" =~ /^{1}$/ AND "type_instance" =~/^idle$/ AND
|
||||||
|
time > now()-{2}s)
|
||||||
|
host_ram_usage:
|
||||||
|
project: 6969
|
||||||
|
db: monit_production
|
||||||
|
translator: influxdb
|
||||||
|
attribute: hostname
|
||||||
|
query: >
|
||||||
|
SELECT {0}("{0}_value")/1000000 FROM {4}.memory WHERE
|
||||||
|
("host" =~ /^{1}$/) AND "type_instance" =~ /^used$/ AND time >=
|
||||||
|
now()-{2}s GROUP BY "type_instance"
|
||||||
|
instance_cpu_usage:
|
||||||
|
project: 1337
|
||||||
|
db: prod_cloud
|
||||||
|
translator: influxdb
|
||||||
|
attribute: human_id
|
||||||
|
query: >
|
||||||
|
SELECT {0}("{0}_value") FROM
|
||||||
|
"vmstats" WHERE host =~ /^{1}$/ AND "type_instance" =~ /^cpu$/ AND
|
||||||
|
time >= now() - {2}s GROUP BY host
|
||||||
|
instance_ram_usage:
|
||||||
|
project: 9696
|
||||||
|
db: prod_cloud
|
||||||
|
translator: influxdb
|
||||||
|
attribute: human_id
|
||||||
|
query: >
|
||||||
|
SELECT {0}("{0}_value") FROM
|
||||||
|
"vmstats" WHERE host =~ /^{1}$/ AND "type_instance" =~ /^mem$/ AND
|
||||||
|
time >= now() - {2}s GROUP BY host
|
||||||
|
|
||||||
|
External Links
|
||||||
|
--------------
|
||||||
|
|
||||||
|
- `List of Grafana datasources <https://grafana.com/plugins?type=datasource>`_
|
8
doc/source/datasources/index.rst
Normal file
8
doc/source/datasources/index.rst
Normal file
@ -0,0 +1,8 @@
|
|||||||
|
Datasources
|
||||||
|
===========
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:glob:
|
||||||
|
:maxdepth: 1
|
||||||
|
|
||||||
|
./*
|
Loading…
x
Reference in New Issue
Block a user