Monitor telemetry with Prometheus & Grafana
Challenge
It is important to gain operational and usage insight into a running Vault cluster for the purposes of understanding performance and assisting with proactive incident response, along with understanding business workloads, and use cases.
Operators and security practitioners need to be aware of conditions that can indicate potential performance implications to production users or security issues which require immediate attention.
Solution
Vault provides rich operational telemetry metrics that can be consumed by popular solutions for monitoring and alerting on key operational conditions.
One of the many ways that you can monitor Vault telemetry includes using the monitoring and alerting toolkit Prometheus, and visualizing the metric data with the Grafana observability platform.
Vault returns telemetry metrics from the /sys/metrics endpoint, and adding the format=prometheus
parameter will result in Prometheus formatted metrics.
Scenario introduction
In this scenario, you will use Docker containers to deploy a Vault server, Prometheus monitoring, and a Grafana dashboard.
You will configure Vault to enable Prometheus metrics, and deploy the containers using the command line in a terminal session. You will also use the Grafana web interface to create a dashboard for visualizing metrics.
Begin the scenario by preparing your environment.
Prerequisites
To perform the steps in the hands on scenario, you need:
Vault 1.8 or later binary installed in your system path; the Community Edition can be used for this tutorial.
Prepare host environment
Create a temporary directory and some subdirectories to contain all of the work you will do in this scenario, and assign its path to the environment variable LEARN_VAULT
.
$ mkdir -p /tmp/learn-vault-monitoring/{vault-config,vault-data} \
/tmp/learn-vault-monitoring/grafana-config \
/tmp/learn-vault-monitoring/prometheus-config && \
export LEARN_VAULT=/tmp/learn-vault-monitoring
Create a Docker network named learn-vault; this network will be used by all containers in the scenario.
$ docker network create --attachable --subnet 10.42.74.0/24 learn-vault
With the environment preparation complete, you are ready to start the Vault container.
Vault container
You will start a minimally configured Vault server using the filesystem storage backend and some initial configuration contained in the vault-config
directory.
Begin by pulling the latest Vault image version.
$ docker pull hashicorp/vault:latest
Vault configuration
Prometheus metrics are not enabled by default; setting the prometheus_retention_time
to a non-zero value enables them.
The example configuration includes a telemetry
stanza to set a 12 hour retention time for metrics stored in memory. It also specifies that Vault should not emit Prometheus metrics prefixed with host names, as this is not desirable in most use cases.
server.hcl
api_addr = "http://127.0.0.1:8200"
listener "tcp" {
address = "0.0.0.0:8200"
tls_disable = "true"
}
storage "file" {
path = "/vault/data"
}
telemetry {
disable_hostname = true
prometheus_retention_time = "12h"
}
More telemetry configuration details are available in the telemetry parameters documentation.
TLS Note
Although the listener stanza disables TLS for this tutorial, Vault should always be used with TLS enabled in production to provide secure communication between clients and the Vault server. To enable TLS requires a certificate file and key file on each Vault server.
Create the Vault server configuration.
$ cat > $LEARN_VAULT/vault-config/server.hcl << EOF
api_addr = "http://127.0.0.1:8200"
listener "tcp" {
address = "0.0.0.0:8200"
tls_disable = "true"
}
storage "file" {
path = "/vault/data"
}
telemetry {
disable_hostname = true
prometheus_retention_time = "12h"
}
EOF
Start Vault container
The Vault container specifies the IPC_LOCK capability for memory locking, a static IP address, and some volume mounts in the project directory for configuration and data.
Start the Vault container running detached in the background.
$ docker run \
--cap-add=IPC_LOCK \
--detach \
--ip 10.42.74.100 \
--name learn-vault \
--network learn-vault \
-p 8200:8200 \
--rm \
--volume $LEARN_VAULT/vault-config:/vault/config \
--volume $LEARN_VAULT/vault-data:/vault/data \
hashicorp/vault server
Check the Vault server logs to ensure that the container is ready.
$ docker logs learn-vault 2>&1 | grep "Vault server started!"
When the Vault container is running, your output should resemble this example.
==> Vault server started! Log data will stream in below:
With the Vault container started and ready, proceed to preparing Vault for use.
Initialize, unseal & authenticate
The running Vault container publishes TCP port 8200 to the Docker host, so the Vault API address is http://127.0.0.1:8200
.
Export the VAULT_ADDR
environment variable value required for correctly addressing the Vault container.
$ export VAULT_ADDR=http://127.0.0.1:8200
For the purpose of simplicity in this tutorial, initialize Vault with 1 key share and a key threshold of 1 and write the output to the file .vault-init
in the project directory.
$ vault operator init \
-key-shares=1 \
-key-threshold=1 \
| head -n3 \
| cat > $LEARN_VAULT/.vault-init
Successful execution of this command should produce no output.
Unseal Vault with the Unseal Key 1 value from the .vault-init
file.
$ vault operator unseal \
$(grep 'Unseal Key 1' $LEARN_VAULT/.vault-init | awk '{print $NF}')
Successful output from unsealing Vault should resemble this example:
Key Value
--- -----
Seal Type shamir
Initialized true
Sealed false
Total Shares 1
Threshold 1
Version 1.8.4
Storage Type file
Cluster Name vault-cluster-5af38c1e
Cluster ID 5463b6ac-a3b8-01fa-fc6a-ab09981d7aa7
HA Enabled false
If your status output also shows Vault to be initialized and unsealed, you can login with vault login
by passing the Initial Root Token value from the .vault-init
file.
$ vault login -no-print \
$(grep 'Initial Root Token' $LEARN_VAULT/.vault-init | awk '{print $NF}')
This command should produce no output when successful. If you want to confirm that the login was successful, try a token lookup and confirm that your token policies contain root.
Note
You will use a root token in this scenario for simplicity. However, in actual production environments, root tokens should be closely guarded and used only for tightly controlled purposes. Review the documentation on root tokens for more details.
$ vault token lookup | grep policies
Successful output should contain the following.
policies [root]
Define Prometheus ACL Policy
The Vault /sys/metrics
endpoint is authenticated. Prometheus requires a Vault token with sufficient capabilities to successfully consume metrics from the endpoint.
Define a prometheus-metrics ACL policy that grants read capabilities to the metrics endpoint.
$ vault policy write prometheus-metrics - << EOF
path "/sys/metrics" {
capabilities = ["read"]
}
EOF
Create an example token with the prometheus-metrics policy attached that Prometheus will use for authentication to access the Vault telemetry metrics endpoint.
Write the token ID to the file prometheus-token
in the Prometheus configuration directory.
$ vault token create \
-field=token \
-policy prometheus-metrics \
> $LEARN_VAULT/prometheus-config/prometheus-token
This command is expected to produce no output.
Note
Production Vault installations typically use auth methods to issue tokens, but for the sake of simplicity this scenario issues the token directly from the token store.
The Vault server is now prepared to properly expose telemetry metrics for Prometheus consumption, and you have created the token that Prometheus will use to access the metrics.
Prometheus container
Before you can start the Prometheus container, you must first create the configuration file prometheus.yml
.
$ cat > $LEARN_VAULT/prometheus-config/prometheus.yml << EOF
scrape_configs:
- job_name: vault
metrics_path: /v1/sys/metrics
params:
format: ['prometheus']
scheme: http
authorization:
credentials_file: /etc/prometheus/prometheus-token
static_configs:
- targets: ['10.42.74.100:8200']
EOF
The configuration is minimal, and specifies a scrape config job named vault with the Vault API endpoint as the metrics path, along with the path to the Vault token and the IP address plus port of the Vault server.
Pull the Prometheus image.
$ docker pull prom/prometheus
Start the Prometheus container using volume mounts that point to the previously created configuration and Vault token file.
$ docker run \
--detach \
--ip 10.42.74.110 \
--name learn-prometheus \
--network learn-vault \
-p 9090:9090 \
--rm \
--volume $LEARN_VAULT/prometheus-config/prometheus.yml:/etc/prometheus/prometheus.yml \
--volume $LEARN_VAULT/prometheus-config/prometheus-token:/etc/prometheus/prometheus-token \
prom/prometheus
Verify that Prometheus is ready to receive requests.
$ docker logs learn-prometheus 2>&1 | grep -i "server is ready"
The log should contain an entry like this one.
ts=2024-06-06T15:14:55.143Z caller=main.go:1114 level=info msg="Server is ready to receive web requests."
Prometheus is ready; continue with Grafana container configuration and deployment.
Grafana container
Create a Grafana configuration that specifies the Prometheus container as the data source. This way, you can focus on metrics and dashboards instead of setting up the data source in the Grafana web UI.
$ cat > $LEARN_VAULT/grafana-config/datasource.yml << EOF
# config file version
apiVersion: 1
datasources:
- name: vault
type: prometheus
access: server
orgId: 1
url: http://10.42.74.110:9090
password:
user:
database:
basicAuth:
basicAuthUser:
basicAuthPassword:
withCredentials:
isDefault:
jsonData:
graphiteVersion: "1.1"
tlsAuth: false
tlsAuthWithCACert: false
secureJsonData:
tlsCACert: ""
tlsClientCert: ""
tlsClientKey: ""
version: 1
editable: true
EOF
Pull the latest Grafana image.
$ docker pull grafana/grafana:latest
Start the Grafana container.
$ docker run \
--detach \
--ip 10.42.74.120 \
--name learn-grafana \
--network learn-vault \
-p 3000:3000 \
--rm \
--volume $LEARN_VAULT/grafana-config/datasource.yml:/etc/grafana/provisioning/datasources/prometheus_datasource.yml \
grafana/grafana
Verify that the Grafana container is ready.
$ docker logs learn-grafana 2>&1 | grep "HTTP Server Listen"
The log should contain an entry like this one.
t=2021-10-19T13:13:51+0000 lvl=info msg="HTTP Server Listen" logger=http.server address=[::]:3000 protocol=http subUrl= socket=
You can also optionally check once more to verify that all containers are up and running.
$ docker ps -f name=learn --format "table {{.Names}}\t{{.Status}}"
The output should resemble this example:
NAMES STATUS
learn-grafana Up 21 seconds
learn-prometheus Up 54 seconds
learn-vault Up 2 minutes
With all containers ready, move on to accessing and configuring Grafana through the web UI.
Access and configure Grafana dashboard
Access the Grafana web interface to create a dashboard containing some example Vault metrics.
Open http://localhost:3000/ in a browser.
Enter
admin
for both the Email or username and password fields.When prompted, change the admin password and confirm it.
Click the Dashboards icon in the navigation and select Manage
Click New Dashboard.
Click Add an empty panel to add the first new panel for a metric from Vault.
Add a memory utilization graph
Let's add a graph to the dashboard for Vault memory utilization.
In the New dashboard/Edit Panel page, use the following steps to add the graph.
In the Data source drop-down, choose vault.
In the Metrics browser text input, notice that you can begin to type
vault_
and Grafana will complete metric names for you from a listing. To specify the system memory usage of Vault, entervault_runtime_sys_bytes
here.Under Panel options in the navigation, enter
System memory utilization
for Title.Scroll down to Graph styles and select Bars.
Scroll to Standard options and use the drop-down to navigate to Data and select bytes(SI).
Click Apply
Your dashboard should resemble this example screenshot.
You can add more panels with the Add panel button shown in the screen shot.
Add a request handling graph
Add another panel to measure request handling.
In the Data source drop-down, choose vault.
In the Metrics browser text input, notice that you can begin to type
vault_
and Grafana will complete metric names for you from a listing. To specify the system memory usage of Vault, entervault_core_handle_request_count
here.Under Panel options in the navigation, enter
Requests handled count
for Title.Click Apply
Generate requests with token lookup
To generate work for the new request handling graph, go to your terminal session and perform 100 token lookup operations.
$ for i in {1..100}; do vault token lookup; done
Return to the Grafana web UI and if necessary, click the refresh button.
Now observe your Grafana dashboard.
It should resemble this example screenshot, showing 100+ requests handled.
Feel free to experiment, and add more panels and metrics types to your dashboard.
Tip
A listing of popular metrics for monitoring Vault appears in the Telemetry Metrics Reference.
Cleanup
Stop and remove the Docker containers.
$ docker stop learn-grafana learn-prometheus learn-vault
Remove the Docker network.
$ docker network rm learn-vault
Remove the project directory.
$ rm -r /tmp/learn-vault-monitoring
Summary
You learned how to configure a Vault server to enable Prometheus metrics with a specific retention time and hostname setting. You also learned how to enable Prometheus metrics scraping and Grafana metrics visualization with dashboard panels.