Dashboards

Codename: Friday — The UI. The visual interface overlay for all your platform's vitals.

Dashboard and visualization platforms for observability data.

Overview

Visualization services provide unified dashboards for viewing logs, metrics, and traces from observability backends.

Grafana - Visualization Platform

Unified observability dashboards for metrics, logs, and traces.

Overview

Grafana provides:

Multi-datasource dashboards
Unified visualization
Alerting and notifications
Explore mode for ad-hoc queries
Dashboard templates and sharing

Ports

3000 - Web UI and API

Configuration

Grafana is auto-provisioned with datasources for:

Prometheus - Metrics
Loki - Logs
Jaeger - Traces

Datasource Config: provisioning/datasources/

Usage

Start Service

make up-observability
# or
docker compose up grafana

Access Web UI

open http://localhost:3000

Default Credentials

Username: admin
Password: admin

Change password on first login!

Key Features

1. Dashboards

Create visual dashboards with:

Time-series graphs
Gauge panels
Tables and lists
Heatmaps
Stat panels
Bar charts

2. Explore Mode

Ad-hoc querying across datasources:

Query metrics (PromQL)
Search logs (LogQL)
Find traces (Jaeger UI)
Correlate data across sources

3. Alerting

Set up alerts based on queries:

Threshold alerts
Query-based alerts
Alert routing
Notification channels

4. Unified Search

Search across all observability data:

Find logs by trace ID
Jump from metric to trace
Correlate events across sources

Quick Start

1. Explore Metrics

Open Grafana (http://localhost:3000)
Go to Explore (compass icon)
Select "Prometheus" datasource
Enter PromQL query:
```
rate(http_requests_total[5m])
```
Click "Run Query"

2. Search Logs

Go to Explore
Select "Loki" datasource
Enter LogQL query:
```
{service_name="swiss-army-go"}
```
Filter and search logs

3. View Traces

Go to Explore
Select "Jaeger" datasource
Search by service or trace ID
Click trace to view details

Creating Dashboards

Basic Dashboard

Click "+" then "Dashboard"
Add panel
Select datasource (Prometheus, Loki, Jaeger)
Write query
Choose visualization type
Save dashboard

Example Panels

Request Rate (Prometheus)

sum(rate(http_requests_total[5m])) by (service)

Error Logs (Loki)

sum(count_over_time({service_name="api"} |= "level=error" [5m])) by (service_name)

Trace Count (Custom)

Query Jaeger API for trace statistics

Dashboard Templates

RED Metrics Dashboard

Monitor Rate, Errors, Duration:

# Rate
sum(rate(http_requests_total[5m])) by (service)

# Errors
sum(rate(http_requests_total{status=~"5.."}[5m])) by (service)

# Duration (P95)
histogram_quantile(0.95,
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service)
)

Service Overview Dashboard

Request rate graph
Error rate graph
Latency percentiles (P50, P95, P99)
Active instances
Recent error logs
Trace samples

Trace-Log-Metric Correlation

Drill-down Flow

1. See metric spike in dashboard
   ↓
2. Click to explore metrics
   ↓
3. Find high-latency traces
   ↓
4. Click trace ID to view in Jaeger
   ↓
5. Find error span in trace
   ↓
6. Copy trace ID
   ↓
7. Search logs for trace_id in Loki
   ↓
8. Find root cause in logs

Example Workflow

Dashboard alert - High error rate
Explore metrics - Which endpoint?
Search logs - What errors?
Find trace - Which request failed?
Analyze trace - Where did it fail?
Check logs - Why did it fail?

Alerting

Create Alert

Open dashboard panel
Click "Alert" tab

Define alert rule:

WHEN avg() OF query(A, 5m, now)
IS ABOVE 100

Add notification channel
Save alert

Notification Channels

Email
Slack
PagerDuty
Webhook
Discord
Teams

Data Source Configuration

Prometheus

# provisioning/datasources/prometheus.yml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    url: http://prometheus:9090
    isDefault: true

Loki

# provisioning/datasources/loki.yml
apiVersion: 1
datasources:
  - name: Loki
    type: loki
    url: http://loki:3100

Jaeger

# provisioning/datasources/jaeger.yml
apiVersion: 1
datasources:
  - name: Jaeger
    type: jaeger
    url: http://jaeger:16686

Variables and Templating

Dashboard Variables

Create dynamic dashboards:

# Service selector
$service = label_values(service_name)

# Query using variable
rate(http_requests_total{service="$service"}[5m])

Common Variables

Service - Filter by service
Environment - dev/staging/prod
Time range - Quick time selection
Instance - Filter by instance

Performance Tips

Limit time ranges - Don't query years of data
Use caching - Enable query caching
Reduce refresh rate - Don't refresh every second
Optimize queries - Use recording rules in Prometheus
Dashboard organization - Separate dashboards by team/service

Production Notes

Authentication - Enable proper auth (OAuth, LDAP, etc.)
User Management - Set up teams and permissions
Backup Dashboards - Export and version control dashboards
High Availability - Deploy multiple Grafana instances
Database - Use external database (PostgreSQL) instead of SQLite
Security - Use HTTPS, secure datasource credentials
Monitoring - Monitor Grafana itself

Troubleshooting

Datasource Not Working

Check datasource configuration
Verify network connectivity
Test datasource URL from Grafana container
Check datasource logs

Dashboard Not Loading

Check query syntax
Verify time range
Check datasource availability
Review Grafana logs

Slow Performance

Reduce time range
Optimize queries
Enable query caching
Increase Grafana resources

Alternatives

If Grafana doesn't fit your needs:

Kibana - For Elasticsearch/OpenSearch stack
Datadog - SaaS, full platform
Custom Dashboards - Build your own
Prometheus UI - Basic metrics UI
Jaeger UI - For traces only

Overview​

Grafana - Visualization Platform

Overview​

Ports​

Configuration​

Usage​

Start Service​

Access Web UI​

Default Credentials​

Key Features​

1. Dashboards​

2. Explore Mode​

3. Alerting​

4. Unified Search​

Quick Start​

1. Explore Metrics​

2. Search Logs​

3. View Traces​

Creating Dashboards​

Basic Dashboard​

Example Panels​

Request Rate (Prometheus)​

Error Logs (Loki)​

Trace Count (Custom)​

Dashboard Templates​

RED Metrics Dashboard​

Service Overview Dashboard​

Trace-Log-Metric Correlation​

Drill-down Flow​

Example Workflow​

Alerting​

Create Alert​

Notification Channels​

Data Source Configuration​

Prometheus​

Loki​

Jaeger​

Variables and Templating​

Dashboard Variables​

Common Variables​

Performance Tips​

Production Notes​

Troubleshooting​

Datasource Not Working​

Dashboard Not Loading​

Slow Performance​

Alternatives​

Overview

Overview

Ports

Configuration

Usage

Start Service

Access Web UI

Default Credentials

Key Features

1. Dashboards

2. Explore Mode

3. Alerting

4. Unified Search

Quick Start

1. Explore Metrics

2. Search Logs

3. View Traces

Creating Dashboards

Basic Dashboard

Example Panels

Request Rate (Prometheus)

Error Logs (Loki)

Trace Count (Custom)

Dashboard Templates

RED Metrics Dashboard

Service Overview Dashboard

Trace-Log-Metric Correlation

Drill-down Flow

Example Workflow

Alerting

Create Alert

Notification Channels

Data Source Configuration

Prometheus

Loki

Jaeger

Variables and Templating

Dashboard Variables

Common Variables

Performance Tips

Production Notes

Troubleshooting

Datasource Not Working

Dashboard Not Loading

Slow Performance

Alternatives