Files
Northern-Thailand-Ping-Rive…/docs/GRAFANA_MATRIX_SETUP.md
grabowski ca730e484b Add comprehensive Matrix alerting system with Grafana integration
- Implement custom Python alerting system (src/alerting.py) with water level monitoring, data freshness checks, and Matrix notifications
- Add complete Grafana Matrix alerting setup guide (docs/GRAFANA_MATRIX_SETUP.md) with webhook configuration, alert rules, and notification policies
- Create Matrix quick start guide (docs/MATRIX_QUICK_START.md) for rapid deployment
- Integrate alerting commands into main application (--alert-check, --alert-test)
- Add Matrix configuration to environment variables (.env.example)
- Update Makefile with alerting targets (alert-check, alert-test)
- Enhance status command to show Matrix notification status
- Support station-specific water level thresholds and escalation rules
- Provide dual alerting approach: native Grafana alerts and custom Python system

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-26 16:18:02 +07:00

8.8 KiB

Complete Grafana Matrix Alerting Setup Guide

Overview

Configure Grafana to send water level alerts directly to Matrix channels when thresholds are exceeded.

Prerequisites

  • Grafana instance running (v8.0+)
  • PostgreSQL data source configured in Grafana
  • Matrix account
  • Matrix room for alerts

Step 1: Get Matrix Access Token

Method 1: Using curl

curl -X POST https://matrix.org/_matrix/client/v3/login \
  -H "Content-Type: application/json" \
  -d '{
    "type": "m.login.password",
    "user": "your_username",
    "password": "your_password"
  }'

Method 2: Using Element Web Client

  1. Open Element in browser: https://app.element.io
  2. Login to your account
  3. Go to Settings → Help & About → Advanced
  4. Copy your Access Token

Method 3: Using Matrix Admin Panel

  • If you have admin access to your homeserver, generate token via admin API

Step 2: Create Alert Room

curl -X POST "https://matrix.org/_matrix/client/v3/createRoom" \
  -H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Water Level Alerts - Northern Thailand",
    "topic": "Automated alerts for Ping River water monitoring",
    "preset": "private_chat"
  }'

Save the room_id from the response (format: !roomid:homeserver.com)

Step 3: Configure Grafana Contact Point

Navigate to Alerting

  1. In Grafana, go to Alerting → Contact Points
  2. Click Add contact point

Contact Point Settings

Name: matrix-water-alerts
Integration: Webhook
URL: https://matrix.org/_matrix/client/v3/rooms/!YOUR_ROOM_ID:matrix.org/send/m.room.message/{{ .GroupLabels.alertname }}_{{ .GroupLabels.severity }}_{{ now.Unix }}
HTTP Method: POST

Headers

Authorization: Bearer YOUR_MATRIX_ACCESS_TOKEN
Content-Type: application/json

Message Template (JSON Body)

{
  "msgtype": "m.text",
  "body": "🌊 **PING RIVER WATER ALERT**\n\n**Alert:** {{ .GroupLabels.alertname }}\n**Severity:** {{ .GroupLabels.severity | toUpper }}\n**Station:** {{ .GroupLabels.station_code }} ({{ .GroupLabels.station_name }})\n\n{{ range .Alerts }}**Status:** {{ .Status | toUpper }}\n**Water Level:** {{ .Annotations.water_level }}m\n**Threshold:** {{ .Annotations.threshold }}m\n**Time:** {{ .StartsAt.Format \"2006-01-02 15:04:05\" }}\n{{ if .Annotations.discharge }}**Discharge:** {{ .Annotations.discharge }} cms\n{{ end }}{{ if .Annotations.message }}**Details:** {{ .Annotations.message }}\n{{ end }}{{ end }}\n📈 **Dashboard:** {{ .ExternalURL }}\n📍 **Location:** Northern Thailand Ping River"
}

Step 4: Create Alert Rules

High Water Level Alert

# Rule Configuration
Rule Name: high-water-level
Evaluation Group: water-level-alerts
Folder: Water Monitoring

# Query A
SELECT
  station_code,
  station_name_th as station_name,
  water_level,
  discharge,
  timestamp
FROM water_measurements
WHERE
  timestamp > now() - interval '5 minutes'
  AND water_level > 6.0

# Condition
IS ABOVE 6.0 FOR 5 minutes

# Labels
severity: critical
alertname: High Water Level
station_code: {{ $labels.station_code }}
station_name: {{ $labels.station_name }}

# Annotations
water_level: {{ $values.water_level }}
threshold: 6.0
discharge: {{ $values.discharge }}
summary: Critical water level detected at {{ $labels.station_code }}

Emergency Water Level Alert

Rule Name: emergency-water-level
Query: water_level > 8.0
Condition: IS ABOVE 8.0 FOR 2 minutes
Labels:
  severity: emergency
  alertname: Emergency Water Level
Annotations:
  threshold: 8.0
  message: IMMEDIATE ACTION REQUIRED - Flood risk imminent

Low Water Level Alert

Rule Name: low-water-level
Query: water_level < 1.0
Condition: IS BELOW 1.0 FOR 15 minutes
Labels:
  severity: warning
  alertname: Low Water Level
Annotations:
  threshold: 1.0
  message: Drought conditions detected

Data Gap Alert

Rule Name: data-gap
Query:
  SELECT
    station_code,
    MAX(timestamp) as last_seen
  FROM water_measurements
  GROUP BY station_code
  HAVING MAX(timestamp) < now() - interval '2 hours'

Condition: HAS NO DATA FOR 30 minutes
Labels:
  severity: warning
  alertname: Data Gap
  issue: missing-data

Rapid Level Change Alert

Rule Name: rapid-level-change
Query:
  SELECT
    station_code,
    water_level,
    LAG(water_level, 1) OVER (PARTITION BY station_code ORDER BY timestamp) as prev_level
  FROM water_measurements
  WHERE timestamp > now() - interval '15 minutes'
  HAVING ABS(water_level - prev_level) > 0.5

Condition: CHANGE > 0.5m FOR 1 minute
Labels:
  severity: warning
  alertname: Rapid Water Level Change

Step 5: Configure Notification Policy

Create Notification Policy

# Policy Tree
- receiver: matrix-water-alerts
  match:
    severity: emergency|critical
  group_wait: 10s
  group_interval: 5m
  repeat_interval: 30m

- receiver: matrix-water-alerts
  match:
    severity: warning
  group_wait: 30s
  group_interval: 10m
  repeat_interval: 2h

Grouping Rules

group_by: [alertname, station_code]
group_wait: 10s
group_interval: 5m
repeat_interval: 1h

Step 6: Station-Specific Thresholds

Create separate rules for each station with appropriate thresholds:

-- P.1 (Chiang Mai) - Urban area, higher thresholds
SELECT * FROM water_measurements
WHERE station_code = 'P.1' AND water_level > 6.5

-- P.4A (Mae Ping) - Agricultural area
SELECT * FROM water_measurements
WHERE station_code = 'P.4A' AND water_level > 5.0

-- P.20 (Downstream) - Lower threshold
SELECT * FROM water_measurements
WHERE station_code = 'P.20' AND water_level > 4.0

Step 7: Advanced Features

Time-Based Routing

# Different receivers for day/night
time_intervals:
  - name: working_hours
    time_intervals:
      - times:
        - start_time: '08:00'
          end_time: '20:00'
        weekdays: ['monday:friday']

routes:
  - receiver: matrix-alerts-day
    match:
      severity: warning
    active_time_intervals: [working_hours]

  - receiver: matrix-alerts-night
    match:
      severity: warning
    active_time_intervals: ['!working_hours']

Multi-Channel Alerts

# Send critical alerts to multiple rooms
- receiver: matrix-emergency
  webhook_configs:
    - url: https://matrix.org/_matrix/client/v3/rooms/!emergency:matrix.org/send/m.room.message
      http_config:
        authorization:
          credentials: "Bearer EMERGENCY_TOKEN"
    - url: https://matrix.org/_matrix/client/v3/rooms/!general:matrix.org/send/m.room.message
      http_config:
        authorization:
          credentials: "Bearer GENERAL_TOKEN"

Step 8: Testing

Test Contact Point

  1. Go to Contact Points in Grafana
  2. Select your Matrix contact point
  3. Click "Test" button
  4. Check Matrix room for test message

Test Alert Rules

  1. Temporarily lower thresholds
  2. Wait for condition to trigger
  3. Verify alert appears in Grafana
  4. Verify Matrix message received
  5. Reset thresholds

Manual Alert Trigger

# Simulate high water level in database
INSERT INTO water_measurements (station_code, water_level, timestamp)
VALUES ('P.1', 7.5, NOW());

Troubleshooting

Common Issues

403 Forbidden

  • Cause: Invalid Matrix access token
  • Fix: Regenerate token or check permissions

Room Not Found

  • Cause: Incorrect room ID format
  • Fix: Ensure room ID starts with ! and includes homeserver

No Alerts Firing

  • Cause: Query returns no results
  • Fix: Test queries in Grafana Explore, check data availability

Alert Spam

  • Cause: No grouping configured
  • Fix: Configure proper group_by and intervals

Messages Not Formatted

  • Cause: Template syntax errors
  • Fix: Validate JSON template, check Grafana template docs

Debug Steps

  1. Check Grafana alert rule status
  2. Verify contact point test succeeds
  3. Check Grafana logs: /var/log/grafana/grafana.log
  4. Test Matrix API directly with curl
  5. Verify database connectivity and query results

Environment Variables

Add to your .env:

MATRIX_HOMESERVER=https://matrix.org
MATRIX_ACCESS_TOKEN=your_access_token_here
MATRIX_ROOM_ID=!your_room_id:matrix.org
GRAFANA_URL=http://your-grafana-host:3000

Example Alert Message

Your Matrix messages will appear as:

🌊 **PING RIVER WATER ALERT**

**Alert:** High Water Level
**Severity:** CRITICAL
**Station:** P.1 (สถานีเชียงใหม่)

**Status:** FIRING
**Water Level:** 6.75m
**Threshold:** 6.0m
**Time:** 2025-09-26 14:30:00
**Discharge:** 450.2 cms

📈 **Dashboard:** http://grafana:3000
📍 **Location:** Northern Thailand Ping River

Security Notes

  • Store Matrix tokens securely (environment variables)
  • Use room-specific tokens when possible
  • Enable rate limiting to prevent spam
  • Consider using dedicated alerting user account
  • Regularly rotate access tokens

This setup provides comprehensive water level monitoring with immediate Matrix notifications when thresholds are exceeded.