- Implement custom Python alerting system (src/alerting.py) with water level monitoring, data freshness checks, and Matrix notifications - Add complete Grafana Matrix alerting setup guide (docs/GRAFANA_MATRIX_SETUP.md) with webhook configuration, alert rules, and notification policies - Create Matrix quick start guide (docs/MATRIX_QUICK_START.md) for rapid deployment - Integrate alerting commands into main application (--alert-check, --alert-test) - Add Matrix configuration to environment variables (.env.example) - Update Makefile with alerting targets (alert-check, alert-test) - Enhance status command to show Matrix notification status - Support station-specific water level thresholds and escalation rules - Provide dual alerting approach: native Grafana alerts and custom Python system 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
8.8 KiB
8.8 KiB
Complete Grafana Matrix Alerting Setup Guide
Overview
Configure Grafana to send water level alerts directly to Matrix channels when thresholds are exceeded.
Prerequisites
- Grafana instance running (v8.0+)
- PostgreSQL data source configured in Grafana
- Matrix account
- Matrix room for alerts
Step 1: Get Matrix Access Token
Method 1: Using curl
curl -X POST https://matrix.org/_matrix/client/v3/login \
-H "Content-Type: application/json" \
-d '{
"type": "m.login.password",
"user": "your_username",
"password": "your_password"
}'
Method 2: Using Element Web Client
- Open Element in browser: https://app.element.io
- Login to your account
- Go to Settings → Help & About → Advanced
- Copy your Access Token
Method 3: Using Matrix Admin Panel
- If you have admin access to your homeserver, generate token via admin API
Step 2: Create Alert Room
curl -X POST "https://matrix.org/_matrix/client/v3/createRoom" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Water Level Alerts - Northern Thailand",
"topic": "Automated alerts for Ping River water monitoring",
"preset": "private_chat"
}'
Save the room_id
from the response (format: !roomid:homeserver.com)
Step 3: Configure Grafana Contact Point
Navigate to Alerting
- In Grafana, go to Alerting → Contact Points
- Click Add contact point
Contact Point Settings
Name: matrix-water-alerts
Integration: Webhook
URL: https://matrix.org/_matrix/client/v3/rooms/!YOUR_ROOM_ID:matrix.org/send/m.room.message/{{ .GroupLabels.alertname }}_{{ .GroupLabels.severity }}_{{ now.Unix }}
HTTP Method: POST
Headers
Authorization: Bearer YOUR_MATRIX_ACCESS_TOKEN
Content-Type: application/json
Message Template (JSON Body)
{
"msgtype": "m.text",
"body": "🌊 **PING RIVER WATER ALERT**\n\n**Alert:** {{ .GroupLabels.alertname }}\n**Severity:** {{ .GroupLabels.severity | toUpper }}\n**Station:** {{ .GroupLabels.station_code }} ({{ .GroupLabels.station_name }})\n\n{{ range .Alerts }}**Status:** {{ .Status | toUpper }}\n**Water Level:** {{ .Annotations.water_level }}m\n**Threshold:** {{ .Annotations.threshold }}m\n**Time:** {{ .StartsAt.Format \"2006-01-02 15:04:05\" }}\n{{ if .Annotations.discharge }}**Discharge:** {{ .Annotations.discharge }} cms\n{{ end }}{{ if .Annotations.message }}**Details:** {{ .Annotations.message }}\n{{ end }}{{ end }}\n📈 **Dashboard:** {{ .ExternalURL }}\n📍 **Location:** Northern Thailand Ping River"
}
Step 4: Create Alert Rules
High Water Level Alert
# Rule Configuration
Rule Name: high-water-level
Evaluation Group: water-level-alerts
Folder: Water Monitoring
# Query A
SELECT
station_code,
station_name_th as station_name,
water_level,
discharge,
timestamp
FROM water_measurements
WHERE
timestamp > now() - interval '5 minutes'
AND water_level > 6.0
# Condition
IS ABOVE 6.0 FOR 5 minutes
# Labels
severity: critical
alertname: High Water Level
station_code: {{ $labels.station_code }}
station_name: {{ $labels.station_name }}
# Annotations
water_level: {{ $values.water_level }}
threshold: 6.0
discharge: {{ $values.discharge }}
summary: Critical water level detected at {{ $labels.station_code }}
Emergency Water Level Alert
Rule Name: emergency-water-level
Query: water_level > 8.0
Condition: IS ABOVE 8.0 FOR 2 minutes
Labels:
severity: emergency
alertname: Emergency Water Level
Annotations:
threshold: 8.0
message: IMMEDIATE ACTION REQUIRED - Flood risk imminent
Low Water Level Alert
Rule Name: low-water-level
Query: water_level < 1.0
Condition: IS BELOW 1.0 FOR 15 minutes
Labels:
severity: warning
alertname: Low Water Level
Annotations:
threshold: 1.0
message: Drought conditions detected
Data Gap Alert
Rule Name: data-gap
Query:
SELECT
station_code,
MAX(timestamp) as last_seen
FROM water_measurements
GROUP BY station_code
HAVING MAX(timestamp) < now() - interval '2 hours'
Condition: HAS NO DATA FOR 30 minutes
Labels:
severity: warning
alertname: Data Gap
issue: missing-data
Rapid Level Change Alert
Rule Name: rapid-level-change
Query:
SELECT
station_code,
water_level,
LAG(water_level, 1) OVER (PARTITION BY station_code ORDER BY timestamp) as prev_level
FROM water_measurements
WHERE timestamp > now() - interval '15 minutes'
HAVING ABS(water_level - prev_level) > 0.5
Condition: CHANGE > 0.5m FOR 1 minute
Labels:
severity: warning
alertname: Rapid Water Level Change
Step 5: Configure Notification Policy
Create Notification Policy
# Policy Tree
- receiver: matrix-water-alerts
match:
severity: emergency|critical
group_wait: 10s
group_interval: 5m
repeat_interval: 30m
- receiver: matrix-water-alerts
match:
severity: warning
group_wait: 30s
group_interval: 10m
repeat_interval: 2h
Grouping Rules
group_by: [alertname, station_code]
group_wait: 10s
group_interval: 5m
repeat_interval: 1h
Step 6: Station-Specific Thresholds
Create separate rules for each station with appropriate thresholds:
-- P.1 (Chiang Mai) - Urban area, higher thresholds
SELECT * FROM water_measurements
WHERE station_code = 'P.1' AND water_level > 6.5
-- P.4A (Mae Ping) - Agricultural area
SELECT * FROM water_measurements
WHERE station_code = 'P.4A' AND water_level > 5.0
-- P.20 (Downstream) - Lower threshold
SELECT * FROM water_measurements
WHERE station_code = 'P.20' AND water_level > 4.0
Step 7: Advanced Features
Time-Based Routing
# Different receivers for day/night
time_intervals:
- name: working_hours
time_intervals:
- times:
- start_time: '08:00'
end_time: '20:00'
weekdays: ['monday:friday']
routes:
- receiver: matrix-alerts-day
match:
severity: warning
active_time_intervals: [working_hours]
- receiver: matrix-alerts-night
match:
severity: warning
active_time_intervals: ['!working_hours']
Multi-Channel Alerts
# Send critical alerts to multiple rooms
- receiver: matrix-emergency
webhook_configs:
- url: https://matrix.org/_matrix/client/v3/rooms/!emergency:matrix.org/send/m.room.message
http_config:
authorization:
credentials: "Bearer EMERGENCY_TOKEN"
- url: https://matrix.org/_matrix/client/v3/rooms/!general:matrix.org/send/m.room.message
http_config:
authorization:
credentials: "Bearer GENERAL_TOKEN"
Step 8: Testing
Test Contact Point
- Go to Contact Points in Grafana
- Select your Matrix contact point
- Click "Test" button
- Check Matrix room for test message
Test Alert Rules
- Temporarily lower thresholds
- Wait for condition to trigger
- Verify alert appears in Grafana
- Verify Matrix message received
- Reset thresholds
Manual Alert Trigger
# Simulate high water level in database
INSERT INTO water_measurements (station_code, water_level, timestamp)
VALUES ('P.1', 7.5, NOW());
Troubleshooting
Common Issues
403 Forbidden
- Cause: Invalid Matrix access token
- Fix: Regenerate token or check permissions
Room Not Found
- Cause: Incorrect room ID format
- Fix: Ensure room ID starts with ! and includes homeserver
No Alerts Firing
- Cause: Query returns no results
- Fix: Test queries in Grafana Explore, check data availability
Alert Spam
- Cause: No grouping configured
- Fix: Configure proper group_by and intervals
Messages Not Formatted
- Cause: Template syntax errors
- Fix: Validate JSON template, check Grafana template docs
Debug Steps
- Check Grafana alert rule status
- Verify contact point test succeeds
- Check Grafana logs:
/var/log/grafana/grafana.log
- Test Matrix API directly with curl
- Verify database connectivity and query results
Environment Variables
Add to your .env
:
MATRIX_HOMESERVER=https://matrix.org
MATRIX_ACCESS_TOKEN=your_access_token_here
MATRIX_ROOM_ID=!your_room_id:matrix.org
GRAFANA_URL=http://your-grafana-host:3000
Example Alert Message
Your Matrix messages will appear as:
🌊 **PING RIVER WATER ALERT**
**Alert:** High Water Level
**Severity:** CRITICAL
**Station:** P.1 (สถานีเชียงใหม่)
**Status:** FIRING
**Water Level:** 6.75m
**Threshold:** 6.0m
**Time:** 2025-09-26 14:30:00
**Discharge:** 450.2 cms
📈 **Dashboard:** http://grafana:3000
📍 **Location:** Northern Thailand Ping River
Security Notes
- Store Matrix tokens securely (environment variables)
- Use room-specific tokens when possible
- Enable rate limiting to prevent spam
- Consider using dedicated alerting user account
- Regularly rotate access tokens
This setup provides comprehensive water level monitoring with immediate Matrix notifications when thresholds are exceeded.