- Implement custom Python alerting system (src/alerting.py) with water level monitoring, data freshness checks, and Matrix notifications - Add complete Grafana Matrix alerting setup guide (docs/GRAFANA_MATRIX_SETUP.md) with webhook configuration, alert rules, and notification policies - Create Matrix quick start guide (docs/MATRIX_QUICK_START.md) for rapid deployment - Integrate alerting commands into main application (--alert-check, --alert-test) - Add Matrix configuration to environment variables (.env.example) - Update Makefile with alerting targets (alert-check, alert-test) - Enhance status command to show Matrix notification status - Support station-specific water level thresholds and escalation rules - Provide dual alerting approach: native Grafana alerts and custom Python system 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
351 lines
8.8 KiB
Markdown
351 lines
8.8 KiB
Markdown
# Complete Grafana Matrix Alerting Setup Guide
|
|
|
|
## Overview
|
|
Configure Grafana to send water level alerts directly to Matrix channels when thresholds are exceeded.
|
|
|
|
## Prerequisites
|
|
- Grafana instance running (v8.0+)
|
|
- PostgreSQL data source configured in Grafana
|
|
- Matrix account
|
|
- Matrix room for alerts
|
|
|
|
## Step 1: Get Matrix Access Token
|
|
|
|
### Method 1: Using curl
|
|
```bash
|
|
curl -X POST https://matrix.org/_matrix/client/v3/login \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"type": "m.login.password",
|
|
"user": "your_username",
|
|
"password": "your_password"
|
|
}'
|
|
```
|
|
|
|
### Method 2: Using Element Web Client
|
|
1. Open Element in browser: https://app.element.io
|
|
2. Login to your account
|
|
3. Go to Settings → Help & About → Advanced
|
|
4. Copy your Access Token
|
|
|
|
### Method 3: Using Matrix Admin Panel
|
|
- If you have admin access to your homeserver, generate token via admin API
|
|
|
|
## Step 2: Create Alert Room
|
|
|
|
```bash
|
|
curl -X POST "https://matrix.org/_matrix/client/v3/createRoom" \
|
|
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"name": "Water Level Alerts - Northern Thailand",
|
|
"topic": "Automated alerts for Ping River water monitoring",
|
|
"preset": "private_chat"
|
|
}'
|
|
```
|
|
|
|
Save the `room_id` from the response (format: !roomid:homeserver.com)
|
|
|
|
## Step 3: Configure Grafana Contact Point
|
|
|
|
### Navigate to Alerting
|
|
1. In Grafana, go to **Alerting → Contact Points**
|
|
2. Click **Add contact point**
|
|
|
|
### Contact Point Settings
|
|
```
|
|
Name: matrix-water-alerts
|
|
Integration: Webhook
|
|
URL: https://matrix.org/_matrix/client/v3/rooms/!YOUR_ROOM_ID:matrix.org/send/m.room.message/{{ .GroupLabels.alertname }}_{{ .GroupLabels.severity }}_{{ now.Unix }}
|
|
HTTP Method: POST
|
|
```
|
|
|
|
### Headers
|
|
```
|
|
Authorization: Bearer YOUR_MATRIX_ACCESS_TOKEN
|
|
Content-Type: application/json
|
|
```
|
|
|
|
### Message Template (JSON Body)
|
|
```json
|
|
{
|
|
"msgtype": "m.text",
|
|
"body": "🌊 **PING RIVER WATER ALERT**\n\n**Alert:** {{ .GroupLabels.alertname }}\n**Severity:** {{ .GroupLabels.severity | toUpper }}\n**Station:** {{ .GroupLabels.station_code }} ({{ .GroupLabels.station_name }})\n\n{{ range .Alerts }}**Status:** {{ .Status | toUpper }}\n**Water Level:** {{ .Annotations.water_level }}m\n**Threshold:** {{ .Annotations.threshold }}m\n**Time:** {{ .StartsAt.Format \"2006-01-02 15:04:05\" }}\n{{ if .Annotations.discharge }}**Discharge:** {{ .Annotations.discharge }} cms\n{{ end }}{{ if .Annotations.message }}**Details:** {{ .Annotations.message }}\n{{ end }}{{ end }}\n📈 **Dashboard:** {{ .ExternalURL }}\n📍 **Location:** Northern Thailand Ping River"
|
|
}
|
|
```
|
|
|
|
## Step 4: Create Alert Rules
|
|
|
|
### High Water Level Alert
|
|
```yaml
|
|
# Rule Configuration
|
|
Rule Name: high-water-level
|
|
Evaluation Group: water-level-alerts
|
|
Folder: Water Monitoring
|
|
|
|
# Query A
|
|
SELECT
|
|
station_code,
|
|
station_name_th as station_name,
|
|
water_level,
|
|
discharge,
|
|
timestamp
|
|
FROM water_measurements
|
|
WHERE
|
|
timestamp > now() - interval '5 minutes'
|
|
AND water_level > 6.0
|
|
|
|
# Condition
|
|
IS ABOVE 6.0 FOR 5 minutes
|
|
|
|
# Labels
|
|
severity: critical
|
|
alertname: High Water Level
|
|
station_code: {{ $labels.station_code }}
|
|
station_name: {{ $labels.station_name }}
|
|
|
|
# Annotations
|
|
water_level: {{ $values.water_level }}
|
|
threshold: 6.0
|
|
discharge: {{ $values.discharge }}
|
|
summary: Critical water level detected at {{ $labels.station_code }}
|
|
```
|
|
|
|
### Emergency Water Level Alert
|
|
```yaml
|
|
Rule Name: emergency-water-level
|
|
Query: water_level > 8.0
|
|
Condition: IS ABOVE 8.0 FOR 2 minutes
|
|
Labels:
|
|
severity: emergency
|
|
alertname: Emergency Water Level
|
|
Annotations:
|
|
threshold: 8.0
|
|
message: IMMEDIATE ACTION REQUIRED - Flood risk imminent
|
|
```
|
|
|
|
### Low Water Level Alert
|
|
```yaml
|
|
Rule Name: low-water-level
|
|
Query: water_level < 1.0
|
|
Condition: IS BELOW 1.0 FOR 15 minutes
|
|
Labels:
|
|
severity: warning
|
|
alertname: Low Water Level
|
|
Annotations:
|
|
threshold: 1.0
|
|
message: Drought conditions detected
|
|
```
|
|
|
|
### Data Gap Alert
|
|
```yaml
|
|
Rule Name: data-gap
|
|
Query:
|
|
SELECT
|
|
station_code,
|
|
MAX(timestamp) as last_seen
|
|
FROM water_measurements
|
|
GROUP BY station_code
|
|
HAVING MAX(timestamp) < now() - interval '2 hours'
|
|
|
|
Condition: HAS NO DATA FOR 30 minutes
|
|
Labels:
|
|
severity: warning
|
|
alertname: Data Gap
|
|
issue: missing-data
|
|
```
|
|
|
|
### Rapid Level Change Alert
|
|
```yaml
|
|
Rule Name: rapid-level-change
|
|
Query:
|
|
SELECT
|
|
station_code,
|
|
water_level,
|
|
LAG(water_level, 1) OVER (PARTITION BY station_code ORDER BY timestamp) as prev_level
|
|
FROM water_measurements
|
|
WHERE timestamp > now() - interval '15 minutes'
|
|
HAVING ABS(water_level - prev_level) > 0.5
|
|
|
|
Condition: CHANGE > 0.5m FOR 1 minute
|
|
Labels:
|
|
severity: warning
|
|
alertname: Rapid Water Level Change
|
|
```
|
|
|
|
## Step 5: Configure Notification Policy
|
|
|
|
### Create Notification Policy
|
|
```yaml
|
|
# Policy Tree
|
|
- receiver: matrix-water-alerts
|
|
match:
|
|
severity: emergency|critical
|
|
group_wait: 10s
|
|
group_interval: 5m
|
|
repeat_interval: 30m
|
|
|
|
- receiver: matrix-water-alerts
|
|
match:
|
|
severity: warning
|
|
group_wait: 30s
|
|
group_interval: 10m
|
|
repeat_interval: 2h
|
|
```
|
|
|
|
### Grouping Rules
|
|
```yaml
|
|
group_by: [alertname, station_code]
|
|
group_wait: 10s
|
|
group_interval: 5m
|
|
repeat_interval: 1h
|
|
```
|
|
|
|
## Step 6: Station-Specific Thresholds
|
|
|
|
Create separate rules for each station with appropriate thresholds:
|
|
|
|
```sql
|
|
-- P.1 (Chiang Mai) - Urban area, higher thresholds
|
|
SELECT * FROM water_measurements
|
|
WHERE station_code = 'P.1' AND water_level > 6.5
|
|
|
|
-- P.4A (Mae Ping) - Agricultural area
|
|
SELECT * FROM water_measurements
|
|
WHERE station_code = 'P.4A' AND water_level > 5.0
|
|
|
|
-- P.20 (Downstream) - Lower threshold
|
|
SELECT * FROM water_measurements
|
|
WHERE station_code = 'P.20' AND water_level > 4.0
|
|
```
|
|
|
|
## Step 7: Advanced Features
|
|
|
|
### Time-Based Routing
|
|
```yaml
|
|
# Different receivers for day/night
|
|
time_intervals:
|
|
- name: working_hours
|
|
time_intervals:
|
|
- times:
|
|
- start_time: '08:00'
|
|
end_time: '20:00'
|
|
weekdays: ['monday:friday']
|
|
|
|
routes:
|
|
- receiver: matrix-alerts-day
|
|
match:
|
|
severity: warning
|
|
active_time_intervals: [working_hours]
|
|
|
|
- receiver: matrix-alerts-night
|
|
match:
|
|
severity: warning
|
|
active_time_intervals: ['!working_hours']
|
|
```
|
|
|
|
### Multi-Channel Alerts
|
|
```yaml
|
|
# Send critical alerts to multiple rooms
|
|
- receiver: matrix-emergency
|
|
webhook_configs:
|
|
- url: https://matrix.org/_matrix/client/v3/rooms/!emergency:matrix.org/send/m.room.message
|
|
http_config:
|
|
authorization:
|
|
credentials: "Bearer EMERGENCY_TOKEN"
|
|
- url: https://matrix.org/_matrix/client/v3/rooms/!general:matrix.org/send/m.room.message
|
|
http_config:
|
|
authorization:
|
|
credentials: "Bearer GENERAL_TOKEN"
|
|
```
|
|
|
|
## Step 8: Testing
|
|
|
|
### Test Contact Point
|
|
1. Go to Contact Points in Grafana
|
|
2. Select your Matrix contact point
|
|
3. Click "Test" button
|
|
4. Check Matrix room for test message
|
|
|
|
### Test Alert Rules
|
|
1. Temporarily lower thresholds
|
|
2. Wait for condition to trigger
|
|
3. Verify alert appears in Grafana
|
|
4. Verify Matrix message received
|
|
5. Reset thresholds
|
|
|
|
### Manual Alert Trigger
|
|
```bash
|
|
# Simulate high water level in database
|
|
INSERT INTO water_measurements (station_code, water_level, timestamp)
|
|
VALUES ('P.1', 7.5, NOW());
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
#### 403 Forbidden
|
|
- **Cause**: Invalid Matrix access token
|
|
- **Fix**: Regenerate token or check permissions
|
|
|
|
#### Room Not Found
|
|
- **Cause**: Incorrect room ID format
|
|
- **Fix**: Ensure room ID starts with ! and includes homeserver
|
|
|
|
#### No Alerts Firing
|
|
- **Cause**: Query returns no results
|
|
- **Fix**: Test queries in Grafana Explore, check data availability
|
|
|
|
#### Alert Spam
|
|
- **Cause**: No grouping configured
|
|
- **Fix**: Configure proper group_by and intervals
|
|
|
|
#### Messages Not Formatted
|
|
- **Cause**: Template syntax errors
|
|
- **Fix**: Validate JSON template, check Grafana template docs
|
|
|
|
### Debug Steps
|
|
1. Check Grafana alert rule status
|
|
2. Verify contact point test succeeds
|
|
3. Check Grafana logs: `/var/log/grafana/grafana.log`
|
|
4. Test Matrix API directly with curl
|
|
5. Verify database connectivity and query results
|
|
|
|
## Environment Variables
|
|
|
|
Add to your `.env`:
|
|
```bash
|
|
MATRIX_HOMESERVER=https://matrix.org
|
|
MATRIX_ACCESS_TOKEN=your_access_token_here
|
|
MATRIX_ROOM_ID=!your_room_id:matrix.org
|
|
GRAFANA_URL=http://your-grafana-host:3000
|
|
```
|
|
|
|
## Example Alert Message
|
|
Your Matrix messages will appear as:
|
|
```
|
|
🌊 **PING RIVER WATER ALERT**
|
|
|
|
**Alert:** High Water Level
|
|
**Severity:** CRITICAL
|
|
**Station:** P.1 (สถานีเชียงใหม่)
|
|
|
|
**Status:** FIRING
|
|
**Water Level:** 6.75m
|
|
**Threshold:** 6.0m
|
|
**Time:** 2025-09-26 14:30:00
|
|
**Discharge:** 450.2 cms
|
|
|
|
📈 **Dashboard:** http://grafana:3000
|
|
📍 **Location:** Northern Thailand Ping River
|
|
```
|
|
|
|
## Security Notes
|
|
- Store Matrix tokens securely (environment variables)
|
|
- Use room-specific tokens when possible
|
|
- Enable rate limiting to prevent spam
|
|
- Consider using dedicated alerting user account
|
|
- Regularly rotate access tokens
|
|
|
|
This setup provides comprehensive water level monitoring with immediate Matrix notifications when thresholds are exceeded. |