Add comprehensive Matrix alerting system with Grafana integration
- Implement custom Python alerting system (src/alerting.py) with water level monitoring, data freshness checks, and Matrix notifications - Add complete Grafana Matrix alerting setup guide (docs/GRAFANA_MATRIX_SETUP.md) with webhook configuration, alert rules, and notification policies - Create Matrix quick start guide (docs/MATRIX_QUICK_START.md) for rapid deployment - Integrate alerting commands into main application (--alert-check, --alert-test) - Add Matrix configuration to environment variables (.env.example) - Update Makefile with alerting targets (alert-check, alert-test) - Enhance status command to show Matrix notification status - Support station-specific water level thresholds and escalation rules - Provide dual alerting approach: native Grafana alerts and custom Python system 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
351
docs/GRAFANA_MATRIX_SETUP.md
Normal file
351
docs/GRAFANA_MATRIX_SETUP.md
Normal file
@@ -0,0 +1,351 @@
|
||||
# Complete Grafana Matrix Alerting Setup Guide
|
||||
|
||||
## Overview
|
||||
Configure Grafana to send water level alerts directly to Matrix channels when thresholds are exceeded.
|
||||
|
||||
## Prerequisites
|
||||
- Grafana instance running (v8.0+)
|
||||
- PostgreSQL data source configured in Grafana
|
||||
- Matrix account
|
||||
- Matrix room for alerts
|
||||
|
||||
## Step 1: Get Matrix Access Token
|
||||
|
||||
### Method 1: Using curl
|
||||
```bash
|
||||
curl -X POST https://matrix.org/_matrix/client/v3/login \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"type": "m.login.password",
|
||||
"user": "your_username",
|
||||
"password": "your_password"
|
||||
}'
|
||||
```
|
||||
|
||||
### Method 2: Using Element Web Client
|
||||
1. Open Element in browser: https://app.element.io
|
||||
2. Login to your account
|
||||
3. Go to Settings → Help & About → Advanced
|
||||
4. Copy your Access Token
|
||||
|
||||
### Method 3: Using Matrix Admin Panel
|
||||
- If you have admin access to your homeserver, generate token via admin API
|
||||
|
||||
## Step 2: Create Alert Room
|
||||
|
||||
```bash
|
||||
curl -X POST "https://matrix.org/_matrix/client/v3/createRoom" \
|
||||
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "Water Level Alerts - Northern Thailand",
|
||||
"topic": "Automated alerts for Ping River water monitoring",
|
||||
"preset": "private_chat"
|
||||
}'
|
||||
```
|
||||
|
||||
Save the `room_id` from the response (format: !roomid:homeserver.com)
|
||||
|
||||
## Step 3: Configure Grafana Contact Point
|
||||
|
||||
### Navigate to Alerting
|
||||
1. In Grafana, go to **Alerting → Contact Points**
|
||||
2. Click **Add contact point**
|
||||
|
||||
### Contact Point Settings
|
||||
```
|
||||
Name: matrix-water-alerts
|
||||
Integration: Webhook
|
||||
URL: https://matrix.org/_matrix/client/v3/rooms/!YOUR_ROOM_ID:matrix.org/send/m.room.message/{{ .GroupLabels.alertname }}_{{ .GroupLabels.severity }}_{{ now.Unix }}
|
||||
HTTP Method: POST
|
||||
```
|
||||
|
||||
### Headers
|
||||
```
|
||||
Authorization: Bearer YOUR_MATRIX_ACCESS_TOKEN
|
||||
Content-Type: application/json
|
||||
```
|
||||
|
||||
### Message Template (JSON Body)
|
||||
```json
|
||||
{
|
||||
"msgtype": "m.text",
|
||||
"body": "🌊 **PING RIVER WATER ALERT**\n\n**Alert:** {{ .GroupLabels.alertname }}\n**Severity:** {{ .GroupLabels.severity | toUpper }}\n**Station:** {{ .GroupLabels.station_code }} ({{ .GroupLabels.station_name }})\n\n{{ range .Alerts }}**Status:** {{ .Status | toUpper }}\n**Water Level:** {{ .Annotations.water_level }}m\n**Threshold:** {{ .Annotations.threshold }}m\n**Time:** {{ .StartsAt.Format \"2006-01-02 15:04:05\" }}\n{{ if .Annotations.discharge }}**Discharge:** {{ .Annotations.discharge }} cms\n{{ end }}{{ if .Annotations.message }}**Details:** {{ .Annotations.message }}\n{{ end }}{{ end }}\n📈 **Dashboard:** {{ .ExternalURL }}\n📍 **Location:** Northern Thailand Ping River"
|
||||
}
|
||||
```
|
||||
|
||||
## Step 4: Create Alert Rules
|
||||
|
||||
### High Water Level Alert
|
||||
```yaml
|
||||
# Rule Configuration
|
||||
Rule Name: high-water-level
|
||||
Evaluation Group: water-level-alerts
|
||||
Folder: Water Monitoring
|
||||
|
||||
# Query A
|
||||
SELECT
|
||||
station_code,
|
||||
station_name_th as station_name,
|
||||
water_level,
|
||||
discharge,
|
||||
timestamp
|
||||
FROM water_measurements
|
||||
WHERE
|
||||
timestamp > now() - interval '5 minutes'
|
||||
AND water_level > 6.0
|
||||
|
||||
# Condition
|
||||
IS ABOVE 6.0 FOR 5 minutes
|
||||
|
||||
# Labels
|
||||
severity: critical
|
||||
alertname: High Water Level
|
||||
station_code: {{ $labels.station_code }}
|
||||
station_name: {{ $labels.station_name }}
|
||||
|
||||
# Annotations
|
||||
water_level: {{ $values.water_level }}
|
||||
threshold: 6.0
|
||||
discharge: {{ $values.discharge }}
|
||||
summary: Critical water level detected at {{ $labels.station_code }}
|
||||
```
|
||||
|
||||
### Emergency Water Level Alert
|
||||
```yaml
|
||||
Rule Name: emergency-water-level
|
||||
Query: water_level > 8.0
|
||||
Condition: IS ABOVE 8.0 FOR 2 minutes
|
||||
Labels:
|
||||
severity: emergency
|
||||
alertname: Emergency Water Level
|
||||
Annotations:
|
||||
threshold: 8.0
|
||||
message: IMMEDIATE ACTION REQUIRED - Flood risk imminent
|
||||
```
|
||||
|
||||
### Low Water Level Alert
|
||||
```yaml
|
||||
Rule Name: low-water-level
|
||||
Query: water_level < 1.0
|
||||
Condition: IS BELOW 1.0 FOR 15 minutes
|
||||
Labels:
|
||||
severity: warning
|
||||
alertname: Low Water Level
|
||||
Annotations:
|
||||
threshold: 1.0
|
||||
message: Drought conditions detected
|
||||
```
|
||||
|
||||
### Data Gap Alert
|
||||
```yaml
|
||||
Rule Name: data-gap
|
||||
Query:
|
||||
SELECT
|
||||
station_code,
|
||||
MAX(timestamp) as last_seen
|
||||
FROM water_measurements
|
||||
GROUP BY station_code
|
||||
HAVING MAX(timestamp) < now() - interval '2 hours'
|
||||
|
||||
Condition: HAS NO DATA FOR 30 minutes
|
||||
Labels:
|
||||
severity: warning
|
||||
alertname: Data Gap
|
||||
issue: missing-data
|
||||
```
|
||||
|
||||
### Rapid Level Change Alert
|
||||
```yaml
|
||||
Rule Name: rapid-level-change
|
||||
Query:
|
||||
SELECT
|
||||
station_code,
|
||||
water_level,
|
||||
LAG(water_level, 1) OVER (PARTITION BY station_code ORDER BY timestamp) as prev_level
|
||||
FROM water_measurements
|
||||
WHERE timestamp > now() - interval '15 minutes'
|
||||
HAVING ABS(water_level - prev_level) > 0.5
|
||||
|
||||
Condition: CHANGE > 0.5m FOR 1 minute
|
||||
Labels:
|
||||
severity: warning
|
||||
alertname: Rapid Water Level Change
|
||||
```
|
||||
|
||||
## Step 5: Configure Notification Policy
|
||||
|
||||
### Create Notification Policy
|
||||
```yaml
|
||||
# Policy Tree
|
||||
- receiver: matrix-water-alerts
|
||||
match:
|
||||
severity: emergency|critical
|
||||
group_wait: 10s
|
||||
group_interval: 5m
|
||||
repeat_interval: 30m
|
||||
|
||||
- receiver: matrix-water-alerts
|
||||
match:
|
||||
severity: warning
|
||||
group_wait: 30s
|
||||
group_interval: 10m
|
||||
repeat_interval: 2h
|
||||
```
|
||||
|
||||
### Grouping Rules
|
||||
```yaml
|
||||
group_by: [alertname, station_code]
|
||||
group_wait: 10s
|
||||
group_interval: 5m
|
||||
repeat_interval: 1h
|
||||
```
|
||||
|
||||
## Step 6: Station-Specific Thresholds
|
||||
|
||||
Create separate rules for each station with appropriate thresholds:
|
||||
|
||||
```sql
|
||||
-- P.1 (Chiang Mai) - Urban area, higher thresholds
|
||||
SELECT * FROM water_measurements
|
||||
WHERE station_code = 'P.1' AND water_level > 6.5
|
||||
|
||||
-- P.4A (Mae Ping) - Agricultural area
|
||||
SELECT * FROM water_measurements
|
||||
WHERE station_code = 'P.4A' AND water_level > 5.0
|
||||
|
||||
-- P.20 (Downstream) - Lower threshold
|
||||
SELECT * FROM water_measurements
|
||||
WHERE station_code = 'P.20' AND water_level > 4.0
|
||||
```
|
||||
|
||||
## Step 7: Advanced Features
|
||||
|
||||
### Time-Based Routing
|
||||
```yaml
|
||||
# Different receivers for day/night
|
||||
time_intervals:
|
||||
- name: working_hours
|
||||
time_intervals:
|
||||
- times:
|
||||
- start_time: '08:00'
|
||||
end_time: '20:00'
|
||||
weekdays: ['monday:friday']
|
||||
|
||||
routes:
|
||||
- receiver: matrix-alerts-day
|
||||
match:
|
||||
severity: warning
|
||||
active_time_intervals: [working_hours]
|
||||
|
||||
- receiver: matrix-alerts-night
|
||||
match:
|
||||
severity: warning
|
||||
active_time_intervals: ['!working_hours']
|
||||
```
|
||||
|
||||
### Multi-Channel Alerts
|
||||
```yaml
|
||||
# Send critical alerts to multiple rooms
|
||||
- receiver: matrix-emergency
|
||||
webhook_configs:
|
||||
- url: https://matrix.org/_matrix/client/v3/rooms/!emergency:matrix.org/send/m.room.message
|
||||
http_config:
|
||||
authorization:
|
||||
credentials: "Bearer EMERGENCY_TOKEN"
|
||||
- url: https://matrix.org/_matrix/client/v3/rooms/!general:matrix.org/send/m.room.message
|
||||
http_config:
|
||||
authorization:
|
||||
credentials: "Bearer GENERAL_TOKEN"
|
||||
```
|
||||
|
||||
## Step 8: Testing
|
||||
|
||||
### Test Contact Point
|
||||
1. Go to Contact Points in Grafana
|
||||
2. Select your Matrix contact point
|
||||
3. Click "Test" button
|
||||
4. Check Matrix room for test message
|
||||
|
||||
### Test Alert Rules
|
||||
1. Temporarily lower thresholds
|
||||
2. Wait for condition to trigger
|
||||
3. Verify alert appears in Grafana
|
||||
4. Verify Matrix message received
|
||||
5. Reset thresholds
|
||||
|
||||
### Manual Alert Trigger
|
||||
```bash
|
||||
# Simulate high water level in database
|
||||
INSERT INTO water_measurements (station_code, water_level, timestamp)
|
||||
VALUES ('P.1', 7.5, NOW());
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
#### 403 Forbidden
|
||||
- **Cause**: Invalid Matrix access token
|
||||
- **Fix**: Regenerate token or check permissions
|
||||
|
||||
#### Room Not Found
|
||||
- **Cause**: Incorrect room ID format
|
||||
- **Fix**: Ensure room ID starts with ! and includes homeserver
|
||||
|
||||
#### No Alerts Firing
|
||||
- **Cause**: Query returns no results
|
||||
- **Fix**: Test queries in Grafana Explore, check data availability
|
||||
|
||||
#### Alert Spam
|
||||
- **Cause**: No grouping configured
|
||||
- **Fix**: Configure proper group_by and intervals
|
||||
|
||||
#### Messages Not Formatted
|
||||
- **Cause**: Template syntax errors
|
||||
- **Fix**: Validate JSON template, check Grafana template docs
|
||||
|
||||
### Debug Steps
|
||||
1. Check Grafana alert rule status
|
||||
2. Verify contact point test succeeds
|
||||
3. Check Grafana logs: `/var/log/grafana/grafana.log`
|
||||
4. Test Matrix API directly with curl
|
||||
5. Verify database connectivity and query results
|
||||
|
||||
## Environment Variables
|
||||
|
||||
Add to your `.env`:
|
||||
```bash
|
||||
MATRIX_HOMESERVER=https://matrix.org
|
||||
MATRIX_ACCESS_TOKEN=your_access_token_here
|
||||
MATRIX_ROOM_ID=!your_room_id:matrix.org
|
||||
GRAFANA_URL=http://your-grafana-host:3000
|
||||
```
|
||||
|
||||
## Example Alert Message
|
||||
Your Matrix messages will appear as:
|
||||
```
|
||||
🌊 **PING RIVER WATER ALERT**
|
||||
|
||||
**Alert:** High Water Level
|
||||
**Severity:** CRITICAL
|
||||
**Station:** P.1 (สถานีเชียงใหม่)
|
||||
|
||||
**Status:** FIRING
|
||||
**Water Level:** 6.75m
|
||||
**Threshold:** 6.0m
|
||||
**Time:** 2025-09-26 14:30:00
|
||||
**Discharge:** 450.2 cms
|
||||
|
||||
📈 **Dashboard:** http://grafana:3000
|
||||
📍 **Location:** Northern Thailand Ping River
|
||||
```
|
||||
|
||||
## Security Notes
|
||||
- Store Matrix tokens securely (environment variables)
|
||||
- Use room-specific tokens when possible
|
||||
- Enable rate limiting to prevent spam
|
||||
- Consider using dedicated alerting user account
|
||||
- Regularly rotate access tokens
|
||||
|
||||
This setup provides comprehensive water level monitoring with immediate Matrix notifications when thresholds are exceeded.
|
Reference in New Issue
Block a user