Files
Northern-Thailand-Ping-Rive…/docs/GRAFANA_MATRIX_SETUP.md
grabowski ca730e484b Add comprehensive Matrix alerting system with Grafana integration
- Implement custom Python alerting system (src/alerting.py) with water level monitoring, data freshness checks, and Matrix notifications
- Add complete Grafana Matrix alerting setup guide (docs/GRAFANA_MATRIX_SETUP.md) with webhook configuration, alert rules, and notification policies
- Create Matrix quick start guide (docs/MATRIX_QUICK_START.md) for rapid deployment
- Integrate alerting commands into main application (--alert-check, --alert-test)
- Add Matrix configuration to environment variables (.env.example)
- Update Makefile with alerting targets (alert-check, alert-test)
- Enhance status command to show Matrix notification status
- Support station-specific water level thresholds and escalation rules
- Provide dual alerting approach: native Grafana alerts and custom Python system

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-26 16:18:02 +07:00

351 lines
8.8 KiB
Markdown

# Complete Grafana Matrix Alerting Setup Guide
## Overview
Configure Grafana to send water level alerts directly to Matrix channels when thresholds are exceeded.
## Prerequisites
- Grafana instance running (v8.0+)
- PostgreSQL data source configured in Grafana
- Matrix account
- Matrix room for alerts
## Step 1: Get Matrix Access Token
### Method 1: Using curl
```bash
curl -X POST https://matrix.org/_matrix/client/v3/login \
-H "Content-Type: application/json" \
-d '{
"type": "m.login.password",
"user": "your_username",
"password": "your_password"
}'
```
### Method 2: Using Element Web Client
1. Open Element in browser: https://app.element.io
2. Login to your account
3. Go to Settings → Help & About → Advanced
4. Copy your Access Token
### Method 3: Using Matrix Admin Panel
- If you have admin access to your homeserver, generate token via admin API
## Step 2: Create Alert Room
```bash
curl -X POST "https://matrix.org/_matrix/client/v3/createRoom" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Water Level Alerts - Northern Thailand",
"topic": "Automated alerts for Ping River water monitoring",
"preset": "private_chat"
}'
```
Save the `room_id` from the response (format: !roomid:homeserver.com)
## Step 3: Configure Grafana Contact Point
### Navigate to Alerting
1. In Grafana, go to **Alerting → Contact Points**
2. Click **Add contact point**
### Contact Point Settings
```
Name: matrix-water-alerts
Integration: Webhook
URL: https://matrix.org/_matrix/client/v3/rooms/!YOUR_ROOM_ID:matrix.org/send/m.room.message/{{ .GroupLabels.alertname }}_{{ .GroupLabels.severity }}_{{ now.Unix }}
HTTP Method: POST
```
### Headers
```
Authorization: Bearer YOUR_MATRIX_ACCESS_TOKEN
Content-Type: application/json
```
### Message Template (JSON Body)
```json
{
"msgtype": "m.text",
"body": "🌊 **PING RIVER WATER ALERT**\n\n**Alert:** {{ .GroupLabels.alertname }}\n**Severity:** {{ .GroupLabels.severity | toUpper }}\n**Station:** {{ .GroupLabels.station_code }} ({{ .GroupLabels.station_name }})\n\n{{ range .Alerts }}**Status:** {{ .Status | toUpper }}\n**Water Level:** {{ .Annotations.water_level }}m\n**Threshold:** {{ .Annotations.threshold }}m\n**Time:** {{ .StartsAt.Format \"2006-01-02 15:04:05\" }}\n{{ if .Annotations.discharge }}**Discharge:** {{ .Annotations.discharge }} cms\n{{ end }}{{ if .Annotations.message }}**Details:** {{ .Annotations.message }}\n{{ end }}{{ end }}\n📈 **Dashboard:** {{ .ExternalURL }}\n📍 **Location:** Northern Thailand Ping River"
}
```
## Step 4: Create Alert Rules
### High Water Level Alert
```yaml
# Rule Configuration
Rule Name: high-water-level
Evaluation Group: water-level-alerts
Folder: Water Monitoring
# Query A
SELECT
station_code,
station_name_th as station_name,
water_level,
discharge,
timestamp
FROM water_measurements
WHERE
timestamp > now() - interval '5 minutes'
AND water_level > 6.0
# Condition
IS ABOVE 6.0 FOR 5 minutes
# Labels
severity: critical
alertname: High Water Level
station_code: {{ $labels.station_code }}
station_name: {{ $labels.station_name }}
# Annotations
water_level: {{ $values.water_level }}
threshold: 6.0
discharge: {{ $values.discharge }}
summary: Critical water level detected at {{ $labels.station_code }}
```
### Emergency Water Level Alert
```yaml
Rule Name: emergency-water-level
Query: water_level > 8.0
Condition: IS ABOVE 8.0 FOR 2 minutes
Labels:
severity: emergency
alertname: Emergency Water Level
Annotations:
threshold: 8.0
message: IMMEDIATE ACTION REQUIRED - Flood risk imminent
```
### Low Water Level Alert
```yaml
Rule Name: low-water-level
Query: water_level < 1.0
Condition: IS BELOW 1.0 FOR 15 minutes
Labels:
severity: warning
alertname: Low Water Level
Annotations:
threshold: 1.0
message: Drought conditions detected
```
### Data Gap Alert
```yaml
Rule Name: data-gap
Query:
SELECT
station_code,
MAX(timestamp) as last_seen
FROM water_measurements
GROUP BY station_code
HAVING MAX(timestamp) < now() - interval '2 hours'
Condition: HAS NO DATA FOR 30 minutes
Labels:
severity: warning
alertname: Data Gap
issue: missing-data
```
### Rapid Level Change Alert
```yaml
Rule Name: rapid-level-change
Query:
SELECT
station_code,
water_level,
LAG(water_level, 1) OVER (PARTITION BY station_code ORDER BY timestamp) as prev_level
FROM water_measurements
WHERE timestamp > now() - interval '15 minutes'
HAVING ABS(water_level - prev_level) > 0.5
Condition: CHANGE > 0.5m FOR 1 minute
Labels:
severity: warning
alertname: Rapid Water Level Change
```
## Step 5: Configure Notification Policy
### Create Notification Policy
```yaml
# Policy Tree
- receiver: matrix-water-alerts
match:
severity: emergency|critical
group_wait: 10s
group_interval: 5m
repeat_interval: 30m
- receiver: matrix-water-alerts
match:
severity: warning
group_wait: 30s
group_interval: 10m
repeat_interval: 2h
```
### Grouping Rules
```yaml
group_by: [alertname, station_code]
group_wait: 10s
group_interval: 5m
repeat_interval: 1h
```
## Step 6: Station-Specific Thresholds
Create separate rules for each station with appropriate thresholds:
```sql
-- P.1 (Chiang Mai) - Urban area, higher thresholds
SELECT * FROM water_measurements
WHERE station_code = 'P.1' AND water_level > 6.5
-- P.4A (Mae Ping) - Agricultural area
SELECT * FROM water_measurements
WHERE station_code = 'P.4A' AND water_level > 5.0
-- P.20 (Downstream) - Lower threshold
SELECT * FROM water_measurements
WHERE station_code = 'P.20' AND water_level > 4.0
```
## Step 7: Advanced Features
### Time-Based Routing
```yaml
# Different receivers for day/night
time_intervals:
- name: working_hours
time_intervals:
- times:
- start_time: '08:00'
end_time: '20:00'
weekdays: ['monday:friday']
routes:
- receiver: matrix-alerts-day
match:
severity: warning
active_time_intervals: [working_hours]
- receiver: matrix-alerts-night
match:
severity: warning
active_time_intervals: ['!working_hours']
```
### Multi-Channel Alerts
```yaml
# Send critical alerts to multiple rooms
- receiver: matrix-emergency
webhook_configs:
- url: https://matrix.org/_matrix/client/v3/rooms/!emergency:matrix.org/send/m.room.message
http_config:
authorization:
credentials: "Bearer EMERGENCY_TOKEN"
- url: https://matrix.org/_matrix/client/v3/rooms/!general:matrix.org/send/m.room.message
http_config:
authorization:
credentials: "Bearer GENERAL_TOKEN"
```
## Step 8: Testing
### Test Contact Point
1. Go to Contact Points in Grafana
2. Select your Matrix contact point
3. Click "Test" button
4. Check Matrix room for test message
### Test Alert Rules
1. Temporarily lower thresholds
2. Wait for condition to trigger
3. Verify alert appears in Grafana
4. Verify Matrix message received
5. Reset thresholds
### Manual Alert Trigger
```bash
# Simulate high water level in database
INSERT INTO water_measurements (station_code, water_level, timestamp)
VALUES ('P.1', 7.5, NOW());
```
## Troubleshooting
### Common Issues
#### 403 Forbidden
- **Cause**: Invalid Matrix access token
- **Fix**: Regenerate token or check permissions
#### Room Not Found
- **Cause**: Incorrect room ID format
- **Fix**: Ensure room ID starts with ! and includes homeserver
#### No Alerts Firing
- **Cause**: Query returns no results
- **Fix**: Test queries in Grafana Explore, check data availability
#### Alert Spam
- **Cause**: No grouping configured
- **Fix**: Configure proper group_by and intervals
#### Messages Not Formatted
- **Cause**: Template syntax errors
- **Fix**: Validate JSON template, check Grafana template docs
### Debug Steps
1. Check Grafana alert rule status
2. Verify contact point test succeeds
3. Check Grafana logs: `/var/log/grafana/grafana.log`
4. Test Matrix API directly with curl
5. Verify database connectivity and query results
## Environment Variables
Add to your `.env`:
```bash
MATRIX_HOMESERVER=https://matrix.org
MATRIX_ACCESS_TOKEN=your_access_token_here
MATRIX_ROOM_ID=!your_room_id:matrix.org
GRAFANA_URL=http://your-grafana-host:3000
```
## Example Alert Message
Your Matrix messages will appear as:
```
🌊 **PING RIVER WATER ALERT**
**Alert:** High Water Level
**Severity:** CRITICAL
**Station:** P.1 (สถานีเชียงใหม่)
**Status:** FIRING
**Water Level:** 6.75m
**Threshold:** 6.0m
**Time:** 2025-09-26 14:30:00
**Discharge:** 450.2 cms
📈 **Dashboard:** http://grafana:3000
📍 **Location:** Northern Thailand Ping River
```
## Security Notes
- Store Matrix tokens securely (environment variables)
- Use room-specific tokens when possible
- Enable rate limiting to prevent spam
- Consider using dedicated alerting user account
- Regularly rotate access tokens
This setup provides comprehensive water level monitoring with immediate Matrix notifications when thresholds are exceeded.