Add comprehensive Matrix alerting system with Grafana integration
- Implement custom Python alerting system (src/alerting.py) with water level monitoring, data freshness checks, and Matrix notifications - Add complete Grafana Matrix alerting setup guide (docs/GRAFANA_MATRIX_SETUP.md) with webhook configuration, alert rules, and notification policies - Create Matrix quick start guide (docs/MATRIX_QUICK_START.md) for rapid deployment - Integrate alerting commands into main application (--alert-check, --alert-test) - Add Matrix configuration to environment variables (.env.example) - Update Makefile with alerting targets (alert-check, alert-test) - Enhance status command to show Matrix notification status - Support station-specific water level thresholds and escalation rules - Provide dual alerting approach: native Grafana alerts and custom Python system 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
168
docs/GRAFANA_MATRIX_ALERTING.md
Normal file
168
docs/GRAFANA_MATRIX_ALERTING.md
Normal file
@@ -0,0 +1,168 @@
|
||||
# Grafana Matrix Alerting Setup
|
||||
|
||||
## Overview
|
||||
Configure Grafana to send water level alerts directly to Matrix channels when thresholds are exceeded.
|
||||
|
||||
## Prerequisites
|
||||
- Grafana instance with your PostgreSQL data source
|
||||
- Matrix account and access token
|
||||
- Matrix room for alerts
|
||||
|
||||
## Step 1: Configure Matrix Contact Point
|
||||
|
||||
1. **In Grafana, go to Alerting → Contact Points**
|
||||
2. **Add new contact point:**
|
||||
```
|
||||
Name: matrix-water-alerts
|
||||
Integration: Webhook
|
||||
URL: https://matrix.org/_matrix/client/v3/rooms/!ROOM_ID:matrix.org/send/m.room.message
|
||||
HTTP Method: POST
|
||||
```
|
||||
|
||||
3. **Add Headers:**
|
||||
```
|
||||
Authorization: Bearer YOUR_MATRIX_ACCESS_TOKEN
|
||||
Content-Type: application/json
|
||||
```
|
||||
|
||||
4. **Message Template:**
|
||||
```json
|
||||
{
|
||||
"msgtype": "m.text",
|
||||
"body": "🌊 WATER ALERT: {{ .CommonLabels.alertname }}\n\nStation: {{ .CommonLabels.station_code }}\nLevel: {{ .CommonAnnotations.water_level }}m\nStatus: {{ .CommonLabels.severity }}\n\nTime: {{ .CommonAnnotations.time }}"
|
||||
}
|
||||
```
|
||||
|
||||
## Step 2: Create Alert Rules
|
||||
|
||||
### High Water Level Alert
|
||||
```yaml
|
||||
Rule Name: high-water-level
|
||||
Query: water_level > 6.0
|
||||
Condition: IS ABOVE 6.0 FOR 5m
|
||||
Labels:
|
||||
- severity: critical
|
||||
- station_code: {{ .station_code }}
|
||||
Annotations:
|
||||
- water_level: {{ .water_level }}
|
||||
- summary: "Critical water level at {{ .station_code }}"
|
||||
```
|
||||
|
||||
### Low Water Level Alert
|
||||
```yaml
|
||||
Rule Name: low-water-level
|
||||
Query: water_level < 1.0
|
||||
Condition: IS BELOW 1.0 FOR 10m
|
||||
Labels:
|
||||
- severity: warning
|
||||
- station_code: {{ .station_code }}
|
||||
```
|
||||
|
||||
### Data Gap Alert
|
||||
```yaml
|
||||
Rule Name: data-gap
|
||||
Query: increase(measurements_total[1h]) == 0
|
||||
Condition: IS EQUAL TO 0 FOR 30m
|
||||
Labels:
|
||||
- severity: warning
|
||||
- issue: data-gap
|
||||
```
|
||||
|
||||
## Step 3: Matrix Setup
|
||||
|
||||
### Get Matrix Access Token
|
||||
```bash
|
||||
curl -X POST https://matrix.org/_matrix/client/v3/login \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"type": "m.login.password",
|
||||
"user": "your_username",
|
||||
"password": "your_password"
|
||||
}'
|
||||
```
|
||||
|
||||
### Create Alert Room
|
||||
```bash
|
||||
curl -X POST "https://matrix.org/_matrix/client/v3/createRoom" \
|
||||
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "Water Level Alerts - Northern Thailand",
|
||||
"topic": "Automated alerts for Ping River water monitoring",
|
||||
"preset": "trusted_private_chat"
|
||||
}'
|
||||
```
|
||||
|
||||
## Example Alert Queries
|
||||
|
||||
### Critical Water Levels
|
||||
```promql
|
||||
# High water alert
|
||||
water_level{station_code=~"P.1|P.4A|P.20"} > 6.0
|
||||
|
||||
# Dangerous discharge
|
||||
discharge{station_code=~".*"} > 500
|
||||
|
||||
# Rapid level change
|
||||
increase(water_level[15m]) > 0.5
|
||||
```
|
||||
|
||||
### System Health
|
||||
```promql
|
||||
# No data received
|
||||
up{job="water-monitor"} == 0
|
||||
|
||||
# Old data
|
||||
(time() - timestamp) > 7200
|
||||
```
|
||||
|
||||
## Alert Notification Format
|
||||
|
||||
Your Matrix messages will look like:
|
||||
```
|
||||
🌊 WATER ALERT: High Water Level
|
||||
|
||||
Station: P.1 (Chiang Mai)
|
||||
Level: 6.2m (CRITICAL)
|
||||
Discharge: 450 cms
|
||||
Status: DANGER
|
||||
|
||||
Time: 2025-09-26 14:30:00
|
||||
Trend: Rising (+0.3m in 30min)
|
||||
|
||||
📍 Location: 18.7883°N, 98.9853°E
|
||||
```
|
||||
|
||||
## Advanced Features
|
||||
|
||||
### Escalation Rules
|
||||
```yaml
|
||||
# Send to different rooms based on severity
|
||||
- if: severity == "critical"
|
||||
receiver: matrix-emergency
|
||||
- if: severity == "warning"
|
||||
receiver: matrix-alerts
|
||||
- if: time_of_day() outside "08:00-20:00"
|
||||
receiver: matrix-night-duty
|
||||
```
|
||||
|
||||
### Rate Limiting
|
||||
```yaml
|
||||
group_wait: 5m
|
||||
group_interval: 10m
|
||||
repeat_interval: 30m
|
||||
```
|
||||
|
||||
## Testing Alerts
|
||||
|
||||
1. **Test Contact Point** - Use Grafana's test button
|
||||
2. **Simulate Alert** - Manually trigger with test data
|
||||
3. **Verify Matrix** - Check message formatting and delivery
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
- **403 Forbidden**: Check Matrix access token
|
||||
- **Room not found**: Verify room ID format
|
||||
- **No alerts**: Check query syntax and thresholds
|
||||
- **Spam**: Configure proper grouping and intervals
|
Reference in New Issue
Block a user