Add comprehensive Matrix alerting system with Grafana integration

- Implement custom Python alerting system (src/alerting.py) with water level monitoring, data freshness checks, and Matrix notifications
- Add complete Grafana Matrix alerting setup guide (docs/GRAFANA_MATRIX_SETUP.md) with webhook configuration, alert rules, and notification policies
- Create Matrix quick start guide (docs/MATRIX_QUICK_START.md) for rapid deployment
- Integrate alerting commands into main application (--alert-check, --alert-test)
- Add Matrix configuration to environment variables (.env.example)
- Update Makefile with alerting targets (alert-check, alert-test)
- Enhance status command to show Matrix notification status
- Support station-specific water level thresholds and escalation rules
- Provide dual alerting approach: native Grafana alerts and custom Python system

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-09-26 16:18:02 +07:00
parent 6c7c128b4d
commit ca730e484b
7 changed files with 1062 additions and 9 deletions

View File

@@ -0,0 +1,168 @@
# Grafana Matrix Alerting Setup
## Overview
Configure Grafana to send water level alerts directly to Matrix channels when thresholds are exceeded.
## Prerequisites
- Grafana instance with your PostgreSQL data source
- Matrix account and access token
- Matrix room for alerts
## Step 1: Configure Matrix Contact Point
1. **In Grafana, go to Alerting → Contact Points**
2. **Add new contact point:**
```
Name: matrix-water-alerts
Integration: Webhook
URL: https://matrix.org/_matrix/client/v3/rooms/!ROOM_ID:matrix.org/send/m.room.message
HTTP Method: POST
```
3. **Add Headers:**
```
Authorization: Bearer YOUR_MATRIX_ACCESS_TOKEN
Content-Type: application/json
```
4. **Message Template:**
```json
{
"msgtype": "m.text",
"body": "🌊 WATER ALERT: {{ .CommonLabels.alertname }}\n\nStation: {{ .CommonLabels.station_code }}\nLevel: {{ .CommonAnnotations.water_level }}m\nStatus: {{ .CommonLabels.severity }}\n\nTime: {{ .CommonAnnotations.time }}"
}
```
## Step 2: Create Alert Rules
### High Water Level Alert
```yaml
Rule Name: high-water-level
Query: water_level > 6.0
Condition: IS ABOVE 6.0 FOR 5m
Labels:
- severity: critical
- station_code: {{ .station_code }}
Annotations:
- water_level: {{ .water_level }}
- summary: "Critical water level at {{ .station_code }}"
```
### Low Water Level Alert
```yaml
Rule Name: low-water-level
Query: water_level < 1.0
Condition: IS BELOW 1.0 FOR 10m
Labels:
- severity: warning
- station_code: {{ .station_code }}
```
### Data Gap Alert
```yaml
Rule Name: data-gap
Query: increase(measurements_total[1h]) == 0
Condition: IS EQUAL TO 0 FOR 30m
Labels:
- severity: warning
- issue: data-gap
```
## Step 3: Matrix Setup
### Get Matrix Access Token
```bash
curl -X POST https://matrix.org/_matrix/client/v3/login \
-H "Content-Type: application/json" \
-d '{
"type": "m.login.password",
"user": "your_username",
"password": "your_password"
}'
```
### Create Alert Room
```bash
curl -X POST "https://matrix.org/_matrix/client/v3/createRoom" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Water Level Alerts - Northern Thailand",
"topic": "Automated alerts for Ping River water monitoring",
"preset": "trusted_private_chat"
}'
```
## Example Alert Queries
### Critical Water Levels
```promql
# High water alert
water_level{station_code=~"P.1|P.4A|P.20"} > 6.0
# Dangerous discharge
discharge{station_code=~".*"} > 500
# Rapid level change
increase(water_level[15m]) > 0.5
```
### System Health
```promql
# No data received
up{job="water-monitor"} == 0
# Old data
(time() - timestamp) > 7200
```
## Alert Notification Format
Your Matrix messages will look like:
```
🌊 WATER ALERT: High Water Level
Station: P.1 (Chiang Mai)
Level: 6.2m (CRITICAL)
Discharge: 450 cms
Status: DANGER
Time: 2025-09-26 14:30:00
Trend: Rising (+0.3m in 30min)
📍 Location: 18.7883°N, 98.9853°E
```
## Advanced Features
### Escalation Rules
```yaml
# Send to different rooms based on severity
- if: severity == "critical"
receiver: matrix-emergency
- if: severity == "warning"
receiver: matrix-alerts
- if: time_of_day() outside "08:00-20:00"
receiver: matrix-night-duty
```
### Rate Limiting
```yaml
group_wait: 5m
group_interval: 10m
repeat_interval: 30m
```
## Testing Alerts
1. **Test Contact Point** - Use Grafana's test button
2. **Simulate Alert** - Manually trigger with test data
3. **Verify Matrix** - Check message formatting and delivery
## Troubleshooting
### Common Issues
- **403 Forbidden**: Check Matrix access token
- **Room not found**: Verify room ID format
- **No alerts**: Check query syntax and thresholds
- **Spam**: Configure proper grouping and intervals