Add comprehensive Matrix alerting system with Grafana integration

- Implement custom Python alerting system (src/alerting.py) with water level monitoring, data freshness checks, and Matrix notifications
- Add complete Grafana Matrix alerting setup guide (docs/GRAFANA_MATRIX_SETUP.md) with webhook configuration, alert rules, and notification policies
- Create Matrix quick start guide (docs/MATRIX_QUICK_START.md) for rapid deployment
- Integrate alerting commands into main application (--alert-check, --alert-test)
- Add Matrix configuration to environment variables (.env.example)
- Update Makefile with alerting targets (alert-check, alert-test)
- Enhance status command to show Matrix notification status
- Support station-specific water level thresholds and escalation rules
- Provide dual alerting approach: native Grafana alerts and custom Python system

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-09-26 16:18:02 +07:00
parent 6c7c128b4d
commit ca730e484b
7 changed files with 1062 additions and 9 deletions

View File

@@ -0,0 +1,168 @@
# Grafana Matrix Alerting Setup
## Overview
Configure Grafana to send water level alerts directly to Matrix channels when thresholds are exceeded.
## Prerequisites
- Grafana instance with your PostgreSQL data source
- Matrix account and access token
- Matrix room for alerts
## Step 1: Configure Matrix Contact Point
1. **In Grafana, go to Alerting → Contact Points**
2. **Add new contact point:**
```
Name: matrix-water-alerts
Integration: Webhook
URL: https://matrix.org/_matrix/client/v3/rooms/!ROOM_ID:matrix.org/send/m.room.message
HTTP Method: POST
```
3. **Add Headers:**
```
Authorization: Bearer YOUR_MATRIX_ACCESS_TOKEN
Content-Type: application/json
```
4. **Message Template:**
```json
{
"msgtype": "m.text",
"body": "🌊 WATER ALERT: {{ .CommonLabels.alertname }}\n\nStation: {{ .CommonLabels.station_code }}\nLevel: {{ .CommonAnnotations.water_level }}m\nStatus: {{ .CommonLabels.severity }}\n\nTime: {{ .CommonAnnotations.time }}"
}
```
## Step 2: Create Alert Rules
### High Water Level Alert
```yaml
Rule Name: high-water-level
Query: water_level > 6.0
Condition: IS ABOVE 6.0 FOR 5m
Labels:
- severity: critical
- station_code: {{ .station_code }}
Annotations:
- water_level: {{ .water_level }}
- summary: "Critical water level at {{ .station_code }}"
```
### Low Water Level Alert
```yaml
Rule Name: low-water-level
Query: water_level < 1.0
Condition: IS BELOW 1.0 FOR 10m
Labels:
- severity: warning
- station_code: {{ .station_code }}
```
### Data Gap Alert
```yaml
Rule Name: data-gap
Query: increase(measurements_total[1h]) == 0
Condition: IS EQUAL TO 0 FOR 30m
Labels:
- severity: warning
- issue: data-gap
```
## Step 3: Matrix Setup
### Get Matrix Access Token
```bash
curl -X POST https://matrix.org/_matrix/client/v3/login \
-H "Content-Type: application/json" \
-d '{
"type": "m.login.password",
"user": "your_username",
"password": "your_password"
}'
```
### Create Alert Room
```bash
curl -X POST "https://matrix.org/_matrix/client/v3/createRoom" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Water Level Alerts - Northern Thailand",
"topic": "Automated alerts for Ping River water monitoring",
"preset": "trusted_private_chat"
}'
```
## Example Alert Queries
### Critical Water Levels
```promql
# High water alert
water_level{station_code=~"P.1|P.4A|P.20"} > 6.0
# Dangerous discharge
discharge{station_code=~".*"} > 500
# Rapid level change
increase(water_level[15m]) > 0.5
```
### System Health
```promql
# No data received
up{job="water-monitor"} == 0
# Old data
(time() - timestamp) > 7200
```
## Alert Notification Format
Your Matrix messages will look like:
```
🌊 WATER ALERT: High Water Level
Station: P.1 (Chiang Mai)
Level: 6.2m (CRITICAL)
Discharge: 450 cms
Status: DANGER
Time: 2025-09-26 14:30:00
Trend: Rising (+0.3m in 30min)
📍 Location: 18.7883°N, 98.9853°E
```
## Advanced Features
### Escalation Rules
```yaml
# Send to different rooms based on severity
- if: severity == "critical"
receiver: matrix-emergency
- if: severity == "warning"
receiver: matrix-alerts
- if: time_of_day() outside "08:00-20:00"
receiver: matrix-night-duty
```
### Rate Limiting
```yaml
group_wait: 5m
group_interval: 10m
repeat_interval: 30m
```
## Testing Alerts
1. **Test Contact Point** - Use Grafana's test button
2. **Simulate Alert** - Manually trigger with test data
3. **Verify Matrix** - Check message formatting and delivery
## Troubleshooting
### Common Issues
- **403 Forbidden**: Check Matrix access token
- **Room not found**: Verify room ID format
- **No alerts**: Check query syntax and thresholds
- **Spam**: Configure proper grouping and intervals

View File

@@ -0,0 +1,351 @@
# Complete Grafana Matrix Alerting Setup Guide
## Overview
Configure Grafana to send water level alerts directly to Matrix channels when thresholds are exceeded.
## Prerequisites
- Grafana instance running (v8.0+)
- PostgreSQL data source configured in Grafana
- Matrix account
- Matrix room for alerts
## Step 1: Get Matrix Access Token
### Method 1: Using curl
```bash
curl -X POST https://matrix.org/_matrix/client/v3/login \
-H "Content-Type: application/json" \
-d '{
"type": "m.login.password",
"user": "your_username",
"password": "your_password"
}'
```
### Method 2: Using Element Web Client
1. Open Element in browser: https://app.element.io
2. Login to your account
3. Go to Settings → Help & About → Advanced
4. Copy your Access Token
### Method 3: Using Matrix Admin Panel
- If you have admin access to your homeserver, generate token via admin API
## Step 2: Create Alert Room
```bash
curl -X POST "https://matrix.org/_matrix/client/v3/createRoom" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Water Level Alerts - Northern Thailand",
"topic": "Automated alerts for Ping River water monitoring",
"preset": "private_chat"
}'
```
Save the `room_id` from the response (format: !roomid:homeserver.com)
## Step 3: Configure Grafana Contact Point
### Navigate to Alerting
1. In Grafana, go to **Alerting → Contact Points**
2. Click **Add contact point**
### Contact Point Settings
```
Name: matrix-water-alerts
Integration: Webhook
URL: https://matrix.org/_matrix/client/v3/rooms/!YOUR_ROOM_ID:matrix.org/send/m.room.message/{{ .GroupLabels.alertname }}_{{ .GroupLabels.severity }}_{{ now.Unix }}
HTTP Method: POST
```
### Headers
```
Authorization: Bearer YOUR_MATRIX_ACCESS_TOKEN
Content-Type: application/json
```
### Message Template (JSON Body)
```json
{
"msgtype": "m.text",
"body": "🌊 **PING RIVER WATER ALERT**\n\n**Alert:** {{ .GroupLabels.alertname }}\n**Severity:** {{ .GroupLabels.severity | toUpper }}\n**Station:** {{ .GroupLabels.station_code }} ({{ .GroupLabels.station_name }})\n\n{{ range .Alerts }}**Status:** {{ .Status | toUpper }}\n**Water Level:** {{ .Annotations.water_level }}m\n**Threshold:** {{ .Annotations.threshold }}m\n**Time:** {{ .StartsAt.Format \"2006-01-02 15:04:05\" }}\n{{ if .Annotations.discharge }}**Discharge:** {{ .Annotations.discharge }} cms\n{{ end }}{{ if .Annotations.message }}**Details:** {{ .Annotations.message }}\n{{ end }}{{ end }}\n📈 **Dashboard:** {{ .ExternalURL }}\n📍 **Location:** Northern Thailand Ping River"
}
```
## Step 4: Create Alert Rules
### High Water Level Alert
```yaml
# Rule Configuration
Rule Name: high-water-level
Evaluation Group: water-level-alerts
Folder: Water Monitoring
# Query A
SELECT
station_code,
station_name_th as station_name,
water_level,
discharge,
timestamp
FROM water_measurements
WHERE
timestamp > now() - interval '5 minutes'
AND water_level > 6.0
# Condition
IS ABOVE 6.0 FOR 5 minutes
# Labels
severity: critical
alertname: High Water Level
station_code: {{ $labels.station_code }}
station_name: {{ $labels.station_name }}
# Annotations
water_level: {{ $values.water_level }}
threshold: 6.0
discharge: {{ $values.discharge }}
summary: Critical water level detected at {{ $labels.station_code }}
```
### Emergency Water Level Alert
```yaml
Rule Name: emergency-water-level
Query: water_level > 8.0
Condition: IS ABOVE 8.0 FOR 2 minutes
Labels:
severity: emergency
alertname: Emergency Water Level
Annotations:
threshold: 8.0
message: IMMEDIATE ACTION REQUIRED - Flood risk imminent
```
### Low Water Level Alert
```yaml
Rule Name: low-water-level
Query: water_level < 1.0
Condition: IS BELOW 1.0 FOR 15 minutes
Labels:
severity: warning
alertname: Low Water Level
Annotations:
threshold: 1.0
message: Drought conditions detected
```
### Data Gap Alert
```yaml
Rule Name: data-gap
Query:
SELECT
station_code,
MAX(timestamp) as last_seen
FROM water_measurements
GROUP BY station_code
HAVING MAX(timestamp) < now() - interval '2 hours'
Condition: HAS NO DATA FOR 30 minutes
Labels:
severity: warning
alertname: Data Gap
issue: missing-data
```
### Rapid Level Change Alert
```yaml
Rule Name: rapid-level-change
Query:
SELECT
station_code,
water_level,
LAG(water_level, 1) OVER (PARTITION BY station_code ORDER BY timestamp) as prev_level
FROM water_measurements
WHERE timestamp > now() - interval '15 minutes'
HAVING ABS(water_level - prev_level) > 0.5
Condition: CHANGE > 0.5m FOR 1 minute
Labels:
severity: warning
alertname: Rapid Water Level Change
```
## Step 5: Configure Notification Policy
### Create Notification Policy
```yaml
# Policy Tree
- receiver: matrix-water-alerts
match:
severity: emergency|critical
group_wait: 10s
group_interval: 5m
repeat_interval: 30m
- receiver: matrix-water-alerts
match:
severity: warning
group_wait: 30s
group_interval: 10m
repeat_interval: 2h
```
### Grouping Rules
```yaml
group_by: [alertname, station_code]
group_wait: 10s
group_interval: 5m
repeat_interval: 1h
```
## Step 6: Station-Specific Thresholds
Create separate rules for each station with appropriate thresholds:
```sql
-- P.1 (Chiang Mai) - Urban area, higher thresholds
SELECT * FROM water_measurements
WHERE station_code = 'P.1' AND water_level > 6.5
-- P.4A (Mae Ping) - Agricultural area
SELECT * FROM water_measurements
WHERE station_code = 'P.4A' AND water_level > 5.0
-- P.20 (Downstream) - Lower threshold
SELECT * FROM water_measurements
WHERE station_code = 'P.20' AND water_level > 4.0
```
## Step 7: Advanced Features
### Time-Based Routing
```yaml
# Different receivers for day/night
time_intervals:
- name: working_hours
time_intervals:
- times:
- start_time: '08:00'
end_time: '20:00'
weekdays: ['monday:friday']
routes:
- receiver: matrix-alerts-day
match:
severity: warning
active_time_intervals: [working_hours]
- receiver: matrix-alerts-night
match:
severity: warning
active_time_intervals: ['!working_hours']
```
### Multi-Channel Alerts
```yaml
# Send critical alerts to multiple rooms
- receiver: matrix-emergency
webhook_configs:
- url: https://matrix.org/_matrix/client/v3/rooms/!emergency:matrix.org/send/m.room.message
http_config:
authorization:
credentials: "Bearer EMERGENCY_TOKEN"
- url: https://matrix.org/_matrix/client/v3/rooms/!general:matrix.org/send/m.room.message
http_config:
authorization:
credentials: "Bearer GENERAL_TOKEN"
```
## Step 8: Testing
### Test Contact Point
1. Go to Contact Points in Grafana
2. Select your Matrix contact point
3. Click "Test" button
4. Check Matrix room for test message
### Test Alert Rules
1. Temporarily lower thresholds
2. Wait for condition to trigger
3. Verify alert appears in Grafana
4. Verify Matrix message received
5. Reset thresholds
### Manual Alert Trigger
```bash
# Simulate high water level in database
INSERT INTO water_measurements (station_code, water_level, timestamp)
VALUES ('P.1', 7.5, NOW());
```
## Troubleshooting
### Common Issues
#### 403 Forbidden
- **Cause**: Invalid Matrix access token
- **Fix**: Regenerate token or check permissions
#### Room Not Found
- **Cause**: Incorrect room ID format
- **Fix**: Ensure room ID starts with ! and includes homeserver
#### No Alerts Firing
- **Cause**: Query returns no results
- **Fix**: Test queries in Grafana Explore, check data availability
#### Alert Spam
- **Cause**: No grouping configured
- **Fix**: Configure proper group_by and intervals
#### Messages Not Formatted
- **Cause**: Template syntax errors
- **Fix**: Validate JSON template, check Grafana template docs
### Debug Steps
1. Check Grafana alert rule status
2. Verify contact point test succeeds
3. Check Grafana logs: `/var/log/grafana/grafana.log`
4. Test Matrix API directly with curl
5. Verify database connectivity and query results
## Environment Variables
Add to your `.env`:
```bash
MATRIX_HOMESERVER=https://matrix.org
MATRIX_ACCESS_TOKEN=your_access_token_here
MATRIX_ROOM_ID=!your_room_id:matrix.org
GRAFANA_URL=http://your-grafana-host:3000
```
## Example Alert Message
Your Matrix messages will appear as:
```
🌊 **PING RIVER WATER ALERT**
**Alert:** High Water Level
**Severity:** CRITICAL
**Station:** P.1 (สถานีเชียงใหม่)
**Status:** FIRING
**Water Level:** 6.75m
**Threshold:** 6.0m
**Time:** 2025-09-26 14:30:00
**Discharge:** 450.2 cms
📈 **Dashboard:** http://grafana:3000
📍 **Location:** Northern Thailand Ping River
```
## Security Notes
- Store Matrix tokens securely (environment variables)
- Use room-specific tokens when possible
- Enable rate limiting to prevent spam
- Consider using dedicated alerting user account
- Regularly rotate access tokens
This setup provides comprehensive water level monitoring with immediate Matrix notifications when thresholds are exceeded.

View File

@@ -0,0 +1,85 @@
# Quick Matrix Alerting Setup
## Step 1: Get Matrix Account
1. Go to https://app.element.io or install Element app
2. Create account or login with existing Matrix account
## Step 2: Get Access Token
### Method 1: Element Web (Recommended)
1. Open Element in browser: https://app.element.io
2. Login to your account
3. Click Settings (gear icon) → Help & About → Advanced
4. Copy your "Access Token" (starts with `syt_...` or similar)
### Method 2: Command Line
```bash
curl -X POST https://matrix.org/_matrix/client/v3/login \
-H "Content-Type: application/json" \
-d '{
"type": "m.login.password",
"user": "your_username",
"password": "your_password"
}'
```
## Step 3: Create Alert Room
1. In Element, click "+" to create new room
2. Name: "Water Level Alerts"
3. Set to Private
4. Copy the room ID from room settings (format: `!roomid:matrix.org`)
## Step 4: Configure .env File
Add these to your `.env` file:
```bash
# Matrix Alerting Configuration
MATRIX_HOMESERVER=https://matrix.org
MATRIX_ACCESS_TOKEN=syt_your_access_token_here
MATRIX_ROOM_ID=!your_room_id:matrix.org
# Grafana Integration (optional)
GRAFANA_URL=http://localhost:3000
```
## Step 5: Test Configuration
```bash
# Test Matrix connection
uv run python run.py --alert-test
# Check system status (shows Matrix config)
uv run python run.py --status
# Run alert check
uv run python run.py --alert-check
```
## Example Alert Message
When thresholds are exceeded, you'll receive messages like:
```
🌊 **WATER LEVEL ALERT**
**Station:** P.1 (สถานีเชียงใหม่)
**Alert Type:** Critical Water Level
**Severity:** CRITICAL
**Current Level:** 6.75m
**Threshold:** 6.0m
**Difference:** +0.75m
**Discharge:** 450.2 cms
**Time:** 2025-09-26 14:30:00
📈 View dashboard: http://localhost:3000
```
## Cron Job Setup (Optional)
Add to crontab for automatic alerting:
```bash
# Check water levels every 15 minutes
*/15 * * * * cd /path/to/monitor && uv run python run.py --alert-check >> alerts.log 2>&1
```
## Troubleshooting
- **403 Error**: Check Matrix access token is valid
- **Room Not Found**: Verify room ID includes `!` prefix and `:homeserver.com` suffix
- **No Alerts**: Check database has recent data with `uv run python run.py --status`