Add comprehensive Matrix alerting system with Grafana integration
- Implement custom Python alerting system (src/alerting.py) with water level monitoring, data freshness checks, and Matrix notifications - Add complete Grafana Matrix alerting setup guide (docs/GRAFANA_MATRIX_SETUP.md) with webhook configuration, alert rules, and notification policies - Create Matrix quick start guide (docs/MATRIX_QUICK_START.md) for rapid deployment - Integrate alerting commands into main application (--alert-check, --alert-test) - Add Matrix configuration to environment variables (.env.example) - Update Makefile with alerting targets (alert-check, alert-test) - Enhance status command to show Matrix notification status - Support station-specific water level thresholds and escalation rules - Provide dual alerting approach: native Grafana alerts and custom Python system 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
12
.env.example
12
.env.example
@@ -82,6 +82,18 @@ SMTP_PORT=587
|
||||
SMTP_USERNAME=
|
||||
SMTP_PASSWORD=
|
||||
|
||||
# Matrix Alerting Configuration
|
||||
MATRIX_HOMESERVER=https://matrix.org
|
||||
MATRIX_ACCESS_TOKEN=
|
||||
MATRIX_ROOM_ID=
|
||||
|
||||
# Grafana Integration
|
||||
GRAFANA_URL=http://localhost:3000
|
||||
|
||||
# Alert Configuration
|
||||
ALERT_MAX_AGE_HOURS=2
|
||||
ALERT_CHECK_INTERVAL_MINUTES=15
|
||||
|
||||
# Development Settings
|
||||
DEBUG=false
|
||||
DEVELOPMENT_MODE=false
|
12
Makefile
12
Makefile
@@ -21,6 +21,11 @@ help:
|
||||
@echo " run Run the monitor in continuous mode"
|
||||
@echo " run-api Run the web API server"
|
||||
@echo " run-test Run a single test cycle"
|
||||
@echo " run-status Show system status"
|
||||
@echo ""
|
||||
@echo "Alerting:"
|
||||
@echo " alert-check Check water levels and send alerts"
|
||||
@echo " alert-test Send test Matrix message"
|
||||
@echo ""
|
||||
@echo "Distribution:"
|
||||
@echo " build-exe Build standalone executable"
|
||||
@@ -92,6 +97,13 @@ run-test:
|
||||
run-status:
|
||||
uv run python run.py --status
|
||||
|
||||
# Alerting
|
||||
alert-check:
|
||||
uv run python run.py --alert-check
|
||||
|
||||
alert-test:
|
||||
uv run python run.py --alert-test
|
||||
|
||||
# Docker
|
||||
docker-build:
|
||||
docker build -t ping-river-monitor .
|
||||
|
168
docs/GRAFANA_MATRIX_ALERTING.md
Normal file
168
docs/GRAFANA_MATRIX_ALERTING.md
Normal file
@@ -0,0 +1,168 @@
|
||||
# Grafana Matrix Alerting Setup
|
||||
|
||||
## Overview
|
||||
Configure Grafana to send water level alerts directly to Matrix channels when thresholds are exceeded.
|
||||
|
||||
## Prerequisites
|
||||
- Grafana instance with your PostgreSQL data source
|
||||
- Matrix account and access token
|
||||
- Matrix room for alerts
|
||||
|
||||
## Step 1: Configure Matrix Contact Point
|
||||
|
||||
1. **In Grafana, go to Alerting → Contact Points**
|
||||
2. **Add new contact point:**
|
||||
```
|
||||
Name: matrix-water-alerts
|
||||
Integration: Webhook
|
||||
URL: https://matrix.org/_matrix/client/v3/rooms/!ROOM_ID:matrix.org/send/m.room.message
|
||||
HTTP Method: POST
|
||||
```
|
||||
|
||||
3. **Add Headers:**
|
||||
```
|
||||
Authorization: Bearer YOUR_MATRIX_ACCESS_TOKEN
|
||||
Content-Type: application/json
|
||||
```
|
||||
|
||||
4. **Message Template:**
|
||||
```json
|
||||
{
|
||||
"msgtype": "m.text",
|
||||
"body": "🌊 WATER ALERT: {{ .CommonLabels.alertname }}\n\nStation: {{ .CommonLabels.station_code }}\nLevel: {{ .CommonAnnotations.water_level }}m\nStatus: {{ .CommonLabels.severity }}\n\nTime: {{ .CommonAnnotations.time }}"
|
||||
}
|
||||
```
|
||||
|
||||
## Step 2: Create Alert Rules
|
||||
|
||||
### High Water Level Alert
|
||||
```yaml
|
||||
Rule Name: high-water-level
|
||||
Query: water_level > 6.0
|
||||
Condition: IS ABOVE 6.0 FOR 5m
|
||||
Labels:
|
||||
- severity: critical
|
||||
- station_code: {{ .station_code }}
|
||||
Annotations:
|
||||
- water_level: {{ .water_level }}
|
||||
- summary: "Critical water level at {{ .station_code }}"
|
||||
```
|
||||
|
||||
### Low Water Level Alert
|
||||
```yaml
|
||||
Rule Name: low-water-level
|
||||
Query: water_level < 1.0
|
||||
Condition: IS BELOW 1.0 FOR 10m
|
||||
Labels:
|
||||
- severity: warning
|
||||
- station_code: {{ .station_code }}
|
||||
```
|
||||
|
||||
### Data Gap Alert
|
||||
```yaml
|
||||
Rule Name: data-gap
|
||||
Query: increase(measurements_total[1h]) == 0
|
||||
Condition: IS EQUAL TO 0 FOR 30m
|
||||
Labels:
|
||||
- severity: warning
|
||||
- issue: data-gap
|
||||
```
|
||||
|
||||
## Step 3: Matrix Setup
|
||||
|
||||
### Get Matrix Access Token
|
||||
```bash
|
||||
curl -X POST https://matrix.org/_matrix/client/v3/login \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"type": "m.login.password",
|
||||
"user": "your_username",
|
||||
"password": "your_password"
|
||||
}'
|
||||
```
|
||||
|
||||
### Create Alert Room
|
||||
```bash
|
||||
curl -X POST "https://matrix.org/_matrix/client/v3/createRoom" \
|
||||
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "Water Level Alerts - Northern Thailand",
|
||||
"topic": "Automated alerts for Ping River water monitoring",
|
||||
"preset": "trusted_private_chat"
|
||||
}'
|
||||
```
|
||||
|
||||
## Example Alert Queries
|
||||
|
||||
### Critical Water Levels
|
||||
```promql
|
||||
# High water alert
|
||||
water_level{station_code=~"P.1|P.4A|P.20"} > 6.0
|
||||
|
||||
# Dangerous discharge
|
||||
discharge{station_code=~".*"} > 500
|
||||
|
||||
# Rapid level change
|
||||
increase(water_level[15m]) > 0.5
|
||||
```
|
||||
|
||||
### System Health
|
||||
```promql
|
||||
# No data received
|
||||
up{job="water-monitor"} == 0
|
||||
|
||||
# Old data
|
||||
(time() - timestamp) > 7200
|
||||
```
|
||||
|
||||
## Alert Notification Format
|
||||
|
||||
Your Matrix messages will look like:
|
||||
```
|
||||
🌊 WATER ALERT: High Water Level
|
||||
|
||||
Station: P.1 (Chiang Mai)
|
||||
Level: 6.2m (CRITICAL)
|
||||
Discharge: 450 cms
|
||||
Status: DANGER
|
||||
|
||||
Time: 2025-09-26 14:30:00
|
||||
Trend: Rising (+0.3m in 30min)
|
||||
|
||||
📍 Location: 18.7883°N, 98.9853°E
|
||||
```
|
||||
|
||||
## Advanced Features
|
||||
|
||||
### Escalation Rules
|
||||
```yaml
|
||||
# Send to different rooms based on severity
|
||||
- if: severity == "critical"
|
||||
receiver: matrix-emergency
|
||||
- if: severity == "warning"
|
||||
receiver: matrix-alerts
|
||||
- if: time_of_day() outside "08:00-20:00"
|
||||
receiver: matrix-night-duty
|
||||
```
|
||||
|
||||
### Rate Limiting
|
||||
```yaml
|
||||
group_wait: 5m
|
||||
group_interval: 10m
|
||||
repeat_interval: 30m
|
||||
```
|
||||
|
||||
## Testing Alerts
|
||||
|
||||
1. **Test Contact Point** - Use Grafana's test button
|
||||
2. **Simulate Alert** - Manually trigger with test data
|
||||
3. **Verify Matrix** - Check message formatting and delivery
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
- **403 Forbidden**: Check Matrix access token
|
||||
- **Room not found**: Verify room ID format
|
||||
- **No alerts**: Check query syntax and thresholds
|
||||
- **Spam**: Configure proper grouping and intervals
|
351
docs/GRAFANA_MATRIX_SETUP.md
Normal file
351
docs/GRAFANA_MATRIX_SETUP.md
Normal file
@@ -0,0 +1,351 @@
|
||||
# Complete Grafana Matrix Alerting Setup Guide
|
||||
|
||||
## Overview
|
||||
Configure Grafana to send water level alerts directly to Matrix channels when thresholds are exceeded.
|
||||
|
||||
## Prerequisites
|
||||
- Grafana instance running (v8.0+)
|
||||
- PostgreSQL data source configured in Grafana
|
||||
- Matrix account
|
||||
- Matrix room for alerts
|
||||
|
||||
## Step 1: Get Matrix Access Token
|
||||
|
||||
### Method 1: Using curl
|
||||
```bash
|
||||
curl -X POST https://matrix.org/_matrix/client/v3/login \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"type": "m.login.password",
|
||||
"user": "your_username",
|
||||
"password": "your_password"
|
||||
}'
|
||||
```
|
||||
|
||||
### Method 2: Using Element Web Client
|
||||
1. Open Element in browser: https://app.element.io
|
||||
2. Login to your account
|
||||
3. Go to Settings → Help & About → Advanced
|
||||
4. Copy your Access Token
|
||||
|
||||
### Method 3: Using Matrix Admin Panel
|
||||
- If you have admin access to your homeserver, generate token via admin API
|
||||
|
||||
## Step 2: Create Alert Room
|
||||
|
||||
```bash
|
||||
curl -X POST "https://matrix.org/_matrix/client/v3/createRoom" \
|
||||
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "Water Level Alerts - Northern Thailand",
|
||||
"topic": "Automated alerts for Ping River water monitoring",
|
||||
"preset": "private_chat"
|
||||
}'
|
||||
```
|
||||
|
||||
Save the `room_id` from the response (format: !roomid:homeserver.com)
|
||||
|
||||
## Step 3: Configure Grafana Contact Point
|
||||
|
||||
### Navigate to Alerting
|
||||
1. In Grafana, go to **Alerting → Contact Points**
|
||||
2. Click **Add contact point**
|
||||
|
||||
### Contact Point Settings
|
||||
```
|
||||
Name: matrix-water-alerts
|
||||
Integration: Webhook
|
||||
URL: https://matrix.org/_matrix/client/v3/rooms/!YOUR_ROOM_ID:matrix.org/send/m.room.message/{{ .GroupLabels.alertname }}_{{ .GroupLabels.severity }}_{{ now.Unix }}
|
||||
HTTP Method: POST
|
||||
```
|
||||
|
||||
### Headers
|
||||
```
|
||||
Authorization: Bearer YOUR_MATRIX_ACCESS_TOKEN
|
||||
Content-Type: application/json
|
||||
```
|
||||
|
||||
### Message Template (JSON Body)
|
||||
```json
|
||||
{
|
||||
"msgtype": "m.text",
|
||||
"body": "🌊 **PING RIVER WATER ALERT**\n\n**Alert:** {{ .GroupLabels.alertname }}\n**Severity:** {{ .GroupLabels.severity | toUpper }}\n**Station:** {{ .GroupLabels.station_code }} ({{ .GroupLabels.station_name }})\n\n{{ range .Alerts }}**Status:** {{ .Status | toUpper }}\n**Water Level:** {{ .Annotations.water_level }}m\n**Threshold:** {{ .Annotations.threshold }}m\n**Time:** {{ .StartsAt.Format \"2006-01-02 15:04:05\" }}\n{{ if .Annotations.discharge }}**Discharge:** {{ .Annotations.discharge }} cms\n{{ end }}{{ if .Annotations.message }}**Details:** {{ .Annotations.message }}\n{{ end }}{{ end }}\n📈 **Dashboard:** {{ .ExternalURL }}\n📍 **Location:** Northern Thailand Ping River"
|
||||
}
|
||||
```
|
||||
|
||||
## Step 4: Create Alert Rules
|
||||
|
||||
### High Water Level Alert
|
||||
```yaml
|
||||
# Rule Configuration
|
||||
Rule Name: high-water-level
|
||||
Evaluation Group: water-level-alerts
|
||||
Folder: Water Monitoring
|
||||
|
||||
# Query A
|
||||
SELECT
|
||||
station_code,
|
||||
station_name_th as station_name,
|
||||
water_level,
|
||||
discharge,
|
||||
timestamp
|
||||
FROM water_measurements
|
||||
WHERE
|
||||
timestamp > now() - interval '5 minutes'
|
||||
AND water_level > 6.0
|
||||
|
||||
# Condition
|
||||
IS ABOVE 6.0 FOR 5 minutes
|
||||
|
||||
# Labels
|
||||
severity: critical
|
||||
alertname: High Water Level
|
||||
station_code: {{ $labels.station_code }}
|
||||
station_name: {{ $labels.station_name }}
|
||||
|
||||
# Annotations
|
||||
water_level: {{ $values.water_level }}
|
||||
threshold: 6.0
|
||||
discharge: {{ $values.discharge }}
|
||||
summary: Critical water level detected at {{ $labels.station_code }}
|
||||
```
|
||||
|
||||
### Emergency Water Level Alert
|
||||
```yaml
|
||||
Rule Name: emergency-water-level
|
||||
Query: water_level > 8.0
|
||||
Condition: IS ABOVE 8.0 FOR 2 minutes
|
||||
Labels:
|
||||
severity: emergency
|
||||
alertname: Emergency Water Level
|
||||
Annotations:
|
||||
threshold: 8.0
|
||||
message: IMMEDIATE ACTION REQUIRED - Flood risk imminent
|
||||
```
|
||||
|
||||
### Low Water Level Alert
|
||||
```yaml
|
||||
Rule Name: low-water-level
|
||||
Query: water_level < 1.0
|
||||
Condition: IS BELOW 1.0 FOR 15 minutes
|
||||
Labels:
|
||||
severity: warning
|
||||
alertname: Low Water Level
|
||||
Annotations:
|
||||
threshold: 1.0
|
||||
message: Drought conditions detected
|
||||
```
|
||||
|
||||
### Data Gap Alert
|
||||
```yaml
|
||||
Rule Name: data-gap
|
||||
Query:
|
||||
SELECT
|
||||
station_code,
|
||||
MAX(timestamp) as last_seen
|
||||
FROM water_measurements
|
||||
GROUP BY station_code
|
||||
HAVING MAX(timestamp) < now() - interval '2 hours'
|
||||
|
||||
Condition: HAS NO DATA FOR 30 minutes
|
||||
Labels:
|
||||
severity: warning
|
||||
alertname: Data Gap
|
||||
issue: missing-data
|
||||
```
|
||||
|
||||
### Rapid Level Change Alert
|
||||
```yaml
|
||||
Rule Name: rapid-level-change
|
||||
Query:
|
||||
SELECT
|
||||
station_code,
|
||||
water_level,
|
||||
LAG(water_level, 1) OVER (PARTITION BY station_code ORDER BY timestamp) as prev_level
|
||||
FROM water_measurements
|
||||
WHERE timestamp > now() - interval '15 minutes'
|
||||
HAVING ABS(water_level - prev_level) > 0.5
|
||||
|
||||
Condition: CHANGE > 0.5m FOR 1 minute
|
||||
Labels:
|
||||
severity: warning
|
||||
alertname: Rapid Water Level Change
|
||||
```
|
||||
|
||||
## Step 5: Configure Notification Policy
|
||||
|
||||
### Create Notification Policy
|
||||
```yaml
|
||||
# Policy Tree
|
||||
- receiver: matrix-water-alerts
|
||||
match:
|
||||
severity: emergency|critical
|
||||
group_wait: 10s
|
||||
group_interval: 5m
|
||||
repeat_interval: 30m
|
||||
|
||||
- receiver: matrix-water-alerts
|
||||
match:
|
||||
severity: warning
|
||||
group_wait: 30s
|
||||
group_interval: 10m
|
||||
repeat_interval: 2h
|
||||
```
|
||||
|
||||
### Grouping Rules
|
||||
```yaml
|
||||
group_by: [alertname, station_code]
|
||||
group_wait: 10s
|
||||
group_interval: 5m
|
||||
repeat_interval: 1h
|
||||
```
|
||||
|
||||
## Step 6: Station-Specific Thresholds
|
||||
|
||||
Create separate rules for each station with appropriate thresholds:
|
||||
|
||||
```sql
|
||||
-- P.1 (Chiang Mai) - Urban area, higher thresholds
|
||||
SELECT * FROM water_measurements
|
||||
WHERE station_code = 'P.1' AND water_level > 6.5
|
||||
|
||||
-- P.4A (Mae Ping) - Agricultural area
|
||||
SELECT * FROM water_measurements
|
||||
WHERE station_code = 'P.4A' AND water_level > 5.0
|
||||
|
||||
-- P.20 (Downstream) - Lower threshold
|
||||
SELECT * FROM water_measurements
|
||||
WHERE station_code = 'P.20' AND water_level > 4.0
|
||||
```
|
||||
|
||||
## Step 7: Advanced Features
|
||||
|
||||
### Time-Based Routing
|
||||
```yaml
|
||||
# Different receivers for day/night
|
||||
time_intervals:
|
||||
- name: working_hours
|
||||
time_intervals:
|
||||
- times:
|
||||
- start_time: '08:00'
|
||||
end_time: '20:00'
|
||||
weekdays: ['monday:friday']
|
||||
|
||||
routes:
|
||||
- receiver: matrix-alerts-day
|
||||
match:
|
||||
severity: warning
|
||||
active_time_intervals: [working_hours]
|
||||
|
||||
- receiver: matrix-alerts-night
|
||||
match:
|
||||
severity: warning
|
||||
active_time_intervals: ['!working_hours']
|
||||
```
|
||||
|
||||
### Multi-Channel Alerts
|
||||
```yaml
|
||||
# Send critical alerts to multiple rooms
|
||||
- receiver: matrix-emergency
|
||||
webhook_configs:
|
||||
- url: https://matrix.org/_matrix/client/v3/rooms/!emergency:matrix.org/send/m.room.message
|
||||
http_config:
|
||||
authorization:
|
||||
credentials: "Bearer EMERGENCY_TOKEN"
|
||||
- url: https://matrix.org/_matrix/client/v3/rooms/!general:matrix.org/send/m.room.message
|
||||
http_config:
|
||||
authorization:
|
||||
credentials: "Bearer GENERAL_TOKEN"
|
||||
```
|
||||
|
||||
## Step 8: Testing
|
||||
|
||||
### Test Contact Point
|
||||
1. Go to Contact Points in Grafana
|
||||
2. Select your Matrix contact point
|
||||
3. Click "Test" button
|
||||
4. Check Matrix room for test message
|
||||
|
||||
### Test Alert Rules
|
||||
1. Temporarily lower thresholds
|
||||
2. Wait for condition to trigger
|
||||
3. Verify alert appears in Grafana
|
||||
4. Verify Matrix message received
|
||||
5. Reset thresholds
|
||||
|
||||
### Manual Alert Trigger
|
||||
```bash
|
||||
# Simulate high water level in database
|
||||
INSERT INTO water_measurements (station_code, water_level, timestamp)
|
||||
VALUES ('P.1', 7.5, NOW());
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
#### 403 Forbidden
|
||||
- **Cause**: Invalid Matrix access token
|
||||
- **Fix**: Regenerate token or check permissions
|
||||
|
||||
#### Room Not Found
|
||||
- **Cause**: Incorrect room ID format
|
||||
- **Fix**: Ensure room ID starts with ! and includes homeserver
|
||||
|
||||
#### No Alerts Firing
|
||||
- **Cause**: Query returns no results
|
||||
- **Fix**: Test queries in Grafana Explore, check data availability
|
||||
|
||||
#### Alert Spam
|
||||
- **Cause**: No grouping configured
|
||||
- **Fix**: Configure proper group_by and intervals
|
||||
|
||||
#### Messages Not Formatted
|
||||
- **Cause**: Template syntax errors
|
||||
- **Fix**: Validate JSON template, check Grafana template docs
|
||||
|
||||
### Debug Steps
|
||||
1. Check Grafana alert rule status
|
||||
2. Verify contact point test succeeds
|
||||
3. Check Grafana logs: `/var/log/grafana/grafana.log`
|
||||
4. Test Matrix API directly with curl
|
||||
5. Verify database connectivity and query results
|
||||
|
||||
## Environment Variables
|
||||
|
||||
Add to your `.env`:
|
||||
```bash
|
||||
MATRIX_HOMESERVER=https://matrix.org
|
||||
MATRIX_ACCESS_TOKEN=your_access_token_here
|
||||
MATRIX_ROOM_ID=!your_room_id:matrix.org
|
||||
GRAFANA_URL=http://your-grafana-host:3000
|
||||
```
|
||||
|
||||
## Example Alert Message
|
||||
Your Matrix messages will appear as:
|
||||
```
|
||||
🌊 **PING RIVER WATER ALERT**
|
||||
|
||||
**Alert:** High Water Level
|
||||
**Severity:** CRITICAL
|
||||
**Station:** P.1 (สถานีเชียงใหม่)
|
||||
|
||||
**Status:** FIRING
|
||||
**Water Level:** 6.75m
|
||||
**Threshold:** 6.0m
|
||||
**Time:** 2025-09-26 14:30:00
|
||||
**Discharge:** 450.2 cms
|
||||
|
||||
📈 **Dashboard:** http://grafana:3000
|
||||
📍 **Location:** Northern Thailand Ping River
|
||||
```
|
||||
|
||||
## Security Notes
|
||||
- Store Matrix tokens securely (environment variables)
|
||||
- Use room-specific tokens when possible
|
||||
- Enable rate limiting to prevent spam
|
||||
- Consider using dedicated alerting user account
|
||||
- Regularly rotate access tokens
|
||||
|
||||
This setup provides comprehensive water level monitoring with immediate Matrix notifications when thresholds are exceeded.
|
85
docs/MATRIX_QUICK_START.md
Normal file
85
docs/MATRIX_QUICK_START.md
Normal file
@@ -0,0 +1,85 @@
|
||||
# Quick Matrix Alerting Setup
|
||||
|
||||
## Step 1: Get Matrix Account
|
||||
1. Go to https://app.element.io or install Element app
|
||||
2. Create account or login with existing Matrix account
|
||||
|
||||
## Step 2: Get Access Token
|
||||
|
||||
### Method 1: Element Web (Recommended)
|
||||
1. Open Element in browser: https://app.element.io
|
||||
2. Login to your account
|
||||
3. Click Settings (gear icon) → Help & About → Advanced
|
||||
4. Copy your "Access Token" (starts with `syt_...` or similar)
|
||||
|
||||
### Method 2: Command Line
|
||||
```bash
|
||||
curl -X POST https://matrix.org/_matrix/client/v3/login \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"type": "m.login.password",
|
||||
"user": "your_username",
|
||||
"password": "your_password"
|
||||
}'
|
||||
```
|
||||
|
||||
## Step 3: Create Alert Room
|
||||
1. In Element, click "+" to create new room
|
||||
2. Name: "Water Level Alerts"
|
||||
3. Set to Private
|
||||
4. Copy the room ID from room settings (format: `!roomid:matrix.org`)
|
||||
|
||||
## Step 4: Configure .env File
|
||||
Add these to your `.env` file:
|
||||
```bash
|
||||
# Matrix Alerting Configuration
|
||||
MATRIX_HOMESERVER=https://matrix.org
|
||||
MATRIX_ACCESS_TOKEN=syt_your_access_token_here
|
||||
MATRIX_ROOM_ID=!your_room_id:matrix.org
|
||||
|
||||
# Grafana Integration (optional)
|
||||
GRAFANA_URL=http://localhost:3000
|
||||
```
|
||||
|
||||
## Step 5: Test Configuration
|
||||
```bash
|
||||
# Test Matrix connection
|
||||
uv run python run.py --alert-test
|
||||
|
||||
# Check system status (shows Matrix config)
|
||||
uv run python run.py --status
|
||||
|
||||
# Run alert check
|
||||
uv run python run.py --alert-check
|
||||
```
|
||||
|
||||
## Example Alert Message
|
||||
When thresholds are exceeded, you'll receive messages like:
|
||||
```
|
||||
🌊 **WATER LEVEL ALERT**
|
||||
|
||||
**Station:** P.1 (สถานีเชียงใหม่)
|
||||
**Alert Type:** Critical Water Level
|
||||
**Severity:** CRITICAL
|
||||
|
||||
**Current Level:** 6.75m
|
||||
**Threshold:** 6.0m
|
||||
**Difference:** +0.75m
|
||||
**Discharge:** 450.2 cms
|
||||
|
||||
**Time:** 2025-09-26 14:30:00
|
||||
|
||||
📈 View dashboard: http://localhost:3000
|
||||
```
|
||||
|
||||
## Cron Job Setup (Optional)
|
||||
Add to crontab for automatic alerting:
|
||||
```bash
|
||||
# Check water levels every 15 minutes
|
||||
*/15 * * * * cd /path/to/monitor && uv run python run.py --alert-check >> alerts.log 2>&1
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
- **403 Error**: Check Matrix access token is valid
|
||||
- **Room Not Found**: Verify room ID includes `!` prefix and `:homeserver.com` suffix
|
||||
- **No Alerts**: Check database has recent data with `uv run python run.py --status`
|
334
src/alerting.py
Normal file
334
src/alerting.py
Normal file
@@ -0,0 +1,334 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Water Level Alerting System with Matrix Integration
|
||||
"""
|
||||
|
||||
import os
|
||||
import json
|
||||
import requests
|
||||
import datetime
|
||||
from typing import List, Dict, Optional
|
||||
from dataclasses import dataclass
|
||||
from enum import Enum
|
||||
|
||||
try:
|
||||
from .config import Config
|
||||
from .database_adapters import create_database_adapter
|
||||
from .logging_config import get_logger
|
||||
except ImportError:
|
||||
from config import Config
|
||||
from database_adapters import create_database_adapter
|
||||
import logging
|
||||
def get_logger(name):
|
||||
return logging.getLogger(name)
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
class AlertLevel(Enum):
|
||||
INFO = "info"
|
||||
WARNING = "warning"
|
||||
CRITICAL = "critical"
|
||||
EMERGENCY = "emergency"
|
||||
|
||||
@dataclass
|
||||
class WaterAlert:
|
||||
station_code: str
|
||||
station_name: str
|
||||
alert_type: str
|
||||
level: AlertLevel
|
||||
water_level: float
|
||||
threshold: float
|
||||
discharge: Optional[float] = None
|
||||
timestamp: Optional[datetime.datetime] = None
|
||||
message: Optional[str] = None
|
||||
|
||||
class MatrixNotifier:
|
||||
def __init__(self, homeserver: str, access_token: str, room_id: str):
|
||||
self.homeserver = homeserver.rstrip('/')
|
||||
self.access_token = access_token
|
||||
self.room_id = room_id
|
||||
self.session = requests.Session()
|
||||
|
||||
def send_message(self, message: str, msgtype: str = "m.text") -> bool:
|
||||
"""Send message to Matrix room"""
|
||||
try:
|
||||
url = f"{self.homeserver}/_matrix/client/v3/rooms/{self.room_id}/send/m.room.message"
|
||||
|
||||
headers = {
|
||||
"Authorization": f"Bearer {self.access_token}",
|
||||
"Content-Type": "application/json"
|
||||
}
|
||||
|
||||
data = {
|
||||
"msgtype": msgtype,
|
||||
"body": message
|
||||
}
|
||||
|
||||
# Add transaction ID to prevent duplicates
|
||||
txn_id = datetime.datetime.now().strftime("%Y%m%d_%H%M%S_%f")
|
||||
url += f"/{txn_id}"
|
||||
|
||||
response = self.session.post(url, headers=headers, json=data, timeout=10)
|
||||
response.raise_for_status()
|
||||
|
||||
logger.info(f"Matrix message sent successfully: {response.json().get('event_id')}")
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to send Matrix message: {e}")
|
||||
return False
|
||||
|
||||
def send_alert(self, alert: WaterAlert) -> bool:
|
||||
"""Send formatted water alert to Matrix"""
|
||||
emoji_map = {
|
||||
AlertLevel.INFO: "ℹ️",
|
||||
AlertLevel.WARNING: "⚠️",
|
||||
AlertLevel.CRITICAL: "🚨",
|
||||
AlertLevel.EMERGENCY: "🆘"
|
||||
}
|
||||
|
||||
emoji = emoji_map.get(alert.level, "📊")
|
||||
|
||||
message = f"""{emoji} **WATER LEVEL ALERT**
|
||||
|
||||
**Station:** {alert.station_code} ({alert.station_name})
|
||||
**Alert Type:** {alert.alert_type}
|
||||
**Severity:** {alert.level.value.upper()}
|
||||
|
||||
**Current Level:** {alert.water_level:.2f}m
|
||||
**Threshold:** {alert.threshold:.2f}m
|
||||
**Difference:** {(alert.water_level - alert.threshold):+.2f}m
|
||||
"""
|
||||
|
||||
if alert.discharge:
|
||||
message += f"**Discharge:** {alert.discharge:.1f} cms\n"
|
||||
|
||||
if alert.timestamp:
|
||||
message += f"**Time:** {alert.timestamp.strftime('%Y-%m-%d %H:%M:%S')}\n"
|
||||
|
||||
if alert.message:
|
||||
message += f"\n**Details:** {alert.message}\n"
|
||||
|
||||
message += f"\n📈 View dashboard: {os.getenv('GRAFANA_URL', 'http://localhost:3000')}"
|
||||
|
||||
return self.send_message(message)
|
||||
|
||||
class WaterLevelAlertSystem:
|
||||
def __init__(self):
|
||||
self.db_adapter = None
|
||||
self.matrix_notifier = None
|
||||
self.thresholds = self._load_thresholds()
|
||||
|
||||
# Matrix configuration from environment
|
||||
matrix_homeserver = os.getenv('MATRIX_HOMESERVER', 'https://matrix.org')
|
||||
matrix_token = os.getenv('MATRIX_ACCESS_TOKEN')
|
||||
matrix_room = os.getenv('MATRIX_ROOM_ID')
|
||||
|
||||
if matrix_token and matrix_room:
|
||||
self.matrix_notifier = MatrixNotifier(matrix_homeserver, matrix_token, matrix_room)
|
||||
logger.info("Matrix notifications enabled")
|
||||
else:
|
||||
logger.warning("Matrix configuration missing - notifications disabled")
|
||||
|
||||
def _load_thresholds(self) -> Dict[str, Dict[str, float]]:
|
||||
"""Load alert thresholds from config or database"""
|
||||
# Default thresholds for Northern Thailand stations
|
||||
return {
|
||||
"P.1": {"warning": 5.0, "critical": 6.5, "emergency": 8.0},
|
||||
"P.4A": {"warning": 4.5, "critical": 6.0, "emergency": 7.5},
|
||||
"P.20": {"warning": 3.0, "critical": 4.5, "emergency": 6.0},
|
||||
"P.21": {"warning": 4.0, "critical": 5.5, "emergency": 7.0},
|
||||
"P.67": {"warning": 6.0, "critical": 8.0, "emergency": 10.0},
|
||||
"P.75": {"warning": 5.5, "critical": 7.5, "emergency": 9.5},
|
||||
"P.103": {"warning": 7.0, "critical": 9.0, "emergency": 11.0},
|
||||
# Default for unknown stations
|
||||
"default": {"warning": 4.0, "critical": 6.0, "emergency": 8.0}
|
||||
}
|
||||
|
||||
def connect_database(self):
|
||||
"""Initialize database connection"""
|
||||
try:
|
||||
db_config = Config.get_database_config()
|
||||
self.db_adapter = create_database_adapter(
|
||||
db_config['type'],
|
||||
connection_string=db_config['connection_string']
|
||||
)
|
||||
|
||||
if self.db_adapter.connect():
|
||||
logger.info("Database connection established for alerting")
|
||||
return True
|
||||
else:
|
||||
logger.error("Failed to connect to database")
|
||||
return False
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Database connection error: {e}")
|
||||
return False
|
||||
|
||||
def check_water_levels(self) -> List[WaterAlert]:
|
||||
"""Check current water levels against thresholds"""
|
||||
alerts = []
|
||||
|
||||
if not self.db_adapter:
|
||||
logger.error("Database not connected")
|
||||
return alerts
|
||||
|
||||
try:
|
||||
# Get latest measurements
|
||||
measurements = self.db_adapter.get_latest_measurements(limit=50)
|
||||
|
||||
for measurement in measurements:
|
||||
station_code = measurement.get('station_code', 'UNKNOWN')
|
||||
water_level = measurement.get('water_level')
|
||||
|
||||
if not water_level:
|
||||
continue
|
||||
|
||||
# Get thresholds for this station
|
||||
station_thresholds = self.thresholds.get(station_code, self.thresholds['default'])
|
||||
|
||||
# Check each threshold level
|
||||
alert_level = None
|
||||
threshold_value = None
|
||||
alert_type = None
|
||||
|
||||
if water_level >= station_thresholds['emergency']:
|
||||
alert_level = AlertLevel.EMERGENCY
|
||||
threshold_value = station_thresholds['emergency']
|
||||
alert_type = "Emergency Water Level"
|
||||
elif water_level >= station_thresholds['critical']:
|
||||
alert_level = AlertLevel.CRITICAL
|
||||
threshold_value = station_thresholds['critical']
|
||||
alert_type = "Critical Water Level"
|
||||
elif water_level >= station_thresholds['warning']:
|
||||
alert_level = AlertLevel.WARNING
|
||||
threshold_value = station_thresholds['warning']
|
||||
alert_type = "High Water Level"
|
||||
|
||||
if alert_level:
|
||||
alert = WaterAlert(
|
||||
station_code=station_code,
|
||||
station_name=measurement.get('station_name_th', f'Station {station_code}'),
|
||||
alert_type=alert_type,
|
||||
level=alert_level,
|
||||
water_level=water_level,
|
||||
threshold=threshold_value,
|
||||
discharge=measurement.get('discharge'),
|
||||
timestamp=measurement.get('timestamp')
|
||||
)
|
||||
alerts.append(alert)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error checking water levels: {e}")
|
||||
|
||||
return alerts
|
||||
|
||||
def check_data_freshness(self, max_age_hours: int = 2) -> List[WaterAlert]:
|
||||
"""Check if data is fresh enough"""
|
||||
alerts = []
|
||||
|
||||
if not self.db_adapter:
|
||||
return alerts
|
||||
|
||||
try:
|
||||
measurements = self.db_adapter.get_latest_measurements(limit=20)
|
||||
cutoff_time = datetime.datetime.now() - datetime.timedelta(hours=max_age_hours)
|
||||
|
||||
for measurement in measurements:
|
||||
timestamp = measurement.get('timestamp')
|
||||
if timestamp and timestamp < cutoff_time:
|
||||
station_code = measurement.get('station_code', 'UNKNOWN')
|
||||
|
||||
age_hours = (datetime.datetime.now() - timestamp).total_seconds() / 3600
|
||||
|
||||
alert = WaterAlert(
|
||||
station_code=station_code,
|
||||
station_name=measurement.get('station_name_th', f'Station {station_code}'),
|
||||
alert_type="Stale Data",
|
||||
level=AlertLevel.WARNING,
|
||||
water_level=measurement.get('water_level', 0),
|
||||
threshold=max_age_hours,
|
||||
timestamp=timestamp,
|
||||
message=f"No fresh data for {age_hours:.1f} hours"
|
||||
)
|
||||
alerts.append(alert)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error checking data freshness: {e}")
|
||||
|
||||
return alerts
|
||||
|
||||
def send_alerts(self, alerts: List[WaterAlert]) -> int:
|
||||
"""Send alerts via configured channels"""
|
||||
sent_count = 0
|
||||
|
||||
if not alerts:
|
||||
return sent_count
|
||||
|
||||
if self.matrix_notifier:
|
||||
for alert in alerts:
|
||||
if self.matrix_notifier.send_alert(alert):
|
||||
sent_count += 1
|
||||
|
||||
# Could add other notification channels here:
|
||||
# - Email
|
||||
# - Discord
|
||||
# - Telegram
|
||||
# - SMS
|
||||
|
||||
return sent_count
|
||||
|
||||
def run_alert_check(self) -> Dict[str, int]:
|
||||
"""Run complete alert check cycle"""
|
||||
if not self.connect_database():
|
||||
return {"error": 1}
|
||||
|
||||
# Check water levels
|
||||
water_alerts = self.check_water_levels()
|
||||
|
||||
# Check data freshness
|
||||
data_alerts = self.check_data_freshness()
|
||||
|
||||
# Combine alerts
|
||||
all_alerts = water_alerts + data_alerts
|
||||
|
||||
# Send alerts
|
||||
sent_count = self.send_alerts(all_alerts)
|
||||
|
||||
logger.info(f"Alert check complete: {len(all_alerts)} alerts, {sent_count} sent")
|
||||
|
||||
return {
|
||||
"water_alerts": len(water_alerts),
|
||||
"data_alerts": len(data_alerts),
|
||||
"total_alerts": len(all_alerts),
|
||||
"sent": sent_count
|
||||
}
|
||||
|
||||
def main():
|
||||
"""Standalone alerting check"""
|
||||
import argparse
|
||||
|
||||
parser = argparse.ArgumentParser(description="Water Level Alert System")
|
||||
parser.add_argument("--check", action="store_true", help="Run alert check")
|
||||
parser.add_argument("--test", action="store_true", help="Send test message")
|
||||
args = parser.parse_args()
|
||||
|
||||
alerting = WaterLevelAlertSystem()
|
||||
|
||||
if args.test:
|
||||
if alerting.matrix_notifier:
|
||||
test_message = "🧪 **Test Alert**\n\nThis is a test message from the Water Level Alert System.\n\nIf you received this, Matrix notifications are working correctly!"
|
||||
success = alerting.matrix_notifier.send_message(test_message)
|
||||
print(f"Test message sent: {success}")
|
||||
else:
|
||||
print("Matrix notifier not configured")
|
||||
|
||||
elif args.check:
|
||||
results = alerting.run_alert_check()
|
||||
print(f"Alert check results: {results}")
|
||||
|
||||
else:
|
||||
print("Use --check or --test")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
91
src/main.py
91
src/main.py
@@ -180,6 +180,65 @@ def run_web_api():
|
||||
logger.error(f"Web API failed: {e}")
|
||||
return False
|
||||
|
||||
def run_alert_check():
|
||||
"""Run water level alert check"""
|
||||
logger.info("Running water level alert check...")
|
||||
|
||||
try:
|
||||
from .alerting import WaterLevelAlertSystem
|
||||
|
||||
# Initialize alerting system
|
||||
alerting = WaterLevelAlertSystem()
|
||||
|
||||
# Run alert check
|
||||
results = alerting.run_alert_check()
|
||||
|
||||
if 'error' in results:
|
||||
logger.error("❌ Alert check failed due to database connection")
|
||||
return False
|
||||
|
||||
logger.info(f"✅ Alert check completed:")
|
||||
logger.info(f" • Water level alerts: {results['water_alerts']}")
|
||||
logger.info(f" • Data freshness alerts: {results['data_alerts']}")
|
||||
logger.info(f" • Total alerts generated: {results['total_alerts']}")
|
||||
logger.info(f" • Alerts sent: {results['sent']}")
|
||||
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"❌ Alert check failed: {e}")
|
||||
return False
|
||||
|
||||
def run_alert_test():
|
||||
"""Send test alert message"""
|
||||
logger.info("Sending test alert message...")
|
||||
|
||||
try:
|
||||
from .alerting import WaterLevelAlertSystem
|
||||
|
||||
# Initialize alerting system
|
||||
alerting = WaterLevelAlertSystem()
|
||||
|
||||
if not alerting.matrix_notifier:
|
||||
logger.error("❌ Matrix notifier not configured")
|
||||
logger.info("Please set MATRIX_ACCESS_TOKEN and MATRIX_ROOM_ID in your .env file")
|
||||
return False
|
||||
|
||||
# Send test message
|
||||
test_message = "🧪 **Test Alert**\n\nThis is a test message from the Northern Thailand Ping River Monitor.\n\nIf you received this, Matrix notifications are working correctly!"
|
||||
success = alerting.matrix_notifier.send_message(test_message)
|
||||
|
||||
if success:
|
||||
logger.info("✅ Test alert message sent successfully")
|
||||
else:
|
||||
logger.error("❌ Test alert message failed to send")
|
||||
|
||||
return success
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"❌ Test alert failed: {e}")
|
||||
return False
|
||||
|
||||
def show_status():
|
||||
"""Show current system status"""
|
||||
logger.info("=== Northern Thailand Ping River Monitor Status ===")
|
||||
@@ -210,6 +269,20 @@ def show_status():
|
||||
else:
|
||||
logger.error("❌ Database connection failed")
|
||||
|
||||
# Test alerting system
|
||||
logger.info("\n=== Alerting System Status ===")
|
||||
try:
|
||||
from .alerting import WaterLevelAlertSystem
|
||||
alerting = WaterLevelAlertSystem()
|
||||
|
||||
if alerting.matrix_notifier:
|
||||
logger.info("✅ Matrix notifications configured")
|
||||
else:
|
||||
logger.warning("⚠️ Matrix notifications not configured")
|
||||
logger.info("Set MATRIX_ACCESS_TOKEN and MATRIX_ROOM_ID in .env file")
|
||||
except Exception as e:
|
||||
logger.error(f"❌ Alerting system error: {e}")
|
||||
|
||||
# Show metrics if available
|
||||
metrics_collector = get_metrics_collector()
|
||||
metrics = metrics_collector.get_all_metrics()
|
||||
@@ -239,6 +312,8 @@ Examples:
|
||||
%(prog)s --fill-gaps 7 # Fill missing data for last 7 days
|
||||
%(prog)s --update-data 2 # Update existing data for last 2 days
|
||||
%(prog)s --status # Show system status
|
||||
%(prog)s --alert-check # Check water levels and send alerts
|
||||
%(prog)s --alert-test # Send test Matrix message
|
||||
"""
|
||||
)
|
||||
|
||||
@@ -274,6 +349,18 @@ Examples:
|
||||
help="Show current system status"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--alert-check",
|
||||
action="store_true",
|
||||
help="Run water level alert check"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--alert-test",
|
||||
action="store_true",
|
||||
help="Send test alert message to Matrix"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--log-level",
|
||||
choices=["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"],
|
||||
@@ -314,6 +401,10 @@ Examples:
|
||||
success = run_data_update(args.update_data)
|
||||
elif args.status:
|
||||
success = show_status()
|
||||
elif args.alert_check:
|
||||
success = run_alert_check()
|
||||
elif args.alert_test:
|
||||
success = run_alert_test()
|
||||
else:
|
||||
success = run_continuous_monitoring()
|
||||
|
||||
|
Reference in New Issue
Block a user