Initial commit: Northern Thailand Ping River Monitor v3.1.0
Some checks failed
Security & Dependency Updates / Dependency Security Scan (push) Successful in 29s
Security & Dependency Updates / Docker Security Scan (push) Failing after 53s
Security & Dependency Updates / License Compliance (push) Successful in 13s
Security & Dependency Updates / Check for Dependency Updates (push) Successful in 19s
Security & Dependency Updates / Code Quality Metrics (push) Successful in 11s
Security & Dependency Updates / Security Summary (push) Successful in 7s

Features:
- Real-time water level monitoring for Ping River Basin (16 stations)
- Coverage from Chiang Dao to Nakhon Sawan in Northern Thailand
- FastAPI web interface with interactive dashboard and station management
- Multi-database support (SQLite, MySQL, PostgreSQL, InfluxDB, VictoriaMetrics)
- Comprehensive monitoring with health checks and metrics collection
- Docker deployment with Grafana integration
- Production-ready architecture with enterprise-grade observability

 CI/CD & Automation:
- Complete Gitea Actions workflows for CI/CD, security, and releases
- Multi-Python version testing (3.9-3.12)
- Multi-architecture Docker builds (amd64, arm64)
- Daily security scanning and dependency monitoring
- Automated documentation generation
- Performance testing and validation

 Production Ready:
- Type safety with Pydantic models and comprehensive type hints
- Data validation layer with range checking and error handling
- Rate limiting and request tracking for API protection
- Enhanced logging with rotation, colors, and performance metrics
- Station management API for dynamic CRUD operations
- Comprehensive documentation and deployment guides

 Technical Stack:
- Python 3.9+ with FastAPI and Pydantic
- Multi-database architecture with adapter pattern
- Docker containerization with multi-stage builds
- Grafana dashboards for visualization
- Gitea Actions for CI/CD automation
- Enterprise monitoring and alerting

 Ready for deployment to B4L infrastructure!
This commit is contained in:
2025-08-12 15:37:09 +07:00
commit af62cfef0b
60 changed files with 13267 additions and 0 deletions

View File

@@ -0,0 +1,447 @@
# Database Deployment Guide for Thailand Water Monitor
This guide covers deployment options for storing water monitoring data in production environments.
## 🏆 Recommendation Summary
| Database | Best For | Performance | Complexity | Cost |
|----------|----------|-------------|------------|------|
| **InfluxDB** | Time-series data, dashboards | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ |
| **VictoriaMetrics** | High-performance metrics | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| **PostgreSQL** | Complex queries, reliability | ⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐ |
| **MySQL** | Familiar, existing infrastructure | ⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐ |
## 1. InfluxDB Deployment (Recommended for Time-Series)
### Why InfluxDB?
- **Purpose-built** for time-series data
- **Excellent compression** (10:1 typical ratio)
- **Built-in retention policies** and downsampling
- **Great Grafana integration** for dashboards
- **High write throughput** (100k+ points/second)
### Docker Deployment
```yaml
# docker-compose.yml
version: '3.8'
services:
influxdb:
image: influxdb:1.8
container_name: water_influxdb
ports:
- "8086:8086"
volumes:
- influxdb_data:/var/lib/influxdb
- ./influxdb.conf:/etc/influxdb/influxdb.conf:ro
environment:
- INFLUXDB_DB=water_monitoring
- INFLUXDB_ADMIN_USER=admin
- INFLUXDB_ADMIN_PASSWORD=your_secure_password
- INFLUXDB_USER=water_user
- INFLUXDB_USER_PASSWORD=water_password
restart: unless-stopped
grafana:
image: grafana/grafana:latest
container_name: water_grafana
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin_password
restart: unless-stopped
volumes:
influxdb_data:
grafana_data:
```
### Environment Variables
```bash
# .env file
DB_TYPE=influxdb
INFLUX_HOST=localhost
INFLUX_PORT=8086
INFLUX_DATABASE=water_monitoring
INFLUX_USERNAME=water_user
INFLUX_PASSWORD=water_password
```
### InfluxDB Configuration
```toml
# influxdb.conf
[meta]
dir = "/var/lib/influxdb/meta"
[data]
dir = "/var/lib/influxdb/data"
wal-dir = "/var/lib/influxdb/wal"
# Optimize for time-series data
cache-max-memory-size = "1g"
cache-snapshot-memory-size = "25m"
cache-snapshot-write-cold-duration = "10m"
# Retention and compression
compact-full-write-cold-duration = "4h"
max-series-per-database = 1000000
max-values-per-tag = 100000
[coordinator]
write-timeout = "10s"
max-concurrent-queries = 0
query-timeout = "0s"
[retention]
enabled = true
check-interval = "30m"
[http]
enabled = true
bind-address = ":8086"
auth-enabled = true
max-body-size = "25000000"
max-concurrent-requests = 0
max-enqueued-requests = 0
```
### Production Setup Commands
```bash
# Start services
docker-compose up -d
# Create retention policies
docker exec -it water_influxdb influx -username admin -password your_secure_password -execute "
CREATE RETENTION POLICY \"raw_data\" ON \"water_monitoring\" DURATION 90d REPLICATION 1 DEFAULT;
CREATE RETENTION POLICY \"downsampled\" ON \"water_monitoring\" DURATION 730d REPLICATION 1;
"
# Create continuous queries for downsampling
docker exec -it water_influxdb influx -username admin -password your_secure_password -execute "
CREATE CONTINUOUS QUERY \"downsample_hourly\" ON \"water_monitoring\"
BEGIN
SELECT mean(water_level) AS water_level, mean(discharge) AS discharge, mean(discharge_percent) AS discharge_percent
INTO \"downsampled\".\"water_data_hourly\"
FROM \"water_data\"
GROUP BY time(1h), station_code, station_name_en, station_name_th
END
"
```
## 2. VictoriaMetrics Deployment (High Performance)
### Why VictoriaMetrics?
- **Extremely fast** and resource-efficient
- **Better compression** than InfluxDB
- **Prometheus-compatible** API
- **Lower memory usage**
- **Built-in clustering**
### Docker Deployment
```yaml
# docker-compose.yml
version: '3.8'
services:
victoriametrics:
image: victoriametrics/victoria-metrics:latest
container_name: water_victoriametrics
ports:
- "8428:8428"
volumes:
- vm_data:/victoria-metrics-data
command:
- '--storageDataPath=/victoria-metrics-data'
- '--retentionPeriod=2y'
- '--httpListenAddr=:8428'
- '--maxConcurrentInserts=16'
restart: unless-stopped
volumes:
vm_data:
```
### Environment Variables
```bash
# .env file
DB_TYPE=victoriametrics
VM_HOST=localhost
VM_PORT=8428
```
## 3. PostgreSQL Deployment (Relational + Time-Series)
### Why PostgreSQL?
- **Mature and reliable**
- **Excellent for complex queries**
- **TimescaleDB extension** for time-series optimization
- **Strong consistency guarantees**
- **Rich ecosystem**
### Docker Deployment with TimescaleDB
```yaml
# docker-compose.yml
version: '3.8'
services:
postgres:
image: timescale/timescaledb:latest-pg14
container_name: water_postgres
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
environment:
- POSTGRES_DB=water_monitoring
- POSTGRES_USER=water_user
- POSTGRES_PASSWORD=secure_password
restart: unless-stopped
volumes:
postgres_data:
```
### Database Initialization
```sql
-- init.sql
CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
-- Create hypertable for time-series optimization
CREATE TABLE water_measurements (
id BIGSERIAL PRIMARY KEY,
timestamp TIMESTAMPTZ NOT NULL,
station_id INT NOT NULL,
water_level NUMERIC(10,3),
discharge NUMERIC(10,2),
discharge_percent NUMERIC(5,2),
status VARCHAR(20) DEFAULT 'active',
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Convert to hypertable (TimescaleDB)
SELECT create_hypertable('water_measurements', 'timestamp', chunk_time_interval => INTERVAL '1 day');
-- Create indexes
CREATE INDEX idx_water_measurements_station_time ON water_measurements (station_id, timestamp DESC);
CREATE INDEX idx_water_measurements_timestamp ON water_measurements (timestamp DESC);
-- Create retention policy (keep raw data for 2 years)
SELECT add_retention_policy('water_measurements', INTERVAL '2 years');
-- Create continuous aggregates for performance
CREATE MATERIALIZED VIEW water_measurements_hourly
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 hour', timestamp) AS bucket,
station_id,
AVG(water_level) as avg_water_level,
MAX(water_level) as max_water_level,
MIN(water_level) as min_water_level,
AVG(discharge) as avg_discharge,
MAX(discharge) as max_discharge,
MIN(discharge) as min_discharge,
AVG(discharge_percent) as avg_discharge_percent
FROM water_measurements
GROUP BY bucket, station_id;
-- Refresh policy for continuous aggregates
SELECT add_continuous_aggregate_policy('water_measurements_hourly',
start_offset => INTERVAL '1 day',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '1 hour');
```
### Environment Variables
```bash
# .env file
DB_TYPE=postgresql
POSTGRES_CONNECTION_STRING=postgresql://water_user:secure_password@localhost:5432/water_monitoring
```
## 4. MySQL Deployment (Traditional Relational)
### Docker Deployment
```yaml
# docker-compose.yml
version: '3.8'
services:
mysql:
image: mysql:8.0
container_name: water_mysql
ports:
- "3306:3306"
volumes:
- mysql_data:/var/lib/mysql
- ./mysql.cnf:/etc/mysql/conf.d/mysql.cnf
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
environment:
- MYSQL_ROOT_PASSWORD=root_password
- MYSQL_DATABASE=water_monitoring
- MYSQL_USER=water_user
- MYSQL_PASSWORD=water_password
restart: unless-stopped
volumes:
mysql_data:
```
### MySQL Configuration
```ini
# mysql.cnf
[mysqld]
# Optimize for time-series data
innodb_buffer_pool_size = 1G
innodb_log_file_size = 256M
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
# Partitioning support
partition = ON
# Query cache
query_cache_type = 1
query_cache_size = 128M
# Connection settings
max_connections = 200
connect_timeout = 10
wait_timeout = 600
```
### Environment Variables
```bash
# .env file
DB_TYPE=mysql
MYSQL_CONNECTION_STRING=mysql://water_user:water_password@localhost:3306/water_monitoring
```
## 5. Installation and Dependencies
### Required Python Packages
```bash
# Base requirements
pip install requests schedule
# Database-specific packages
pip install influxdb # For InfluxDB
pip install sqlalchemy pymysql # For MySQL
pip install sqlalchemy psycopg2-binary # For PostgreSQL
# VictoriaMetrics uses HTTP API (no extra packages needed)
```
### Updated requirements.txt
```txt
requests>=2.28.0
schedule>=1.2.0
pandas>=1.5.0
# Database adapters (install as needed)
influxdb>=5.3.1
sqlalchemy>=1.4.0
pymysql>=1.0.2
psycopg2-binary>=2.9.0
```
## 6. Production Deployment Examples
### Using InfluxDB (Recommended)
```bash
# Set environment variables
export DB_TYPE=influxdb
export INFLUX_HOST=your-influx-server.com
export INFLUX_PORT=8086
export INFLUX_DATABASE=water_monitoring
export INFLUX_USERNAME=water_user
export INFLUX_PASSWORD=your_secure_password
# Run the scraper
python water_scraper_v3.py
```
### Using PostgreSQL with TimescaleDB
```bash
# Set environment variables
export DB_TYPE=postgresql
export POSTGRES_CONNECTION_STRING=postgresql://water_user:password@your-postgres-server.com:5432/water_monitoring
# Run the scraper
python water_scraper_v3.py
```
### Using VictoriaMetrics
```bash
# Set environment variables
export DB_TYPE=victoriametrics
export VM_HOST=your-vm-server.com
export VM_PORT=8428
# Run the scraper
python water_scraper_v3.py
```
## 7. Monitoring and Alerting
### Grafana Dashboard Setup
1. **Add Data Source**: Configure your database as a Grafana data source
2. **Import Dashboard**: Use pre-built water monitoring dashboards
3. **Set Alerts**: Configure alerts for abnormal water levels or discharge rates
### Example Grafana Queries
#### InfluxDB Queries
```sql
-- Current water levels
SELECT last("water_level") FROM "water_data" GROUP BY "station_code"
-- Discharge trends (last 24h)
SELECT mean("discharge") FROM "water_data" WHERE time >= now() - 24h GROUP BY time(1h), "station_code"
```
#### PostgreSQL/TimescaleDB Queries
```sql
-- Current water levels
SELECT DISTINCT ON (station_id)
station_id, water_level, discharge, timestamp
FROM water_measurements
ORDER BY station_id, timestamp DESC;
-- Hourly averages (last 24h)
SELECT
time_bucket('1 hour', timestamp) as hour,
station_id,
AVG(water_level) as avg_level,
AVG(discharge) as avg_discharge
FROM water_measurements
WHERE timestamp >= NOW() - INTERVAL '24 hours'
GROUP BY hour, station_id
ORDER BY hour DESC;
```
## 8. Performance Optimization Tips
### For All Databases
- **Batch inserts**: Insert multiple measurements at once
- **Connection pooling**: Reuse database connections
- **Indexing**: Ensure proper indexes on timestamp and station_id
- **Retention policies**: Automatically delete old data
### InfluxDB Specific
- Use **tags** for metadata (station codes, names)
- Use **fields** for numeric values (water levels, discharge)
- Configure **retention policies** and **continuous queries**
- Enable **compression** for long-term storage
### PostgreSQL/TimescaleDB Specific
- Use **hypertables** for automatic partitioning
- Create **continuous aggregates** for common queries
- Configure **compression** for older chunks
- Use **parallel queries** for large datasets
### VictoriaMetrics Specific
- Use **labels** efficiently (similar to Prometheus)
- Configure **retention periods** appropriately
- Use **downsampling** for long-term storage
- Enable **deduplication** if needed
This deployment guide provides production-ready configurations for all supported database backends. Choose the one that best fits your infrastructure and requirements.

View File

@@ -0,0 +1,329 @@
# Debian/Linux Troubleshooting Guide
This guide addresses common issues when running the Thailand Water Monitor on Debian and other Linux distributions.
## Fixed Issues
### SQLAlchemy Connection Error (RESOLVED)
**Error Message:**
```
2025-07-24 19:48:31,920 - ERROR - Failed to connect to SQLITE: 'Connection' object has no attribute 'commit'
2025-07-24 19:48:32,740 - ERROR - Error saving to SQLITE: 'Connection' object has no attribute 'commit'
```
**Root Cause:**
This error occurred due to incompatibility between the database adapter code and newer versions of SQLAlchemy. The code was calling `conn.commit()` on a connection object that doesn't have a `commit()` method in newer SQLAlchemy versions.
**Solution Applied:**
Changed from `engine.connect()` to `engine.begin()` context manager, which automatically handles transactions:
```python
# OLD (problematic) code:
with self.engine.connect() as conn:
conn.execute(text(sql))
conn.commit() # This fails in newer SQLAlchemy
# NEW (fixed) code:
with self.engine.begin() as conn:
conn.execute(text(sql))
# Transaction automatically committed when context exits
```
**Status:****FIXED** - The issue has been resolved in the current version.
## Installation on Debian/Ubuntu
### System Requirements
```bash
# Update package list
sudo apt update
# Install Python and pip
sudo apt install python3 python3-pip python3-venv
# Install system dependencies for database drivers
sudo apt install build-essential python3-dev
# For MySQL support (optional)
sudo apt install default-libmysqlclient-dev
# For PostgreSQL support (optional)
sudo apt install libpq-dev
```
### Python Environment Setup
```bash
# Create virtual environment
python3 -m venv water_monitor_env
# Activate virtual environment
source water_monitor_env/bin/activate
# Install requirements
pip install -r requirements.txt
```
### Running the Monitor
```bash
# Test run
python water_scraper_v3.py --test
# Run with specific database
export DB_TYPE=sqlite
python water_scraper_v3.py
# Run demo
python demo_databases.py
```
## Common Linux Issues
### 1. Permission Errors
**Error:**
```
PermissionError: [Errno 13] Permission denied: 'water_levels.db'
```
**Solution:**
```bash
# Check current directory permissions
ls -la
# Create data directory with proper permissions
mkdir -p data
chmod 755 data
# Set database path to data directory
export WATER_DB_PATH=data/water_levels.db
```
### 2. Missing System Dependencies
**Error:**
```
ImportError: No module named '_sqlite3'
```
**Solution:**
```bash
# Install SQLite development headers
sudo apt install libsqlite3-dev
# Reinstall Python if needed
sudo apt install python3-sqlite
```
### 3. Network/Firewall Issues
**Error:**
```
requests.exceptions.ConnectionError: HTTPSConnectionPool
```
**Solution:**
```bash
# Test network connectivity
curl -I https://hyd-app-db.rid.go.th/hydro1h.html
# Check firewall rules
sudo ufw status
# Allow outbound HTTPS if needed
sudo ufw allow out 443
```
### 4. Systemd Service Setup
Create service file `/etc/systemd/system/water-monitor.service`:
```ini
[Unit]
Description=Thailand Water Level Monitor
After=network.target
[Service]
Type=simple
User=water-monitor
Group=water-monitor
WorkingDirectory=/opt/water_level_monitor
Environment=PATH=/opt/water_level_monitor/venv/bin
Environment=DB_TYPE=sqlite
Environment=WATER_DB_PATH=/opt/water_level_monitor/data/water_levels.db
ExecStart=/opt/water_level_monitor/venv/bin/python water_scraper_v3.py
Restart=always
RestartSec=60
# Security settings
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/water_level_monitor/data
ReadWritePaths=/opt/water_level_monitor/logs
[Install]
WantedBy=multi-user.target
```
Enable and start:
```bash
sudo systemctl daemon-reload
sudo systemctl enable water-monitor.service
sudo systemctl start water-monitor.service
sudo systemctl status water-monitor.service
```
### 5. Log Rotation
Create `/etc/logrotate.d/water-monitor`:
```
/opt/water_level_monitor/water_monitor.log {
daily
missingok
rotate 30
compress
delaycompress
notifempty
create 644 water-monitor water-monitor
postrotate
systemctl reload water-monitor.service
endscript
}
```
## Database-Specific Issues
### SQLite
**Issue:** Database locked
```bash
# Check for processes using the database
sudo lsof /path/to/water_levels.db
# Kill processes if needed
sudo pkill -f water_scraper_v3.py
```
### VictoriaMetrics with HTTPS
**Configuration:**
```bash
export DB_TYPE=victoriametrics
export VM_HOST=https://your-vm-server.com
export VM_PORT=443
```
**Test connection:**
```bash
curl -k https://your-vm-server.com/health
```
## Performance Optimization
### 1. System Tuning
```bash
# Increase file descriptor limits
echo "* soft nofile 65536" >> /etc/security/limits.conf
echo "* hard nofile 65536" >> /etc/security/limits.conf
# Optimize network settings
echo "net.core.rmem_max = 16777216" >> /etc/sysctl.conf
echo "net.core.wmem_max = 16777216" >> /etc/sysctl.conf
sysctl -p
```
### 2. Database Optimization
```bash
# For SQLite
export SQLITE_CACHE_SIZE=10000
export SQLITE_SYNCHRONOUS=NORMAL
# Monitor database size
du -h data/water_levels.db
```
## Monitoring and Maintenance
### Health Check Script
Create `health_check.sh`:
```bash
#!/bin/bash
LOG_FILE="/opt/water_level_monitor/water_monitor.log"
SERVICE_NAME="water-monitor"
# Check if service is running
if ! systemctl is-active --quiet $SERVICE_NAME; then
echo "ERROR: $SERVICE_NAME is not running"
systemctl restart $SERVICE_NAME
exit 1
fi
# Check recent log entries
RECENT_ERRORS=$(tail -n 100 $LOG_FILE | grep -c "ERROR")
if [ $RECENT_ERRORS -gt 5 ]; then
echo "WARNING: $RECENT_ERRORS errors found in recent logs"
exit 1
fi
echo "OK: Service is healthy"
exit 0
```
### Cron Job for Health Checks
```bash
# Add to crontab
*/5 * * * * /opt/water_level_monitor/health_check.sh >> /var/log/water-monitor-health.log 2>&1
```
## Getting Help
### Debug Information
```bash
# System information
uname -a
python3 --version
pip list | grep -E "(sqlalchemy|requests|influxdb)"
# Service logs
journalctl -u water-monitor.service -f
# Application logs
tail -f water_monitor.log
# Database information
sqlite3 water_levels.db ".schema"
sqlite3 water_levels.db "SELECT COUNT(*) FROM water_measurements;"
```
### Common Commands
```bash
# Restart service
sudo systemctl restart water-monitor.service
# View logs
sudo journalctl -u water-monitor.service --since "1 hour ago"
# Test configuration
python config.py
# Test database connection
python demo_databases.py
# Manual data fetch
python water_scraper_v3.py --test
```
This troubleshooting guide should help resolve most common issues encountered when running the Thailand Water Monitor on Debian and other Linux distributions.

View File

@@ -0,0 +1,293 @@
# Enhanced Scheduler Guide
This guide explains the new 15-minute scheduling system that runs continuously throughout each hour to ensure comprehensive data coverage.
## ✅ **New Scheduling Behavior**
### **15-Minute Schedule Pattern**
- **Timing**: Runs every 15 minutes: 1:00, 1:15, 1:30, 1:45, 2:00, 2:15, 2:30, 2:45, etc.
- **Hourly Full Checks**: At :00 minutes (includes gap filling and data updates)
- **Quarter-Hour Quick Checks**: At :15, :30, :45 minutes (data fetch only)
- **Continuous Coverage**: Ensures no data is missed throughout each hour
### **Operation Types**
- **Full Operations** (at :00): Data fetching + gap filling + data updates
- **Quick Operations** (at :15, :30, :45): Data fetching only for performance
## 🔧 **Technical Implementation**
### **Scheduler States**
```python
# State tracking variables
self.last_successful_update = None # Timestamp of last successful data update
self.retry_mode = False # Whether in quick check mode (skip gap filling)
self.next_hourly_check = None # Next scheduled hourly check
```
### **Quarter-Hour Check Process**
```python
def quarter_hour_check(self):
"""15-minute check for new data"""
current_time = datetime.datetime.now()
minute = current_time.minute
# Determine if this is a full hourly check (at :00) or a quarter-hour check
if minute == 0:
logging.info("=== HOURLY CHECK (00:00) ===")
self.retry_mode = False # Full check with gap filling and updates
else:
logging.info(f"=== 15-MINUTE CHECK ({minute:02d}:00) ===")
self.retry_mode = True # Skip gap filling and updates on 15-min checks
new_data_found = self.run_scraping_cycle()
if new_data_found:
self.last_successful_update = datetime.datetime.now()
if minute == 0:
logging.info("New data found during hourly check")
else:
logging.info(f"New data found during 15-minute check at :{minute:02d}")
else:
if minute == 0:
logging.info("No new data found during hourly check")
else:
logging.info(f"No new data found during 15-minute check at :{minute:02d}")
```
### **Scheduler Setup**
```python
def start_scheduler(self):
"""Start enhanced scheduler with 15-minute checks"""
# Schedule checks every 15 minutes (at :00, :15, :30, :45)
schedule.every().hour.at(":00").do(self.quarter_hour_check)
schedule.every().hour.at(":15").do(self.quarter_hour_check)
schedule.every().hour.at(":30").do(self.quarter_hour_check)
schedule.every().hour.at(":45").do(self.quarter_hour_check)
while True:
schedule.run_pending()
time.sleep(30) # Check every 30 seconds
```
## 📊 **New Data Detection Logic**
### **Smart Detection Algorithm**
```python
def has_new_data(self) -> bool:
"""Check if there is new data available since last successful update"""
# Get most recent timestamp from database
latest_data = self.get_latest_data(limit=1)
# Check if we should have newer data by now
now = datetime.datetime.now()
expected_latest = now.replace(minute=0, second=0, microsecond=0)
# If current time is past 5 minutes after the hour, we should have data
if now.minute >= 5:
if latest_timestamp < expected_latest:
return True # New data expected
# Check if we have data for the previous hour
previous_hour = expected_latest - datetime.timedelta(hours=1)
if latest_timestamp < previous_hour:
return True # Missing recent data
return False # Data is up to date
```
### **Actual Data Verification**
```python
# Compare timestamps before and after scraping
initial_timestamp = get_latest_timestamp_before_scraping()
# ... perform scraping ...
latest_timestamp = get_latest_timestamp_after_scraping()
if initial_timestamp is None or latest_timestamp > initial_timestamp:
new_data_found = True
self.last_successful_update = datetime.datetime.now()
```
## 🚀 **Operational Modes**
### **Mode 1: Full Hourly Operation (at :00)**
- **Schedule**: Every hour at :00 minutes (1:00, 2:00, 3:00, etc.)
- **Operations**:
- ✅ Fetch current data
- ✅ Fill data gaps (last 7 days)
- ✅ Update existing data (last 2 days)
- **Purpose**: Comprehensive data collection and maintenance
### **Mode 2: Quick 15-Minute Checks (at :15, :30, :45)**
- **Schedule**: Every 15 minutes at quarter-hour marks
- **Operations**:
- ✅ Fetch current data only
- ❌ Skip gap filling (performance optimization)
- ❌ Skip data updates (performance optimization)
- **Purpose**: Ensure no new data is missed between hourly checks
## 📋 **Logging Output Examples**
### **Successful Hourly Check (at :00)**
```
2025-07-26 01:00:00,123 - INFO - === HOURLY CHECK (00:00) ===
2025-07-26 01:00:00,124 - INFO - Starting scraping cycle...
2025-07-26 01:00:01,456 - INFO - Successfully fetched 384 data points from API
2025-07-26 01:00:02,789 - INFO - New data found: 2025-07-26 01:00:00
2025-07-26 01:00:03,012 - INFO - Filled 5 data gaps
2025-07-26 01:00:04,234 - INFO - Updated 2 existing measurements
2025-07-26 01:00:04,235 - INFO - New data found during hourly check
```
### **15-Minute Quick Check (at :15, :30, :45)**
```
2025-07-26 01:15:00,123 - INFO - === 15-MINUTE CHECK (15:00) ===
2025-07-26 01:15:00,124 - INFO - Starting scraping cycle...
2025-07-26 01:15:01,456 - INFO - Successfully fetched 299 data points from API
2025-07-26 01:15:02,789 - INFO - New data found: 2025-07-26 01:00:00
2025-07-26 01:15:02,790 - INFO - New data found during 15-minute check at :15
```
### **Continuous 15-Minute Pattern**
```
2025-07-26 01:00:00,123 - INFO - === HOURLY CHECK (00:00) ===
2025-07-26 01:00:04,235 - INFO - New data found during hourly check
2025-07-26 01:15:00,123 - INFO - === 15-MINUTE CHECK (15:00) ===
2025-07-26 01:15:02,790 - INFO - No new data found during 15-minute check at :15
2025-07-26 01:30:00,123 - INFO - === 15-MINUTE CHECK (30:00) ===
2025-07-26 01:30:02,790 - INFO - No new data found during 15-minute check at :30
2025-07-26 01:45:00,123 - INFO - === 15-MINUTE CHECK (45:00) ===
2025-07-26 01:45:02,790 - INFO - No new data found during 15-minute check at :45
2025-07-26 02:00:00,123 - INFO - === HOURLY CHECK (00:00) ===
2025-07-26 02:00:04,235 - INFO - New data found during hourly check
```
## ⚙️ **Configuration Options**
### **Environment Variables**
```bash
# Retry interval (default: 5 minutes)
export RETRY_INTERVAL_MINUTES=5
# Data availability buffer (default: 5 minutes after hour)
export DATA_BUFFER_MINUTES=5
# Gap filling days (default: 7 days)
export GAP_FILL_DAYS=7
# Update check days (default: 2 days)
export UPDATE_DAYS=2
```
### **Scheduler Timing**
```python
# Hourly checks at top of hour
schedule.every().hour.at(":00").do(self.hourly_check)
# 5-minute retries (dynamically scheduled)
schedule.every(5).minutes.do(self.retry_check).tag('retry')
# Check every 30 seconds for responsive retry scheduling
time.sleep(30)
```
## 🔍 **Performance Optimizations**
### **Retry Mode Optimizations**
- **Skip Gap Filling**: Avoids expensive historical data fetching during retries
- **Skip Data Updates**: Avoids comparison operations during retries
- **Focused API Calls**: Only fetches current day data during retries
- **Reduced Database Queries**: Minimal database operations during retries
### **Resource Management**
- **API Rate Limiting**: 1-second delays between API calls
- **Database Connection Pooling**: Efficient connection reuse
- **Memory Efficiency**: Selective data processing
- **Error Recovery**: Automatic retry with exponential backoff
## 🛠️ **Troubleshooting**
### **Common Scenarios**
#### **Stuck in Retry Mode**
```
# Check if API is returning data
curl -X POST https://hyd-app-db.rid.go.th/webservice/getGroupHourlyWaterLevelReportAllHL.ashx
# Check database connectivity
python water_scraper_v3.py --check-gaps 1
# Manual data fetch test
python water_scraper_v3.py --test
```
#### **Missing Hourly Triggers**
```
# Check system time synchronization
timedatectl status
# Verify scheduler is running
ps aux | grep water_scraper
# Check logs for scheduler activity
tail -f water_monitor.log | grep "HOURLY CHECK"
```
#### **False New Data Detection**
```
# Check latest data in database
sqlite3 water_monitoring.db "SELECT MAX(timestamp) FROM water_measurements;"
# Verify timestamp parsing
python -c "
import datetime
print('Current hour:', datetime.datetime.now().replace(minute=0, second=0, microsecond=0))
"
```
## 📈 **Monitoring and Alerts**
### **Key Metrics to Monitor**
- **Hourly Success Rate**: Percentage of hourly checks that find new data
- **Retry Duration**: How long system stays in retry mode
- **Data Freshness**: Time since last successful data update
- **API Response Time**: Performance of data fetching operations
### **Alert Conditions**
- **Extended Retry Mode**: System in retry mode for > 30 minutes
- **No Data for 2+ Hours**: No new data found for extended period
- **High Error Rate**: Multiple consecutive API failures
- **Database Issues**: Connection or save failures
### **Health Check Script**
```bash
#!/bin/bash
# Check if system is stuck in retry mode
RETRY_COUNT=$(tail -n 100 water_monitor.log | grep -c "RETRY CHECK")
if [ $RETRY_COUNT -gt 6 ]; then
echo "WARNING: System may be stuck in retry mode ($RETRY_COUNT retries in last 100 log entries)"
fi
# Check data freshness
LATEST_DATA=$(sqlite3 water_monitoring.db "SELECT MAX(timestamp) FROM water_measurements;")
echo "Latest data timestamp: $LATEST_DATA"
```
## 🎯 **Best Practices**
### **Production Deployment**
1. **Monitor Logs**: Watch for retry mode patterns
2. **Set Alerts**: Configure notifications for extended retry periods
3. **Regular Maintenance**: Weekly gap filling and data validation
4. **Backup Strategy**: Regular database backups before major operations
### **Performance Tuning**
1. **Adjust Buffer Time**: Modify data availability buffer based on API patterns
2. **Optimize Retry Interval**: Balance between responsiveness and API load
3. **Database Indexing**: Ensure proper indexes for timestamp queries
4. **Connection Pooling**: Configure appropriate database connection limits
This enhanced scheduler ensures reliable, efficient, and intelligent water level monitoring with automatic adaptation to data availability patterns.

227
docs/ENHANCEMENT_SUMMARY.md Normal file
View File

@@ -0,0 +1,227 @@
# 🚀 Northern Thailand Ping River Monitor - Enhancement Summary
## 🎯 **What We've Accomplished**
We've successfully transformed your water monitoring system from a simple scraper into a **production-ready, enterprise-grade monitoring platform** focused on the Ping River Basin in Northern Thailand, with modern web interfaces, station management capabilities, and comprehensive observability.
## 🌟 **Major New Features Added**
### 1. **FastAPI Web Interface** 🌐
- **Interactive Dashboard** at `http://localhost:8000`
- **REST API** with comprehensive endpoints
- **Station Management** - Add, update, delete monitoring stations
- **Real-time Health Monitoring**
- **Manual Data Collection Triggers**
- **Interactive API Documentation** at `/docs`
- **CORS Support** for web applications
### 2. **Enhanced Architecture** 🏗️
- **Type Safety** with Pydantic models and comprehensive type hints
- **Data Validation Layer** with range checking and error handling
- **Custom Exception Classes** for better error management
- **Modular Design** with separated concerns
### 3. **Observability & Monitoring** 📊
- **Metrics Collection System** (counters, gauges, histograms)
- **Health Checks** for database, API, and system resources
- **Performance Tracking** with response times and success rates
- **Enhanced Logging** with colors, rotation, and performance logs
### 4. **Production Features** 🚀
- **Rate Limiting** to prevent API abuse
- **Request Tracking** with detailed statistics
- **Configuration Validation** on startup
- **Graceful Error Handling** and recovery
- **Background Task Management**
## 📁 **New Files Created**
```
src/
├── models.py # Data models and type definitions
├── exceptions.py # Custom exception classes
├── validators.py # Data validation layer
├── metrics.py # Metrics collection system
├── health_check.py # Health monitoring system
├── rate_limiter.py # Rate limiting and request tracking
├── logging_config.py # Enhanced logging configuration
├── web_api.py # FastAPI web interface
├── main.py # Enhanced CLI with multiple modes
└── __init__.py # Package initialization
# Root files
├── run.py # Simple startup script
├── test_integration.py # Integration test suite
├── test_api.py # API endpoint tests
└── ENHANCEMENT_SUMMARY.md # This file
```
## 🔧 **Enhanced Existing Files**
- **`src/water_scraper_v3.py`** - Integrated new features, metrics, validation
- **`src/config.py`** - Added configuration validation
- **`requirements.txt`** - Added FastAPI, Pydantic, and monitoring dependencies
- **`docker-compose.victoriametrics.yml`** - Added web API service
- **`Dockerfile`** - Updated for new startup script
- **`README.md`** - Updated with new features and usage instructions
## 🌐 **Web API Endpoints**
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/` | GET | Interactive dashboard |
| `/docs` | GET | API documentation |
| `/health` | GET | System health status |
| `/metrics` | GET | Application metrics |
| `/stations` | GET | List all monitoring stations |
| `/measurements/latest` | GET | Latest measurements |
| `/measurements/station/{code}` | GET | Station-specific data |
| `/scrape/trigger` | POST | Trigger manual data collection |
| `/scraping/status` | GET | Scraping status and statistics |
| `/config` | GET | Current configuration (masked) |
## 🚀 **Usage Examples**
### **Traditional Mode (Enhanced)**
```bash
# Test single cycle
python run.py --test
# Continuous monitoring
python run.py
# Fill data gaps
python run.py --fill-gaps 7
# Show system status
python run.py --status
```
### **Web API Mode (NEW!)**
```bash
# Start web API server
python run.py --web-api
# Access dashboard
open http://localhost:8000
# View API documentation
open http://localhost:8000/docs
```
### **Docker Deployment**
```bash
# Start complete stack
docker-compose -f docker-compose.victoriametrics.yml up -d
# Services available:
# - Water API: http://localhost:8000
# - Grafana: http://localhost:3000
# - VictoriaMetrics: http://localhost:8428
```
## 📊 **Monitoring & Observability**
### **Built-in Metrics**
- API request counts and response times
- Database connection status and save operations
- Scraping cycle success/failure rates
- System resource usage (memory, etc.)
### **Health Checks**
- Database connectivity and data freshness
- External API availability
- Memory usage monitoring
- Overall system health status
### **Enhanced Logging**
- Colored console output for better readability
- File rotation to prevent disk space issues
- Performance logging for optimization
- Structured logging with proper levels
## 🔒 **Production Ready Features**
### **Security & Reliability**
- Rate limiting to prevent API abuse
- Input validation and sanitization
- Graceful error handling and recovery
- Configuration validation on startup
### **Performance**
- Efficient metrics collection with minimal overhead
- Background task management
- Connection pooling and resource management
- Optimized database operations
### **Scalability**
- Modular architecture for easy extension
- Async support for high concurrency
- Configurable resource limits
- Health checks for load balancer integration
## 🧪 **Testing**
### **Integration Tests**
```bash
# Run all integration tests
python test_integration.py
```
### **API Tests**
```bash
# Test API endpoints (server must be running)
python test_api.py
```
## 📈 **Performance Improvements**
1. **Request Tracking** - Monitor API performance and success rates
2. **Rate Limiting** - Prevent API abuse and ensure stability
3. **Data Validation** - Catch errors early and improve data quality
4. **Metrics Collection** - Identify bottlenecks and optimization opportunities
5. **Health Monitoring** - Proactive issue detection and alerting
## 🎉 **Benefits Achieved**
### **For Developers**
- **Better Developer Experience** with type hints and validation
- **Easier Debugging** with enhanced logging and error messages
- **Comprehensive Testing** with integration and API tests
- **Modern Architecture** following best practices
### **For Operations**
- **Web Dashboard** for easy monitoring and management
- **Health Checks** for automated monitoring integration
- **Metrics Collection** for performance analysis
- **Production-Ready** deployment with Docker support
### **For Users**
- **REST API** for integration with other systems
- **Real-time Data Access** via web interface
- **Manual Controls** for triggering data collection
- **Status Monitoring** for system visibility
## 🔮 **Future Enhancement Opportunities**
1. **Authentication & Authorization** - Add user management and API keys
2. **Real-time WebSocket Updates** - Live data streaming to web clients
3. **Advanced Analytics** - Trend analysis and forecasting
4. **Alert System** - Email/SMS notifications for critical conditions
5. **Multi-tenant Support** - Support for multiple organizations
6. **Data Export** - CSV, Excel, and other format exports
7. **Mobile App** - React Native or Flutter mobile interface
## 🏆 **Summary**
Your Thailand Water Monitor has been transformed from a simple data scraper into a **comprehensive, enterprise-grade monitoring platform** that includes:
-**Modern Web Interface** with FastAPI
-**Production-Ready Architecture** with proper error handling
-**Comprehensive Monitoring** with metrics and health checks
-**Type Safety** and data validation
-**Enhanced Logging** and observability
-**Docker Support** for easy deployment
-**Extensive Testing** for reliability
The system is now ready for production deployment and can serve as a foundation for further enhancements and integrations!

275
docs/GAP_FILLING_GUIDE.md Normal file
View File

@@ -0,0 +1,275 @@
# Gap Filling and Data Integrity Guide
This guide explains the enhanced gap-filling functionality that addresses data gaps and missing timestamps in the Thailand Water Monitor.
## ✅ **Issues Resolved**
### **1. Data Gaps Problem**
- **Before**: Tool only fetched current day data, leaving gaps in historical records
- **After**: Automatically detects and fills missing timestamps for the last 7 days
### **2. Missing Midnight Timestamps**
- **Before**: Jump from 23:00 to 01:00 (missing 00:00 midnight data)
- **After**: Specifically checks for and fills midnight hour gaps
### **3. Changed Values**
- **Before**: No mechanism to update existing data if values changed on the server
- **After**: Compares existing data with fresh API data and updates changed values
## 🔧 **New Features**
### **Command Line Interface**
```bash
# Check for missing data gaps
python water_scraper_v3.py --check-gaps [days]
# Fill missing data gaps
python water_scraper_v3.py --fill-gaps [days]
# Update existing data with latest values
python water_scraper_v3.py --update-data [days]
# Run single test cycle
python water_scraper_v3.py --test
# Show help
python water_scraper_v3.py --help
```
### **Automatic Gap Detection**
The system now automatically:
- Generates expected hourly timestamps for the specified time range
- Compares with existing database records
- Identifies missing timestamps
- Groups missing data by date for efficient API calls
### **Intelligent Gap Filling**
- **Historical Data Fetching**: Retrieves data for specific dates to fill gaps
- **Selective Insertion**: Only inserts data for actually missing timestamps
- **API Rate Limiting**: Includes delays between API calls to be respectful
- **Error Handling**: Continues processing even if some dates fail
### **Data Update Mechanism**
- **Change Detection**: Compares water levels, discharge rates, and percentages
- **Precision Checking**: Uses appropriate thresholds (0.001m for water level, 0.1 cms for discharge)
- **Selective Updates**: Only updates records where values have actually changed
## 📊 **Test Results**
### **Before Enhancement**
```
Found 22 missing timestamps in the last 2 days:
2025-07-23: Missing hours [9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
2025-07-24: Missing hours [0, 20, 21, 22, 23]
2025-07-25: Missing hours [0, 9]
```
### **After Gap Filling**
```
Gap filling completed. Filled 96 missing data points
Remaining gaps:
2025-07-24: Missing hours [10]
2025-07-25: Missing hours [0, 10]
```
**Improvement**: Reduced from 22 missing timestamps to 3 (86% improvement)
## 🚀 **Enhanced Scraping Cycle**
The regular scraping cycle now includes three phases:
### **Phase 1: Current Data Collection**
```python
# Fetch and save current data
water_data = self.fetch_water_data()
success = self.save_to_database(water_data)
```
### **Phase 2: Gap Filling (Last 7 Days)**
```python
# Check for and fill missing data
filled_count = self.fill_data_gaps(days_back=7)
```
### **Phase 3: Data Updates (Last 2 Days)**
```python
# Update existing data with latest values
updated_count = self.update_existing_data(days_back=2)
```
## 🔧 **Technical Improvements**
### **Database Connection Handling**
- **SQLite Optimization**: Added timeout and thread safety parameters
- **Retry Logic**: Exponential backoff for database lock errors
- **Transaction Management**: Proper use of `engine.begin()` for automatic commits
### **Error Recovery**
```python
# Retry logic with exponential backoff
for attempt in range(max_retries):
try:
success = self.db_adapter.save_measurements(water_data)
if success:
return True
except Exception as e:
if "database is locked" in str(e).lower():
time.sleep(2 ** attempt) # 1s, 2s, 4s delays
continue
```
### **Memory Efficiency**
- **Selective Data Processing**: Only processes data for missing timestamps
- **Batch Processing**: Groups operations by date to minimize API calls
- **Resource Management**: Proper cleanup and connection handling
## 📋 **Usage Examples**
### **Daily Maintenance**
```bash
# Check for gaps in the last week
python water_scraper_v3.py --check-gaps 7
# Fill any found gaps
python water_scraper_v3.py --fill-gaps 7
# Update recent data for accuracy
python water_scraper_v3.py --update-data 2
```
### **Historical Data Recovery**
```bash
# Check for gaps in the last month
python water_scraper_v3.py --check-gaps 30
# Fill gaps for the last month (be patient, this takes time)
python water_scraper_v3.py --fill-gaps 30
```
### **Production Monitoring**
```bash
# Quick test to ensure system is working
python water_scraper_v3.py --test
# Check for recent gaps
python water_scraper_v3.py --check-gaps 1
```
## 🔍 **Monitoring and Alerts**
### **Gap Detection Output**
```
Found 22 missing timestamps:
2025-07-23: Missing hours [9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
2025-07-24: Missing hours [0, 20, 21, 22, 23]
2025-07-25: Missing hours [0, 9]
```
### **Gap Filling Progress**
```
Fetching data for 2025-07-24 to fill 5 missing timestamps
Successfully fetched 368 data points from API for 2025-07-24
Filled 80 data points for 2025-07-24
Gap filling completed. Filled 96 missing data points
```
### **Update Detection**
```
Checking for updates on 2025-07-24
Update needed for P.1 at 2025-07-24 15:00:00
Updated 5 measurements for 2025-07-24
Data update completed. Updated 5 measurements
```
## ⚙️ **Configuration Options**
### **Environment Variables**
```bash
# Database configuration
export DB_TYPE=sqlite
export WATER_DB_PATH=water_monitoring.db
# Gap filling settings (can be added to config.py)
export GAP_FILL_DAYS=7 # Days to check for gaps
export UPDATE_DAYS=2 # Days to check for updates
export API_DELAY=1 # Seconds between API calls
export MAX_RETRIES=3 # Database retry attempts
```
### **Customizable Parameters**
- **Gap Check Period**: Default 7 days, configurable via command line
- **Update Period**: Default 2 days, configurable via command line
- **API Rate Limiting**: 1-second delay between calls (configurable)
- **Retry Logic**: 3 attempts with exponential backoff (configurable)
## 🛠️ **Troubleshooting**
### **Common Issues**
#### **Database Locked Errors**
```
ERROR - Error saving to SQLITE: database is locked
```
**Solution**: The retry logic now handles this automatically with exponential backoff.
#### **API Rate Limiting**
```
WARNING - Too many requests to API
```
**Solution**: Increase delay between API calls or reduce the number of days processed at once.
#### **Missing Data Still Present**
```
Found X missing timestamps after gap filling
```
**Possible Causes**:
- Data not available on the Thai government server for those timestamps
- Network issues during API calls
- API returned empty data for those specific times
### **Debug Commands**
```bash
# Enable debug logging
export LOG_LEVEL=DEBUG
python water_scraper_v3.py --check-gaps 1
# Test specific date range
python water_scraper_v3.py --fill-gaps 1
# Check database directly
sqlite3 water_monitoring.db "SELECT COUNT(*) FROM water_measurements;"
sqlite3 water_monitoring.db "SELECT timestamp, COUNT(*) FROM water_measurements GROUP BY timestamp ORDER BY timestamp DESC LIMIT 10;"
```
## 📈 **Performance Metrics**
### **Gap Filling Efficiency**
- **API Calls**: Grouped by date to minimize requests
- **Processing Speed**: ~100-400 data points per API call
- **Success Rate**: 86% gap reduction in test case
- **Resource Usage**: Minimal memory footprint with selective processing
### **Database Performance**
- **SQLite Optimization**: Connection pooling and timeout handling
- **Transaction Efficiency**: Batch inserts with proper transaction management
- **Retry Success**: Automatic recovery from temporary lock conditions
## 🎯 **Best Practices**
### **Regular Maintenance**
1. **Daily**: Run `--check-gaps 1` to monitor recent data quality
2. **Weekly**: Run `--fill-gaps 7` to catch any missed data
3. **Monthly**: Run `--update-data 7` to ensure data accuracy
### **Production Deployment**
1. **Automated Scheduling**: Use cron or systemd timers for regular gap checks
2. **Monitoring**: Set up alerts for excessive missing data
3. **Backup**: Regular database backups before major gap-filling operations
### **Data Quality Assurance**
1. **Validation**: Check for reasonable value ranges after gap filling
2. **Comparison**: Compare filled data with nearby timestamps for consistency
3. **Documentation**: Log all gap-filling activities for audit trails
This enhanced gap-filling system ensures comprehensive and accurate water level monitoring with minimal data loss and automatic recovery capabilities.

475
docs/GEOLOCATION_GUIDE.md Normal file
View File

@@ -0,0 +1,475 @@
# Geolocation Support for Grafana Geomap
This guide explains the geolocation functionality added to the Thailand Water Monitor for use with Grafana's geomap visualization.
## ✅ **Implemented Features**
### **Database Schema Updates**
All database adapters now support geolocation fields:
- **latitude**: Decimal latitude coordinates (DECIMAL(10,8) for SQL, REAL for SQLite)
- **longitude**: Decimal longitude coordinates (DECIMAL(11,8) for SQL, REAL for SQLite)
- **geohash**: Geohash string for efficient spatial indexing (VARCHAR(20)/TEXT)
### **Station Data Enhancement**
Station mapping now includes geolocation fields:
```python
'8': {
'code': 'P.1',
'thai_name': 'สะพานนวรัฐ',
'english_name': 'Nawarat Bridge',
'latitude': 15.6944, # Decimal degrees
'longitude': 100.2028, # Decimal degrees
'geohash': 'w5q6uuhvfcfp25' # Geohash for P.1
}
```
## 🗄️ **Database Schema**
### **Updated Stations Table**
```sql
CREATE TABLE stations (
id INTEGER PRIMARY KEY,
station_code TEXT UNIQUE NOT NULL,
thai_name TEXT NOT NULL,
english_name TEXT NOT NULL,
latitude REAL, -- NEW: Latitude coordinate
longitude REAL, -- NEW: Longitude coordinate
geohash TEXT, -- NEW: Geohash for spatial indexing
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```
### **Database Support**
-**SQLite**: REAL columns for coordinates, TEXT for geohash
-**PostgreSQL**: DECIMAL(10,8) and DECIMAL(11,8) for coordinates, VARCHAR(20) for geohash
-**MySQL**: DECIMAL(10,8) and DECIMAL(11,8) for coordinates, VARCHAR(20) for geohash
-**VictoriaMetrics**: Geolocation data included in metric labels
## 📊 **Current Station Data**
### **P.1 - Nawarat Bridge (Sample)**
- **Station Code**: P.1
- **Thai Name**: สะพานนวรัฐ
- **English Name**: Nawarat Bridge
- **Latitude**: 15.6944
- **Longitude**: 100.2028
- **Geohash**: w5q6uuhvfcfp25
### **Remaining Stations**
The following stations are ready for geolocation data when coordinates become available:
- P.20 - บ้านเชียงดาว (Ban Chiang Dao)
- P.75 - บ้านช่อแล (Ban Chai Lat)
- P.92 - บ้านเมืองกึ๊ด (Ban Muang Aut)
- P.4A - บ้านแม่แตง (Ban Mae Taeng)
- P.67 - บ้านแม่แต (Ban Tae)
- P.21 - บ้านริมใต้ (Ban Rim Tai)
- P.103 - สะพานวงแหวนรอบ 3 (Ring Bridge 3)
- P.82 - บ้านสบวิน (Ban Sob win)
- P.84 - บ้านพันตน (Ban Panton)
- P.81 - บ้านโป่ง (Ban Pong)
- P.5 - สะพานท่านาง (Tha Nang Bridge)
- P.77 - บ้านสบแม่สะป๊วด (Baan Sop Mae Sapuord)
- P.87 - บ้านป่าซาง (Ban Pa Sang)
- P.76 - บ้านแม่อีไฮ (Banb Mae I Hai)
- P.85 - บ้านหล่ายแก้ว (Baan Lai Kaew)
## 🗺️ **Grafana Geomap Integration**
### **Data Source Configuration**
The geolocation data is automatically included in all database queries and can be used directly in Grafana:
#### **SQLite/PostgreSQL/MySQL Query Example**
```sql
SELECT
m.timestamp,
s.station_code,
s.english_name,
s.thai_name,
s.latitude,
s.longitude,
s.geohash,
m.water_level,
m.discharge,
m.discharge_percent
FROM water_measurements m
JOIN stations s ON m.station_id = s.id
WHERE s.latitude IS NOT NULL
AND s.longitude IS NOT NULL
ORDER BY m.timestamp DESC
```
#### **VictoriaMetrics Query Example**
```promql
water_level{latitude!="",longitude!=""}
```
### **Geomap Panel Configuration**
#### **1. Create Geomap Panel**
1. Add new panel in Grafana
2. Select "Geomap" visualization
3. Configure data source (SQLite/PostgreSQL/MySQL/VictoriaMetrics)
#### **2. Configure Location Fields**
- **Latitude Field**: `latitude`
- **Longitude Field**: `longitude`
- **Alternative**: Use `geohash` field for geohash-based positioning
#### **3. Configure Display Options**
- **Station Labels**: Use `station_code` or `english_name`
- **Tooltip Information**: Include `thai_name`, `water_level`, `discharge`
- **Color Mapping**: Map to `water_level` or `discharge_percent`
#### **4. Sample Geomap Configuration**
```json
{
"type": "geomap",
"title": "Thailand Water Stations",
"targets": [
{
"rawSql": "SELECT latitude, longitude, station_code, english_name, water_level, discharge_percent FROM stations s JOIN water_measurements m ON s.id = m.station_id WHERE s.latitude IS NOT NULL AND m.timestamp = (SELECT MAX(timestamp) FROM water_measurements WHERE station_id = s.id)",
"format": "table"
}
],
"fieldConfig": {
"defaults": {
"custom": {
"hideFrom": {
"legend": false,
"tooltip": false,
"vis": false
}
},
"mappings": [],
"color": {
"mode": "continuous-GrYlRd",
"field": "water_level"
}
}
},
"options": {
"view": {
"id": "coords",
"lat": 15.6944,
"lon": 100.2028,
"zoom": 8
},
"controls": {
"mouseWheelZoom": true,
"showZoom": true,
"showAttribution": true
},
"layers": [
{
"type": "markers",
"config": {
"size": {
"field": "discharge_percent",
"min": 5,
"max": 20
},
"color": {
"field": "water_level"
},
"showLegend": true
}
}
]
}
}
```
## 🔧 **Adding New Station Coordinates**
### **Method 1: Update Station Mapping**
Edit `water_scraper_v3.py` and add coordinates to the station mapping:
```python
'1': {
'code': 'P.20',
'thai_name': 'บ้านเชียงดาว',
'english_name': 'Ban Chiang Dao',
'latitude': 19.3056, # Add actual coordinates
'longitude': 98.9264, # Add actual coordinates
'geohash': 'w4r6...' # Add actual geohash
}
```
### **Method 2: Direct Database Update**
```sql
UPDATE stations
SET latitude = 19.3056, longitude = 98.9264, geohash = 'w4r6uuhvfcfp25'
WHERE station_code = 'P.20';
```
### **Method 3: Bulk Update Script**
```python
import sqlite3
coordinates = {
'P.20': {'lat': 19.3056, 'lon': 98.9264, 'geohash': 'w4r6uuhvfcfp25'},
'P.75': {'lat': 18.7756, 'lon': 99.1234, 'geohash': 'w4r5uuhvfcfp25'},
# Add more stations...
}
conn = sqlite3.connect('water_monitoring.db')
cursor = conn.cursor()
for station_code, coords in coordinates.items():
cursor.execute("""
UPDATE stations
SET latitude = ?, longitude = ?, geohash = ?
WHERE station_code = ?
""", (coords['lat'], coords['lon'], coords['geohash'], station_code))
conn.commit()
conn.close()
```
## 🌐 **Geohash Information**
### **What is Geohash?**
Geohash is a geocoding system that represents geographic coordinates as a short alphanumeric string. It provides:
- **Spatial Indexing**: Efficient spatial queries
- **Proximity**: Similar geohashes indicate nearby locations
- **Hierarchical**: Longer geohashes provide more precision
### **Geohash Precision Levels**
- **5 characters**: ~2.4km precision
- **6 characters**: ~610m precision
- **7 characters**: ~76m precision
- **8 characters**: ~19m precision
- **9+ characters**: <5m precision
### **Example: P.1 Geohash**
- **Geohash**: `w5q6uuhvfcfp25`
- **Length**: 14 characters
- **Precision**: Sub-meter accuracy
- **Location**: Nawarat Bridge, Thailand
## 📈 **Grafana Visualization Examples**
### **1. Station Location Map**
- **Type**: Geomap with markers
- **Data**: Current station locations
- **Color**: Water level or discharge percentage
- **Size**: Discharge volume
### **2. Regional Water Levels**
- **Type**: Geomap with heatmap
- **Data**: Water level data across regions
- **Visualization**: Color-coded intensity map
- **Filters**: Time range, station groups
### **3. Alert Zones**
- **Type**: Geomap with threshold markers
- **Data**: Stations exceeding alert thresholds
- **Visualization**: Red markers for high water levels
- **Alerts**: Automated notifications for critical levels
## 🔄 **Updating a Running System**
### **Automated Migration Script**
Use the provided migration script to safely add geolocation columns to your existing database:
```bash
# Stop the water monitoring service first
sudo systemctl stop water-monitor
# Run the migration script
python migrate_geolocation.py
# Restart the service
sudo systemctl start water-monitor
```
### **Migration Script Features**
-**Auto-detects database type** from environment variables
-**Checks existing columns** to avoid conflicts
-**Supports all database types** (SQLite, PostgreSQL, MySQL)
-**Adds sample data** for P.1 station
-**Safe operation** - won't break existing data
### **Step-by-Step Migration Process**
#### **1. Stop the Application**
```bash
# If running as systemd service
sudo systemctl stop water-monitor
# If running in screen/tmux
# Use Ctrl+C to stop the process
# If running as Docker container
docker stop water-monitor
```
#### **2. Backup Your Database**
```bash
# SQLite backup
cp water_monitoring.db water_monitoring.db.backup
# PostgreSQL backup
pg_dump water_monitoring > water_monitoring_backup.sql
# MySQL backup
mysqldump water_monitoring > water_monitoring_backup.sql
```
#### **3. Run Migration Script**
```bash
# Default (uses environment variables)
python migrate_geolocation.py
# Or specify database path for SQLite
SQLITE_DB_PATH=/path/to/water_monitoring.db python migrate_geolocation.py
```
#### **4. Verify Migration**
```bash
# Check SQLite schema
sqlite3 water_monitoring.db ".schema stations"
# Check PostgreSQL schema
psql -d water_monitoring -c "\d stations"
# Check MySQL schema
mysql -e "DESCRIBE water_monitoring.stations"
```
#### **5. Update Application Code**
Ensure you have the latest version of the application with geolocation support:
```bash
# Pull latest code
git pull origin main
# Install any new dependencies
pip install -r requirements.txt
```
#### **6. Restart Application**
```bash
# Systemd service
sudo systemctl start water-monitor
# Docker container
docker start water-monitor
# Manual execution
python water_scraper_v3.py
```
### **Migration Output Example**
```
2025-07-28 17:30:00,123 - INFO - Starting geolocation column migration...
2025-07-28 17:30:00,124 - INFO - Detected database type: SQLITE
2025-07-28 17:30:00,125 - INFO - Migrating SQLite database: water_monitoring.db
2025-07-28 17:30:00,126 - INFO - Current columns in stations table: ['id', 'station_code', 'thai_name', 'english_name', 'created_at', 'updated_at']
2025-07-28 17:30:00,127 - INFO - Added latitude column
2025-07-28 17:30:00,128 - INFO - Added longitude column
2025-07-28 17:30:00,129 - INFO - Added geohash column
2025-07-28 17:30:00,130 - INFO - Successfully added columns: latitude, longitude, geohash
2025-07-28 17:30:00,131 - INFO - Updated P.1 station with sample geolocation data
2025-07-28 17:30:00,132 - INFO - P.1 station geolocation: ('P.1', 15.6944, 100.2028, 'w5q6uuhvfcfp25')
2025-07-28 17:30:00,133 - INFO - ✅ Migration completed successfully!
2025-07-28 17:30:00,134 - INFO - You can now restart your water monitoring application
2025-07-28 17:30:00,135 - INFO - The system will automatically use the new geolocation columns
```
## 🔍 **Troubleshooting**
### **Migration Issues**
#### **Database Locked Error**
```bash
# Stop all processes using the database
sudo systemctl stop water-monitor
pkill -f water_scraper
# Wait a few seconds, then run migration
sleep 5
python migrate_geolocation.py
```
#### **Permission Denied**
```bash
# Check database file permissions
ls -la water_monitoring.db
# Fix permissions if needed
sudo chown $USER:$USER water_monitoring.db
chmod 664 water_monitoring.db
```
#### **Missing Dependencies**
```bash
# For PostgreSQL
pip install psycopg2-binary
# For MySQL
pip install pymysql
# For all databases
pip install -r requirements.txt
```
### **Verification Issues**
#### **Missing Coordinates**
If stations don't appear on the geomap:
1. Check if latitude/longitude are NULL in database
2. Verify geolocation data in station mapping
3. Ensure database schema includes geolocation columns
4. Run migration script if columns are missing
#### **Incorrect Positioning**
If stations appear in wrong locations:
1. Verify coordinate format (decimal degrees)
2. Check latitude/longitude order (lat first, lon second)
3. Validate geohash accuracy
### **Rollback Procedure**
If migration causes issues:
#### **SQLite Rollback**
```bash
# Stop application
sudo systemctl stop water-monitor
# Restore backup
cp water_monitoring.db.backup water_monitoring.db
# Restart with old version
sudo systemctl start water-monitor
```
#### **PostgreSQL Rollback**
```sql
-- Remove added columns
ALTER TABLE stations DROP COLUMN IF EXISTS latitude;
ALTER TABLE stations DROP COLUMN IF EXISTS longitude;
ALTER TABLE stations DROP COLUMN IF EXISTS geohash;
```
#### **MySQL Rollback**
```sql
-- Remove added columns
ALTER TABLE stations DROP COLUMN latitude;
ALTER TABLE stations DROP COLUMN longitude;
ALTER TABLE stations DROP COLUMN geohash;
```
## 🎯 **Next Steps**
### **Immediate Actions**
1. **Gather Coordinates**: Collect GPS coordinates for all 16 stations
2. **Update Database**: Add coordinates to remaining stations
3. **Create Dashboards**: Build Grafana geomap visualizations
### **Future Enhancements**
1. **Automatic Geocoding**: API integration for address-to-coordinate conversion
2. **Mobile GPS**: Mobile app for field coordinate collection
3. **Satellite Integration**: Satellite imagery overlay in Grafana
4. **Geofencing**: Alert zones based on geographic boundaries
The geolocation functionality is now fully implemented and ready for use with Grafana's geomap visualization. Station P.1 (Nawarat Bridge) serves as a working example with complete coordinate data.

295
docs/GITEA_WORKFLOWS.md Normal file
View File

@@ -0,0 +1,295 @@
# 🔄 Gitea Actions Workflows - Northern Thailand Ping River Monitor
## 📋 Overview
This document describes the Gitea Actions workflows configured for the Northern Thailand Ping River Monitor project. These workflows provide comprehensive CI/CD, security scanning, and documentation generation.
## 🚀 Available Workflows
### 1. **CI/CD Pipeline** (`.gitea/workflows/ci.yml`)
**Triggers:**
- Push to `main` or `develop` branches
- Pull requests to `main`
- Daily scheduled runs at 2 AM UTC
**Jobs:**
- **Test Suite**: Multi-version Python testing (3.9-3.12)
- **Code Quality**: Linting, formatting, and type checking
- **Build**: Docker image creation and testing
- **Integration Test**: Testing with VictoriaMetrics service
- **Deploy Staging**: Automatic deployment to staging (develop branch)
- **Deploy Production**: Manual deployment to production (main branch)
- **Performance Test**: Load testing after production deployment
**Key Features:**
- ✅ Multi-Python version testing
- ✅ Docker multi-architecture builds (amd64, arm64)
- ✅ Service integration testing
- ✅ Automatic staging deployment
- ✅ Manual production approval
- ✅ Performance validation
### 2. **Security & Dependency Updates** (`.gitea/workflows/security.yml`)
**Triggers:**
- Daily scheduled runs at 3 AM UTC
- Manual dispatch
- Changes to requirements files or Dockerfile
**Jobs:**
- **Dependency Scan**: Safety, Bandit, Semgrep security scans
- **Docker Security**: Trivy vulnerability scanning
- **License Check**: License compliance verification
- **Dependency Updates**: Automated update detection
- **Code Quality**: Complexity and maintainability analysis
**Key Features:**
- 🔒 Daily security scans
- 📦 Dependency vulnerability detection
- 📄 License compliance checking
- 🔄 Automated update notifications
- 📊 Code quality metrics
### 3. **Release Workflow** (`.gitea/workflows/release.yml`)
**Triggers:**
- Git tags matching `v*.*.*` pattern
- Manual dispatch with version input
**Jobs:**
- **Create Release**: Automated release creation with changelog
- **Test Release**: Comprehensive testing across Python versions
- **Build Release**: Multi-architecture Docker images with proper tags
- **Security Scan**: Trivy security scanning of release images
- **Deploy Release**: Production deployment with health checks
- **Validate Release**: Post-deployment validation and testing
**Key Features:**
- 🏷️ Automated release creation
- 📝 Changelog generation
- 🐳 Multi-architecture Docker builds
- 🔒 Security scanning
- ✅ Comprehensive validation
### 4. **Documentation** (`.gitea/workflows/docs.yml`)
**Triggers:**
- Changes to documentation files
- Changes to Python source files
- Manual dispatch
**Jobs:**
- **Validate Docs**: Link checking and structure validation
- **Generate API Docs**: OpenAPI specification generation
- **Build Sphinx Docs**: Comprehensive API documentation
- **Documentation Summary**: Build status and artifact summary
**Key Features:**
- 📚 Automated API documentation
- 🔗 Link validation
- 📖 Sphinx documentation generation
- ✅ Documentation completeness checking
## 🔧 Workflow Configuration
### **Required Secrets**
Configure these secrets in your Gitea repository settings:
```bash
GITEA_TOKEN # Gitea access token for container registry
SLACK_WEBHOOK_URL # Optional: Slack notifications
STAGING_WEBHOOK_URL # Optional: Staging deployment webhook
PRODUCTION_WEBHOOK_URL # Optional: Production deployment webhook
```
### **Environment Variables**
Key environment variables used across workflows:
```yaml
PYTHON_VERSION: '3.11' # Default Python version
REGISTRY: git.b4l.co.th # Container registry
IMAGE_NAME: grabowski/northern-thailand-ping-river-monitor
```
## 📊 Workflow Status
### **CI/CD Pipeline Status**
- **Test Coverage**: Multi-version Python testing
- **Code Quality**: Automated linting and formatting
- **Security**: Integrated security scanning
- **Deployment**: Automated staging, manual production
### **Security Monitoring**
- **Daily Scans**: Automated vulnerability detection
- **Dependency Updates**: Proactive update notifications
- **License Compliance**: Automated license checking
- **Code Quality**: Continuous quality monitoring
### **Release Management**
- **Automated Releases**: Tag-based release creation
- **Multi-Architecture**: Support for amd64 and arm64
- **Security Validation**: Pre-deployment security checks
- **Health Monitoring**: Post-deployment validation
## 🚀 Usage Examples
### **Triggering Workflows**
**Manual CI/CD Run:**
```bash
# Push to trigger CI/CD
git push origin main
# Create pull request to trigger testing
git checkout -b feature/new-feature
git push origin feature/new-feature
# Create PR in Gitea UI
```
**Manual Security Scan:**
```bash
# Trigger via Gitea Actions UI
# Go to Actions → Security & Dependency Updates → Run workflow
```
**Creating a Release:**
```bash
# Create and push a tag
git tag v3.1.1
git push origin v3.1.1
# Or use manual dispatch in Gitea Actions UI
```
### **Monitoring Workflow Results**
**Check Workflow Status:**
1. Navigate to your repository in Gitea
2. Click on "Actions" tab
3. View workflow runs and their status
**Download Artifacts:**
1. Click on a completed workflow run
2. Scroll to "Artifacts" section
3. Download reports and logs
**View Security Reports:**
1. Go to Security workflow runs
2. Download security-reports artifacts
3. Review JSON reports for vulnerabilities
## 🔍 Troubleshooting
### **Common Issues**
**Workflow Fails on Dependencies:**
```bash
# Check requirements.txt for version conflicts
pip-compile requirements.in
```
**Docker Build Fails:**
```bash
# Test Docker build locally
make docker-build
docker run --rm ping-river-monitor python run.py --test
```
**Security Scan Failures:**
```bash
# Run security scans locally
safety check -r requirements.txt
bandit -r src/
```
**Test Failures:**
```bash
# Run tests locally
make test
python tests/test_integration.py
```
### **Debugging Workflows**
**Enable Debug Logging:**
Add to workflow file:
```yaml
env:
ACTIONS_STEP_DEBUG: true
ACTIONS_RUNNER_DEBUG: true
```
**Check Workflow Logs:**
1. Go to failed workflow run
2. Click on failed job
3. Expand failed step to see detailed logs
**Validate Workflow Syntax:**
```bash
# Validate YAML syntax
make validate-workflows
```
## 📈 Performance Optimization
### **Caching Strategy**
- **Pip Cache**: Cached across workflow runs
- **Docker Layer Cache**: GitHub Actions cache for faster builds
- **Dependency Cache**: Cached based on requirements.txt hash
### **Parallel Execution**
- **Matrix Builds**: Multiple Python versions tested in parallel
- **Independent Jobs**: Security scans run independently of tests
- **Conditional Execution**: Jobs skip when not needed
### **Resource Management**
- **Timeout Settings**: Prevent hanging workflows
- **Resource Limits**: Appropriate runner sizing
- **Artifact Cleanup**: Automatic cleanup of old artifacts
## 🔒 Security Best Practices
### **Secret Management**
- Use Gitea repository secrets for sensitive data
- Never commit secrets to repository
- Rotate secrets regularly
- Use least-privilege access tokens
### **Container Security**
- Multi-stage Docker builds for smaller images
- Non-root user in containers
- Regular base image updates
- Vulnerability scanning before deployment
### **Code Security**
- Automated security scanning in CI/CD
- Dependency vulnerability monitoring
- License compliance checking
- Code quality enforcement
## 📚 Additional Resources
### **Gitea Actions Documentation**
- [Gitea Actions Overview](https://docs.gitea.io/en-us/usage/actions/)
- [Workflow Syntax](https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions)
- [Available Actions](https://github.com/marketplace?type=actions)
### **Project-Specific Resources**
- [Contributing Guide](../CONTRIBUTING.md)
- [Deployment Checklist](../DEPLOYMENT_CHECKLIST.md)
- [Project Structure](PROJECT_STRUCTURE.md)
### **Monitoring and Alerts**
- Workflow status badges in README
- Email notifications for failures
- Slack/Discord integration for team updates
- Grafana dashboards for deployment metrics
---
**Workflow Version**: v3.1.0
**Last Updated**: 2025-08-12
**Maintained By**: Ping River Monitor Team

389
docs/HTTPS_CONFIGURATION.md Normal file
View File

@@ -0,0 +1,389 @@
# HTTPS VictoriaMetrics Configuration Guide
This guide explains how to configure the Thailand Water Monitor to connect to VictoriaMetrics through HTTPS and reverse proxies.
## Configuration Options
### 1. Environment Variables for HTTPS
```bash
# Option 1: Full HTTPS URL (Recommended)
export DB_TYPE=victoriametrics
export VM_HOST=https://vm.example.com
export VM_PORT=443
# Option 2: Host and port separately
export DB_TYPE=victoriametrics
export VM_HOST=vm.example.com
export VM_PORT=443
# Option 3: Custom port with HTTPS
export DB_TYPE=victoriametrics
export VM_HOST=https://vm.example.com
export VM_PORT=8443
```
### 2. Windows PowerShell Configuration
```powershell
# Set environment variables for HTTPS
$env:DB_TYPE="victoriametrics"
$env:VM_HOST="https://vm.example.com"
$env:VM_PORT="443"
# Run the water monitor
python water_scraper_v3.py
```
### 3. Linux/Mac Configuration
```bash
# Set environment variables for HTTPS
export DB_TYPE=victoriametrics
export VM_HOST=https://vm.example.com
export VM_PORT=443
# Run the water monitor
python water_scraper_v3.py
```
## Reverse Proxy Examples
### 1. Nginx Reverse Proxy
```nginx
server {
listen 443 ssl http2;
server_name vm.example.com;
# SSL Configuration
ssl_certificate /path/to/certificate.crt;
ssl_certificate_key /path/to/private.key;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512;
# Security headers
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
add_header X-Frame-Options DENY always;
add_header X-Content-Type-Options nosniff always;
# Optional: Basic authentication
# auth_basic "VictoriaMetrics";
# auth_basic_user_file /etc/nginx/.htpasswd;
location / {
proxy_pass http://localhost:8428;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support (if needed)
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
# Timeouts
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
}
}
# Redirect HTTP to HTTPS
server {
listen 80;
server_name vm.example.com;
return 301 https://$server_name$request_uri;
}
```
### 2. Apache Reverse Proxy
```apache
<VirtualHost *:443>
ServerName vm.example.com
# SSL Configuration
SSLEngine on
SSLCertificateFile /path/to/certificate.crt
SSLCertificateKeyFile /path/to/private.key
SSLProtocol all -SSLv3 -TLSv1 -TLSv1.1
SSLCipherSuite ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384
# Security headers
Header always set Strict-Transport-Security "max-age=31536000; includeSubDomains"
Header always set X-Frame-Options DENY
Header always set X-Content-Type-Options nosniff
# Reverse proxy configuration
ProxyPreserveHost On
ProxyPass / http://localhost:8428/
ProxyPassReverse / http://localhost:8428/
# Optional: Basic authentication
# AuthType Basic
# AuthName "VictoriaMetrics"
# AuthUserFile /etc/apache2/.htpasswd
# Require valid-user
</VirtualHost>
<VirtualHost *:80>
ServerName vm.example.com
Redirect permanent / https://vm.example.com/
</VirtualHost>
```
### 3. Traefik Reverse Proxy
```yaml
# docker-compose.yml with Traefik
version: '3.8'
services:
traefik:
image: traefik:v2.10
command:
- --api.dashboard=true
- --entrypoints.web.address=:80
- --entrypoints.websecure.address=:443
- --providers.docker=true
- --certificatesresolvers.letsencrypt.acme.tlschallenge=true
- --certificatesresolvers.letsencrypt.acme.email=admin@example.com
- --certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json
ports:
- "80:80"
- "443:443"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- letsencrypt:/letsencrypt
labels:
- traefik.http.routers.api.rule=Host(`traefik.example.com`)
- traefik.http.routers.api.tls.certresolver=letsencrypt
victoriametrics:
image: victoriametrics/victoria-metrics:latest
command:
- '--storageDataPath=/victoria-metrics-data'
- '--retentionPeriod=2y'
- '--httpListenAddr=:8428'
volumes:
- vm_data:/victoria-metrics-data
labels:
- traefik.enable=true
- traefik.http.routers.vm.rule=Host(`vm.example.com`)
- traefik.http.routers.vm.tls.certresolver=letsencrypt
- traefik.http.services.vm.loadbalancer.server.port=8428
volumes:
vm_data:
letsencrypt:
```
## Testing HTTPS Configuration
### 1. Test Connection
```bash
# Test HTTPS connection
curl -k https://vm.example.com/health
# Test with specific port
curl -k https://vm.example.com:8443/health
# Test API endpoint
curl -k "https://vm.example.com/api/v1/query?query=up"
```
### 2. Test with Water Monitor
```bash
# Set environment variables
export DB_TYPE=victoriametrics
export VM_HOST=https://vm.example.com
export VM_PORT=443
# Test with demo script
python demo_databases.py victoriametrics
# Run full water monitor
python water_scraper_v3.py
```
### 3. Verify SSL Certificate
```bash
# Check SSL certificate
openssl s_client -connect vm.example.com:443 -servername vm.example.com
# Check certificate expiration
echo | openssl s_client -connect vm.example.com:443 2>/dev/null | openssl x509 -noout -dates
```
## Configuration Examples
### 1. Production HTTPS Setup
```bash
# Environment variables for production
export DB_TYPE=victoriametrics
export VM_HOST=https://metrics.company.com
export VM_PORT=443
export LOG_LEVEL=INFO
export SCRAPING_INTERVAL_HOURS=1
# Run water monitor
python water_scraper_v3.py
```
### 2. Development with Self-Signed Certificate
```bash
# For development with self-signed certificates
export DB_TYPE=victoriametrics
export VM_HOST=https://dev-vm.local
export VM_PORT=443
export PYTHONHTTPSVERIFY=0 # Disable SSL verification (dev only)
python water_scraper_v3.py
```
### 3. Custom Port Configuration
```bash
# Custom HTTPS port
export DB_TYPE=victoriametrics
export VM_HOST=https://vm.example.com
export VM_PORT=8443
python water_scraper_v3.py
```
## Troubleshooting HTTPS Issues
### 1. SSL Certificate Errors
```bash
# Error: SSL certificate verify failed
# Solution: Check certificate validity
openssl x509 -in certificate.crt -text -noout
# Temporary workaround (not recommended for production)
export PYTHONHTTPSVERIFY=0
```
### 2. Connection Timeout
```bash
# Error: Connection timeout
# Check firewall and network connectivity
telnet vm.example.com 443
nc -zv vm.example.com 443
```
### 3. DNS Resolution Issues
```bash
# Error: Name resolution failed
# Check DNS resolution
nslookup vm.example.com
dig vm.example.com
```
### 4. Proxy Configuration Issues
```bash
# Check proxy logs
# Nginx
tail -f /var/log/nginx/error.log
# Apache
tail -f /var/log/apache2/error.log
# Test direct connection to backend
curl http://localhost:8428/health
```
## Security Best Practices
### 1. SSL/TLS Configuration
- Use TLS 1.2 or higher
- Disable weak ciphers
- Enable HSTS headers
- Use strong SSL certificates
### 2. Authentication
```nginx
# Basic authentication in Nginx
auth_basic "VictoriaMetrics Access";
auth_basic_user_file /etc/nginx/.htpasswd;
# Create password file
htpasswd -c /etc/nginx/.htpasswd username
```
### 3. Network Security
- Use firewall rules to restrict access
- Consider VPN for internal access
- Implement rate limiting
- Monitor access logs
### 4. Certificate Management
```bash
# Auto-renewal with Let's Encrypt
certbot renew --dry-run
# Certificate monitoring
echo | openssl s_client -connect vm.example.com:443 2>/dev/null | \
openssl x509 -noout -dates | grep notAfter
```
## Docker Configuration for HTTPS
### 1. Docker Compose with HTTPS
```yaml
version: '3.8'
services:
water-monitor:
build: .
environment:
- DB_TYPE=victoriametrics
- VM_HOST=https://vm.example.com
- VM_PORT=443
restart: unless-stopped
depends_on:
- victoriametrics
victoriametrics:
image: victoriametrics/victoria-metrics:latest
ports:
- "8428:8428"
volumes:
- vm_data:/victoria-metrics-data
command:
- '--storageDataPath=/victoria-metrics-data'
- '--retentionPeriod=2y'
- '--httpListenAddr=:8428'
volumes:
vm_data:
```
### 2. Environment File (.env)
```bash
# .env file
DB_TYPE=victoriametrics
VM_HOST=https://vm.example.com
VM_PORT=443
LOG_LEVEL=INFO
SCRAPING_INTERVAL_HOURS=1
```
This configuration guide provides comprehensive instructions for setting up HTTPS connectivity to VictoriaMetrics through reverse proxies, ensuring secure and reliable data transmission for the Thailand Water Monitor.

View File

@@ -0,0 +1,136 @@
# Geolocation Migration Quick Start
This is a quick reference guide for updating a running Thailand Water Monitor system to add geolocation support for Grafana geomap.
## 🚀 **Quick Migration (5 minutes)**
### **Step 1: Stop Application**
```bash
# Stop the service (choose your method)
sudo systemctl stop water-monitor
# OR
docker stop water-monitor
# OR use Ctrl+C if running manually
```
### **Step 2: Backup Database**
```bash
# SQLite backup
cp water_monitoring.db water_monitoring.db.backup
# PostgreSQL backup
pg_dump water_monitoring > backup.sql
# MySQL backup
mysqldump water_monitoring > backup.sql
```
### **Step 3: Run Migration**
```bash
# Run the automated migration script
python migrate_geolocation.py
```
### **Step 4: Restart Application**
```bash
# Restart the service
sudo systemctl start water-monitor
# OR
docker start water-monitor
# OR
python water_scraper_v3.py
```
## ✅ **Expected Output**
```
2025-07-28 17:30:00,123 - INFO - Starting geolocation column migration...
2025-07-28 17:30:00,124 - INFO - Detected database type: SQLITE
2025-07-28 17:30:00,127 - INFO - Added latitude column
2025-07-28 17:30:00,128 - INFO - Added longitude column
2025-07-28 17:30:00,129 - INFO - Added geohash column
2025-07-28 17:30:00,133 - INFO - ✅ Migration completed successfully!
```
## 🗺️ **Verify Geolocation Works**
### **Check Database**
```bash
# SQLite
sqlite3 water_monitoring.db "SELECT station_code, latitude, longitude, geohash FROM stations WHERE station_code = 'P.1';"
# Expected output: P.1|15.6944|100.2028|w5q6uuhvfcfp25
```
### **Test Application**
```bash
# Run a test cycle
python water_scraper_v3.py --test
# Should complete without errors
```
## 🔧 **Grafana Setup**
### **Query for Geomap**
```sql
SELECT
s.latitude, s.longitude, s.station_code, s.english_name,
m.water_level, m.discharge_percent
FROM stations s
JOIN water_measurements m ON s.id = m.station_id
WHERE s.latitude IS NOT NULL
AND m.timestamp = (SELECT MAX(timestamp) FROM water_measurements WHERE station_id = s.id)
```
### **Geomap Configuration**
1. Create new panel → Select "Geomap"
2. Set **Latitude field**: `latitude`
3. Set **Longitude field**: `longitude`
4. Set **Color field**: `water_level`
5. Set **Size field**: `discharge_percent`
## 🚨 **Troubleshooting**
### **Database Locked**
```bash
sudo systemctl stop water-monitor
pkill -f water_scraper
sleep 5
python migrate_geolocation.py
```
### **Permission Error**
```bash
sudo chown $USER:$USER water_monitoring.db
chmod 664 water_monitoring.db
```
### **Missing Dependencies**
```bash
pip install psycopg2-binary pymysql
```
## 🔄 **Rollback (if needed)**
```bash
# Stop application
sudo systemctl stop water-monitor
# Restore backup
cp water_monitoring.db.backup water_monitoring.db
# Restart
sudo systemctl start water-monitor
```
## 📚 **More Information**
- **Full Guide**: See `GEOLOCATION_GUIDE.md`
- **Migration Script**: `migrate_geolocation.py`
- **Database Schema**: Updated with latitude, longitude, geohash columns
## 🎯 **What You Get**
-**P.1 Station** ready for geomap (Nawarat Bridge)
-**Database Schema** updated for all 16 stations
-**Grafana Compatible** data structure
-**Backward Compatible** - existing data preserved
**Total Time**: ~5 minutes for complete migration

206
docs/PROJECT_STATUS.md Normal file
View File

@@ -0,0 +1,206 @@
# Thailand Water Monitor - Current Project Status
## 📁 **Clean Project Structure**
The project has been cleaned up and organized with the following structure:
```
water_level_monitor/
├── 📄 .gitignore # Git ignore rules
├── 📄 README.md # Main project documentation
├── 📄 requirements.txt # Python dependencies
├── 📄 config.py # Configuration management
├── 📄 water_scraper_v3.py # Main application (15-min scheduler)
├── 📄 database_adapters.py # Multi-database support
├── 📄 demo_databases.py # Database demonstration
├── 📄 Dockerfile # Container configuration
├── 📄 docker-compose.victoriametrics.yml # VictoriaMetrics stack
├── 📚 Documentation/
│ ├── 📄 DATABASE_DEPLOYMENT_GUIDE.md # Multi-database setup guide
│ ├── 📄 DEBIAN_TROUBLESHOOTING.md # Linux deployment guide
│ ├── 📄 ENHANCED_SCHEDULER_GUIDE.md # 15-minute scheduler guide
│ ├── 📄 GAP_FILLING_GUIDE.md # Data gap filling guide
│ ├── 📄 HTTPS_CONFIGURATION.md # HTTPS setup guide
│ └── 📄 VICTORIAMETRICS_SETUP.md # VictoriaMetrics guide
└── 📁 grafana/ # Grafana configuration
├── 📁 provisioning/
│ ├── 📁 datasources/
│ │ └── 📄 victoriametrics.yml # VictoriaMetrics data source
│ └── 📁 dashboards/
│ └── 📄 dashboard.yml # Dashboard provider config
└── 📁 dashboards/
└── 📄 water-monitoring-dashboard.json # Pre-built dashboard
```
## 🧹 **Files Removed During Cleanup**
### **Old Data Files**
-`thailand_water_data_v2.csv` - Old CSV export
-`water_monitor.log` - Log file (regenerated automatically)
-`water_monitoring.db` - SQLite database (recreated automatically)
### **Outdated Documentation**
-`FINAL_SUMMARY.md` - Contained references to non-existent v2 files
-`PROJECT_SUMMARY.md` - Outdated project information
### **System Files**
-`__pycache__/` - Python compiled files directory
## ✅ **Current Features**
### **Enhanced 15-Minute Scheduler**
- **Timing**: Runs every 15 minutes (1:00, 1:15, 1:30, 1:45, 2:00, etc.)
- **Full Checks**: At :00 minutes (gap filling + data updates)
- **Quick Checks**: At :15, :30, :45 minutes (data fetch only)
- **Gap Filling**: Automatically fills missing historical data
- **Data Updates**: Updates existing records when values change
### **Multi-Database Support**
- **VictoriaMetrics** (Recommended) - High-performance time-series
- **InfluxDB** - Purpose-built time-series database
- **PostgreSQL + TimescaleDB** - Relational with time-series optimization
- **MySQL** - Traditional relational database
- **SQLite** - Local development and testing
### **Production Features**
- **Docker Support**: Complete containerization
- **Grafana Integration**: Pre-built dashboards
- **HTTPS Configuration**: Secure deployment options
- **Health Monitoring**: Comprehensive logging and error handling
- **Gap Detection**: Automatic identification of missing data
- **Retry Logic**: Database lock handling and network error recovery
## 🚀 **Quick Start**
### **1. Basic Setup (SQLite)**
```bash
cd water_level_monitor
pip install -r requirements.txt
python water_scraper_v3.py
```
### **2. VictoriaMetrics Setup**
```bash
# Start VictoriaMetrics + Grafana
docker-compose -f docker-compose.victoriametrics.yml up -d
# Configure environment
export DB_TYPE=victoriametrics
export VM_HOST=localhost
export VM_PORT=8428
# Run monitor
python water_scraper_v3.py
```
### **3. Test Different Databases**
```bash
# Test all supported databases
python demo_databases.py all
# Test specific database
python demo_databases.py victoriametrics
```
## 📊 **Data Collection**
### **Station Coverage**
- **16 Water Monitoring Stations** across Thailand
- **Accurate Station Codes**: P.1, P.20, P.21, P.4A, P.5, P.67, P.75, P.76, P.77, P.81, P.82, P.84, P.85, P.87, P.92, P.103
- **Bilingual Names**: Thai and English station identification
### **Metrics Collected**
- 🌊 **Water Level**: Measured in meters (m)
- 💧 **Discharge**: Measured in cubic meters per second (cms)
- 📊 **Discharge Percentage**: Relative to station capacity
-**Timestamp**: Hour 24 handling (midnight = 00:00 next day)
### **Data Frequency**
- **Every 15 Minutes**: Continuous monitoring
- **~300+ Data Points**: Per collection cycle
- **Automatic Gap Filling**: Historical data recovery
- **Data Updates**: Changed values detection and correction
## 🔧 **Command Line Tools**
### **Main Application**
```bash
python water_scraper_v3.py # Run continuous monitoring
python water_scraper_v3.py --test # Single test cycle
python water_scraper_v3.py --help # Show help
```
### **Gap Management**
```bash
python water_scraper_v3.py --check-gaps [days] # Check for missing data
python water_scraper_v3.py --fill-gaps [days] # Fill missing data gaps
python water_scraper_v3.py --update-data [days] # Update existing data
```
### **Database Testing**
```bash
python demo_databases.py # SQLite demo
python demo_databases.py victoriametrics # VictoriaMetrics demo
python demo_databases.py all # Test all databases
```
## 📈 **Monitoring & Visualization**
### **Grafana Dashboard**
- **URL**: http://localhost:3000 (when using docker-compose)
- **Username**: admin
- **Password**: admin_password
- **Features**: Time series charts, status tables, gauges, alerts
### **VictoriaMetrics API**
- **URL**: http://localhost:8428
- **Health**: http://localhost:8428/health
- **Metrics**: http://localhost:8428/metrics
- **Query API**: http://localhost:8428/api/v1/query
## 🛡️ **Security & Production**
### **HTTPS Configuration**
- Complete guide in `HTTPS_CONFIGURATION.md`
- SSL certificate setup
- Reverse proxy configuration
- Security best practices
### **Deployment Options**
- **Docker**: Containerized deployment
- **Systemd**: Linux service configuration
- **Cloud**: AWS, GCP, Azure deployment guides
- **Monitoring**: Health checks and alerting
## 📚 **Documentation**
### **Available Guides**
1. **README.md** - Main project documentation
2. **DATABASE_DEPLOYMENT_GUIDE.md** - Multi-database setup
3. **ENHANCED_SCHEDULER_GUIDE.md** - 15-minute scheduler details
4. **GAP_FILLING_GUIDE.md** - Data integrity and gap filling
5. **DEBIAN_TROUBLESHOOTING.md** - Linux deployment troubleshooting
6. **VICTORIAMETRICS_SETUP.md** - VictoriaMetrics configuration
7. **HTTPS_CONFIGURATION.md** - Secure deployment setup
### **Key Features Documented**
- ✅ Installation and configuration
- ✅ Multi-database support
- ✅ 15-minute scheduling system
- ✅ Gap filling and data integrity
- ✅ Production deployment
- ✅ Monitoring and troubleshooting
- ✅ Security configuration
## 🎯 **Project Status: PRODUCTION READY**
The Thailand Water Monitor is now:
-**Clean**: All old and redundant files removed
-**Organized**: Clear project structure with proper documentation
-**Enhanced**: 15-minute scheduling with gap filling
-**Scalable**: Multi-database support with VictoriaMetrics
-**Secure**: HTTPS configuration and security best practices
-**Monitored**: Comprehensive logging and Grafana dashboards
-**Documented**: Complete guides for all features and deployment options
The project is ready for production deployment with professional-grade monitoring capabilities.

272
docs/PROJECT_STRUCTURE.md Normal file
View File

@@ -0,0 +1,272 @@
# 🏗️ Project Structure - Northern Thailand Ping River Monitor
## 📁 Directory Layout
```
Northern-Thailand-Ping-River-Monitor/
├── 📁 src/ # Main application source code
│ ├── __init__.py # Package initialization
│ ├── main.py # CLI entry point and main application
│ ├── water_scraper_v3.py # Core data collection engine
│ ├── web_api.py # FastAPI web interface
│ ├── config.py # Configuration management
│ ├── database_adapters.py # Database abstraction layer
│ ├── models.py # Data models and type definitions
│ ├── exceptions.py # Custom exception classes
│ ├── validators.py # Data validation layer
│ ├── metrics.py # Metrics collection system
│ ├── health_check.py # Health monitoring system
│ ├── rate_limiter.py # Rate limiting and request tracking
│ └── logging_config.py # Enhanced logging configuration
├── 📁 docs/ # Documentation files
│ ├── STATION_MANAGEMENT_GUIDE.md # Station management documentation
│ ├── ENHANCEMENT_SUMMARY.md # Feature enhancement summary
│ └── PROJECT_STRUCTURE.md # This file
├── 📁 scripts/ # Utility scripts
│ └── migrate_geolocation.py # Database migration script
├── 📁 grafana/ # Grafana configuration
│ ├── dashboards/ # Dashboard definitions
│ └── provisioning/ # Grafana provisioning config
├── 📁 tests/ # Test files
│ ├── test_integration.py # Integration test suite
│ ├── test_station_management.py # Station management tests
│ └── test_api.py # API endpoint tests
├── 📄 run.py # Simple startup script
├── 📄 requirements.txt # Production dependencies
├── 📄 requirements-dev.txt # Development dependencies
├── 📄 setup.py # Package installation script
├── 📄 Dockerfile # Docker container definition
├── 📄 docker-compose.victoriametrics.yml # Complete stack deployment
├── 📄 Makefile # Common development tasks
├── 📄 .env.example # Environment configuration template
├── 📄 .gitignore # Git ignore patterns
├── 📄 .gitlab-ci.yml # CI/CD pipeline configuration
├── 📄 LICENSE # MIT license
├── 📄 README.md # Main project documentation
└── 📄 CONTRIBUTING.md # Contribution guidelines
```
## 🔧 Core Components
### **Application Layer**
- **`src/main.py`** - Command-line interface and application orchestration
- **`src/web_api.py`** - FastAPI web interface with REST endpoints
- **`src/water_scraper_v3.py`** - Core data collection and processing engine
### **Data Layer**
- **`src/database_adapters.py`** - Multi-database support (SQLite, MySQL, PostgreSQL, InfluxDB, VictoriaMetrics)
- **`src/models.py`** - Pydantic data models and type definitions
- **`src/validators.py`** - Data validation and sanitization
### **Infrastructure Layer**
- **`src/config.py`** - Configuration management with environment variable support
- **`src/logging_config.py`** - Structured logging with rotation and colors
- **`src/metrics.py`** - Application metrics collection (counters, gauges, histograms)
- **`src/health_check.py`** - System health monitoring and status checks
### **Utility Layer**
- **`src/exceptions.py`** - Custom exception hierarchy
- **`src/rate_limiter.py`** - API rate limiting and request tracking
## 🌐 Web API Structure
### **Endpoints Organization**
```
/ # Dashboard homepage
├── /health # System health status
├── /metrics # Application metrics
├── /config # Configuration (masked)
├── /stations # Station management
│ ├── GET / # List all stations
│ ├── POST / # Create new station
│ ├── GET /{id} # Get specific station
│ ├── PUT /{id} # Update station
│ └── DELETE /{id} # Delete station
├── /measurements # Data access
│ ├── /latest # Latest measurements
│ └── /station/{code} # Station-specific data
└── /scraping # Data collection control
├── /trigger # Manual data collection
└── /status # Scraping status
```
### **API Models**
- **Request Models**: Station creation/update, query parameters
- **Response Models**: Station info, measurements, health status
- **Error Models**: Standardized error responses
## 🗄️ Database Architecture
### **Supported Databases**
1. **SQLite** - Local development and testing
2. **MySQL** - Traditional relational database
3. **PostgreSQL** - Advanced relational with TimescaleDB support
4. **InfluxDB** - Purpose-built time-series database
5. **VictoriaMetrics** - High-performance metrics storage
### **Schema Design**
```sql
-- Stations table
stations (
id INTEGER PRIMARY KEY,
station_code VARCHAR(10) UNIQUE,
thai_name VARCHAR(255),
english_name VARCHAR(255),
latitude DECIMAL(10,8),
longitude DECIMAL(11,8),
geohash VARCHAR(20),
status VARCHAR(20),
created_at TIMESTAMP,
updated_at TIMESTAMP
)
-- Measurements table
water_measurements (
id BIGINT PRIMARY KEY,
timestamp DATETIME,
station_id INTEGER,
water_level DECIMAL(10,3),
discharge DECIMAL(10,2),
discharge_percent DECIMAL(5,2),
status VARCHAR(20),
created_at TIMESTAMP,
FOREIGN KEY (station_id) REFERENCES stations(id),
UNIQUE(timestamp, station_id)
)
```
## 🐳 Docker Architecture
### **Multi-Stage Build**
1. **Builder Stage** - Compile dependencies and build artifacts
2. **Production Stage** - Minimal runtime environment
### **Service Composition**
- **ping-river-monitor** - Data collection service
- **ping-river-api** - Web API service
- **victoriametrics** - Time-series database
- **grafana** - Visualization dashboard
## 📊 Monitoring Architecture
### **Metrics Collection**
- **Counters** - API requests, database operations, scraping cycles
- **Gauges** - Current values, connection status, resource usage
- **Histograms** - Response times, processing durations
### **Health Checks**
- **Database Health** - Connection status, data freshness
- **API Health** - External API availability, response times
- **System Health** - Memory usage, disk space, CPU load
### **Logging Levels**
- **DEBUG** - Detailed execution information
- **INFO** - General operational messages
- **WARNING** - Potential issues and recoverable errors
- **ERROR** - Serious problems requiring attention
- **CRITICAL** - System-threatening issues
## 🔧 Configuration Management
### **Environment Variables**
```bash
# Database
DB_TYPE=victoriametrics
VM_HOST=localhost
VM_PORT=8428
# Application
SCRAPING_INTERVAL_HOURS=1
LOG_LEVEL=INFO
DATA_RETENTION_DAYS=365
# Security
SECRET_KEY=your-secret-key
API_KEY=your-api-key
```
### **Configuration Hierarchy**
1. Environment variables (highest priority)
2. .env file
3. Default values in config.py (lowest priority)
## 🧪 Testing Architecture
### **Test Categories**
- **Unit Tests** - Individual component testing
- **Integration Tests** - System component interaction
- **API Tests** - Endpoint functionality and responses
- **Performance Tests** - Load and stress testing
### **Test Data**
- **Mock Data** - Simulated API responses
- **Test Database** - Isolated test environment
- **Fixtures** - Reusable test data sets
## 📦 Deployment Architecture
### **Development**
```bash
python run.py --web-api # Local development server
```
### **Production**
```bash
docker-compose up -d # Full stack deployment
```
### **CI/CD Pipeline**
1. **Test Stage** - Run all tests and quality checks
2. **Build Stage** - Create Docker images
3. **Deploy Stage** - Deploy to staging/production
4. **Health Check** - Verify deployment success
## 🔒 Security Architecture
### **Input Validation**
- Pydantic models for API requests
- Data range validation for measurements
- SQL injection prevention through ORM
### **Authentication** (Future)
- API key authentication
- JWT token support
- Role-based access control
### **Data Protection**
- Environment variable configuration
- Sensitive data masking in logs
- HTTPS support for production
## 📈 Performance Architecture
### **Optimization Strategies**
- Database connection pooling
- Query optimization and indexing
- Response caching for static data
- Async processing for I/O operations
### **Scalability Considerations**
- Horizontal scaling with load balancers
- Database read replicas
- Microservice architecture readiness
- Container orchestration support
## 🔄 Data Flow Architecture
### **Collection Flow**
```
External API → Rate Limiter → Data Validator → Database Adapter → Database
```
### **API Flow**
```
HTTP Request → FastAPI → Business Logic → Database Adapter → HTTP Response
```
### **Monitoring Flow**
```
Application Events → Metrics Collector → Health Checks → Monitoring Dashboard
```
This architecture provides a solid foundation for a production-ready water monitoring system with excellent maintainability, scalability, and observability.

View File

@@ -0,0 +1,241 @@
# 🏔️ Station Management Guide - Northern Thailand Ping River Monitor
## 🎯 **Overview**
The Northern Thailand Ping River Monitor now includes comprehensive station management capabilities, allowing you to dynamically add, update, and remove monitoring stations through the web API.
## 🌊 **Current Coverage**
The system currently monitors **16 water stations** along the Ping River Basin:
### **Upper Ping River (Chiang Mai Province)**
- **P.20** - Ban Chiang Dao (บ้านเชียงดาว)
- **P.75** - Ban Chai Lat (บ้านช่อแล)
- **P.92** - Ban Muang Aut (บ้านเมืองกึ๊ด)
- **P.4A** - Ban Mae Taeng (บ้านแม่แตง)
- **P.67** - Ban Tae (บ้านแม่แต)
- **P.21** - Ban Rim Tai (บ้านริมใต้)
- **P.103** - Ring Bridge 3 (สะพานวงแหวนรอบ 3)
### **Middle Ping River**
- **P.1** - Nawarat Bridge (สะพานนวรัฐ) - *Main reference station*
- **P.82** - Ban Sob win (บ้านสบวิน)
- **P.84** - Ban Panton (บ้านพันตน)
- **P.81** - Ban Pong (บ้านโป่ง)
- **P.5** - Tha Nang Bridge (สะพานท่านาง)
### **Lower Ping River**
- **P.77** - Baan Sop Mae Sapuord (บ้านสบแม่สะป๊วด)
- **P.87** - Ban Pa Sang (บ้านป่าซาง)
- **P.76** - Banb Mae I Hai (บ้านแม่อีไฮ)
- **P.85** - Baan Lai Kaew (บ้านหล่ายแก้ว)
## 🔧 **Station Management API**
### **List All Stations**
```bash
GET /stations
```
**Response:**
```json
[
{
"station_id": 1,
"station_code": "P.20",
"thai_name": "บ้านเชียงดาว",
"english_name": "Ban Chiang Dao",
"latitude": 19.36731448032191,
"longitude": 98.9688487015384,
"geohash": null,
"status": "active"
}
]
```
### **Get Specific Station**
```bash
GET /stations/{station_id}
```
### **Add New Station**
```bash
POST /stations
Content-Type: application/json
{
"station_code": "P.NEW",
"thai_name": "สถานีใหม่",
"english_name": "New Station",
"latitude": 18.7875,
"longitude": 99.0045,
"geohash": "w5q6uuhvfcfp25",
"status": "active"
}
```
### **Update Station Information**
```bash
PUT /stations/{station_id}
Content-Type: application/json
{
"thai_name": "ชื่อใหม่",
"english_name": "Updated Name",
"latitude": 18.8000,
"longitude": 99.0100
}
```
### **Delete Station**
```bash
DELETE /stations/{station_id}
```
## 🧪 **Testing Station Management**
Use the provided test script to verify station management functionality:
```bash
# Test all station management endpoints
python test_station_management.py
```
This will:
1. List existing stations
2. Create a test station
3. Retrieve station details
4. Update station information
5. Verify changes
6. Delete the test station
7. Confirm deletion
## 📊 **Station Data Model**
### **Required Fields**
- `station_code`: Unique identifier (e.g., "P.1", "P.20")
- `thai_name`: Thai language name
- `english_name`: English language name
### **Optional Fields**
- `latitude`: GPS latitude coordinate (-90 to 90)
- `longitude`: GPS longitude coordinate (-180 to 180)
- `geohash`: Geohash string for location
- `status`: Station status ("active", "inactive", "maintenance", "error")
### **Validation Rules**
- Station codes must be unique
- Latitude must be between -90 and 90
- Longitude must be between -180 and 180
- Names cannot be empty
- Status must be valid enum value
## 🌐 **Web Interface**
Access the station management interface through the web dashboard:
1. **Start the API server:**
```bash
python run.py --web-api
```
2. **Open your browser:**
- Dashboard: http://localhost:8000
- API Documentation: http://localhost:8000/docs
3. **Use the interactive API docs** to test station management endpoints
## 🔄 **Integration with Data Collection**
- **Dynamic Station Discovery**: New stations are automatically included in data collection
- **Real-time Updates**: Station information changes are reflected immediately
- **Data Continuity**: Historical data is preserved when updating station details
- **Error Handling**: Invalid stations are skipped during data collection
## 📍 **Geographic Coverage**
The Ping River Basin monitoring network covers:
- **Total Distance**: ~400 km from Chiang Dao to Nakhon Sawan
- **Elevation Range**: 300m to 1,200m above sea level
- **Catchment Area**: ~25,000 km²
- **Major Cities**: Chiang Mai, Lamphun, Tak, Nakhon Sawan
## 🚀 **Usage Examples**
### **Add a New Upstream Station**
```bash
curl -X POST "http://localhost:8000/stations" \
-H "Content-Type: application/json" \
-d '{
"station_code": "P.UPSTREAM",
"thai_name": "สถานีต้นน้ำ",
"english_name": "Upstream Station",
"latitude": 19.5000,
"longitude": 98.9000,
"status": "active"
}'
```
### **Update Station Coordinates**
```bash
curl -X PUT "http://localhost:8000/stations/1" \
-H "Content-Type: application/json" \
-d '{
"latitude": 19.3700,
"longitude": 98.9700
}'
```
### **Mark Station for Maintenance**
```bash
curl -X PUT "http://localhost:8000/stations/5" \
-H "Content-Type: application/json" \
-d '{
"status": "maintenance"
}'
```
## 🔒 **Best Practices**
### **Station Naming**
- Use consistent code format (P.XX)
- Include both Thai and English names
- Use descriptive location names
### **Coordinate Accuracy**
- Use high-precision GPS coordinates (6+ decimal places)
- Verify coordinates match actual station location
- Include geohash for efficient spatial queries
### **Status Management**
- Set status to "maintenance" during repairs
- Use "inactive" for temporarily offline stations
- Use "error" for stations with data quality issues
### **Data Integrity**
- Test new stations before adding to production
- Backup station configuration before major changes
- Monitor data quality after station updates
## 🎯 **Future Enhancements**
Planned improvements for station management:
1. **Bulk Operations** - Import/export multiple stations
2. **Station Groups** - Organize stations by river section
3. **Automated Validation** - GPS coordinate verification
4. **Historical Tracking** - Track station configuration changes
5. **Alert Integration** - Notifications for station status changes
6. **Map Interface** - Visual station management on interactive map
## 📞 **Support**
For station management issues:
1. Check the API documentation at `/docs`
2. Run the test script: `python test_station_management.py`
3. Review logs for error details
4. Verify station data format and validation rules
The station management system provides flexible control over your monitoring network while maintaining data integrity and system reliability.

View File

@@ -0,0 +1,443 @@
# VictoriaMetrics Setup Guide for Thailand Water Monitor
This guide provides comprehensive instructions for setting up VictoriaMetrics as the time-series database backend for the Thailand Water Monitor.
## Why VictoriaMetrics?
VictoriaMetrics is an excellent choice for water monitoring data because:
- **High Performance**: Up to 10x faster than InfluxDB
- **Low Resource Usage**: Uses 10x less RAM than Prometheus
- **Better Compression**: 70x better compression than Prometheus
- **Prometheus Compatible**: Drop-in replacement for Prometheus
- **Easy to Deploy**: Single binary, no dependencies
- **Cost Effective**: Open source with commercial support available
## Quick Start
### 1. Environment Variables
Set these environment variables to configure VictoriaMetrics:
```bash
# Windows (PowerShell)
$env:DB_TYPE="victoriametrics"
$env:VM_HOST="localhost"
$env:VM_PORT="8428"
# Linux/Mac
export DB_TYPE=victoriametrics
export VM_HOST=localhost
export VM_PORT=8428
```
### 2. Start VictoriaMetrics with Docker
```bash
# Simple setup
docker run -d \
--name victoriametrics \
-p 8428:8428 \
-v victoria-metrics-data:/victoria-metrics-data \
victoriametrics/victoria-metrics:latest \
--storageDataPath=/victoria-metrics-data \
--retentionPeriod=2y \
--httpListenAddr=:8428
# Verify it's running
curl http://localhost:8428/health
```
### 3. Run the Water Monitor
```bash
python water_scraper_v3.py
```
### 4. Access Grafana Dashboard
```bash
# Start with Docker Compose (includes Grafana)
docker-compose -f docker-compose.victoriametrics.yml up -d
# Access Grafana at http://localhost:3000
# Username: admin
# Password: admin_password
```
## Production Setup
### Docker Compose Configuration
Use the provided `docker-compose.victoriametrics.yml` file:
```bash
# Start the complete stack
docker-compose -f docker-compose.victoriametrics.yml up -d
# Check status
docker-compose -f docker-compose.victoriametrics.yml ps
# View logs
docker-compose -f docker-compose.victoriametrics.yml logs -f
```
### Manual VictoriaMetrics Configuration
#### High-Performance Configuration
```bash
docker run -d \
--name victoriametrics \
-p 8428:8428 \
-v victoria-metrics-data:/victoria-metrics-data \
victoriametrics/victoria-metrics:latest \
--storageDataPath=/victoria-metrics-data \
--retentionPeriod=2y \
--httpListenAddr=:8428 \
--maxConcurrentInserts=32 \
--search.maxQueryDuration=60s \
--search.maxConcurrentRequests=16 \
--dedup.minScrapeInterval=30s \
--memory.allowedPercent=80 \
--loggerLevel=INFO \
--loggerFormat=json \
--search.maxSeries=1000000 \
--search.maxPointsPerTimeseries=100000
```
#### Configuration Parameters Explained
| Parameter | Description | Recommended Value |
|-----------|-------------|-------------------|
| `--storageDataPath` | Data storage directory | `/victoria-metrics-data` |
| `--retentionPeriod` | How long to keep data | `2y` (2 years) |
| `--httpListenAddr` | HTTP listen address | `:8428` |
| `--maxConcurrentInserts` | Max concurrent inserts | `32` |
| `--search.maxQueryDuration` | Max query duration | `60s` |
| `--search.maxConcurrentRequests` | Max concurrent queries | `16` |
| `--dedup.minScrapeInterval` | Deduplication interval | `30s` |
| `--memory.allowedPercent` | Max memory usage | `80` |
| `--loggerLevel` | Log level | `INFO` |
| `--search.maxSeries` | Max time series | `1000000` |
### Kubernetes Deployment
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: victoriametrics
spec:
replicas: 1
selector:
matchLabels:
app: victoriametrics
template:
metadata:
labels:
app: victoriametrics
spec:
containers:
- name: victoriametrics
image: victoriametrics/victoria-metrics:latest
ports:
- containerPort: 8428
args:
- --storageDataPath=/victoria-metrics-data
- --retentionPeriod=2y
- --httpListenAddr=:8428
- --maxConcurrentInserts=32
volumeMounts:
- name: storage
mountPath: /victoria-metrics-data
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2000m"
volumes:
- name: storage
persistentVolumeClaim:
claimName: victoriametrics-pvc
---
apiVersion: v1
kind: Service
metadata:
name: victoriametrics
spec:
selector:
app: victoriametrics
ports:
- port: 8428
targetPort: 8428
type: ClusterIP
```
## Data Queries
### HTTP API Queries
VictoriaMetrics provides a Prometheus-compatible HTTP API:
```bash
# Current water levels for all stations
curl "http://localhost:8428/api/v1/query?query=water_level"
# Water levels for specific station
curl "http://localhost:8428/api/v1/query?query=water_level{station_code=\"P.1\"}"
# Average discharge over last hour
curl "http://localhost:8428/api/v1/query?query=avg_over_time(water_discharge[1h])"
# High discharge alerts (>80%)
curl "http://localhost:8428/api/v1/query?query=water_discharge_percent>80"
# Time range query (last 6 hours)
START=$(date -d '6 hours ago' +%s)
END=$(date +%s)
curl "http://localhost:8428/api/v1/query_range?query=water_level&start=${START}&end=${END}&step=300"
```
### PromQL Examples
```promql
# Current water levels
water_level
# Water level trends (last 24h)
water_level[24h]
# Discharge rates by station
water_discharge{station_code="P.1"}
# Average discharge across all stations
avg(water_discharge)
# Stations with high discharge (>80%)
water_discharge_percent > 80
# Rate of change in water level
rate(water_level[5m])
# Maximum water level in last hour
max_over_time(water_level[1h])
# Stations with increasing water levels
increase(water_level[1h]) > 0
```
## Grafana Integration
### Data Source Configuration
1. **Add VictoriaMetrics as Prometheus Data Source**:
- URL: `http://localhost:8428` (or `http://victoriametrics:8428` in Docker)
- Access: Server (default)
- HTTP Method: POST
2. **Import Dashboard**:
- Use the provided `water-monitoring-dashboard.json`
- Or create custom dashboards with the queries above
### Dashboard Panels
The included dashboard provides:
- **Time Series**: Water levels and discharge over time
- **Table**: Current status of all stations
- **Pie Chart**: Discharge percentage distribution
- **Gauge**: Average discharge percentage
- **Variables**: Filter by station
## Monitoring and Maintenance
### Health Checks
```bash
# Check VictoriaMetrics health
curl http://localhost:8428/health
# Check metrics endpoint
curl http://localhost:8428/metrics
# Check configuration
curl http://localhost:8428/api/v1/status/config
```
### Performance Monitoring
```bash
# Query performance stats
curl http://localhost:8428/api/v1/status/tsdb
# Memory usage
curl http://localhost:8428/api/v1/status/runtime
# Active queries
curl http://localhost:8428/api/v1/status/active_queries
```
### Backup and Restore
```bash
# Create backup
docker exec victoriametrics /usr/bin/vmbackup \
-storageDataPath=/victoria-metrics-data \
-dst=fs:///backup/$(date +%Y%m%d)
# Restore from backup
docker exec victoriametrics /usr/bin/vmrestore \
-src=fs:///backup/20250724 \
-storageDataPath=/victoria-metrics-data
```
### Log Analysis
```bash
# View logs
docker logs victoriametrics
# Follow logs
docker logs -f victoriametrics
# Search for errors
docker logs victoriametrics 2>&1 | grep ERROR
```
## Troubleshooting
### Common Issues
1. **Connection Refused**:
```bash
# Check if VictoriaMetrics is running
docker ps | grep victoriametrics
# Check port binding
netstat -tlnp | grep 8428
```
2. **High Memory Usage**:
```bash
# Reduce memory limit
docker run ... --memory.allowedPercent=60 ...
```
3. **Slow Queries**:
```bash
# Increase query timeout
docker run ... --search.maxQueryDuration=120s ...
```
4. **Data Not Appearing**:
```bash
# Check if data is being written
curl "http://localhost:8428/api/v1/query?query=up"
# Check water monitor logs
tail -f water_monitor.log
```
### Performance Tuning
1. **For High Write Load**:
```bash
--maxConcurrentInserts=64
--insert.maxQueueDuration=60s
```
2. **For High Query Load**:
```bash
--search.maxConcurrentRequests=32
--search.maxQueryDuration=120s
```
3. **For Large Datasets**:
```bash
--search.maxSeries=10000000
--search.maxPointsPerTimeseries=1000000
```
## Security
### Authentication
VictoriaMetrics doesn't have built-in authentication. Use a reverse proxy:
```nginx
server {
listen 80;
server_name victoriametrics.example.com;
auth_basic "VictoriaMetrics";
auth_basic_user_file /etc/nginx/.htpasswd;
location / {
proxy_pass http://localhost:8428;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
```
### TLS/SSL
```bash
# Use nginx or traefik for TLS termination
# Or use VictoriaMetrics with TLS:
docker run ... \
-v /path/to/cert.pem:/cert.pem \
-v /path/to/key.pem:/key.pem \
victoriametrics/victoria-metrics:latest \
--tls \
--tlsCertFile=/cert.pem \
--tlsKeyFile=/key.pem
```
## Scaling
### Cluster Setup
For high availability and horizontal scaling:
```bash
# Start multiple VictoriaMetrics instances
docker run -d --name vm1 -p 8428:8428 victoriametrics/victoria-metrics:latest
docker run -d --name vm2 -p 8429:8428 victoriametrics/victoria-metrics:latest
# Use load balancer to distribute queries
# Use vminsert/vmselect/vmstorage for true clustering
```
### Resource Requirements
| Data Points/Hour | RAM | CPU | Storage/Day |
|------------------|-----|-----|-------------|
| 1,000 | 100MB | 0.1 CPU | 10MB |
| 10,000 | 500MB | 0.5 CPU | 100MB |
| 100,000 | 2GB | 1 CPU | 1GB |
| 1,000,000 | 8GB | 2 CPU | 10GB |
## Migration
### From InfluxDB
```bash
# Export from InfluxDB
influx -database water_monitoring -execute "SELECT * FROM water_data" -format csv > data.csv
# Import to VictoriaMetrics (convert to Prometheus format first)
# Use vmctl tool for migration
```
### From Prometheus
```bash
# Use vmctl for direct migration
vmctl prometheus --prom-snapshot=/path/to/prometheus/data --vm-addr=http://localhost:8428
```
This comprehensive setup guide should help you configure VictoriaMetrics for optimal performance with the Thailand Water Monitor system.

View File

@@ -0,0 +1,179 @@
# Notable Documents and References
This document contains important references and external resources related to the Thailand Water Level Monitoring System.
## 🌊 **Official Thai Government Water Resources**
### **Royal Irrigation Department (RID) Resources**
#### **1. Water Level Monitoring Diagram**
- **URL**: https://water.rid.go.th/hyd/Diagram/graphic_ping.pdf
- **Description**: Official diagram showing the water level monitoring network structure
- **Content**: Technical diagrams and network topology for Thailand's water monitoring system
- **Language**: Thai
- **Format**: PDF
- **Usage**: Understanding the official monitoring infrastructure and station relationships
#### **2. Hourly Water Level Data Portal**
- **URL**: https://hyd-app-db.rid.go.th/hydro1h.html
- **Description**: Real-time hourly water level data web interface
- **Content**: Live data from all 16 monitoring stations across Thailand
- **Language**: Thai
- **Format**: Web Application
- **Usage**: Primary data source for the monitoring system
- **API Endpoint**: Used by our scraper to fetch real-time data
- **Update Frequency**: Hourly updates
- **Data Points**: ~240-384 measurements per hour across all stations
#### **3. Individual Station Data - P.76 Example**
- **URL**: https://www.hydro-1.net/Data/STATION/P.76.html
- **Description**: Detailed individual station data page for station P.76
- **Content**: Historical data, station details, and specific measurements
- **Language**: Thai/English
- **Format**: Web Page
- **Usage**: Reference for individual station characteristics and historical data patterns
- **Station**: P.76 - บ้านแม่อีไฮ (Banb Mae I Hai)
## 📊 **Data Sources and APIs**
### **Primary Data Source**
- **API Endpoint**: `https://hyd-app-db.rid.go.th/webservice/getGroupHourlyWaterLevelReportAllHL.ashx`
- **Method**: POST
- **Data Format**: JSON
- **Update Schedule**: Hourly (top of each hour)
- **Coverage**: All 16 monitoring stations
- **Metrics**: Water level (m), Discharge (cms), Discharge percentage (%)
### **Station Coverage**
The system monitors 16 stations across Thailand:
- P.1 - สะพานนวรัฐ (Nawarat Bridge)
- P.5 - สะพานท่านาง (Tha Nang Bridge)
- P.20 - บ้านเชียงดาว (Ban Chiang Dao)
- P.21 - บ้านริมใต้ (Ban Rim Tai)
- P.4A - บ้านแม่แตง (Ban Mae Taeng)
- P.67 - บ้านแม่แต (Ban Tae)
- P.75 - บ้านช่อแล (Ban Chai Lat)
- P.76 - บ้านแม่อีไฮ (Banb Mae I Hai)
- P.77 - บ้านสบแม่สะป๊วด (Baan Sop Mae Sapuord)
- P.81 - บ้านโป่ง (Ban Pong)
- P.82 - บ้านสบวิน (Ban Sob win)
- P.84 - บ้านพันตน (Ban Panton)
- P.85 - บ้านหล่ายแก้ว (Baan Lai Kaew)
- P.87 - บ้านป่าซาง (Ban Pa Sang)
- P.92 - บ้านเมืองกึ๊ด (Ban Muang Aut)
- P.103 - สะพานวงแหวนรอบ 3 (Ring Bridge 3)
## 🔗 **Related Resources**
### **Technical Documentation**
- **Thai Water Resources**: https://water.rid.go.th/
- **Hydro Information Network**: https://www.hydro-1.net/
- **Royal Irrigation Department**: https://www.rid.go.th/
### **Data Standards**
- **Time Format**: Thai Buddhist calendar (BE) + 24-hour format
- **Coordinate System**: WGS84 decimal degrees
- **Water Level Units**: Meters (m)
- **Discharge Units**: Cubic meters per second (cms)
- **Update Frequency**: Hourly at :00 minutes
### **API Parameters**
```javascript
{
'DW[UtokID]': '1',
'DW[BasinID]': '6',
'DW[TimeCurrent]': 'DD/MM/YYYY', // Thai Buddhist calendar
'_search': 'false',
'nd': timestamp_milliseconds,
'rows': '100',
'page': '1',
'sidx': 'indexhourly',
'sord': 'asc'
}
```
## 📋 **Data Structure Reference**
### **JSON Response Format**
```json
{
"rows": [
{
"hourlytime": "1.00", // Hour (1-24, where 24 = midnight next day)
"wlvalues1": "2.45", // Water level for station 1 (meters)
"qvalues1": "125.3", // Discharge for station 1 (cms)
"QPercent1": "45.2", // Discharge percentage for station 1
"wlvalues2": "1.89", // Station 2 data...
// ... continues for all 16 stations
}
]
}
```
### **Station ID Mapping**
- Station 1 → P.20 (Ban Chiang Dao)
- Station 2 → P.75 (Ban Chai Lat)
- Station 3 → P.92 (Ban Muang Aut)
- Station 4 → P.4A (Ban Mae Taeng)
- Station 5 → P.67 (Ban Tae)
- Station 6 → P.21 (Ban Rim Tai)
- Station 7 → P.103 (Ring Bridge 3)
- Station 8 → P.1 (Nawarat Bridge)
- Station 9 → P.82 (Ban Sob win)
- Station 10 → P.84 (Ban Panton)
- Station 11 → P.81 (Ban Pong)
- Station 12 → P.5 (Tha Nang Bridge)
- Station 13 → P.77 (Baan Sop Mae Sapuord)
- Station 14 → P.87 (Ban Pa Sang)
- Station 15 → P.76 (Banb Mae I Hai)
- Station 16 → P.85 (Baan Lai Kaew)
## 🌐 **Geolocation Reference**
### **Sample Coordinates (P.1 - Nawarat Bridge)**
- **Latitude**: 15.6944°N
- **Longitude**: 100.2028°E
- **Geohash**: w5q6uuhvfcfp25
- **Location**: Nakhon Sawan Province, Thailand
### **Coordinate System**
- **Datum**: WGS84
- **Format**: Decimal degrees
- **Precision**: 4 decimal places (~11m accuracy)
- **Usage**: Grafana geomap visualization
## 📝 **Usage Notes**
### **Data Collection**
- **Frequency**: Every 15 minutes (full check at :00, quick checks at :15, :30, :45)
- **Retention**: 2+ years of historical data
- **Gap Filling**: Automatic detection and filling of missing data
- **Data Updates**: Checks for changed values in recent data
### **Time Handling**
- **Thai Time**: UTC+7 (Asia/Bangkok)
- **Buddhist Calendar**: Thai year = Gregorian year + 543
- **Hour 24**: Represents midnight (00:00) of the next day
- **API Format**: DD/MM/YYYY (Buddhist calendar)
### **Data Quality**
- **Validation**: Automatic data validation and error detection
- **Retry Logic**: 15-minute retry intervals when data is unavailable
- **Error Handling**: Comprehensive error logging and recovery
- **Monitoring**: Health checks and alert conditions
## 🔍 **Research and Development**
### **Future Enhancements**
- **Additional Stations**: Potential expansion to more monitoring points
- **Real-time Alerts**: Threshold-based notification system
- **Predictive Analytics**: Water level forecasting capabilities
- **Mobile Integration**: Field data collection and verification
### **Technical Improvements**
- **API Optimization**: Enhanced data fetching efficiency
- **Database Performance**: Query optimization and indexing
- **Visualization**: Advanced Grafana dashboard features
- **Integration**: Connection with other water management systems
This document serves as a comprehensive reference for understanding the data sources, technical specifications, and official resources that support the Thailand Water Level Monitoring System.