Initial commit: Northern Thailand Ping River Monitor v3.1.0

Features: - Real-time water level monitoring for Ping River Basin (16 stations) - Coverage from Chiang Dao to Nakhon Sawan in Northern Thailand - FastAPI web interface with interactive dashboard and station management - Multi-database support (SQLite, MySQL, PostgreSQL, InfluxDB, VictoriaMetrics) - Comprehensive monitoring with health checks and metrics collection - Docker deployment with Grafana integration - Production-ready architecture with enterprise-grade observability CI/CD & Automation: - Complete Gitea Actions workflows for CI/CD, security, and releases - Multi-Python version testing (3.9-3.12) - Multi-architecture Docker builds (amd64, arm64) - Daily security scanning and dependency monitoring - Automated documentation generation - Performance testing and validation Production Ready: - Type safety with Pydantic models and comprehensive type hints - Data validation layer with range checking and error handling - Rate limiting and request tracking for API protection - Enhanced logging with rotation, colors, and performance metrics - Station management API for dynamic CRUD operations - Comprehensive documentation and deployment guides Technical Stack: - Python 3.9+ with FastAPI and Pydantic - Multi-database architecture with adapter pattern - Docker containerization with multi-stage builds - Grafana dashboards for visualization - Gitea Actions for CI/CD automation - Enterprise monitoring and alerting Ready for deployment to B4L infrastructure!
2025-08-12 15:37:09 +07:00
commit af62cfef0b
60 changed files with 13267 additions and 0 deletions
--- a/docs/DATABASE_DEPLOYMENT_GUIDE.md
+++ b/docs/DATABASE_DEPLOYMENT_GUIDE.md
@@ -0,0 +1,447 @@
+# Database Deployment Guide for Thailand Water Monitor
+
+This guide covers deployment options for storing water monitoring data in production environments.
+
+## 🏆 Recommendation Summary
+
+| Database | Best For | Performance | Complexity | Cost |
+|----------|----------|-------------|------------|------|
+| **InfluxDB** | Time-series data, dashboards | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ |
+| **VictoriaMetrics** | High-performance metrics | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
+| **PostgreSQL** | Complex queries, reliability | ⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐ |
+| **MySQL** | Familiar, existing infrastructure | ⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐ |
+
+## 1. InfluxDB Deployment (Recommended for Time-Series)
+
+### Why InfluxDB?
+- **Purpose-built** for time-series data
+- **Excellent compression** (10:1 typical ratio)
+- **Built-in retention policies** and downsampling
+- **Great Grafana integration** for dashboards
+- **High write throughput** (100k+ points/second)
+
+### Docker Deployment
+
+```yaml
+# docker-compose.yml
+version: '3.8'
+services:
+  influxdb:
+    image: influxdb:1.8
+    container_name: water_influxdb
+    ports:
+      - "8086:8086"
+    volumes:
+      - influxdb_data:/var/lib/influxdb
+      - ./influxdb.conf:/etc/influxdb/influxdb.conf:ro
+    environment:
+      - INFLUXDB_DB=water_monitoring
+      - INFLUXDB_ADMIN_USER=admin
+      - INFLUXDB_ADMIN_PASSWORD=your_secure_password
+      - INFLUXDB_USER=water_user
+      - INFLUXDB_USER_PASSWORD=water_password
+    restart: unless-stopped
+
+  grafana:
+    image: grafana/grafana:latest
+    container_name: water_grafana
+    ports:
+      - "3000:3000"
+    volumes:
+      - grafana_data:/var/lib/grafana
+    environment:
+      - GF_SECURITY_ADMIN_PASSWORD=admin_password
+    restart: unless-stopped
+
+volumes:
+  influxdb_data:
+  grafana_data:
+```
+
+### Environment Variables
+```bash
+# .env file
+DB_TYPE=influxdb
+INFLUX_HOST=localhost
+INFLUX_PORT=8086
+INFLUX_DATABASE=water_monitoring
+INFLUX_USERNAME=water_user
+INFLUX_PASSWORD=water_password
+```
+
+### InfluxDB Configuration
+```toml
+# influxdb.conf
+[meta]
+  dir = "/var/lib/influxdb/meta"
+
+[data]
+  dir = "/var/lib/influxdb/data"
+  wal-dir = "/var/lib/influxdb/wal"
+  
+  # Optimize for time-series data
+  cache-max-memory-size = "1g"
+  cache-snapshot-memory-size = "25m"
+  cache-snapshot-write-cold-duration = "10m"
+  
+  # Retention and compression
+  compact-full-write-cold-duration = "4h"
+  max-series-per-database = 1000000
+  max-values-per-tag = 100000
+
+[coordinator]
+  write-timeout = "10s"
+  max-concurrent-queries = 0
+  query-timeout = "0s"
+
+[retention]
+  enabled = true
+  check-interval = "30m"
+
+[http]
+  enabled = true
+  bind-address = ":8086"
+  auth-enabled = true
+  max-body-size = "25000000"
+  max-concurrent-requests = 0
+  max-enqueued-requests = 0
+```
+
+### Production Setup Commands
+```bash
+# Start services
+docker-compose up -d
+
+# Create retention policies
+docker exec -it water_influxdb influx -username admin -password your_secure_password -execute "
+CREATE RETENTION POLICY \"raw_data\" ON \"water_monitoring\" DURATION 90d REPLICATION 1 DEFAULT;
+CREATE RETENTION POLICY \"downsampled\" ON \"water_monitoring\" DURATION 730d REPLICATION 1;
+"
+
+# Create continuous queries for downsampling
+docker exec -it water_influxdb influx -username admin -password your_secure_password -execute "
+CREATE CONTINUOUS QUERY \"downsample_hourly\" ON \"water_monitoring\" 
+BEGIN 
+  SELECT mean(water_level) AS water_level, mean(discharge) AS discharge, mean(discharge_percent) AS discharge_percent
+  INTO \"downsampled\".\"water_data_hourly\" 
+  FROM \"water_data\" 
+  GROUP BY time(1h), station_code, station_name_en, station_name_th
+END
+"
+```
+
+## 2. VictoriaMetrics Deployment (High Performance)
+
+### Why VictoriaMetrics?
+- **Extremely fast** and resource-efficient
+- **Better compression** than InfluxDB
+- **Prometheus-compatible** API
+- **Lower memory usage**
+- **Built-in clustering**
+
+### Docker Deployment
+```yaml
+# docker-compose.yml
+version: '3.8'
+services:
+  victoriametrics:
+    image: victoriametrics/victoria-metrics:latest
+    container_name: water_victoriametrics
+    ports:
+      - "8428:8428"
+    volumes:
+      - vm_data:/victoria-metrics-data
+    command:
+      - '--storageDataPath=/victoria-metrics-data'
+      - '--retentionPeriod=2y'
+      - '--httpListenAddr=:8428'
+      - '--maxConcurrentInserts=16'
+    restart: unless-stopped
+
+volumes:
+  vm_data:
+```
+
+### Environment Variables
+```bash
+# .env file
+DB_TYPE=victoriametrics
+VM_HOST=localhost
+VM_PORT=8428
+```
+
+## 3. PostgreSQL Deployment (Relational + Time-Series)
+
+### Why PostgreSQL?
+- **Mature and reliable**
+- **Excellent for complex queries**
+- **TimescaleDB extension** for time-series optimization
+- **Strong consistency guarantees**
+- **Rich ecosystem**
+
+### Docker Deployment with TimescaleDB
+```yaml
+# docker-compose.yml
+version: '3.8'
+services:
+  postgres:
+    image: timescale/timescaledb:latest-pg14
+    container_name: water_postgres
+    ports:
+      - "5432:5432"
+    volumes:
+      - postgres_data:/var/lib/postgresql/data
+      - ./init.sql:/docker-entrypoint-initdb.d/init.sql
+    environment:
+      - POSTGRES_DB=water_monitoring
+      - POSTGRES_USER=water_user
+      - POSTGRES_PASSWORD=secure_password
+    restart: unless-stopped
+
+volumes:
+  postgres_data:
+```
+
+### Database Initialization
+```sql
+-- init.sql
+CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
+
+-- Create hypertable for time-series optimization
+CREATE TABLE water_measurements (
+    id BIGSERIAL PRIMARY KEY,
+    timestamp TIMESTAMPTZ NOT NULL,
+    station_id INT NOT NULL,
+    water_level NUMERIC(10,3),
+    discharge NUMERIC(10,2),
+    discharge_percent NUMERIC(5,2),
+    status VARCHAR(20) DEFAULT 'active',
+    created_at TIMESTAMPTZ DEFAULT NOW()
+);
+
+-- Convert to hypertable (TimescaleDB)
+SELECT create_hypertable('water_measurements', 'timestamp', chunk_time_interval => INTERVAL '1 day');
+
+-- Create indexes
+CREATE INDEX idx_water_measurements_station_time ON water_measurements (station_id, timestamp DESC);
+CREATE INDEX idx_water_measurements_timestamp ON water_measurements (timestamp DESC);
+
+-- Create retention policy (keep raw data for 2 years)
+SELECT add_retention_policy('water_measurements', INTERVAL '2 years');
+
+-- Create continuous aggregates for performance
+CREATE MATERIALIZED VIEW water_measurements_hourly
+WITH (timescaledb.continuous) AS
+SELECT 
+    time_bucket('1 hour', timestamp) AS bucket,
+    station_id,
+    AVG(water_level) as avg_water_level,
+    MAX(water_level) as max_water_level,
+    MIN(water_level) as min_water_level,
+    AVG(discharge) as avg_discharge,
+    MAX(discharge) as max_discharge,
+    MIN(discharge) as min_discharge,
+    AVG(discharge_percent) as avg_discharge_percent
+FROM water_measurements
+GROUP BY bucket, station_id;
+
+-- Refresh policy for continuous aggregates
+SELECT add_continuous_aggregate_policy('water_measurements_hourly',
+    start_offset => INTERVAL '1 day',
+    end_offset => INTERVAL '1 hour',
+    schedule_interval => INTERVAL '1 hour');
+```
+
+### Environment Variables
+```bash
+# .env file
+DB_TYPE=postgresql
+POSTGRES_CONNECTION_STRING=postgresql://water_user:secure_password@localhost:5432/water_monitoring
+```
+
+## 4. MySQL Deployment (Traditional Relational)
+
+### Docker Deployment
+```yaml
+# docker-compose.yml
+version: '3.8'
+services:
+  mysql:
+    image: mysql:8.0
+    container_name: water_mysql
+    ports:
+      - "3306:3306"
+    volumes:
+      - mysql_data:/var/lib/mysql
+      - ./mysql.cnf:/etc/mysql/conf.d/mysql.cnf
+      - ./init.sql:/docker-entrypoint-initdb.d/init.sql
+    environment:
+      - MYSQL_ROOT_PASSWORD=root_password
+      - MYSQL_DATABASE=water_monitoring
+      - MYSQL_USER=water_user
+      - MYSQL_PASSWORD=water_password
+    restart: unless-stopped
+
+volumes:
+  mysql_data:
+```
+
+### MySQL Configuration
+```ini
+# mysql.cnf
+[mysqld]
+# Optimize for time-series data
+innodb_buffer_pool_size = 1G
+innodb_log_file_size = 256M
+innodb_flush_log_at_trx_commit = 2
+innodb_flush_method = O_DIRECT
+
+# Partitioning support
+partition = ON
+
+# Query cache
+query_cache_type = 1
+query_cache_size = 128M
+
+# Connection settings
+max_connections = 200
+connect_timeout = 10
+wait_timeout = 600
+```
+
+### Environment Variables
+```bash
+# .env file
+DB_TYPE=mysql
+MYSQL_CONNECTION_STRING=mysql://water_user:water_password@localhost:3306/water_monitoring
+```
+
+## 5. Installation and Dependencies
+
+### Required Python Packages
+```bash
+# Base requirements
+pip install requests schedule
+
+# Database-specific packages
+pip install influxdb                    # For InfluxDB
+pip install sqlalchemy pymysql         # For MySQL
+pip install sqlalchemy psycopg2-binary # For PostgreSQL
+# VictoriaMetrics uses HTTP API (no extra packages needed)
+```
+
+### Updated requirements.txt
+```txt
+requests>=2.28.0
+schedule>=1.2.0
+pandas>=1.5.0
+
+# Database adapters (install as needed)
+influxdb>=5.3.1
+sqlalchemy>=1.4.0
+pymysql>=1.0.2
+psycopg2-binary>=2.9.0
+```
+
+## 6. Production Deployment Examples
+
+### Using InfluxDB (Recommended)
+```bash
+# Set environment variables
+export DB_TYPE=influxdb
+export INFLUX_HOST=your-influx-server.com
+export INFLUX_PORT=8086
+export INFLUX_DATABASE=water_monitoring
+export INFLUX_USERNAME=water_user
+export INFLUX_PASSWORD=your_secure_password
+
+# Run the scraper
+python water_scraper_v3.py
+```
+
+### Using PostgreSQL with TimescaleDB
+```bash
+# Set environment variables
+export DB_TYPE=postgresql
+export POSTGRES_CONNECTION_STRING=postgresql://water_user:password@your-postgres-server.com:5432/water_monitoring
+
+# Run the scraper
+python water_scraper_v3.py
+```
+
+### Using VictoriaMetrics
+```bash
+# Set environment variables
+export DB_TYPE=victoriametrics
+export VM_HOST=your-vm-server.com
+export VM_PORT=8428
+
+# Run the scraper
+python water_scraper_v3.py
+```
+
+## 7. Monitoring and Alerting
+
+### Grafana Dashboard Setup
+1. **Add Data Source**: Configure your database as a Grafana data source
+2. **Import Dashboard**: Use pre-built water monitoring dashboards
+3. **Set Alerts**: Configure alerts for abnormal water levels or discharge rates
+
+### Example Grafana Queries
+
+#### InfluxDB Queries
+```sql
+-- Current water levels
+SELECT last("water_level") FROM "water_data" GROUP BY "station_code"
+
+-- Discharge trends (last 24h)
+SELECT mean("discharge") FROM "water_data" WHERE time >= now() - 24h GROUP BY time(1h), "station_code"
+```
+
+#### PostgreSQL/TimescaleDB Queries
+```sql
+-- Current water levels
+SELECT DISTINCT ON (station_id) 
+    station_id, water_level, discharge, timestamp
+FROM water_measurements 
+ORDER BY station_id, timestamp DESC;
+
+-- Hourly averages (last 24h)
+SELECT 
+    time_bucket('1 hour', timestamp) as hour,
+    station_id,
+    AVG(water_level) as avg_level,
+    AVG(discharge) as avg_discharge
+FROM water_measurements 
+WHERE timestamp >= NOW() - INTERVAL '24 hours'
+GROUP BY hour, station_id
+ORDER BY hour DESC;
+```
+
+## 8. Performance Optimization Tips
+
+### For All Databases
+- **Batch inserts**: Insert multiple measurements at once
+- **Connection pooling**: Reuse database connections
+- **Indexing**: Ensure proper indexes on timestamp and station_id
+- **Retention policies**: Automatically delete old data
+
+### InfluxDB Specific
+- Use **tags** for metadata (station codes, names)
+- Use **fields** for numeric values (water levels, discharge)
+- Configure **retention policies** and **continuous queries**
+- Enable **compression** for long-term storage
+
+### PostgreSQL/TimescaleDB Specific
+- Use **hypertables** for automatic partitioning
+- Create **continuous aggregates** for common queries
+- Configure **compression** for older chunks
+- Use **parallel queries** for large datasets
+
+### VictoriaMetrics Specific
+- Use **labels** efficiently (similar to Prometheus)
+- Configure **retention periods** appropriately
+- Use **downsampling** for long-term storage
+- Enable **deduplication** if needed
+
+This deployment guide provides production-ready configurations for all supported database backends. Choose the one that best fits your infrastructure and requirements.