Files
Northern-Thailand-Ping-Rive…/docs/DATABASE_DEPLOYMENT_GUIDE.md
grabowski af62cfef0b
Some checks failed
Security & Dependency Updates / Dependency Security Scan (push) Successful in 29s
Security & Dependency Updates / Docker Security Scan (push) Failing after 53s
Security & Dependency Updates / License Compliance (push) Successful in 13s
Security & Dependency Updates / Check for Dependency Updates (push) Successful in 19s
Security & Dependency Updates / Code Quality Metrics (push) Successful in 11s
Security & Dependency Updates / Security Summary (push) Successful in 7s
Initial commit: Northern Thailand Ping River Monitor v3.1.0
Features:
- Real-time water level monitoring for Ping River Basin (16 stations)
- Coverage from Chiang Dao to Nakhon Sawan in Northern Thailand
- FastAPI web interface with interactive dashboard and station management
- Multi-database support (SQLite, MySQL, PostgreSQL, InfluxDB, VictoriaMetrics)
- Comprehensive monitoring with health checks and metrics collection
- Docker deployment with Grafana integration
- Production-ready architecture with enterprise-grade observability

 CI/CD & Automation:
- Complete Gitea Actions workflows for CI/CD, security, and releases
- Multi-Python version testing (3.9-3.12)
- Multi-architecture Docker builds (amd64, arm64)
- Daily security scanning and dependency monitoring
- Automated documentation generation
- Performance testing and validation

 Production Ready:
- Type safety with Pydantic models and comprehensive type hints
- Data validation layer with range checking and error handling
- Rate limiting and request tracking for API protection
- Enhanced logging with rotation, colors, and performance metrics
- Station management API for dynamic CRUD operations
- Comprehensive documentation and deployment guides

 Technical Stack:
- Python 3.9+ with FastAPI and Pydantic
- Multi-database architecture with adapter pattern
- Docker containerization with multi-stage builds
- Grafana dashboards for visualization
- Gitea Actions for CI/CD automation
- Enterprise monitoring and alerting

 Ready for deployment to B4L infrastructure!
2025-08-12 15:40:24 +07:00

12 KiB

Database Deployment Guide for Thailand Water Monitor

This guide covers deployment options for storing water monitoring data in production environments.

🏆 Recommendation Summary

Database Best For Performance Complexity Cost
InfluxDB Time-series data, dashboards
VictoriaMetrics High-performance metrics
PostgreSQL Complex queries, reliability
MySQL Familiar, existing infrastructure

Why InfluxDB?

  • Purpose-built for time-series data
  • Excellent compression (10:1 typical ratio)
  • Built-in retention policies and downsampling
  • Great Grafana integration for dashboards
  • High write throughput (100k+ points/second)

Docker Deployment

# docker-compose.yml
version: '3.8'
services:
  influxdb:
    image: influxdb:1.8
    container_name: water_influxdb
    ports:
      - "8086:8086"
    volumes:
      - influxdb_data:/var/lib/influxdb
      - ./influxdb.conf:/etc/influxdb/influxdb.conf:ro
    environment:
      - INFLUXDB_DB=water_monitoring
      - INFLUXDB_ADMIN_USER=admin
      - INFLUXDB_ADMIN_PASSWORD=your_secure_password
      - INFLUXDB_USER=water_user
      - INFLUXDB_USER_PASSWORD=water_password
    restart: unless-stopped

  grafana:
    image: grafana/grafana:latest
    container_name: water_grafana
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin_password
    restart: unless-stopped

volumes:
  influxdb_data:
  grafana_data:

Environment Variables

# .env file
DB_TYPE=influxdb
INFLUX_HOST=localhost
INFLUX_PORT=8086
INFLUX_DATABASE=water_monitoring
INFLUX_USERNAME=water_user
INFLUX_PASSWORD=water_password

InfluxDB Configuration

# influxdb.conf
[meta]
  dir = "/var/lib/influxdb/meta"

[data]
  dir = "/var/lib/influxdb/data"
  wal-dir = "/var/lib/influxdb/wal"
  
  # Optimize for time-series data
  cache-max-memory-size = "1g"
  cache-snapshot-memory-size = "25m"
  cache-snapshot-write-cold-duration = "10m"
  
  # Retention and compression
  compact-full-write-cold-duration = "4h"
  max-series-per-database = 1000000
  max-values-per-tag = 100000

[coordinator]
  write-timeout = "10s"
  max-concurrent-queries = 0
  query-timeout = "0s"

[retention]
  enabled = true
  check-interval = "30m"

[http]
  enabled = true
  bind-address = ":8086"
  auth-enabled = true
  max-body-size = "25000000"
  max-concurrent-requests = 0
  max-enqueued-requests = 0

Production Setup Commands

# Start services
docker-compose up -d

# Create retention policies
docker exec -it water_influxdb influx -username admin -password your_secure_password -execute "
CREATE RETENTION POLICY \"raw_data\" ON \"water_monitoring\" DURATION 90d REPLICATION 1 DEFAULT;
CREATE RETENTION POLICY \"downsampled\" ON \"water_monitoring\" DURATION 730d REPLICATION 1;
"

# Create continuous queries for downsampling
docker exec -it water_influxdb influx -username admin -password your_secure_password -execute "
CREATE CONTINUOUS QUERY \"downsample_hourly\" ON \"water_monitoring\" 
BEGIN 
  SELECT mean(water_level) AS water_level, mean(discharge) AS discharge, mean(discharge_percent) AS discharge_percent
  INTO \"downsampled\".\"water_data_hourly\" 
  FROM \"water_data\" 
  GROUP BY time(1h), station_code, station_name_en, station_name_th
END
"

2. VictoriaMetrics Deployment (High Performance)

Why VictoriaMetrics?

  • Extremely fast and resource-efficient
  • Better compression than InfluxDB
  • Prometheus-compatible API
  • Lower memory usage
  • Built-in clustering

Docker Deployment

# docker-compose.yml
version: '3.8'
services:
  victoriametrics:
    image: victoriametrics/victoria-metrics:latest
    container_name: water_victoriametrics
    ports:
      - "8428:8428"
    volumes:
      - vm_data:/victoria-metrics-data
    command:
      - '--storageDataPath=/victoria-metrics-data'
      - '--retentionPeriod=2y'
      - '--httpListenAddr=:8428'
      - '--maxConcurrentInserts=16'
    restart: unless-stopped

volumes:
  vm_data:

Environment Variables

# .env file
DB_TYPE=victoriametrics
VM_HOST=localhost
VM_PORT=8428

3. PostgreSQL Deployment (Relational + Time-Series)

Why PostgreSQL?

  • Mature and reliable
  • Excellent for complex queries
  • TimescaleDB extension for time-series optimization
  • Strong consistency guarantees
  • Rich ecosystem

Docker Deployment with TimescaleDB

# docker-compose.yml
version: '3.8'
services:
  postgres:
    image: timescale/timescaledb:latest-pg14
    container_name: water_postgres
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql
    environment:
      - POSTGRES_DB=water_monitoring
      - POSTGRES_USER=water_user
      - POSTGRES_PASSWORD=secure_password
    restart: unless-stopped

volumes:
  postgres_data:

Database Initialization

-- init.sql
CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;

-- Create hypertable for time-series optimization
CREATE TABLE water_measurements (
    id BIGSERIAL PRIMARY KEY,
    timestamp TIMESTAMPTZ NOT NULL,
    station_id INT NOT NULL,
    water_level NUMERIC(10,3),
    discharge NUMERIC(10,2),
    discharge_percent NUMERIC(5,2),
    status VARCHAR(20) DEFAULT 'active',
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Convert to hypertable (TimescaleDB)
SELECT create_hypertable('water_measurements', 'timestamp', chunk_time_interval => INTERVAL '1 day');

-- Create indexes
CREATE INDEX idx_water_measurements_station_time ON water_measurements (station_id, timestamp DESC);
CREATE INDEX idx_water_measurements_timestamp ON water_measurements (timestamp DESC);

-- Create retention policy (keep raw data for 2 years)
SELECT add_retention_policy('water_measurements', INTERVAL '2 years');

-- Create continuous aggregates for performance
CREATE MATERIALIZED VIEW water_measurements_hourly
WITH (timescaledb.continuous) AS
SELECT 
    time_bucket('1 hour', timestamp) AS bucket,
    station_id,
    AVG(water_level) as avg_water_level,
    MAX(water_level) as max_water_level,
    MIN(water_level) as min_water_level,
    AVG(discharge) as avg_discharge,
    MAX(discharge) as max_discharge,
    MIN(discharge) as min_discharge,
    AVG(discharge_percent) as avg_discharge_percent
FROM water_measurements
GROUP BY bucket, station_id;

-- Refresh policy for continuous aggregates
SELECT add_continuous_aggregate_policy('water_measurements_hourly',
    start_offset => INTERVAL '1 day',
    end_offset => INTERVAL '1 hour',
    schedule_interval => INTERVAL '1 hour');

Environment Variables

# .env file
DB_TYPE=postgresql
POSTGRES_CONNECTION_STRING=postgresql://water_user:secure_password@localhost:5432/water_monitoring

4. MySQL Deployment (Traditional Relational)

Docker Deployment

# docker-compose.yml
version: '3.8'
services:
  mysql:
    image: mysql:8.0
    container_name: water_mysql
    ports:
      - "3306:3306"
    volumes:
      - mysql_data:/var/lib/mysql
      - ./mysql.cnf:/etc/mysql/conf.d/mysql.cnf
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql
    environment:
      - MYSQL_ROOT_PASSWORD=root_password
      - MYSQL_DATABASE=water_monitoring
      - MYSQL_USER=water_user
      - MYSQL_PASSWORD=water_password
    restart: unless-stopped

volumes:
  mysql_data:

MySQL Configuration

# mysql.cnf
[mysqld]
# Optimize for time-series data
innodb_buffer_pool_size = 1G
innodb_log_file_size = 256M
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT

# Partitioning support
partition = ON

# Query cache
query_cache_type = 1
query_cache_size = 128M

# Connection settings
max_connections = 200
connect_timeout = 10
wait_timeout = 600

Environment Variables

# .env file
DB_TYPE=mysql
MYSQL_CONNECTION_STRING=mysql://water_user:water_password@localhost:3306/water_monitoring

5. Installation and Dependencies

Required Python Packages

# Base requirements
pip install requests schedule

# Database-specific packages
pip install influxdb                    # For InfluxDB
pip install sqlalchemy pymysql         # For MySQL
pip install sqlalchemy psycopg2-binary # For PostgreSQL
# VictoriaMetrics uses HTTP API (no extra packages needed)

Updated requirements.txt

requests>=2.28.0
schedule>=1.2.0
pandas>=1.5.0

# Database adapters (install as needed)
influxdb>=5.3.1
sqlalchemy>=1.4.0
pymysql>=1.0.2
psycopg2-binary>=2.9.0

6. Production Deployment Examples

# Set environment variables
export DB_TYPE=influxdb
export INFLUX_HOST=your-influx-server.com
export INFLUX_PORT=8086
export INFLUX_DATABASE=water_monitoring
export INFLUX_USERNAME=water_user
export INFLUX_PASSWORD=your_secure_password

# Run the scraper
python water_scraper_v3.py

Using PostgreSQL with TimescaleDB

# Set environment variables
export DB_TYPE=postgresql
export POSTGRES_CONNECTION_STRING=postgresql://water_user:password@your-postgres-server.com:5432/water_monitoring

# Run the scraper
python water_scraper_v3.py

Using VictoriaMetrics

# Set environment variables
export DB_TYPE=victoriametrics
export VM_HOST=your-vm-server.com
export VM_PORT=8428

# Run the scraper
python water_scraper_v3.py

7. Monitoring and Alerting

Grafana Dashboard Setup

  1. Add Data Source: Configure your database as a Grafana data source
  2. Import Dashboard: Use pre-built water monitoring dashboards
  3. Set Alerts: Configure alerts for abnormal water levels or discharge rates

Example Grafana Queries

InfluxDB Queries

-- Current water levels
SELECT last("water_level") FROM "water_data" GROUP BY "station_code"

-- Discharge trends (last 24h)
SELECT mean("discharge") FROM "water_data" WHERE time >= now() - 24h GROUP BY time(1h), "station_code"

PostgreSQL/TimescaleDB Queries

-- Current water levels
SELECT DISTINCT ON (station_id) 
    station_id, water_level, discharge, timestamp
FROM water_measurements 
ORDER BY station_id, timestamp DESC;

-- Hourly averages (last 24h)
SELECT 
    time_bucket('1 hour', timestamp) as hour,
    station_id,
    AVG(water_level) as avg_level,
    AVG(discharge) as avg_discharge
FROM water_measurements 
WHERE timestamp >= NOW() - INTERVAL '24 hours'
GROUP BY hour, station_id
ORDER BY hour DESC;

8. Performance Optimization Tips

For All Databases

  • Batch inserts: Insert multiple measurements at once
  • Connection pooling: Reuse database connections
  • Indexing: Ensure proper indexes on timestamp and station_id
  • Retention policies: Automatically delete old data

InfluxDB Specific

  • Use tags for metadata (station codes, names)
  • Use fields for numeric values (water levels, discharge)
  • Configure retention policies and continuous queries
  • Enable compression for long-term storage

PostgreSQL/TimescaleDB Specific

  • Use hypertables for automatic partitioning
  • Create continuous aggregates for common queries
  • Configure compression for older chunks
  • Use parallel queries for large datasets

VictoriaMetrics Specific

  • Use labels efficiently (similar to Prometheus)
  • Configure retention periods appropriately
  • Use downsampling for long-term storage
  • Enable deduplication if needed

This deployment guide provides production-ready configurations for all supported database backends. Choose the one that best fits your infrastructure and requirements.