grabowski/Northern-Thailand-Ping-River-Monitor

Fork 0

Files

grabowski af62cfef0b

Security & Dependency Updates / Dependency Security Scan (push) Successful in 29s

Details

Security & Dependency Updates / Docker Security Scan (push) Failing after 53s

Details

Security & Dependency Updates / License Compliance (push) Successful in 13s

Details

Security & Dependency Updates / Check for Dependency Updates (push) Successful in 19s

Details

Security & Dependency Updates / Code Quality Metrics (push) Successful in 11s

Details

Security & Dependency Updates / Security Summary (push) Successful in 7s

Details

Initial commit: Northern Thailand Ping River Monitor v3.1.0

Features:
- Real-time water level monitoring for Ping River Basin (16 stations)
- Coverage from Chiang Dao to Nakhon Sawan in Northern Thailand
- FastAPI web interface with interactive dashboard and station management
- Multi-database support (SQLite, MySQL, PostgreSQL, InfluxDB, VictoriaMetrics)
- Comprehensive monitoring with health checks and metrics collection
- Docker deployment with Grafana integration
- Production-ready architecture with enterprise-grade observability

 CI/CD & Automation:
- Complete Gitea Actions workflows for CI/CD, security, and releases
- Multi-Python version testing (3.9-3.12)
- Multi-architecture Docker builds (amd64, arm64)
- Daily security scanning and dependency monitoring
- Automated documentation generation
- Performance testing and validation

 Production Ready:
- Type safety with Pydantic models and comprehensive type hints
- Data validation layer with range checking and error handling
- Rate limiting and request tracking for API protection
- Enhanced logging with rotation, colors, and performance metrics
- Station management API for dynamic CRUD operations
- Comprehensive documentation and deployment guides

 Technical Stack:
- Python 3.9+ with FastAPI and Pydantic
- Multi-database architecture with adapter pattern
- Docker containerization with multi-stage builds
- Grafana dashboards for visualization
- Gitea Actions for CI/CD automation
- Enterprise monitoring and alerting

 Ready for deployment to B4L infrastructure!

2025-08-12 15:40:24 +07:00

12 KiB

Raw Blame History

Database Deployment Guide for Thailand Water Monitor

This guide covers deployment options for storing water monitoring data in production environments.

🏆 Recommendation Summary

Database	Best For	Performance	Complexity	Cost
InfluxDB	Time-series data, dashboards	⭐⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐
VictoriaMetrics	High-performance metrics	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐
PostgreSQL	Complex queries, reliability	⭐⭐⭐⭐	⭐⭐	⭐⭐⭐⭐
MySQL	Familiar, existing infrastructure	⭐⭐⭐	⭐⭐	⭐⭐⭐⭐

1. InfluxDB Deployment (Recommended for Time-Series)

Why InfluxDB?

Purpose-built for time-series data
Excellent compression (10:1 typical ratio)
Built-in retention policies and downsampling
Great Grafana integration for dashboards
High write throughput (100k+ points/second)

Docker Deployment

# docker-compose.yml
version: '3.8'
services:
  influxdb:
    image: influxdb:1.8
    container_name: water_influxdb
    ports:
      - "8086:8086"
    volumes:
      - influxdb_data:/var/lib/influxdb
      - ./influxdb.conf:/etc/influxdb/influxdb.conf:ro
    environment:
      - INFLUXDB_DB=water_monitoring
      - INFLUXDB_ADMIN_USER=admin
      - INFLUXDB_ADMIN_PASSWORD=your_secure_password
      - INFLUXDB_USER=water_user
      - INFLUXDB_USER_PASSWORD=water_password
    restart: unless-stopped

  grafana:
    image: grafana/grafana:latest
    container_name: water_grafana
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin_password
    restart: unless-stopped

volumes:
  influxdb_data:
  grafana_data:

Environment Variables

# .env file
DB_TYPE=influxdb
INFLUX_HOST=localhost
INFLUX_PORT=8086
INFLUX_DATABASE=water_monitoring
INFLUX_USERNAME=water_user
INFLUX_PASSWORD=water_password

InfluxDB Configuration

# influxdb.conf
[meta]
  dir = "/var/lib/influxdb/meta"

[data]
  dir = "/var/lib/influxdb/data"
  wal-dir = "/var/lib/influxdb/wal"
  
  # Optimize for time-series data
  cache-max-memory-size = "1g"
  cache-snapshot-memory-size = "25m"
  cache-snapshot-write-cold-duration = "10m"
  
  # Retention and compression
  compact-full-write-cold-duration = "4h"
  max-series-per-database = 1000000
  max-values-per-tag = 100000

[coordinator]
  write-timeout = "10s"
  max-concurrent-queries = 0
  query-timeout = "0s"

[retention]
  enabled = true
  check-interval = "30m"

[http]
  enabled = true
  bind-address = ":8086"
  auth-enabled = true
  max-body-size = "25000000"
  max-concurrent-requests = 0
  max-enqueued-requests = 0

Production Setup Commands

# Start services
docker-compose up -d

# Create retention policies
docker exec -it water_influxdb influx -username admin -password your_secure_password -execute "
CREATE RETENTION POLICY \"raw_data\" ON \"water_monitoring\" DURATION 90d REPLICATION 1 DEFAULT;
CREATE RETENTION POLICY \"downsampled\" ON \"water_monitoring\" DURATION 730d REPLICATION 1;
"

# Create continuous queries for downsampling
docker exec -it water_influxdb influx -username admin -password your_secure_password -execute "
CREATE CONTINUOUS QUERY \"downsample_hourly\" ON \"water_monitoring\" 
BEGIN 
  SELECT mean(water_level) AS water_level, mean(discharge) AS discharge, mean(discharge_percent) AS discharge_percent
  INTO \"downsampled\".\"water_data_hourly\" 
  FROM \"water_data\" 
  GROUP BY time(1h), station_code, station_name_en, station_name_th
END
"

2. VictoriaMetrics Deployment (High Performance)

Why VictoriaMetrics?

Extremely fast and resource-efficient
Better compression than InfluxDB
Prometheus-compatible API
Lower memory usage
Built-in clustering

Docker Deployment

# docker-compose.yml
version: '3.8'
services:
  victoriametrics:
    image: victoriametrics/victoria-metrics:latest
    container_name: water_victoriametrics
    ports:
      - "8428:8428"
    volumes:
      - vm_data:/victoria-metrics-data
    command:
      - '--storageDataPath=/victoria-metrics-data'
      - '--retentionPeriod=2y'
      - '--httpListenAddr=:8428'
      - '--maxConcurrentInserts=16'
    restart: unless-stopped

volumes:
  vm_data:

Environment Variables

# .env file
DB_TYPE=victoriametrics
VM_HOST=localhost
VM_PORT=8428

3. PostgreSQL Deployment (Relational + Time-Series)

Why PostgreSQL?

Mature and reliable
Excellent for complex queries
TimescaleDB extension for time-series optimization
Strong consistency guarantees
Rich ecosystem

Docker Deployment with TimescaleDB

# docker-compose.yml
version: '3.8'
services:
  postgres:
    image: timescale/timescaledb:latest-pg14
    container_name: water_postgres
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql
    environment:
      - POSTGRES_DB=water_monitoring
      - POSTGRES_USER=water_user
      - POSTGRES_PASSWORD=secure_password
    restart: unless-stopped

volumes:
  postgres_data:

Database Initialization

-- init.sql
CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;

-- Create hypertable for time-series optimization
CREATE TABLE water_measurements (
    id BIGSERIAL PRIMARY KEY,
    timestamp TIMESTAMPTZ NOT NULL,
    station_id INT NOT NULL,
    water_level NUMERIC(10,3),
    discharge NUMERIC(10,2),
    discharge_percent NUMERIC(5,2),
    status VARCHAR(20) DEFAULT 'active',
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Convert to hypertable (TimescaleDB)
SELECT create_hypertable('water_measurements', 'timestamp', chunk_time_interval => INTERVAL '1 day');

-- Create indexes
CREATE INDEX idx_water_measurements_station_time ON water_measurements (station_id, timestamp DESC);
CREATE INDEX idx_water_measurements_timestamp ON water_measurements (timestamp DESC);

-- Create retention policy (keep raw data for 2 years)
SELECT add_retention_policy('water_measurements', INTERVAL '2 years');

-- Create continuous aggregates for performance
CREATE MATERIALIZED VIEW water_measurements_hourly
WITH (timescaledb.continuous) AS
SELECT 
    time_bucket('1 hour', timestamp) AS bucket,
    station_id,
    AVG(water_level) as avg_water_level,
    MAX(water_level) as max_water_level,
    MIN(water_level) as min_water_level,
    AVG(discharge) as avg_discharge,
    MAX(discharge) as max_discharge,
    MIN(discharge) as min_discharge,
    AVG(discharge_percent) as avg_discharge_percent
FROM water_measurements
GROUP BY bucket, station_id;

-- Refresh policy for continuous aggregates
SELECT add_continuous_aggregate_policy('water_measurements_hourly',
    start_offset => INTERVAL '1 day',
    end_offset => INTERVAL '1 hour',
    schedule_interval => INTERVAL '1 hour');

Environment Variables

# .env file
DB_TYPE=postgresql
POSTGRES_CONNECTION_STRING=postgresql://water_user:secure_password@localhost:5432/water_monitoring

4. MySQL Deployment (Traditional Relational)

Docker Deployment

# docker-compose.yml
version: '3.8'
services:
  mysql:
    image: mysql:8.0
    container_name: water_mysql
    ports:
      - "3306:3306"
    volumes:
      - mysql_data:/var/lib/mysql
      - ./mysql.cnf:/etc/mysql/conf.d/mysql.cnf
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql
    environment:
      - MYSQL_ROOT_PASSWORD=root_password
      - MYSQL_DATABASE=water_monitoring
      - MYSQL_USER=water_user
      - MYSQL_PASSWORD=water_password
    restart: unless-stopped

volumes:
  mysql_data:

MySQL Configuration

# mysql.cnf
[mysqld]
# Optimize for time-series data
innodb_buffer_pool_size = 1G
innodb_log_file_size = 256M
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT

# Partitioning support
partition = ON

# Query cache
query_cache_type = 1
query_cache_size = 128M

# Connection settings
max_connections = 200
connect_timeout = 10
wait_timeout = 600

Environment Variables

# .env file
DB_TYPE=mysql
MYSQL_CONNECTION_STRING=mysql://water_user:water_password@localhost:3306/water_monitoring

5. Installation and Dependencies

Required Python Packages

# Base requirements
pip install requests schedule

# Database-specific packages
pip install influxdb                    # For InfluxDB
pip install sqlalchemy pymysql         # For MySQL
pip install sqlalchemy psycopg2-binary # For PostgreSQL
# VictoriaMetrics uses HTTP API (no extra packages needed)

Updated requirements.txt

requests>=2.28.0
schedule>=1.2.0
pandas>=1.5.0

# Database adapters (install as needed)
influxdb>=5.3.1
sqlalchemy>=1.4.0
pymysql>=1.0.2
psycopg2-binary>=2.9.0

6. Production Deployment Examples

Using InfluxDB (Recommended)

# Set environment variables
export DB_TYPE=influxdb
export INFLUX_HOST=your-influx-server.com
export INFLUX_PORT=8086
export INFLUX_DATABASE=water_monitoring
export INFLUX_USERNAME=water_user
export INFLUX_PASSWORD=your_secure_password

# Run the scraper
python water_scraper_v3.py

Using PostgreSQL with TimescaleDB

# Set environment variables
export DB_TYPE=postgresql
export POSTGRES_CONNECTION_STRING=postgresql://water_user:password@your-postgres-server.com:5432/water_monitoring

# Run the scraper
python water_scraper_v3.py

Using VictoriaMetrics

# Set environment variables
export DB_TYPE=victoriametrics
export VM_HOST=your-vm-server.com
export VM_PORT=8428

# Run the scraper
python water_scraper_v3.py

7. Monitoring and Alerting

Grafana Dashboard Setup

Add Data Source: Configure your database as a Grafana data source
Import Dashboard: Use pre-built water monitoring dashboards
Set Alerts: Configure alerts for abnormal water levels or discharge rates

Example Grafana Queries

InfluxDB Queries

-- Current water levels
SELECT last("water_level") FROM "water_data" GROUP BY "station_code"

-- Discharge trends (last 24h)
SELECT mean("discharge") FROM "water_data" WHERE time >= now() - 24h GROUP BY time(1h), "station_code"

PostgreSQL/TimescaleDB Queries

-- Current water levels
SELECT DISTINCT ON (station_id) 
    station_id, water_level, discharge, timestamp
FROM water_measurements 
ORDER BY station_id, timestamp DESC;

-- Hourly averages (last 24h)
SELECT 
    time_bucket('1 hour', timestamp) as hour,
    station_id,
    AVG(water_level) as avg_level,
    AVG(discharge) as avg_discharge
FROM water_measurements 
WHERE timestamp >= NOW() - INTERVAL '24 hours'
GROUP BY hour, station_id
ORDER BY hour DESC;

8. Performance Optimization Tips

For All Databases

Batch inserts: Insert multiple measurements at once
Connection pooling: Reuse database connections
Indexing: Ensure proper indexes on timestamp and station_id
Retention policies: Automatically delete old data

InfluxDB Specific

Use tags for metadata (station codes, names)
Use fields for numeric values (water levels, discharge)
Configure retention policies and continuous queries
Enable compression for long-term storage

PostgreSQL/TimescaleDB Specific

Use hypertables for automatic partitioning
Create continuous aggregates for common queries
Configure compression for older chunks
Use parallel queries for large datasets

VictoriaMetrics Specific

Use labels efficiently (similar to Prometheus)
Configure retention periods appropriately
Use downsampling for long-term storage
Enable deduplication if needed

This deployment guide provides production-ready configurations for all supported database backends. Choose the one that best fits your infrastructure and requirements.

12 KiB Raw Blame History

Database Deployment Guide for Thailand Water Monitor

🏆 Recommendation Summary

1. InfluxDB Deployment (Recommended for Time-Series)

Why InfluxDB?

Docker Deployment

Environment Variables

InfluxDB Configuration

Production Setup Commands

2. VictoriaMetrics Deployment (High Performance)

Why VictoriaMetrics?

Docker Deployment

Environment Variables

3. PostgreSQL Deployment (Relational + Time-Series)

Why PostgreSQL?

Docker Deployment with TimescaleDB

Database Initialization

Environment Variables

4. MySQL Deployment (Traditional Relational)

Docker Deployment

MySQL Configuration

Environment Variables

5. Installation and Dependencies

Required Python Packages

Updated requirements.txt

6. Production Deployment Examples

Using InfluxDB (Recommended)

Using PostgreSQL with TimescaleDB

Using VictoriaMetrics

7. Monitoring and Alerting

Grafana Dashboard Setup

Example Grafana Queries

InfluxDB Queries

PostgreSQL/TimescaleDB Queries

8. Performance Optimization Tips

For All Databases

InfluxDB Specific

PostgreSQL/TimescaleDB Specific

VictoriaMetrics Specific

12 KiB

Raw Blame History