Some checks failed
Security & Dependency Updates / Dependency Security Scan (push) Successful in 29s
Security & Dependency Updates / Docker Security Scan (push) Failing after 53s
Security & Dependency Updates / License Compliance (push) Successful in 13s
Security & Dependency Updates / Check for Dependency Updates (push) Successful in 19s
Security & Dependency Updates / Code Quality Metrics (push) Successful in 11s
Security & Dependency Updates / Security Summary (push) Successful in 7s
Features: - Real-time water level monitoring for Ping River Basin (16 stations) - Coverage from Chiang Dao to Nakhon Sawan in Northern Thailand - FastAPI web interface with interactive dashboard and station management - Multi-database support (SQLite, MySQL, PostgreSQL, InfluxDB, VictoriaMetrics) - Comprehensive monitoring with health checks and metrics collection - Docker deployment with Grafana integration - Production-ready architecture with enterprise-grade observability CI/CD & Automation: - Complete Gitea Actions workflows for CI/CD, security, and releases - Multi-Python version testing (3.9-3.12) - Multi-architecture Docker builds (amd64, arm64) - Daily security scanning and dependency monitoring - Automated documentation generation - Performance testing and validation Production Ready: - Type safety with Pydantic models and comprehensive type hints - Data validation layer with range checking and error handling - Rate limiting and request tracking for API protection - Enhanced logging with rotation, colors, and performance metrics - Station management API for dynamic CRUD operations - Comprehensive documentation and deployment guides Technical Stack: - Python 3.9+ with FastAPI and Pydantic - Multi-database architecture with adapter pattern - Docker containerization with multi-stage builds - Grafana dashboards for visualization - Gitea Actions for CI/CD automation - Enterprise monitoring and alerting Ready for deployment to B4L infrastructure!
12 KiB
12 KiB
Database Deployment Guide for Thailand Water Monitor
This guide covers deployment options for storing water monitoring data in production environments.
🏆 Recommendation Summary
Database | Best For | Performance | Complexity | Cost |
---|---|---|---|---|
InfluxDB | Time-series data, dashboards | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ |
VictoriaMetrics | High-performance metrics | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
PostgreSQL | Complex queries, reliability | ⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐ |
MySQL | Familiar, existing infrastructure | ⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐ |
1. InfluxDB Deployment (Recommended for Time-Series)
Why InfluxDB?
- Purpose-built for time-series data
- Excellent compression (10:1 typical ratio)
- Built-in retention policies and downsampling
- Great Grafana integration for dashboards
- High write throughput (100k+ points/second)
Docker Deployment
# docker-compose.yml
version: '3.8'
services:
influxdb:
image: influxdb:1.8
container_name: water_influxdb
ports:
- "8086:8086"
volumes:
- influxdb_data:/var/lib/influxdb
- ./influxdb.conf:/etc/influxdb/influxdb.conf:ro
environment:
- INFLUXDB_DB=water_monitoring
- INFLUXDB_ADMIN_USER=admin
- INFLUXDB_ADMIN_PASSWORD=your_secure_password
- INFLUXDB_USER=water_user
- INFLUXDB_USER_PASSWORD=water_password
restart: unless-stopped
grafana:
image: grafana/grafana:latest
container_name: water_grafana
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin_password
restart: unless-stopped
volumes:
influxdb_data:
grafana_data:
Environment Variables
# .env file
DB_TYPE=influxdb
INFLUX_HOST=localhost
INFLUX_PORT=8086
INFLUX_DATABASE=water_monitoring
INFLUX_USERNAME=water_user
INFLUX_PASSWORD=water_password
InfluxDB Configuration
# influxdb.conf
[meta]
dir = "/var/lib/influxdb/meta"
[data]
dir = "/var/lib/influxdb/data"
wal-dir = "/var/lib/influxdb/wal"
# Optimize for time-series data
cache-max-memory-size = "1g"
cache-snapshot-memory-size = "25m"
cache-snapshot-write-cold-duration = "10m"
# Retention and compression
compact-full-write-cold-duration = "4h"
max-series-per-database = 1000000
max-values-per-tag = 100000
[coordinator]
write-timeout = "10s"
max-concurrent-queries = 0
query-timeout = "0s"
[retention]
enabled = true
check-interval = "30m"
[http]
enabled = true
bind-address = ":8086"
auth-enabled = true
max-body-size = "25000000"
max-concurrent-requests = 0
max-enqueued-requests = 0
Production Setup Commands
# Start services
docker-compose up -d
# Create retention policies
docker exec -it water_influxdb influx -username admin -password your_secure_password -execute "
CREATE RETENTION POLICY \"raw_data\" ON \"water_monitoring\" DURATION 90d REPLICATION 1 DEFAULT;
CREATE RETENTION POLICY \"downsampled\" ON \"water_monitoring\" DURATION 730d REPLICATION 1;
"
# Create continuous queries for downsampling
docker exec -it water_influxdb influx -username admin -password your_secure_password -execute "
CREATE CONTINUOUS QUERY \"downsample_hourly\" ON \"water_monitoring\"
BEGIN
SELECT mean(water_level) AS water_level, mean(discharge) AS discharge, mean(discharge_percent) AS discharge_percent
INTO \"downsampled\".\"water_data_hourly\"
FROM \"water_data\"
GROUP BY time(1h), station_code, station_name_en, station_name_th
END
"
2. VictoriaMetrics Deployment (High Performance)
Why VictoriaMetrics?
- Extremely fast and resource-efficient
- Better compression than InfluxDB
- Prometheus-compatible API
- Lower memory usage
- Built-in clustering
Docker Deployment
# docker-compose.yml
version: '3.8'
services:
victoriametrics:
image: victoriametrics/victoria-metrics:latest
container_name: water_victoriametrics
ports:
- "8428:8428"
volumes:
- vm_data:/victoria-metrics-data
command:
- '--storageDataPath=/victoria-metrics-data'
- '--retentionPeriod=2y'
- '--httpListenAddr=:8428'
- '--maxConcurrentInserts=16'
restart: unless-stopped
volumes:
vm_data:
Environment Variables
# .env file
DB_TYPE=victoriametrics
VM_HOST=localhost
VM_PORT=8428
3. PostgreSQL Deployment (Relational + Time-Series)
Why PostgreSQL?
- Mature and reliable
- Excellent for complex queries
- TimescaleDB extension for time-series optimization
- Strong consistency guarantees
- Rich ecosystem
Docker Deployment with TimescaleDB
# docker-compose.yml
version: '3.8'
services:
postgres:
image: timescale/timescaledb:latest-pg14
container_name: water_postgres
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
environment:
- POSTGRES_DB=water_monitoring
- POSTGRES_USER=water_user
- POSTGRES_PASSWORD=secure_password
restart: unless-stopped
volumes:
postgres_data:
Database Initialization
-- init.sql
CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
-- Create hypertable for time-series optimization
CREATE TABLE water_measurements (
id BIGSERIAL PRIMARY KEY,
timestamp TIMESTAMPTZ NOT NULL,
station_id INT NOT NULL,
water_level NUMERIC(10,3),
discharge NUMERIC(10,2),
discharge_percent NUMERIC(5,2),
status VARCHAR(20) DEFAULT 'active',
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Convert to hypertable (TimescaleDB)
SELECT create_hypertable('water_measurements', 'timestamp', chunk_time_interval => INTERVAL '1 day');
-- Create indexes
CREATE INDEX idx_water_measurements_station_time ON water_measurements (station_id, timestamp DESC);
CREATE INDEX idx_water_measurements_timestamp ON water_measurements (timestamp DESC);
-- Create retention policy (keep raw data for 2 years)
SELECT add_retention_policy('water_measurements', INTERVAL '2 years');
-- Create continuous aggregates for performance
CREATE MATERIALIZED VIEW water_measurements_hourly
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 hour', timestamp) AS bucket,
station_id,
AVG(water_level) as avg_water_level,
MAX(water_level) as max_water_level,
MIN(water_level) as min_water_level,
AVG(discharge) as avg_discharge,
MAX(discharge) as max_discharge,
MIN(discharge) as min_discharge,
AVG(discharge_percent) as avg_discharge_percent
FROM water_measurements
GROUP BY bucket, station_id;
-- Refresh policy for continuous aggregates
SELECT add_continuous_aggregate_policy('water_measurements_hourly',
start_offset => INTERVAL '1 day',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '1 hour');
Environment Variables
# .env file
DB_TYPE=postgresql
POSTGRES_CONNECTION_STRING=postgresql://water_user:secure_password@localhost:5432/water_monitoring
4. MySQL Deployment (Traditional Relational)
Docker Deployment
# docker-compose.yml
version: '3.8'
services:
mysql:
image: mysql:8.0
container_name: water_mysql
ports:
- "3306:3306"
volumes:
- mysql_data:/var/lib/mysql
- ./mysql.cnf:/etc/mysql/conf.d/mysql.cnf
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
environment:
- MYSQL_ROOT_PASSWORD=root_password
- MYSQL_DATABASE=water_monitoring
- MYSQL_USER=water_user
- MYSQL_PASSWORD=water_password
restart: unless-stopped
volumes:
mysql_data:
MySQL Configuration
# mysql.cnf
[mysqld]
# Optimize for time-series data
innodb_buffer_pool_size = 1G
innodb_log_file_size = 256M
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
# Partitioning support
partition = ON
# Query cache
query_cache_type = 1
query_cache_size = 128M
# Connection settings
max_connections = 200
connect_timeout = 10
wait_timeout = 600
Environment Variables
# .env file
DB_TYPE=mysql
MYSQL_CONNECTION_STRING=mysql://water_user:water_password@localhost:3306/water_monitoring
5. Installation and Dependencies
Required Python Packages
# Base requirements
pip install requests schedule
# Database-specific packages
pip install influxdb # For InfluxDB
pip install sqlalchemy pymysql # For MySQL
pip install sqlalchemy psycopg2-binary # For PostgreSQL
# VictoriaMetrics uses HTTP API (no extra packages needed)
Updated requirements.txt
requests>=2.28.0
schedule>=1.2.0
pandas>=1.5.0
# Database adapters (install as needed)
influxdb>=5.3.1
sqlalchemy>=1.4.0
pymysql>=1.0.2
psycopg2-binary>=2.9.0
6. Production Deployment Examples
Using InfluxDB (Recommended)
# Set environment variables
export DB_TYPE=influxdb
export INFLUX_HOST=your-influx-server.com
export INFLUX_PORT=8086
export INFLUX_DATABASE=water_monitoring
export INFLUX_USERNAME=water_user
export INFLUX_PASSWORD=your_secure_password
# Run the scraper
python water_scraper_v3.py
Using PostgreSQL with TimescaleDB
# Set environment variables
export DB_TYPE=postgresql
export POSTGRES_CONNECTION_STRING=postgresql://water_user:password@your-postgres-server.com:5432/water_monitoring
# Run the scraper
python water_scraper_v3.py
Using VictoriaMetrics
# Set environment variables
export DB_TYPE=victoriametrics
export VM_HOST=your-vm-server.com
export VM_PORT=8428
# Run the scraper
python water_scraper_v3.py
7. Monitoring and Alerting
Grafana Dashboard Setup
- Add Data Source: Configure your database as a Grafana data source
- Import Dashboard: Use pre-built water monitoring dashboards
- Set Alerts: Configure alerts for abnormal water levels or discharge rates
Example Grafana Queries
InfluxDB Queries
-- Current water levels
SELECT last("water_level") FROM "water_data" GROUP BY "station_code"
-- Discharge trends (last 24h)
SELECT mean("discharge") FROM "water_data" WHERE time >= now() - 24h GROUP BY time(1h), "station_code"
PostgreSQL/TimescaleDB Queries
-- Current water levels
SELECT DISTINCT ON (station_id)
station_id, water_level, discharge, timestamp
FROM water_measurements
ORDER BY station_id, timestamp DESC;
-- Hourly averages (last 24h)
SELECT
time_bucket('1 hour', timestamp) as hour,
station_id,
AVG(water_level) as avg_level,
AVG(discharge) as avg_discharge
FROM water_measurements
WHERE timestamp >= NOW() - INTERVAL '24 hours'
GROUP BY hour, station_id
ORDER BY hour DESC;
8. Performance Optimization Tips
For All Databases
- Batch inserts: Insert multiple measurements at once
- Connection pooling: Reuse database connections
- Indexing: Ensure proper indexes on timestamp and station_id
- Retention policies: Automatically delete old data
InfluxDB Specific
- Use tags for metadata (station codes, names)
- Use fields for numeric values (water levels, discharge)
- Configure retention policies and continuous queries
- Enable compression for long-term storage
PostgreSQL/TimescaleDB Specific
- Use hypertables for automatic partitioning
- Create continuous aggregates for common queries
- Configure compression for older chunks
- Use parallel queries for large datasets
VictoriaMetrics Specific
- Use labels efficiently (similar to Prometheus)
- Configure retention periods appropriately
- Use downsampling for long-term storage
- Enable deduplication if needed
This deployment guide provides production-ready configurations for all supported database backends. Choose the one that best fits your infrastructure and requirements.