Initial commit: Northern Thailand Ping River Monitor v3.1.0
Some checks failed
Security & Dependency Updates / Dependency Security Scan (push) Successful in 29s
Security & Dependency Updates / Docker Security Scan (push) Failing after 53s
Security & Dependency Updates / License Compliance (push) Successful in 13s
Security & Dependency Updates / Check for Dependency Updates (push) Successful in 19s
Security & Dependency Updates / Code Quality Metrics (push) Successful in 11s
Security & Dependency Updates / Security Summary (push) Successful in 7s
Some checks failed
Security & Dependency Updates / Dependency Security Scan (push) Successful in 29s
Security & Dependency Updates / Docker Security Scan (push) Failing after 53s
Security & Dependency Updates / License Compliance (push) Successful in 13s
Security & Dependency Updates / Check for Dependency Updates (push) Successful in 19s
Security & Dependency Updates / Code Quality Metrics (push) Successful in 11s
Security & Dependency Updates / Security Summary (push) Successful in 7s
Features: - Real-time water level monitoring for Ping River Basin (16 stations) - Coverage from Chiang Dao to Nakhon Sawan in Northern Thailand - FastAPI web interface with interactive dashboard and station management - Multi-database support (SQLite, MySQL, PostgreSQL, InfluxDB, VictoriaMetrics) - Comprehensive monitoring with health checks and metrics collection - Docker deployment with Grafana integration - Production-ready architecture with enterprise-grade observability CI/CD & Automation: - Complete Gitea Actions workflows for CI/CD, security, and releases - Multi-Python version testing (3.9-3.12) - Multi-architecture Docker builds (amd64, arm64) - Daily security scanning and dependency monitoring - Automated documentation generation - Performance testing and validation Production Ready: - Type safety with Pydantic models and comprehensive type hints - Data validation layer with range checking and error handling - Rate limiting and request tracking for API protection - Enhanced logging with rotation, colors, and performance metrics - Station management API for dynamic CRUD operations - Comprehensive documentation and deployment guides Technical Stack: - Python 3.9+ with FastAPI and Pydantic - Multi-database architecture with adapter pattern - Docker containerization with multi-stage builds - Grafana dashboards for visualization - Gitea Actions for CI/CD automation - Enterprise monitoring and alerting Ready for deployment to B4L infrastructure!
This commit is contained in:
447
docs/DATABASE_DEPLOYMENT_GUIDE.md
Normal file
447
docs/DATABASE_DEPLOYMENT_GUIDE.md
Normal file
@@ -0,0 +1,447 @@
|
||||
# Database Deployment Guide for Thailand Water Monitor
|
||||
|
||||
This guide covers deployment options for storing water monitoring data in production environments.
|
||||
|
||||
## 🏆 Recommendation Summary
|
||||
|
||||
| Database | Best For | Performance | Complexity | Cost |
|
||||
|----------|----------|-------------|------------|------|
|
||||
| **InfluxDB** | Time-series data, dashboards | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ |
|
||||
| **VictoriaMetrics** | High-performance metrics | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
|
||||
| **PostgreSQL** | Complex queries, reliability | ⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐ |
|
||||
| **MySQL** | Familiar, existing infrastructure | ⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐ |
|
||||
|
||||
## 1. InfluxDB Deployment (Recommended for Time-Series)
|
||||
|
||||
### Why InfluxDB?
|
||||
- **Purpose-built** for time-series data
|
||||
- **Excellent compression** (10:1 typical ratio)
|
||||
- **Built-in retention policies** and downsampling
|
||||
- **Great Grafana integration** for dashboards
|
||||
- **High write throughput** (100k+ points/second)
|
||||
|
||||
### Docker Deployment
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
version: '3.8'
|
||||
services:
|
||||
influxdb:
|
||||
image: influxdb:1.8
|
||||
container_name: water_influxdb
|
||||
ports:
|
||||
- "8086:8086"
|
||||
volumes:
|
||||
- influxdb_data:/var/lib/influxdb
|
||||
- ./influxdb.conf:/etc/influxdb/influxdb.conf:ro
|
||||
environment:
|
||||
- INFLUXDB_DB=water_monitoring
|
||||
- INFLUXDB_ADMIN_USER=admin
|
||||
- INFLUXDB_ADMIN_PASSWORD=your_secure_password
|
||||
- INFLUXDB_USER=water_user
|
||||
- INFLUXDB_USER_PASSWORD=water_password
|
||||
restart: unless-stopped
|
||||
|
||||
grafana:
|
||||
image: grafana/grafana:latest
|
||||
container_name: water_grafana
|
||||
ports:
|
||||
- "3000:3000"
|
||||
volumes:
|
||||
- grafana_data:/var/lib/grafana
|
||||
environment:
|
||||
- GF_SECURITY_ADMIN_PASSWORD=admin_password
|
||||
restart: unless-stopped
|
||||
|
||||
volumes:
|
||||
influxdb_data:
|
||||
grafana_data:
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
```bash
|
||||
# .env file
|
||||
DB_TYPE=influxdb
|
||||
INFLUX_HOST=localhost
|
||||
INFLUX_PORT=8086
|
||||
INFLUX_DATABASE=water_monitoring
|
||||
INFLUX_USERNAME=water_user
|
||||
INFLUX_PASSWORD=water_password
|
||||
```
|
||||
|
||||
### InfluxDB Configuration
|
||||
```toml
|
||||
# influxdb.conf
|
||||
[meta]
|
||||
dir = "/var/lib/influxdb/meta"
|
||||
|
||||
[data]
|
||||
dir = "/var/lib/influxdb/data"
|
||||
wal-dir = "/var/lib/influxdb/wal"
|
||||
|
||||
# Optimize for time-series data
|
||||
cache-max-memory-size = "1g"
|
||||
cache-snapshot-memory-size = "25m"
|
||||
cache-snapshot-write-cold-duration = "10m"
|
||||
|
||||
# Retention and compression
|
||||
compact-full-write-cold-duration = "4h"
|
||||
max-series-per-database = 1000000
|
||||
max-values-per-tag = 100000
|
||||
|
||||
[coordinator]
|
||||
write-timeout = "10s"
|
||||
max-concurrent-queries = 0
|
||||
query-timeout = "0s"
|
||||
|
||||
[retention]
|
||||
enabled = true
|
||||
check-interval = "30m"
|
||||
|
||||
[http]
|
||||
enabled = true
|
||||
bind-address = ":8086"
|
||||
auth-enabled = true
|
||||
max-body-size = "25000000"
|
||||
max-concurrent-requests = 0
|
||||
max-enqueued-requests = 0
|
||||
```
|
||||
|
||||
### Production Setup Commands
|
||||
```bash
|
||||
# Start services
|
||||
docker-compose up -d
|
||||
|
||||
# Create retention policies
|
||||
docker exec -it water_influxdb influx -username admin -password your_secure_password -execute "
|
||||
CREATE RETENTION POLICY \"raw_data\" ON \"water_monitoring\" DURATION 90d REPLICATION 1 DEFAULT;
|
||||
CREATE RETENTION POLICY \"downsampled\" ON \"water_monitoring\" DURATION 730d REPLICATION 1;
|
||||
"
|
||||
|
||||
# Create continuous queries for downsampling
|
||||
docker exec -it water_influxdb influx -username admin -password your_secure_password -execute "
|
||||
CREATE CONTINUOUS QUERY \"downsample_hourly\" ON \"water_monitoring\"
|
||||
BEGIN
|
||||
SELECT mean(water_level) AS water_level, mean(discharge) AS discharge, mean(discharge_percent) AS discharge_percent
|
||||
INTO \"downsampled\".\"water_data_hourly\"
|
||||
FROM \"water_data\"
|
||||
GROUP BY time(1h), station_code, station_name_en, station_name_th
|
||||
END
|
||||
"
|
||||
```
|
||||
|
||||
## 2. VictoriaMetrics Deployment (High Performance)
|
||||
|
||||
### Why VictoriaMetrics?
|
||||
- **Extremely fast** and resource-efficient
|
||||
- **Better compression** than InfluxDB
|
||||
- **Prometheus-compatible** API
|
||||
- **Lower memory usage**
|
||||
- **Built-in clustering**
|
||||
|
||||
### Docker Deployment
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
version: '3.8'
|
||||
services:
|
||||
victoriametrics:
|
||||
image: victoriametrics/victoria-metrics:latest
|
||||
container_name: water_victoriametrics
|
||||
ports:
|
||||
- "8428:8428"
|
||||
volumes:
|
||||
- vm_data:/victoria-metrics-data
|
||||
command:
|
||||
- '--storageDataPath=/victoria-metrics-data'
|
||||
- '--retentionPeriod=2y'
|
||||
- '--httpListenAddr=:8428'
|
||||
- '--maxConcurrentInserts=16'
|
||||
restart: unless-stopped
|
||||
|
||||
volumes:
|
||||
vm_data:
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
```bash
|
||||
# .env file
|
||||
DB_TYPE=victoriametrics
|
||||
VM_HOST=localhost
|
||||
VM_PORT=8428
|
||||
```
|
||||
|
||||
## 3. PostgreSQL Deployment (Relational + Time-Series)
|
||||
|
||||
### Why PostgreSQL?
|
||||
- **Mature and reliable**
|
||||
- **Excellent for complex queries**
|
||||
- **TimescaleDB extension** for time-series optimization
|
||||
- **Strong consistency guarantees**
|
||||
- **Rich ecosystem**
|
||||
|
||||
### Docker Deployment with TimescaleDB
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
version: '3.8'
|
||||
services:
|
||||
postgres:
|
||||
image: timescale/timescaledb:latest-pg14
|
||||
container_name: water_postgres
|
||||
ports:
|
||||
- "5432:5432"
|
||||
volumes:
|
||||
- postgres_data:/var/lib/postgresql/data
|
||||
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
|
||||
environment:
|
||||
- POSTGRES_DB=water_monitoring
|
||||
- POSTGRES_USER=water_user
|
||||
- POSTGRES_PASSWORD=secure_password
|
||||
restart: unless-stopped
|
||||
|
||||
volumes:
|
||||
postgres_data:
|
||||
```
|
||||
|
||||
### Database Initialization
|
||||
```sql
|
||||
-- init.sql
|
||||
CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
|
||||
|
||||
-- Create hypertable for time-series optimization
|
||||
CREATE TABLE water_measurements (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
timestamp TIMESTAMPTZ NOT NULL,
|
||||
station_id INT NOT NULL,
|
||||
water_level NUMERIC(10,3),
|
||||
discharge NUMERIC(10,2),
|
||||
discharge_percent NUMERIC(5,2),
|
||||
status VARCHAR(20) DEFAULT 'active',
|
||||
created_at TIMESTAMPTZ DEFAULT NOW()
|
||||
);
|
||||
|
||||
-- Convert to hypertable (TimescaleDB)
|
||||
SELECT create_hypertable('water_measurements', 'timestamp', chunk_time_interval => INTERVAL '1 day');
|
||||
|
||||
-- Create indexes
|
||||
CREATE INDEX idx_water_measurements_station_time ON water_measurements (station_id, timestamp DESC);
|
||||
CREATE INDEX idx_water_measurements_timestamp ON water_measurements (timestamp DESC);
|
||||
|
||||
-- Create retention policy (keep raw data for 2 years)
|
||||
SELECT add_retention_policy('water_measurements', INTERVAL '2 years');
|
||||
|
||||
-- Create continuous aggregates for performance
|
||||
CREATE MATERIALIZED VIEW water_measurements_hourly
|
||||
WITH (timescaledb.continuous) AS
|
||||
SELECT
|
||||
time_bucket('1 hour', timestamp) AS bucket,
|
||||
station_id,
|
||||
AVG(water_level) as avg_water_level,
|
||||
MAX(water_level) as max_water_level,
|
||||
MIN(water_level) as min_water_level,
|
||||
AVG(discharge) as avg_discharge,
|
||||
MAX(discharge) as max_discharge,
|
||||
MIN(discharge) as min_discharge,
|
||||
AVG(discharge_percent) as avg_discharge_percent
|
||||
FROM water_measurements
|
||||
GROUP BY bucket, station_id;
|
||||
|
||||
-- Refresh policy for continuous aggregates
|
||||
SELECT add_continuous_aggregate_policy('water_measurements_hourly',
|
||||
start_offset => INTERVAL '1 day',
|
||||
end_offset => INTERVAL '1 hour',
|
||||
schedule_interval => INTERVAL '1 hour');
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
```bash
|
||||
# .env file
|
||||
DB_TYPE=postgresql
|
||||
POSTGRES_CONNECTION_STRING=postgresql://water_user:secure_password@localhost:5432/water_monitoring
|
||||
```
|
||||
|
||||
## 4. MySQL Deployment (Traditional Relational)
|
||||
|
||||
### Docker Deployment
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
version: '3.8'
|
||||
services:
|
||||
mysql:
|
||||
image: mysql:8.0
|
||||
container_name: water_mysql
|
||||
ports:
|
||||
- "3306:3306"
|
||||
volumes:
|
||||
- mysql_data:/var/lib/mysql
|
||||
- ./mysql.cnf:/etc/mysql/conf.d/mysql.cnf
|
||||
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
|
||||
environment:
|
||||
- MYSQL_ROOT_PASSWORD=root_password
|
||||
- MYSQL_DATABASE=water_monitoring
|
||||
- MYSQL_USER=water_user
|
||||
- MYSQL_PASSWORD=water_password
|
||||
restart: unless-stopped
|
||||
|
||||
volumes:
|
||||
mysql_data:
|
||||
```
|
||||
|
||||
### MySQL Configuration
|
||||
```ini
|
||||
# mysql.cnf
|
||||
[mysqld]
|
||||
# Optimize for time-series data
|
||||
innodb_buffer_pool_size = 1G
|
||||
innodb_log_file_size = 256M
|
||||
innodb_flush_log_at_trx_commit = 2
|
||||
innodb_flush_method = O_DIRECT
|
||||
|
||||
# Partitioning support
|
||||
partition = ON
|
||||
|
||||
# Query cache
|
||||
query_cache_type = 1
|
||||
query_cache_size = 128M
|
||||
|
||||
# Connection settings
|
||||
max_connections = 200
|
||||
connect_timeout = 10
|
||||
wait_timeout = 600
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
```bash
|
||||
# .env file
|
||||
DB_TYPE=mysql
|
||||
MYSQL_CONNECTION_STRING=mysql://water_user:water_password@localhost:3306/water_monitoring
|
||||
```
|
||||
|
||||
## 5. Installation and Dependencies
|
||||
|
||||
### Required Python Packages
|
||||
```bash
|
||||
# Base requirements
|
||||
pip install requests schedule
|
||||
|
||||
# Database-specific packages
|
||||
pip install influxdb # For InfluxDB
|
||||
pip install sqlalchemy pymysql # For MySQL
|
||||
pip install sqlalchemy psycopg2-binary # For PostgreSQL
|
||||
# VictoriaMetrics uses HTTP API (no extra packages needed)
|
||||
```
|
||||
|
||||
### Updated requirements.txt
|
||||
```txt
|
||||
requests>=2.28.0
|
||||
schedule>=1.2.0
|
||||
pandas>=1.5.0
|
||||
|
||||
# Database adapters (install as needed)
|
||||
influxdb>=5.3.1
|
||||
sqlalchemy>=1.4.0
|
||||
pymysql>=1.0.2
|
||||
psycopg2-binary>=2.9.0
|
||||
```
|
||||
|
||||
## 6. Production Deployment Examples
|
||||
|
||||
### Using InfluxDB (Recommended)
|
||||
```bash
|
||||
# Set environment variables
|
||||
export DB_TYPE=influxdb
|
||||
export INFLUX_HOST=your-influx-server.com
|
||||
export INFLUX_PORT=8086
|
||||
export INFLUX_DATABASE=water_monitoring
|
||||
export INFLUX_USERNAME=water_user
|
||||
export INFLUX_PASSWORD=your_secure_password
|
||||
|
||||
# Run the scraper
|
||||
python water_scraper_v3.py
|
||||
```
|
||||
|
||||
### Using PostgreSQL with TimescaleDB
|
||||
```bash
|
||||
# Set environment variables
|
||||
export DB_TYPE=postgresql
|
||||
export POSTGRES_CONNECTION_STRING=postgresql://water_user:password@your-postgres-server.com:5432/water_monitoring
|
||||
|
||||
# Run the scraper
|
||||
python water_scraper_v3.py
|
||||
```
|
||||
|
||||
### Using VictoriaMetrics
|
||||
```bash
|
||||
# Set environment variables
|
||||
export DB_TYPE=victoriametrics
|
||||
export VM_HOST=your-vm-server.com
|
||||
export VM_PORT=8428
|
||||
|
||||
# Run the scraper
|
||||
python water_scraper_v3.py
|
||||
```
|
||||
|
||||
## 7. Monitoring and Alerting
|
||||
|
||||
### Grafana Dashboard Setup
|
||||
1. **Add Data Source**: Configure your database as a Grafana data source
|
||||
2. **Import Dashboard**: Use pre-built water monitoring dashboards
|
||||
3. **Set Alerts**: Configure alerts for abnormal water levels or discharge rates
|
||||
|
||||
### Example Grafana Queries
|
||||
|
||||
#### InfluxDB Queries
|
||||
```sql
|
||||
-- Current water levels
|
||||
SELECT last("water_level") FROM "water_data" GROUP BY "station_code"
|
||||
|
||||
-- Discharge trends (last 24h)
|
||||
SELECT mean("discharge") FROM "water_data" WHERE time >= now() - 24h GROUP BY time(1h), "station_code"
|
||||
```
|
||||
|
||||
#### PostgreSQL/TimescaleDB Queries
|
||||
```sql
|
||||
-- Current water levels
|
||||
SELECT DISTINCT ON (station_id)
|
||||
station_id, water_level, discharge, timestamp
|
||||
FROM water_measurements
|
||||
ORDER BY station_id, timestamp DESC;
|
||||
|
||||
-- Hourly averages (last 24h)
|
||||
SELECT
|
||||
time_bucket('1 hour', timestamp) as hour,
|
||||
station_id,
|
||||
AVG(water_level) as avg_level,
|
||||
AVG(discharge) as avg_discharge
|
||||
FROM water_measurements
|
||||
WHERE timestamp >= NOW() - INTERVAL '24 hours'
|
||||
GROUP BY hour, station_id
|
||||
ORDER BY hour DESC;
|
||||
```
|
||||
|
||||
## 8. Performance Optimization Tips
|
||||
|
||||
### For All Databases
|
||||
- **Batch inserts**: Insert multiple measurements at once
|
||||
- **Connection pooling**: Reuse database connections
|
||||
- **Indexing**: Ensure proper indexes on timestamp and station_id
|
||||
- **Retention policies**: Automatically delete old data
|
||||
|
||||
### InfluxDB Specific
|
||||
- Use **tags** for metadata (station codes, names)
|
||||
- Use **fields** for numeric values (water levels, discharge)
|
||||
- Configure **retention policies** and **continuous queries**
|
||||
- Enable **compression** for long-term storage
|
||||
|
||||
### PostgreSQL/TimescaleDB Specific
|
||||
- Use **hypertables** for automatic partitioning
|
||||
- Create **continuous aggregates** for common queries
|
||||
- Configure **compression** for older chunks
|
||||
- Use **parallel queries** for large datasets
|
||||
|
||||
### VictoriaMetrics Specific
|
||||
- Use **labels** efficiently (similar to Prometheus)
|
||||
- Configure **retention periods** appropriately
|
||||
- Use **downsampling** for long-term storage
|
||||
- Enable **deduplication** if needed
|
||||
|
||||
This deployment guide provides production-ready configurations for all supported database backends. Choose the one that best fits your infrastructure and requirements.
|
Reference in New Issue
Block a user