Initial commit: Northern Thailand Ping River Monitor v3.1.0
Some checks failed
Security & Dependency Updates / Dependency Security Scan (push) Successful in 29s
Security & Dependency Updates / Docker Security Scan (push) Failing after 53s
Security & Dependency Updates / License Compliance (push) Successful in 13s
Security & Dependency Updates / Check for Dependency Updates (push) Successful in 19s
Security & Dependency Updates / Code Quality Metrics (push) Successful in 11s
Security & Dependency Updates / Security Summary (push) Successful in 7s

Features:
- Real-time water level monitoring for Ping River Basin (16 stations)
- Coverage from Chiang Dao to Nakhon Sawan in Northern Thailand
- FastAPI web interface with interactive dashboard and station management
- Multi-database support (SQLite, MySQL, PostgreSQL, InfluxDB, VictoriaMetrics)
- Comprehensive monitoring with health checks and metrics collection
- Docker deployment with Grafana integration
- Production-ready architecture with enterprise-grade observability

 CI/CD & Automation:
- Complete Gitea Actions workflows for CI/CD, security, and releases
- Multi-Python version testing (3.9-3.12)
- Multi-architecture Docker builds (amd64, arm64)
- Daily security scanning and dependency monitoring
- Automated documentation generation
- Performance testing and validation

 Production Ready:
- Type safety with Pydantic models and comprehensive type hints
- Data validation layer with range checking and error handling
- Rate limiting and request tracking for API protection
- Enhanced logging with rotation, colors, and performance metrics
- Station management API for dynamic CRUD operations
- Comprehensive documentation and deployment guides

 Technical Stack:
- Python 3.9+ with FastAPI and Pydantic
- Multi-database architecture with adapter pattern
- Docker containerization with multi-stage builds
- Grafana dashboards for visualization
- Gitea Actions for CI/CD automation
- Enterprise monitoring and alerting

 Ready for deployment to B4L infrastructure!
This commit is contained in:
2025-08-12 15:37:09 +07:00
commit af62cfef0b
60 changed files with 13267 additions and 0 deletions

View File

@@ -0,0 +1,447 @@
# Database Deployment Guide for Thailand Water Monitor
This guide covers deployment options for storing water monitoring data in production environments.
## 🏆 Recommendation Summary
| Database | Best For | Performance | Complexity | Cost |
|----------|----------|-------------|------------|------|
| **InfluxDB** | Time-series data, dashboards | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ |
| **VictoriaMetrics** | High-performance metrics | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| **PostgreSQL** | Complex queries, reliability | ⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐ |
| **MySQL** | Familiar, existing infrastructure | ⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐ |
## 1. InfluxDB Deployment (Recommended for Time-Series)
### Why InfluxDB?
- **Purpose-built** for time-series data
- **Excellent compression** (10:1 typical ratio)
- **Built-in retention policies** and downsampling
- **Great Grafana integration** for dashboards
- **High write throughput** (100k+ points/second)
### Docker Deployment
```yaml
# docker-compose.yml
version: '3.8'
services:
influxdb:
image: influxdb:1.8
container_name: water_influxdb
ports:
- "8086:8086"
volumes:
- influxdb_data:/var/lib/influxdb
- ./influxdb.conf:/etc/influxdb/influxdb.conf:ro
environment:
- INFLUXDB_DB=water_monitoring
- INFLUXDB_ADMIN_USER=admin
- INFLUXDB_ADMIN_PASSWORD=your_secure_password
- INFLUXDB_USER=water_user
- INFLUXDB_USER_PASSWORD=water_password
restart: unless-stopped
grafana:
image: grafana/grafana:latest
container_name: water_grafana
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin_password
restart: unless-stopped
volumes:
influxdb_data:
grafana_data:
```
### Environment Variables
```bash
# .env file
DB_TYPE=influxdb
INFLUX_HOST=localhost
INFLUX_PORT=8086
INFLUX_DATABASE=water_monitoring
INFLUX_USERNAME=water_user
INFLUX_PASSWORD=water_password
```
### InfluxDB Configuration
```toml
# influxdb.conf
[meta]
dir = "/var/lib/influxdb/meta"
[data]
dir = "/var/lib/influxdb/data"
wal-dir = "/var/lib/influxdb/wal"
# Optimize for time-series data
cache-max-memory-size = "1g"
cache-snapshot-memory-size = "25m"
cache-snapshot-write-cold-duration = "10m"
# Retention and compression
compact-full-write-cold-duration = "4h"
max-series-per-database = 1000000
max-values-per-tag = 100000
[coordinator]
write-timeout = "10s"
max-concurrent-queries = 0
query-timeout = "0s"
[retention]
enabled = true
check-interval = "30m"
[http]
enabled = true
bind-address = ":8086"
auth-enabled = true
max-body-size = "25000000"
max-concurrent-requests = 0
max-enqueued-requests = 0
```
### Production Setup Commands
```bash
# Start services
docker-compose up -d
# Create retention policies
docker exec -it water_influxdb influx -username admin -password your_secure_password -execute "
CREATE RETENTION POLICY \"raw_data\" ON \"water_monitoring\" DURATION 90d REPLICATION 1 DEFAULT;
CREATE RETENTION POLICY \"downsampled\" ON \"water_monitoring\" DURATION 730d REPLICATION 1;
"
# Create continuous queries for downsampling
docker exec -it water_influxdb influx -username admin -password your_secure_password -execute "
CREATE CONTINUOUS QUERY \"downsample_hourly\" ON \"water_monitoring\"
BEGIN
SELECT mean(water_level) AS water_level, mean(discharge) AS discharge, mean(discharge_percent) AS discharge_percent
INTO \"downsampled\".\"water_data_hourly\"
FROM \"water_data\"
GROUP BY time(1h), station_code, station_name_en, station_name_th
END
"
```
## 2. VictoriaMetrics Deployment (High Performance)
### Why VictoriaMetrics?
- **Extremely fast** and resource-efficient
- **Better compression** than InfluxDB
- **Prometheus-compatible** API
- **Lower memory usage**
- **Built-in clustering**
### Docker Deployment
```yaml
# docker-compose.yml
version: '3.8'
services:
victoriametrics:
image: victoriametrics/victoria-metrics:latest
container_name: water_victoriametrics
ports:
- "8428:8428"
volumes:
- vm_data:/victoria-metrics-data
command:
- '--storageDataPath=/victoria-metrics-data'
- '--retentionPeriod=2y'
- '--httpListenAddr=:8428'
- '--maxConcurrentInserts=16'
restart: unless-stopped
volumes:
vm_data:
```
### Environment Variables
```bash
# .env file
DB_TYPE=victoriametrics
VM_HOST=localhost
VM_PORT=8428
```
## 3. PostgreSQL Deployment (Relational + Time-Series)
### Why PostgreSQL?
- **Mature and reliable**
- **Excellent for complex queries**
- **TimescaleDB extension** for time-series optimization
- **Strong consistency guarantees**
- **Rich ecosystem**
### Docker Deployment with TimescaleDB
```yaml
# docker-compose.yml
version: '3.8'
services:
postgres:
image: timescale/timescaledb:latest-pg14
container_name: water_postgres
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
environment:
- POSTGRES_DB=water_monitoring
- POSTGRES_USER=water_user
- POSTGRES_PASSWORD=secure_password
restart: unless-stopped
volumes:
postgres_data:
```
### Database Initialization
```sql
-- init.sql
CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
-- Create hypertable for time-series optimization
CREATE TABLE water_measurements (
id BIGSERIAL PRIMARY KEY,
timestamp TIMESTAMPTZ NOT NULL,
station_id INT NOT NULL,
water_level NUMERIC(10,3),
discharge NUMERIC(10,2),
discharge_percent NUMERIC(5,2),
status VARCHAR(20) DEFAULT 'active',
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Convert to hypertable (TimescaleDB)
SELECT create_hypertable('water_measurements', 'timestamp', chunk_time_interval => INTERVAL '1 day');
-- Create indexes
CREATE INDEX idx_water_measurements_station_time ON water_measurements (station_id, timestamp DESC);
CREATE INDEX idx_water_measurements_timestamp ON water_measurements (timestamp DESC);
-- Create retention policy (keep raw data for 2 years)
SELECT add_retention_policy('water_measurements', INTERVAL '2 years');
-- Create continuous aggregates for performance
CREATE MATERIALIZED VIEW water_measurements_hourly
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 hour', timestamp) AS bucket,
station_id,
AVG(water_level) as avg_water_level,
MAX(water_level) as max_water_level,
MIN(water_level) as min_water_level,
AVG(discharge) as avg_discharge,
MAX(discharge) as max_discharge,
MIN(discharge) as min_discharge,
AVG(discharge_percent) as avg_discharge_percent
FROM water_measurements
GROUP BY bucket, station_id;
-- Refresh policy for continuous aggregates
SELECT add_continuous_aggregate_policy('water_measurements_hourly',
start_offset => INTERVAL '1 day',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '1 hour');
```
### Environment Variables
```bash
# .env file
DB_TYPE=postgresql
POSTGRES_CONNECTION_STRING=postgresql://water_user:secure_password@localhost:5432/water_monitoring
```
## 4. MySQL Deployment (Traditional Relational)
### Docker Deployment
```yaml
# docker-compose.yml
version: '3.8'
services:
mysql:
image: mysql:8.0
container_name: water_mysql
ports:
- "3306:3306"
volumes:
- mysql_data:/var/lib/mysql
- ./mysql.cnf:/etc/mysql/conf.d/mysql.cnf
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
environment:
- MYSQL_ROOT_PASSWORD=root_password
- MYSQL_DATABASE=water_monitoring
- MYSQL_USER=water_user
- MYSQL_PASSWORD=water_password
restart: unless-stopped
volumes:
mysql_data:
```
### MySQL Configuration
```ini
# mysql.cnf
[mysqld]
# Optimize for time-series data
innodb_buffer_pool_size = 1G
innodb_log_file_size = 256M
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
# Partitioning support
partition = ON
# Query cache
query_cache_type = 1
query_cache_size = 128M
# Connection settings
max_connections = 200
connect_timeout = 10
wait_timeout = 600
```
### Environment Variables
```bash
# .env file
DB_TYPE=mysql
MYSQL_CONNECTION_STRING=mysql://water_user:water_password@localhost:3306/water_monitoring
```
## 5. Installation and Dependencies
### Required Python Packages
```bash
# Base requirements
pip install requests schedule
# Database-specific packages
pip install influxdb # For InfluxDB
pip install sqlalchemy pymysql # For MySQL
pip install sqlalchemy psycopg2-binary # For PostgreSQL
# VictoriaMetrics uses HTTP API (no extra packages needed)
```
### Updated requirements.txt
```txt
requests>=2.28.0
schedule>=1.2.0
pandas>=1.5.0
# Database adapters (install as needed)
influxdb>=5.3.1
sqlalchemy>=1.4.0
pymysql>=1.0.2
psycopg2-binary>=2.9.0
```
## 6. Production Deployment Examples
### Using InfluxDB (Recommended)
```bash
# Set environment variables
export DB_TYPE=influxdb
export INFLUX_HOST=your-influx-server.com
export INFLUX_PORT=8086
export INFLUX_DATABASE=water_monitoring
export INFLUX_USERNAME=water_user
export INFLUX_PASSWORD=your_secure_password
# Run the scraper
python water_scraper_v3.py
```
### Using PostgreSQL with TimescaleDB
```bash
# Set environment variables
export DB_TYPE=postgresql
export POSTGRES_CONNECTION_STRING=postgresql://water_user:password@your-postgres-server.com:5432/water_monitoring
# Run the scraper
python water_scraper_v3.py
```
### Using VictoriaMetrics
```bash
# Set environment variables
export DB_TYPE=victoriametrics
export VM_HOST=your-vm-server.com
export VM_PORT=8428
# Run the scraper
python water_scraper_v3.py
```
## 7. Monitoring and Alerting
### Grafana Dashboard Setup
1. **Add Data Source**: Configure your database as a Grafana data source
2. **Import Dashboard**: Use pre-built water monitoring dashboards
3. **Set Alerts**: Configure alerts for abnormal water levels or discharge rates
### Example Grafana Queries
#### InfluxDB Queries
```sql
-- Current water levels
SELECT last("water_level") FROM "water_data" GROUP BY "station_code"
-- Discharge trends (last 24h)
SELECT mean("discharge") FROM "water_data" WHERE time >= now() - 24h GROUP BY time(1h), "station_code"
```
#### PostgreSQL/TimescaleDB Queries
```sql
-- Current water levels
SELECT DISTINCT ON (station_id)
station_id, water_level, discharge, timestamp
FROM water_measurements
ORDER BY station_id, timestamp DESC;
-- Hourly averages (last 24h)
SELECT
time_bucket('1 hour', timestamp) as hour,
station_id,
AVG(water_level) as avg_level,
AVG(discharge) as avg_discharge
FROM water_measurements
WHERE timestamp >= NOW() - INTERVAL '24 hours'
GROUP BY hour, station_id
ORDER BY hour DESC;
```
## 8. Performance Optimization Tips
### For All Databases
- **Batch inserts**: Insert multiple measurements at once
- **Connection pooling**: Reuse database connections
- **Indexing**: Ensure proper indexes on timestamp and station_id
- **Retention policies**: Automatically delete old data
### InfluxDB Specific
- Use **tags** for metadata (station codes, names)
- Use **fields** for numeric values (water levels, discharge)
- Configure **retention policies** and **continuous queries**
- Enable **compression** for long-term storage
### PostgreSQL/TimescaleDB Specific
- Use **hypertables** for automatic partitioning
- Create **continuous aggregates** for common queries
- Configure **compression** for older chunks
- Use **parallel queries** for large datasets
### VictoriaMetrics Specific
- Use **labels** efficiently (similar to Prometheus)
- Configure **retention periods** appropriately
- Use **downsampling** for long-term storage
- Enable **deduplication** if needed
This deployment guide provides production-ready configurations for all supported database backends. Choose the one that best fits your infrastructure and requirements.