# Database Deployment Guide for Thailand Water Monitor This guide covers deployment options for storing water monitoring data in production environments. ## 🏆 Recommendation Summary | Database | Best For | Performance | Complexity | Cost | |----------|----------|-------------|------------|------| | **InfluxDB** | Time-series data, dashboards | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | | **VictoriaMetrics** | High-performance metrics | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | | **PostgreSQL** | Complex queries, reliability | ⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐ | | **MySQL** | Familiar, existing infrastructure | ⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐ | ## 1. InfluxDB Deployment (Recommended for Time-Series) ### Why InfluxDB? - **Purpose-built** for time-series data - **Excellent compression** (10:1 typical ratio) - **Built-in retention policies** and downsampling - **Great Grafana integration** for dashboards - **High write throughput** (100k+ points/second) ### Docker Deployment ```yaml # docker-compose.yml version: '3.8' services: influxdb: image: influxdb:1.8 container_name: water_influxdb ports: - "8086:8086" volumes: - influxdb_data:/var/lib/influxdb - ./influxdb.conf:/etc/influxdb/influxdb.conf:ro environment: - INFLUXDB_DB=water_monitoring - INFLUXDB_ADMIN_USER=admin - INFLUXDB_ADMIN_PASSWORD=your_secure_password - INFLUXDB_USER=water_user - INFLUXDB_USER_PASSWORD=water_password restart: unless-stopped grafana: image: grafana/grafana:latest container_name: water_grafana ports: - "3000:3000" volumes: - grafana_data:/var/lib/grafana environment: - GF_SECURITY_ADMIN_PASSWORD=admin_password restart: unless-stopped volumes: influxdb_data: grafana_data: ``` ### Environment Variables ```bash # .env file DB_TYPE=influxdb INFLUX_HOST=localhost INFLUX_PORT=8086 INFLUX_DATABASE=water_monitoring INFLUX_USERNAME=water_user INFLUX_PASSWORD=water_password ``` ### InfluxDB Configuration ```toml # influxdb.conf [meta] dir = "/var/lib/influxdb/meta" [data] dir = "/var/lib/influxdb/data" wal-dir = "/var/lib/influxdb/wal" # Optimize for time-series data cache-max-memory-size = "1g" cache-snapshot-memory-size = "25m" cache-snapshot-write-cold-duration = "10m" # Retention and compression compact-full-write-cold-duration = "4h" max-series-per-database = 1000000 max-values-per-tag = 100000 [coordinator] write-timeout = "10s" max-concurrent-queries = 0 query-timeout = "0s" [retention] enabled = true check-interval = "30m" [http] enabled = true bind-address = ":8086" auth-enabled = true max-body-size = "25000000" max-concurrent-requests = 0 max-enqueued-requests = 0 ``` ### Production Setup Commands ```bash # Start services docker-compose up -d # Create retention policies docker exec -it water_influxdb influx -username admin -password your_secure_password -execute " CREATE RETENTION POLICY \"raw_data\" ON \"water_monitoring\" DURATION 90d REPLICATION 1 DEFAULT; CREATE RETENTION POLICY \"downsampled\" ON \"water_monitoring\" DURATION 730d REPLICATION 1; " # Create continuous queries for downsampling docker exec -it water_influxdb influx -username admin -password your_secure_password -execute " CREATE CONTINUOUS QUERY \"downsample_hourly\" ON \"water_monitoring\" BEGIN SELECT mean(water_level) AS water_level, mean(discharge) AS discharge, mean(discharge_percent) AS discharge_percent INTO \"downsampled\".\"water_data_hourly\" FROM \"water_data\" GROUP BY time(1h), station_code, station_name_en, station_name_th END " ``` ## 2. VictoriaMetrics Deployment (High Performance) ### Why VictoriaMetrics? - **Extremely fast** and resource-efficient - **Better compression** than InfluxDB - **Prometheus-compatible** API - **Lower memory usage** - **Built-in clustering** ### Docker Deployment ```yaml # docker-compose.yml version: '3.8' services: victoriametrics: image: victoriametrics/victoria-metrics:latest container_name: water_victoriametrics ports: - "8428:8428" volumes: - vm_data:/victoria-metrics-data command: - '--storageDataPath=/victoria-metrics-data' - '--retentionPeriod=2y' - '--httpListenAddr=:8428' - '--maxConcurrentInserts=16' restart: unless-stopped volumes: vm_data: ``` ### Environment Variables ```bash # .env file DB_TYPE=victoriametrics VM_HOST=localhost VM_PORT=8428 ``` ## 3. PostgreSQL Deployment (Relational + Time-Series) ### Why PostgreSQL? - **Mature and reliable** - **Excellent for complex queries** - **TimescaleDB extension** for time-series optimization - **Strong consistency guarantees** - **Rich ecosystem** ### Docker Deployment with TimescaleDB ```yaml # docker-compose.yml version: '3.8' services: postgres: image: timescale/timescaledb:latest-pg14 container_name: water_postgres ports: - "5432:5432" volumes: - postgres_data:/var/lib/postgresql/data - ./init.sql:/docker-entrypoint-initdb.d/init.sql environment: - POSTGRES_DB=water_monitoring - POSTGRES_USER=water_user - POSTGRES_PASSWORD=secure_password restart: unless-stopped volumes: postgres_data: ``` ### Database Initialization ```sql -- init.sql CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE; -- Create hypertable for time-series optimization CREATE TABLE water_measurements ( id BIGSERIAL PRIMARY KEY, timestamp TIMESTAMPTZ NOT NULL, station_id INT NOT NULL, water_level NUMERIC(10,3), discharge NUMERIC(10,2), discharge_percent NUMERIC(5,2), status VARCHAR(20) DEFAULT 'active', created_at TIMESTAMPTZ DEFAULT NOW() ); -- Convert to hypertable (TimescaleDB) SELECT create_hypertable('water_measurements', 'timestamp', chunk_time_interval => INTERVAL '1 day'); -- Create indexes CREATE INDEX idx_water_measurements_station_time ON water_measurements (station_id, timestamp DESC); CREATE INDEX idx_water_measurements_timestamp ON water_measurements (timestamp DESC); -- Create retention policy (keep raw data for 2 years) SELECT add_retention_policy('water_measurements', INTERVAL '2 years'); -- Create continuous aggregates for performance CREATE MATERIALIZED VIEW water_measurements_hourly WITH (timescaledb.continuous) AS SELECT time_bucket('1 hour', timestamp) AS bucket, station_id, AVG(water_level) as avg_water_level, MAX(water_level) as max_water_level, MIN(water_level) as min_water_level, AVG(discharge) as avg_discharge, MAX(discharge) as max_discharge, MIN(discharge) as min_discharge, AVG(discharge_percent) as avg_discharge_percent FROM water_measurements GROUP BY bucket, station_id; -- Refresh policy for continuous aggregates SELECT add_continuous_aggregate_policy('water_measurements_hourly', start_offset => INTERVAL '1 day', end_offset => INTERVAL '1 hour', schedule_interval => INTERVAL '1 hour'); ``` ### Environment Variables ```bash # .env file DB_TYPE=postgresql POSTGRES_CONNECTION_STRING=postgresql://water_user:secure_password@localhost:5432/water_monitoring ``` ## 4. MySQL Deployment (Traditional Relational) ### Docker Deployment ```yaml # docker-compose.yml version: '3.8' services: mysql: image: mysql:8.0 container_name: water_mysql ports: - "3306:3306" volumes: - mysql_data:/var/lib/mysql - ./mysql.cnf:/etc/mysql/conf.d/mysql.cnf - ./init.sql:/docker-entrypoint-initdb.d/init.sql environment: - MYSQL_ROOT_PASSWORD=root_password - MYSQL_DATABASE=water_monitoring - MYSQL_USER=water_user - MYSQL_PASSWORD=water_password restart: unless-stopped volumes: mysql_data: ``` ### MySQL Configuration ```ini # mysql.cnf [mysqld] # Optimize for time-series data innodb_buffer_pool_size = 1G innodb_log_file_size = 256M innodb_flush_log_at_trx_commit = 2 innodb_flush_method = O_DIRECT # Partitioning support partition = ON # Query cache query_cache_type = 1 query_cache_size = 128M # Connection settings max_connections = 200 connect_timeout = 10 wait_timeout = 600 ``` ### Environment Variables ```bash # .env file DB_TYPE=mysql MYSQL_CONNECTION_STRING=mysql://water_user:water_password@localhost:3306/water_monitoring ``` ## 5. Installation and Dependencies ### Required Python Packages ```bash # Base requirements pip install requests schedule # Database-specific packages pip install influxdb # For InfluxDB pip install sqlalchemy pymysql # For MySQL pip install sqlalchemy psycopg2-binary # For PostgreSQL # VictoriaMetrics uses HTTP API (no extra packages needed) ``` ### Updated requirements.txt ```txt requests>=2.28.0 schedule>=1.2.0 pandas>=1.5.0 # Database adapters (install as needed) influxdb>=5.3.1 sqlalchemy>=1.4.0 pymysql>=1.0.2 psycopg2-binary>=2.9.0 ``` ## 6. Production Deployment Examples ### Using InfluxDB (Recommended) ```bash # Set environment variables export DB_TYPE=influxdb export INFLUX_HOST=your-influx-server.com export INFLUX_PORT=8086 export INFLUX_DATABASE=water_monitoring export INFLUX_USERNAME=water_user export INFLUX_PASSWORD=your_secure_password # Run the scraper python water_scraper_v3.py ``` ### Using PostgreSQL with TimescaleDB ```bash # Set environment variables export DB_TYPE=postgresql export POSTGRES_CONNECTION_STRING=postgresql://water_user:password@your-postgres-server.com:5432/water_monitoring # Run the scraper python water_scraper_v3.py ``` ### Using VictoriaMetrics ```bash # Set environment variables export DB_TYPE=victoriametrics export VM_HOST=your-vm-server.com export VM_PORT=8428 # Run the scraper python water_scraper_v3.py ``` ## 7. Monitoring and Alerting ### Grafana Dashboard Setup 1. **Add Data Source**: Configure your database as a Grafana data source 2. **Import Dashboard**: Use pre-built water monitoring dashboards 3. **Set Alerts**: Configure alerts for abnormal water levels or discharge rates ### Example Grafana Queries #### InfluxDB Queries ```sql -- Current water levels SELECT last("water_level") FROM "water_data" GROUP BY "station_code" -- Discharge trends (last 24h) SELECT mean("discharge") FROM "water_data" WHERE time >= now() - 24h GROUP BY time(1h), "station_code" ``` #### PostgreSQL/TimescaleDB Queries ```sql -- Current water levels SELECT DISTINCT ON (station_id) station_id, water_level, discharge, timestamp FROM water_measurements ORDER BY station_id, timestamp DESC; -- Hourly averages (last 24h) SELECT time_bucket('1 hour', timestamp) as hour, station_id, AVG(water_level) as avg_level, AVG(discharge) as avg_discharge FROM water_measurements WHERE timestamp >= NOW() - INTERVAL '24 hours' GROUP BY hour, station_id ORDER BY hour DESC; ``` ## 8. Performance Optimization Tips ### For All Databases - **Batch inserts**: Insert multiple measurements at once - **Connection pooling**: Reuse database connections - **Indexing**: Ensure proper indexes on timestamp and station_id - **Retention policies**: Automatically delete old data ### InfluxDB Specific - Use **tags** for metadata (station codes, names) - Use **fields** for numeric values (water levels, discharge) - Configure **retention policies** and **continuous queries** - Enable **compression** for long-term storage ### PostgreSQL/TimescaleDB Specific - Use **hypertables** for automatic partitioning - Create **continuous aggregates** for common queries - Configure **compression** for older chunks - Use **parallel queries** for large datasets ### VictoriaMetrics Specific - Use **labels** efficiently (similar to Prometheus) - Configure **retention periods** appropriately - Use **downsampling** for long-term storage - Enable **deduplication** if needed This deployment guide provides production-ready configurations for all supported database backends. Choose the one that best fits your infrastructure and requirements.