# Enhanced Scheduler Guide This guide explains the new 15-minute scheduling system that runs continuously throughout each hour to ensure comprehensive data coverage. ## ✅ **New Scheduling Behavior** ### **15-Minute Schedule Pattern** - **Timing**: Runs every 15 minutes: 1:00, 1:15, 1:30, 1:45, 2:00, 2:15, 2:30, 2:45, etc. - **Hourly Full Checks**: At :00 minutes (includes gap filling and data updates) - **Quarter-Hour Quick Checks**: At :15, :30, :45 minutes (data fetch only) - **Continuous Coverage**: Ensures no data is missed throughout each hour ### **Operation Types** - **Full Operations** (at :00): Data fetching + gap filling + data updates - **Quick Operations** (at :15, :30, :45): Data fetching only for performance ## 🔧 **Technical Implementation** ### **Scheduler States** ```python # State tracking variables self.last_successful_update = None # Timestamp of last successful data update self.retry_mode = False # Whether in quick check mode (skip gap filling) self.next_hourly_check = None # Next scheduled hourly check ``` ### **Quarter-Hour Check Process** ```python def quarter_hour_check(self): """15-minute check for new data""" current_time = datetime.datetime.now() minute = current_time.minute # Determine if this is a full hourly check (at :00) or a quarter-hour check if minute == 0: logging.info("=== HOURLY CHECK (00:00) ===") self.retry_mode = False # Full check with gap filling and updates else: logging.info(f"=== 15-MINUTE CHECK ({minute:02d}:00) ===") self.retry_mode = True # Skip gap filling and updates on 15-min checks new_data_found = self.run_scraping_cycle() if new_data_found: self.last_successful_update = datetime.datetime.now() if minute == 0: logging.info("New data found during hourly check") else: logging.info(f"New data found during 15-minute check at :{minute:02d}") else: if minute == 0: logging.info("No new data found during hourly check") else: logging.info(f"No new data found during 15-minute check at :{minute:02d}") ``` ### **Scheduler Setup** ```python def start_scheduler(self): """Start enhanced scheduler with 15-minute checks""" # Schedule checks every 15 minutes (at :00, :15, :30, :45) schedule.every().hour.at(":00").do(self.quarter_hour_check) schedule.every().hour.at(":15").do(self.quarter_hour_check) schedule.every().hour.at(":30").do(self.quarter_hour_check) schedule.every().hour.at(":45").do(self.quarter_hour_check) while True: schedule.run_pending() time.sleep(30) # Check every 30 seconds ``` ## 📊 **New Data Detection Logic** ### **Smart Detection Algorithm** ```python def has_new_data(self) -> bool: """Check if there is new data available since last successful update""" # Get most recent timestamp from database latest_data = self.get_latest_data(limit=1) # Check if we should have newer data by now now = datetime.datetime.now() expected_latest = now.replace(minute=0, second=0, microsecond=0) # If current time is past 5 minutes after the hour, we should have data if now.minute >= 5: if latest_timestamp < expected_latest: return True # New data expected # Check if we have data for the previous hour previous_hour = expected_latest - datetime.timedelta(hours=1) if latest_timestamp < previous_hour: return True # Missing recent data return False # Data is up to date ``` ### **Actual Data Verification** ```python # Compare timestamps before and after scraping initial_timestamp = get_latest_timestamp_before_scraping() # ... perform scraping ... latest_timestamp = get_latest_timestamp_after_scraping() if initial_timestamp is None or latest_timestamp > initial_timestamp: new_data_found = True self.last_successful_update = datetime.datetime.now() ``` ## 🚀 **Operational Modes** ### **Mode 1: Full Hourly Operation (at :00)** - **Schedule**: Every hour at :00 minutes (1:00, 2:00, 3:00, etc.) - **Operations**: - ✅ Fetch current data - ✅ Fill data gaps (last 7 days) - ✅ Update existing data (last 2 days) - **Purpose**: Comprehensive data collection and maintenance ### **Mode 2: Quick 15-Minute Checks (at :15, :30, :45)** - **Schedule**: Every 15 minutes at quarter-hour marks - **Operations**: - ✅ Fetch current data only - ❌ Skip gap filling (performance optimization) - ❌ Skip data updates (performance optimization) - **Purpose**: Ensure no new data is missed between hourly checks ## 📋 **Logging Output Examples** ### **Successful Hourly Check (at :00)** ``` 2025-07-26 01:00:00,123 - INFO - === HOURLY CHECK (00:00) === 2025-07-26 01:00:00,124 - INFO - Starting scraping cycle... 2025-07-26 01:00:01,456 - INFO - Successfully fetched 384 data points from API 2025-07-26 01:00:02,789 - INFO - New data found: 2025-07-26 01:00:00 2025-07-26 01:00:03,012 - INFO - Filled 5 data gaps 2025-07-26 01:00:04,234 - INFO - Updated 2 existing measurements 2025-07-26 01:00:04,235 - INFO - New data found during hourly check ``` ### **15-Minute Quick Check (at :15, :30, :45)** ``` 2025-07-26 01:15:00,123 - INFO - === 15-MINUTE CHECK (15:00) === 2025-07-26 01:15:00,124 - INFO - Starting scraping cycle... 2025-07-26 01:15:01,456 - INFO - Successfully fetched 299 data points from API 2025-07-26 01:15:02,789 - INFO - New data found: 2025-07-26 01:00:00 2025-07-26 01:15:02,790 - INFO - New data found during 15-minute check at :15 ``` ### **Continuous 15-Minute Pattern** ``` 2025-07-26 01:00:00,123 - INFO - === HOURLY CHECK (00:00) === 2025-07-26 01:00:04,235 - INFO - New data found during hourly check 2025-07-26 01:15:00,123 - INFO - === 15-MINUTE CHECK (15:00) === 2025-07-26 01:15:02,790 - INFO - No new data found during 15-minute check at :15 2025-07-26 01:30:00,123 - INFO - === 15-MINUTE CHECK (30:00) === 2025-07-26 01:30:02,790 - INFO - No new data found during 15-minute check at :30 2025-07-26 01:45:00,123 - INFO - === 15-MINUTE CHECK (45:00) === 2025-07-26 01:45:02,790 - INFO - No new data found during 15-minute check at :45 2025-07-26 02:00:00,123 - INFO - === HOURLY CHECK (00:00) === 2025-07-26 02:00:04,235 - INFO - New data found during hourly check ``` ## ⚙️ **Configuration Options** ### **Environment Variables** ```bash # Retry interval (default: 5 minutes) export RETRY_INTERVAL_MINUTES=5 # Data availability buffer (default: 5 minutes after hour) export DATA_BUFFER_MINUTES=5 # Gap filling days (default: 7 days) export GAP_FILL_DAYS=7 # Update check days (default: 2 days) export UPDATE_DAYS=2 ``` ### **Scheduler Timing** ```python # Hourly checks at top of hour schedule.every().hour.at(":00").do(self.hourly_check) # 5-minute retries (dynamically scheduled) schedule.every(5).minutes.do(self.retry_check).tag('retry') # Check every 30 seconds for responsive retry scheduling time.sleep(30) ``` ## 🔍 **Performance Optimizations** ### **Retry Mode Optimizations** - **Skip Gap Filling**: Avoids expensive historical data fetching during retries - **Skip Data Updates**: Avoids comparison operations during retries - **Focused API Calls**: Only fetches current day data during retries - **Reduced Database Queries**: Minimal database operations during retries ### **Resource Management** - **API Rate Limiting**: 1-second delays between API calls - **Database Connection Pooling**: Efficient connection reuse - **Memory Efficiency**: Selective data processing - **Error Recovery**: Automatic retry with exponential backoff ## 🛠️ **Troubleshooting** ### **Common Scenarios** #### **Stuck in Retry Mode** ``` # Check if API is returning data curl -X POST https://hyd-app-db.rid.go.th/webservice/getGroupHourlyWaterLevelReportAllHL.ashx # Check database connectivity python water_scraper_v3.py --check-gaps 1 # Manual data fetch test python water_scraper_v3.py --test ``` #### **Missing Hourly Triggers** ``` # Check system time synchronization timedatectl status # Verify scheduler is running ps aux | grep water_scraper # Check logs for scheduler activity tail -f water_monitor.log | grep "HOURLY CHECK" ``` #### **False New Data Detection** ``` # Check latest data in database sqlite3 water_monitoring.db "SELECT MAX(timestamp) FROM water_measurements;" # Verify timestamp parsing python -c " import datetime print('Current hour:', datetime.datetime.now().replace(minute=0, second=0, microsecond=0)) " ``` ## 📈 **Monitoring and Alerts** ### **Key Metrics to Monitor** - **Hourly Success Rate**: Percentage of hourly checks that find new data - **Retry Duration**: How long system stays in retry mode - **Data Freshness**: Time since last successful data update - **API Response Time**: Performance of data fetching operations ### **Alert Conditions** - **Extended Retry Mode**: System in retry mode for > 30 minutes - **No Data for 2+ Hours**: No new data found for extended period - **High Error Rate**: Multiple consecutive API failures - **Database Issues**: Connection or save failures ### **Health Check Script** ```bash #!/bin/bash # Check if system is stuck in retry mode RETRY_COUNT=$(tail -n 100 water_monitor.log | grep -c "RETRY CHECK") if [ $RETRY_COUNT -gt 6 ]; then echo "WARNING: System may be stuck in retry mode ($RETRY_COUNT retries in last 100 log entries)" fi # Check data freshness LATEST_DATA=$(sqlite3 water_monitoring.db "SELECT MAX(timestamp) FROM water_measurements;") echo "Latest data timestamp: $LATEST_DATA" ``` ## 🎯 **Best Practices** ### **Production Deployment** 1. **Monitor Logs**: Watch for retry mode patterns 2. **Set Alerts**: Configure notifications for extended retry periods 3. **Regular Maintenance**: Weekly gap filling and data validation 4. **Backup Strategy**: Regular database backups before major operations ### **Performance Tuning** 1. **Adjust Buffer Time**: Modify data availability buffer based on API patterns 2. **Optimize Retry Interval**: Balance between responsiveness and API load 3. **Database Indexing**: Ensure proper indexes for timestamp queries 4. **Connection Pooling**: Configure appropriate database connection limits This enhanced scheduler ensures reliable, efficient, and intelligent water level monitoring with automatic adaptation to data availability patterns.