Initial commit: Northern Thailand Ping River Monitor v3.1.0
Some checks failed
Security & Dependency Updates / Dependency Security Scan (push) Successful in 29s
Security & Dependency Updates / Docker Security Scan (push) Failing after 53s
Security & Dependency Updates / License Compliance (push) Successful in 13s
Security & Dependency Updates / Check for Dependency Updates (push) Successful in 19s
Security & Dependency Updates / Code Quality Metrics (push) Successful in 11s
Security & Dependency Updates / Security Summary (push) Successful in 7s

Features:
- Real-time water level monitoring for Ping River Basin (16 stations)
- Coverage from Chiang Dao to Nakhon Sawan in Northern Thailand
- FastAPI web interface with interactive dashboard and station management
- Multi-database support (SQLite, MySQL, PostgreSQL, InfluxDB, VictoriaMetrics)
- Comprehensive monitoring with health checks and metrics collection
- Docker deployment with Grafana integration
- Production-ready architecture with enterprise-grade observability

 CI/CD & Automation:
- Complete Gitea Actions workflows for CI/CD, security, and releases
- Multi-Python version testing (3.9-3.12)
- Multi-architecture Docker builds (amd64, arm64)
- Daily security scanning and dependency monitoring
- Automated documentation generation
- Performance testing and validation

 Production Ready:
- Type safety with Pydantic models and comprehensive type hints
- Data validation layer with range checking and error handling
- Rate limiting and request tracking for API protection
- Enhanced logging with rotation, colors, and performance metrics
- Station management API for dynamic CRUD operations
- Comprehensive documentation and deployment guides

 Technical Stack:
- Python 3.9+ with FastAPI and Pydantic
- Multi-database architecture with adapter pattern
- Docker containerization with multi-stage builds
- Grafana dashboards for visualization
- Gitea Actions for CI/CD automation
- Enterprise monitoring and alerting

 Ready for deployment to B4L infrastructure!
This commit is contained in:
2025-08-12 15:37:09 +07:00
commit af62cfef0b
60 changed files with 13267 additions and 0 deletions

275
docs/GAP_FILLING_GUIDE.md Normal file
View File

@@ -0,0 +1,275 @@
# Gap Filling and Data Integrity Guide
This guide explains the enhanced gap-filling functionality that addresses data gaps and missing timestamps in the Thailand Water Monitor.
## ✅ **Issues Resolved**
### **1. Data Gaps Problem**
- **Before**: Tool only fetched current day data, leaving gaps in historical records
- **After**: Automatically detects and fills missing timestamps for the last 7 days
### **2. Missing Midnight Timestamps**
- **Before**: Jump from 23:00 to 01:00 (missing 00:00 midnight data)
- **After**: Specifically checks for and fills midnight hour gaps
### **3. Changed Values**
- **Before**: No mechanism to update existing data if values changed on the server
- **After**: Compares existing data with fresh API data and updates changed values
## 🔧 **New Features**
### **Command Line Interface**
```bash
# Check for missing data gaps
python water_scraper_v3.py --check-gaps [days]
# Fill missing data gaps
python water_scraper_v3.py --fill-gaps [days]
# Update existing data with latest values
python water_scraper_v3.py --update-data [days]
# Run single test cycle
python water_scraper_v3.py --test
# Show help
python water_scraper_v3.py --help
```
### **Automatic Gap Detection**
The system now automatically:
- Generates expected hourly timestamps for the specified time range
- Compares with existing database records
- Identifies missing timestamps
- Groups missing data by date for efficient API calls
### **Intelligent Gap Filling**
- **Historical Data Fetching**: Retrieves data for specific dates to fill gaps
- **Selective Insertion**: Only inserts data for actually missing timestamps
- **API Rate Limiting**: Includes delays between API calls to be respectful
- **Error Handling**: Continues processing even if some dates fail
### **Data Update Mechanism**
- **Change Detection**: Compares water levels, discharge rates, and percentages
- **Precision Checking**: Uses appropriate thresholds (0.001m for water level, 0.1 cms for discharge)
- **Selective Updates**: Only updates records where values have actually changed
## 📊 **Test Results**
### **Before Enhancement**
```
Found 22 missing timestamps in the last 2 days:
2025-07-23: Missing hours [9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
2025-07-24: Missing hours [0, 20, 21, 22, 23]
2025-07-25: Missing hours [0, 9]
```
### **After Gap Filling**
```
Gap filling completed. Filled 96 missing data points
Remaining gaps:
2025-07-24: Missing hours [10]
2025-07-25: Missing hours [0, 10]
```
**Improvement**: Reduced from 22 missing timestamps to 3 (86% improvement)
## 🚀 **Enhanced Scraping Cycle**
The regular scraping cycle now includes three phases:
### **Phase 1: Current Data Collection**
```python
# Fetch and save current data
water_data = self.fetch_water_data()
success = self.save_to_database(water_data)
```
### **Phase 2: Gap Filling (Last 7 Days)**
```python
# Check for and fill missing data
filled_count = self.fill_data_gaps(days_back=7)
```
### **Phase 3: Data Updates (Last 2 Days)**
```python
# Update existing data with latest values
updated_count = self.update_existing_data(days_back=2)
```
## 🔧 **Technical Improvements**
### **Database Connection Handling**
- **SQLite Optimization**: Added timeout and thread safety parameters
- **Retry Logic**: Exponential backoff for database lock errors
- **Transaction Management**: Proper use of `engine.begin()` for automatic commits
### **Error Recovery**
```python
# Retry logic with exponential backoff
for attempt in range(max_retries):
try:
success = self.db_adapter.save_measurements(water_data)
if success:
return True
except Exception as e:
if "database is locked" in str(e).lower():
time.sleep(2 ** attempt) # 1s, 2s, 4s delays
continue
```
### **Memory Efficiency**
- **Selective Data Processing**: Only processes data for missing timestamps
- **Batch Processing**: Groups operations by date to minimize API calls
- **Resource Management**: Proper cleanup and connection handling
## 📋 **Usage Examples**
### **Daily Maintenance**
```bash
# Check for gaps in the last week
python water_scraper_v3.py --check-gaps 7
# Fill any found gaps
python water_scraper_v3.py --fill-gaps 7
# Update recent data for accuracy
python water_scraper_v3.py --update-data 2
```
### **Historical Data Recovery**
```bash
# Check for gaps in the last month
python water_scraper_v3.py --check-gaps 30
# Fill gaps for the last month (be patient, this takes time)
python water_scraper_v3.py --fill-gaps 30
```
### **Production Monitoring**
```bash
# Quick test to ensure system is working
python water_scraper_v3.py --test
# Check for recent gaps
python water_scraper_v3.py --check-gaps 1
```
## 🔍 **Monitoring and Alerts**
### **Gap Detection Output**
```
Found 22 missing timestamps:
2025-07-23: Missing hours [9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
2025-07-24: Missing hours [0, 20, 21, 22, 23]
2025-07-25: Missing hours [0, 9]
```
### **Gap Filling Progress**
```
Fetching data for 2025-07-24 to fill 5 missing timestamps
Successfully fetched 368 data points from API for 2025-07-24
Filled 80 data points for 2025-07-24
Gap filling completed. Filled 96 missing data points
```
### **Update Detection**
```
Checking for updates on 2025-07-24
Update needed for P.1 at 2025-07-24 15:00:00
Updated 5 measurements for 2025-07-24
Data update completed. Updated 5 measurements
```
## ⚙️ **Configuration Options**
### **Environment Variables**
```bash
# Database configuration
export DB_TYPE=sqlite
export WATER_DB_PATH=water_monitoring.db
# Gap filling settings (can be added to config.py)
export GAP_FILL_DAYS=7 # Days to check for gaps
export UPDATE_DAYS=2 # Days to check for updates
export API_DELAY=1 # Seconds between API calls
export MAX_RETRIES=3 # Database retry attempts
```
### **Customizable Parameters**
- **Gap Check Period**: Default 7 days, configurable via command line
- **Update Period**: Default 2 days, configurable via command line
- **API Rate Limiting**: 1-second delay between calls (configurable)
- **Retry Logic**: 3 attempts with exponential backoff (configurable)
## 🛠️ **Troubleshooting**
### **Common Issues**
#### **Database Locked Errors**
```
ERROR - Error saving to SQLITE: database is locked
```
**Solution**: The retry logic now handles this automatically with exponential backoff.
#### **API Rate Limiting**
```
WARNING - Too many requests to API
```
**Solution**: Increase delay between API calls or reduce the number of days processed at once.
#### **Missing Data Still Present**
```
Found X missing timestamps after gap filling
```
**Possible Causes**:
- Data not available on the Thai government server for those timestamps
- Network issues during API calls
- API returned empty data for those specific times
### **Debug Commands**
```bash
# Enable debug logging
export LOG_LEVEL=DEBUG
python water_scraper_v3.py --check-gaps 1
# Test specific date range
python water_scraper_v3.py --fill-gaps 1
# Check database directly
sqlite3 water_monitoring.db "SELECT COUNT(*) FROM water_measurements;"
sqlite3 water_monitoring.db "SELECT timestamp, COUNT(*) FROM water_measurements GROUP BY timestamp ORDER BY timestamp DESC LIMIT 10;"
```
## 📈 **Performance Metrics**
### **Gap Filling Efficiency**
- **API Calls**: Grouped by date to minimize requests
- **Processing Speed**: ~100-400 data points per API call
- **Success Rate**: 86% gap reduction in test case
- **Resource Usage**: Minimal memory footprint with selective processing
### **Database Performance**
- **SQLite Optimization**: Connection pooling and timeout handling
- **Transaction Efficiency**: Batch inserts with proper transaction management
- **Retry Success**: Automatic recovery from temporary lock conditions
## 🎯 **Best Practices**
### **Regular Maintenance**
1. **Daily**: Run `--check-gaps 1` to monitor recent data quality
2. **Weekly**: Run `--fill-gaps 7` to catch any missed data
3. **Monthly**: Run `--update-data 7` to ensure data accuracy
### **Production Deployment**
1. **Automated Scheduling**: Use cron or systemd timers for regular gap checks
2. **Monitoring**: Set up alerts for excessive missing data
3. **Backup**: Regular database backups before major gap-filling operations
### **Data Quality Assurance**
1. **Validation**: Check for reasonable value ranges after gap filling
2. **Comparison**: Compare filled data with nearby timestamps for consistency
3. **Documentation**: Log all gap-filling activities for audit trails
This enhanced gap-filling system ensures comprehensive and accurate water level monitoring with minimal data loss and automatic recovery capabilities.