πŸš€ -> Project on GitHub <-

Health Monitoring Dashboard v1.2

Navigation: Home Docs Quickstart Dashboard OSINT

The integrated health monitoring system provides comprehensive monitoring and diagnostics for CrawlLama.

Features

Live System Metrics

Component Health Checks

Performance Tracking

Alert System

Rich Terminal UI

Usage

Unified Health Dashboard

The Health Dashboard offers two modes in one application:

Interactive Menu:

# Windows
health-dashboard.bat

# Linux/Mac
./health-dashboard.sh

# Direct with Python
python health-dashboard.py

Direct Start Modes:

# Live System Monitor
python health-dashboard.py --monitor

# Test Dashboard
python health-dashboard.py --tests

Mode 1: Terminal-based Live Monitoring

Real-time monitoring with Rich Terminal UI:

Mode 2: GUI Test Dashboard

Tkinter-based GUI for test management:

Programmatic Usage

System Monitor

from pathlib import Path
from core.health import SystemMonitor

# Initialize
monitor = SystemMonitor(update_interval=1.0)
monitor.start()

# Get metrics
metrics = monitor.get_latest_metrics()
print(f"CPU: {metrics.cpu_percent}%")
print(f"Memory: {metrics.memory_used_gb}/{metrics.memory_total_gb} GB")
print(f"Disk: {metrics.disk_percent}%")

# Stop
monitor.stop()

Component Health Checker

from pathlib import Path
from core.health import ComponentHealthChecker, HealthStatus

# Initialize
checker = ComponentHealthChecker(Path.cwd())

# Check all components
health = checker.check_all()

# Display results
for name, status in health.items():
 print(f"{name}: {status.status.value} - {status.message}")
 print(f" Response Time: {status.response_time_ms:.2f}ms")

Performance Tracker

from core.health import PerformanceTracker, PerformanceTimer

# Initialize
tracker = PerformanceTracker()

# Track operation
with PerformanceTimer(tracker, "llm_query") as timer:
 # Your operation here
 result = expensive_operation()

# Get statistics
stats = tracker.get_stats("llm_query")
print(f"Average: {stats.avg_duration_ms:.2f}ms")
print(f"P95: {stats.p95_duration_ms:.2f}ms")
print(f"Success Rate: {stats.success_rate:.1f}%")

Alert System

from core.health import AlertSystem, AlertLevel

# Initialize
alerts = AlertSystem()

# Register alert callback
def on_alert(alert):
 print(f"[{alert.level.value}] {alert.component}: {alert.message}")

alerts.register_callback(on_alert)

# Check system data
alerts.check_alerts({
 'system_metrics': monitor.get_latest_metrics(),
 'component_health': checker.check_all(),
 'performance_stats': tracker.get_all_stats()
})

# Get active alerts
active = alerts.get_alerts(unacknowledged_only=True)
for alert in active:
 print(f"{alert.level.value}: {alert.message}")

Rich Terminal Dashboard

from pathlib import Path
from core.health import RichHealthDashboard

# Start dashboard
dashboard = RichHealthDashboard(
 project_root=Path.cwd(),
 update_interval=2.0 # Seconds
)

dashboard.start() # Blocks until Ctrl+C

Integration in Your Code

LLM Client with Performance Tracking

from core.llm_client import LLMClient
from core.health import PerformanceTracker

tracker = PerformanceTracker()
client = LLMClient("config.json")

# Wrapper function
def tracked_query(prompt: str):
 with PerformanceTimer(tracker, "llm_query") as timer:
 try:
 response = client.generate(prompt)
 return response
 except Exception as e:
 timer.mark_failure()
 raise

# Use
response = tracked_query("What is AI?")

# Display statistics
stats = tracker.get_stats("llm_query")
print(f"Average response time: {stats.avg_duration_ms:.2f}ms")

Web Search with Monitoring

from tools.web_search import web_search
from core.health import PerformanceTracker

tracker = PerformanceTracker()

def monitored_search(query: str):
 with PerformanceTimer(tracker, "web_search"):
 return web_search(query)

# Use
results = monitored_search("Python tutorials")

Alert Rules

Default Rules | Rule | Threshold | Level | Description |

|β€”β€”-|β€”β€”β€”β€”-|β€”β€”-|————–| | CPU Warning | 85% | WARNING | High CPU usage | | CPU Error | 95% | ERROR | Critical CPU usage | | Memory Warning | 85% | WARNING | High RAM usage | | Memory Error | 95% | ERROR | Critical RAM usage | | Disk Warning | 5 GB free | WARNING | Low storage space | | Disk Critical | 1 GB free | CRITICAL | Very low storage space | | Component Health | Unhealthy | ERROR | Component failure | | Performance | P95 > 5s | WARNING | Slow performance |

Custom Alert Rules

from core.health import AlertRule, AlertLevel

class CustomAlertRule(AlertRule):
 def __init__(self):
 super().__init__(
 name="Custom Rule",
 level=AlertLevel.WARNING,
 cooldown_minutes=10
 )

 def check(self, data: dict) -> str | None:
 # Your custom logic
 if some_condition:
 return "Custom alert message"
 return None

# Add rule
alerts.add_rule(CustomAlertRule())

Production Environment

Development Environment

Troubleshooting

Dashboard Won’t Start

Problem: ModuleNotFoundError: No module named 'rich'

Solution:

pip install rich psutil

No System Metrics

Problem: Metrics not displayed

Solution: Ensure psutil is installed:

pip install psutil

Component Checks Fail

Problem: All components show β€œUnhealthy”

Solution:

  1. Check config.json
  2. Ensure all directories exist:
     mkdir -p data/cache data/embeddings logs
    

Performance Data Missing

Problem: No performance statistics

Solution: Integrate PerformanceTimer in your code (see examples above)

Dashboard Layout

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ CrawlLama Health Dashboard | 2025-10-24 14:30:00 ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ System Metrics β”‚ Performance β”‚
β”‚ β”‚ β”‚
β”‚ CPU 45.2% β–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘ β”‚ llm_query 1250ms β”‚
β”‚ Memory 62.1% β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘ β”‚ web_search 850ms β”‚
β”‚ Disk 38.5% β–ˆβ–ˆβ–ˆβ–‘β–‘β–‘ β”‚ cache_read 25ms β”‚
β”‚ Network ↓1.2/↑0.3 MB/s β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚
β”‚ Component Health β”‚ β”‚
β”‚ β”‚ β”‚
β”‚ LLM Client 45ms β”‚ β”‚
β”‚ Cache System 12ms β”‚ β”‚
β”‚ RAG System 89ms β”‚ β”‚
β”‚ Search Tools 23ms β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Alerts (2) β”‚
β”‚ β”‚
β”‚ High CPU usage: 87.5% (threshold: 85.0%) β”‚
β”‚ Slow operations: llm_query (P95: 5200ms) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
┃ Alerts: 0 1 1 | Press Ctrl+C to exit ┃
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Best Practices

  1. Regular Monitoring: Start the dashboard during development
  2. Performance Integration: Use PerformanceTimer for critical operations
  3. Alert Callbacks: Implement logging or notifications
  4. Adjust Thresholds: Adapt alerts to your environment
  5. Historical Data: Regularly export performance statistics

Changelog

v1.2.0 (2025-10-24)

v1.0.0

Further Resources