Troubleshooting Guide¶
This document provides comprehensive solutions for common issues encountered when using the Gunicorn Prometheus Exporter.
Common Issues¶
This section addresses the most frequently encountered problems and their solutions.
Port Already in Use¶
Error:
Solution:
- Change the metrics port in your configuration:
# In gunicorn.conf.py
import os
os.environ.setdefault("PROMETHEUS_METRICS_PORT", "9091") # Use different port
- Or kill the process using the port:
Permission Denied¶
Error:
Solution:
- Check multiprocess directory permissions:
# Create directory with proper permissions
mkdir -p /tmp/prometheus_multiproc
chmod 755 /tmp/prometheus_multiproc
- Or use a different directory:
# In gunicorn.conf.py
import os
os.environ.setdefault("PROMETHEUS_MULTIPROC_DIR", "/var/tmp/prometheus_multiproc")
Import Errors for Async Workers¶
Error:
Solution:
- Install the required dependencies:
# For eventlet workers
pip install gunicorn-prometheus-exporter[eventlet]
# For gevent workers
pip install gunicorn-prometheus-exporter[gevent]
# Or install all async dependencies
pip install gunicorn-prometheus-exporter[async]
- Verify the installation:
Metrics Not Updating¶
Issue: Metrics endpoint shows stale or no data.
Solutions:
- Check environment variables:
# Verify all required variables are set
echo $PROMETHEUS_MULTIPROC_DIR
echo $PROMETHEUS_METRICS_PORT
echo $PROMETHEUS_BIND_ADDRESS
echo $GUNICORN_WORKERS
- Check multiprocess directory:
- Restart Gunicorn:
Worker Type Errors¶
Error:
Solution:
- Verify worker class is correctly specified:
# In gunicorn.conf.py
worker_class = "gunicorn_prometheus_exporter.PrometheusWorker" # Sync
worker_class = "gunicorn_prometheus_exporter.PrometheusThreadWorker" # Thread
worker_class = "gunicorn_prometheus_exporter.PrometheusEventletWorker" # Eventlet
worker_class = "gunicorn_prometheus_exporter.PrometheusGeventWorker" # Gevent
- Check if async dependencies are installed:
Configuration Issues¶
This section covers problems related to environment variables, Redis configuration, and other setup issues.
Environment Variables Not Set¶
Error:
Solution:
- Set environment variables in your configuration:
# In gunicorn.conf.py
import os
os.environ.setdefault("PROMETHEUS_MULTIPROC_DIR", "/tmp/prometheus_multiproc")
os.environ.setdefault("PROMETHEUS_METRICS_PORT", "9090")
os.environ.setdefault("PROMETHEUS_BIND_ADDRESS", "0.0.0.0")
os.environ.setdefault("GUNICORN_WORKERS", "2")
- Or export them in your shell:
export PROMETHEUS_MULTIPROC_DIR="/tmp/prometheus_multiproc"
export PROMETHEUS_METRICS_PORT="9090"
export PROMETHEUS_BIND_ADDRESS="0.0.0.0"
export GUNICORN_WORKERS="2"
Redis Configuration Issues¶
Error:
Solution:
- Check Redis server is running:
- Verify Redis configuration:
# In gunicorn.conf.py
import os
os.environ.setdefault("REDIS_ENABLED", "true")
os.environ.setdefault("REDIS_HOST", "localhost")
os.environ.setdefault("REDIS_PORT", "6379")
os.environ.setdefault("REDIS_DB", "0")
- Test Redis connection:
Debug Mode¶
This section provides guidance on enabling debug logging and diagnostic tools for troubleshooting.
Enable Debug Logging¶
# In gunicorn.conf.py
import logging
logging.basicConfig(level=logging.DEBUG)
# Or set specific logger
logging.getLogger('gunicorn_prometheus_exporter').setLevel(logging.DEBUG)
Verbose Gunicorn Output¶
Check Metrics Endpoint¶
# Test metrics endpoint
curl http://0.0.0.0:9090/metrics
# Check for specific metrics
curl http://0.0.0.0:9090/metrics | grep gunicorn_worker
# Check for errors
curl http://0.0.0.0:9090/metrics | grep -i error
Diagnostic Commands¶
This section provides command-line tools and techniques for diagnosing system issues.
Check Process Status¶
# List Gunicorn processes
ps aux | grep gunicorn
# Check open ports
netstat -tlnp | grep 9090
# Check multiprocess directory
ls -la /tmp/prometheus_multiproc/
Monitor Metrics¶
# Watch metrics in real-time
watch -n 1 'curl -s http://0.0.0.0:9090/metrics | grep gunicorn_worker_requests_total'
# Monitor specific worker
watch -n 1 'curl -s http://0.0.0.0:9090/metrics | grep "worker_id=\"worker_1\""'
Test Worker Types¶
# Test sync worker
gunicorn --config example/gunicorn_simple.conf.py example/app:app
# Test thread worker
gunicorn --config example/gunicorn_thread_worker.conf.py example/app:app
# Test eventlet worker
gunicorn --config example/gunicorn_eventlet_async.conf.py example/async_app:app
# Test gevent worker
gunicorn --config example/gunicorn_gevent_async.conf.py example/async_app:app
Async Worker Issues¶
This section addresses specific problems related to asynchronous worker types (Eventlet and Gevent).
Eventlet Worker Problems¶
Common Issues:
- Import errors: Install
eventlet
package - WSGI compatibility: Use async-compatible application
- Worker connections: Set appropriate
worker_connections
Solution:
# In gunicorn.conf.py
worker_class = "gunicorn_prometheus_exporter.PrometheusEventletWorker"
worker_connections = 1000
# Use async-compatible app
app = "example.async_app:app"
Gevent Worker Problems¶
Common Issues:
- Import errors: Install
gevent
package - Monkey patching: May conflict with other libraries
- Worker connections: Set appropriate
worker_connections
Solution:
# In gunicorn.conf.py
worker_class = "gunicorn_prometheus_exporter.PrometheusGeventWorker"
worker_connections = 1000
# Use async-compatible app
app = "example.async_app:app"
Alternative Solution:
# In gunicorn.conf.py - Use EventletWorker instead
worker_class = "gunicorn_prometheus_exporter.PrometheusEventletWorker"
# Use async-compatible app
app = "example.async_app:app"
Performance Issues¶
This section covers performance-related problems and optimization strategies.
High Memory Usage¶
Symptoms:
- Memory usage increases over time
- Workers restart frequently
Solutions:
- Reduce worker count:
- Enable metric cleanup:
- Monitor memory metrics:
High CPU Usage¶
Symptoms:
- CPU usage spikes during requests
- Slow response times
Solutions:
- Use appropriate worker type:
# For I/O-bound apps
worker_class = "gunicorn_prometheus_exporter.PrometheusThreadWorker"
# For async apps
worker_class = "gunicorn_prometheus_exporter.PrometheusEventletWorker"
- Monitor CPU metrics:
Slow Metrics Collection¶
Symptoms:
- Metrics endpoint responds slowly
- High latency in metric updates
Solutions:
- Reduce metric collection frequency:
# Update worker metrics less frequently
def worker_int(worker):
# Only update every 10 seconds
if hasattr(worker, '_last_metrics_update'):
if time.time() - worker._last_metrics_update < 10:
return
worker._last_metrics_update = time.time()
worker.update_worker_metrics()
Recovery Procedures¶
Clean Restart¶
# Stop all Gunicorn processes
pkill -f gunicorn
# Clean multiprocess directory
rm -rf /tmp/prometheus_multiproc/*
# Restart with fresh configuration
gunicorn -c gunicorn.conf.py app:app
Emergency Recovery¶
# Force kill all processes
pkill -9 -f gunicorn
# Clean all temporary files
rm -rf /tmp/prometheus_multiproc/*
rm -rf /tmp/gunicorn*
# Restart with minimal configuration
gunicorn --bind 0.0.0.0:8000 --workers 1 app:app
Data Recovery¶
# Backup metrics data
cp -r /tmp/prometheus_multiproc /backup/prometheus_multiproc_$(date +%Y%m%d_%H%M%S)
# Restore from backup
cp -r /backup/prometheus_multiproc_latest/* /tmp/prometheus_multiproc/
Getting Help¶
This section provides information on obtaining additional support and reporting issues.
Debug Information¶
When reporting issues, include:
- Gunicorn version:
- Python version:
- Installed packages:
- Configuration file:
- Error logs:
- Metrics endpoint:
Support Channels¶
- GitHub Issues: Report bugs and feature requests
- Documentation: Check the Backend API, Config API, Hooks API, Metrics API, or Plugin API
- Examples: See the
example/
directory for working configurations
For more help, see the Installation Guide and Configuration Reference.