Skip to content

Troubleshooting Guide

Common issues and solutions for the Gunicorn Prometheus Exporter.

🚨 Common Issues

Port Already in Use

Error:

OSError: [Errno 98] Address already in use

Solution: 1. Change the metrics port in your configuration:

# In gunicorn.conf.py
import os
os.environ.setdefault("PROMETHEUS_METRICS_PORT", "9091")  # Use different port

  1. Or kill the process using the port:
    # Find the process
    lsof -i :9090
    
    # Kill the process
    kill -9 <PID>
    

Permission Denied

Error:

PermissionError: [Errno 13] Permission denied

Solution: 1. Check multiprocess directory permissions:

# Create directory with proper permissions
mkdir -p /tmp/prometheus_multiproc
chmod 755 /tmp/prometheus_multiproc

  1. Or use a different directory:
    # In gunicorn.conf.py
    import os
    os.environ.setdefault("PROMETHEUS_MULTIPROC_DIR", "/var/tmp/prometheus_multiproc")
    

Import Errors for Async Workers

Error:

ModuleNotFoundError: No module named 'eventlet'

Solution: 1. Install the required dependencies:

# For eventlet workers
pip install gunicorn-prometheus-exporter[eventlet]

# For gevent workers
pip install gunicorn-prometheus-exporter[gevent]

# For tornado workers (⚠️ Not recommended)
pip install gunicorn-prometheus-exporter[tornado]

# Or install all async dependencies
pip install gunicorn-prometheus-exporter[async]

  1. Verify the installation:
    python -c "import eventlet; print('eventlet available')"
    

Metrics Not Updating

Issue: Metrics endpoint shows stale or no data.

Solutions:

  1. Check environment variables:

    # Verify all required variables are set
    echo $PROMETHEUS_MULTIPROC_DIR
    echo $PROMETHEUS_METRICS_PORT
    echo $PROMETHEUS_BIND_ADDRESS
    echo $GUNICORN_WORKERS
    

  2. Check multiprocess directory:

    # Verify directory exists and is writable
    ls -la /tmp/prometheus_multiproc/
    

  3. Restart Gunicorn:

    # Kill existing process
    pkill -f gunicorn
    
    # Start fresh
    gunicorn -c gunicorn.conf.py app:app
    

Worker Type Errors

Error:

TypeError: 'NoneType' object is not callable

Solution: 1. Verify worker class is correctly specified:

# In gunicorn.conf.py
worker_class = "gunicorn_prometheus_exporter.PrometheusWorker"  # Sync
worker_class = "gunicorn_prometheus_exporter.PrometheusThreadWorker"  # Thread
worker_class = "gunicorn_prometheus_exporter.PrometheusEventletWorker"  # Eventlet
worker_class = "gunicorn_prometheus_exporter.PrometheusGeventWorker"  # Gevent
worker_class = "gunicorn_prometheus_exporter.PrometheusTornadoWorker"  # Tornado (⚠️ Not recommended)

  1. Check if async dependencies are installed:
    # For eventlet workers
    python -c "import eventlet"
    
    # For gevent workers
    python -c "import gevent"
    
    # For tornado workers (⚠️ Not recommended)
    python -c "import tornado"
    

🔧 Configuration Issues

Environment Variables Not Set

Error:

ValueError: Environment variable PROMETHEUS_METRICS_PORT must be set in production

Solution: 1. Set environment variables in your configuration:

# In gunicorn.conf.py
import os
os.environ.setdefault("PROMETHEUS_MULTIPROC_DIR", "/tmp/prometheus_multiproc")
os.environ.setdefault("PROMETHEUS_METRICS_PORT", "9090")
os.environ.setdefault("PROMETHEUS_BIND_ADDRESS", "0.0.0.0")
os.environ.setdefault("GUNICORN_WORKERS", "2")

  1. Or export them in your shell:
    export PROMETHEUS_MULTIPROC_DIR="/tmp/prometheus_multiproc"
    export PROMETHEUS_METRICS_PORT="9090"
    export PROMETHEUS_BIND_ADDRESS="0.0.0.0"
    export GUNICORN_WORKERS="2"
    

Redis Configuration Issues

Error:

ConnectionError: Error connecting to Redis

Solution: 1. Check Redis server is running:

redis-cli ping

  1. Verify Redis configuration:

    # In gunicorn.conf.py
    import os
    os.environ.setdefault("REDIS_ENABLED", "true")
    os.environ.setdefault("REDIS_HOST", "localhost")
    os.environ.setdefault("REDIS_PORT", "6379")
    os.environ.setdefault("REDIS_DB", "0")
    

  2. Test Redis connection:

    redis-cli -h localhost -p 6379 ping
    

🐛 Debug Mode

Enable Debug Logging

# In gunicorn.conf.py
import logging
logging.basicConfig(level=logging.DEBUG)

# Or set specific logger
logging.getLogger('gunicorn_prometheus_exporter').setLevel(logging.DEBUG)

Verbose Gunicorn Output

# Start with verbose logging
gunicorn -c gunicorn.conf.py app:app --log-level debug

Check Metrics Endpoint

# Test metrics endpoint
curl http://0.0.0.0:9090/metrics

# Check for specific metrics
curl http://0.0.0.0:9090/metrics | grep gunicorn_worker

# Check for errors
curl http://0.0.0.0:9090/metrics | grep -i error

🔍 Diagnostic Commands

Check Process Status

# List Gunicorn processes
ps aux | grep gunicorn

# Check open ports
netstat -tlnp | grep 9090

# Check multiprocess directory
ls -la /tmp/prometheus_multiproc/

Monitor Metrics

# Watch metrics in real-time
watch -n 1 'curl -s http://0.0.0.0:9090/metrics | grep gunicorn_worker_requests_total'

# Monitor specific worker
watch -n 1 'curl -s http://0.0.0.0:9090/metrics | grep "worker_id=\"worker_1\""'

Test Worker Types

# Test sync worker
gunicorn --config example/gunicorn_simple.conf.py example/app:app

# Test thread worker
gunicorn --config example/gunicorn_thread_worker.conf.py example/app:app

# Test eventlet worker
gunicorn --config example/gunicorn_eventlet_async.conf.py example/async_app:app

# Test gevent worker
gunicorn --config example/gunicorn_gevent_async.conf.py example/async_app:app

# Test tornado worker (⚠️ Not recommended)
gunicorn --config example/gunicorn_tornado_async.conf.py example/async_app:app

🚨 Async Worker Issues

Eventlet Worker Problems

Common Issues: 1. Import errors: Install eventlet package 2. WSGI compatibility: Use async-compatible application 3. Worker connections: Set appropriate worker_connections

Solution:

# In gunicorn.conf.py
worker_class = "gunicorn_prometheus_exporter.PrometheusEventletWorker"
worker_connections = 1000

# Use async-compatible app
app = "example.async_app:app"

Gevent Worker Problems

Common Issues: 1. Import errors: Install gevent package 2. Monkey patching: May conflict with other libraries 3. Worker connections: Set appropriate worker_connections

Solution:

# In gunicorn.conf.py
worker_class = "gunicorn_prometheus_exporter.PrometheusGeventWorker"
worker_connections = 1000

# Use async-compatible app
app = "example.async_app:app"

Common Issues: 1. Import errors: Install tornado package 2. IOLoop conflicts: May conflict with other async libraries 3. Application compatibility: Requires async-compatible app 4. ⚠️ Metrics endpoint hanging: Prometheus metrics endpoint may become unresponsive 5. ⚠️ Thread safety issues: Metrics collection can cause deadlocks

⚠️ Warning: TornadoWorker has known compatibility issues with metrics collection. The Prometheus metrics endpoint may hang or become unresponsive. Use PrometheusEventletWorker or PrometheusGeventWorker instead for async applications.

Alternative Solution:

# In gunicorn.conf.py - Use EventletWorker instead
worker_class = "gunicorn_prometheus_exporter.PrometheusEventletWorker"

# Use async-compatible app
app = "example.async_app:app"

🔧 Performance Issues

High Memory Usage

Symptoms: - Memory usage increases over time - Workers restart frequently

Solutions: 1. Reduce worker count:

# In gunicorn.conf.py
workers = 2  # Reduce from default

  1. Enable metric cleanup:

    # In gunicorn.conf.py
    import os
    os.environ.setdefault("CLEANUP_DB_FILES", "true")
    

  2. Monitor memory metrics:

    # Check memory usage
    curl http://0.0.0.0:9090/metrics | grep gunicorn_worker_memory_bytes
    

High CPU Usage

Symptoms: - CPU usage spikes during requests - Slow response times

Solutions: 1. Use appropriate worker type:

# For I/O-bound apps
worker_class = "gunicorn_prometheus_exporter.PrometheusThreadWorker"

# For async apps
worker_class = "gunicorn_prometheus_exporter.PrometheusEventletWorker"

  1. Monitor CPU metrics:
    # Check CPU usage
    curl http://0.0.0.0:9090/metrics | grep gunicorn_worker_cpu_percent
    

Slow Metrics Collection

Symptoms: - Metrics endpoint responds slowly - High latency in metric updates

Solutions: 1. Reduce metric collection frequency:

# Update worker metrics less frequently
def worker_int(worker):
    # Only update every 10 seconds
    if hasattr(worker, '_last_metrics_update'):
        if time.time() - worker._last_metrics_update < 10:
            return
    worker._last_metrics_update = time.time()
    worker.update_worker_metrics()

  1. Use Redis forwarding for aggregation:
    # Enable Redis forwarding
    import os
    os.environ.setdefault("REDIS_ENABLED", "true")
    

🛠️ Recovery Procedures

Clean Restart

# Stop all Gunicorn processes
pkill -f gunicorn

# Clean multiprocess directory
rm -rf /tmp/prometheus_multiproc/*

# Restart with fresh configuration
gunicorn -c gunicorn.conf.py app:app

Emergency Recovery

# Force kill all processes
pkill -9 -f gunicorn

# Clean all temporary files
rm -rf /tmp/prometheus_multiproc/*
rm -rf /tmp/gunicorn*

# Restart with minimal configuration
gunicorn --bind 0.0.0.0:8000 --workers 1 app:app

Data Recovery

# Backup metrics data
cp -r /tmp/prometheus_multiproc /backup/prometheus_multiproc_$(date +%Y%m%d_%H%M%S)

# Restore from backup
cp -r /backup/prometheus_multiproc_latest/* /tmp/prometheus_multiproc/

📞 Getting Help

Debug Information

When reporting issues, include:

  1. Gunicorn version:

    gunicorn --version
    

  2. Python version:

    python --version
    

  3. Installed packages:

    pip list | grep gunicorn
    

  4. Configuration file:

    cat gunicorn.conf.py
    

  5. Error logs:

    gunicorn -c gunicorn.conf.py app:app --log-level debug 2>&1
    

  6. Metrics endpoint:

    curl http://0.0.0.0:9090/metrics
    

Support Channels

  • GitHub Issues: Report bugs and feature requests
  • Documentation: Check the API Reference
  • Examples: See the example/ directory for working configurations

For more help, see the Installation Guide and Configuration Reference.