Troubleshooting Guide¶
Common issues and solutions for the Gunicorn Prometheus Exporter.
🚨 Common Issues¶
Port Already in Use¶
Error:
Solution: 1. Change the metrics port in your configuration:
# In gunicorn.conf.py
import os
os.environ.setdefault("PROMETHEUS_METRICS_PORT", "9091") # Use different port
- Or kill the process using the port:
Permission Denied¶
Error:
Solution: 1. Check multiprocess directory permissions:
# Create directory with proper permissions
mkdir -p /tmp/prometheus_multiproc
chmod 755 /tmp/prometheus_multiproc
- Or use a different directory:
Import Errors for Async Workers¶
Error:
Solution: 1. Install the required dependencies:
# For eventlet workers
pip install gunicorn-prometheus-exporter[eventlet]
# For gevent workers
pip install gunicorn-prometheus-exporter[gevent]
# For tornado workers (⚠️ Not recommended)
pip install gunicorn-prometheus-exporter[tornado]
# Or install all async dependencies
pip install gunicorn-prometheus-exporter[async]
- Verify the installation:
Metrics Not Updating¶
Issue: Metrics endpoint shows stale or no data.
Solutions:
-
Check environment variables:
-
Check multiprocess directory:
-
Restart Gunicorn:
Worker Type Errors¶
Error:
Solution: 1. Verify worker class is correctly specified:
# In gunicorn.conf.py
worker_class = "gunicorn_prometheus_exporter.PrometheusWorker" # Sync
worker_class = "gunicorn_prometheus_exporter.PrometheusThreadWorker" # Thread
worker_class = "gunicorn_prometheus_exporter.PrometheusEventletWorker" # Eventlet
worker_class = "gunicorn_prometheus_exporter.PrometheusGeventWorker" # Gevent
worker_class = "gunicorn_prometheus_exporter.PrometheusTornadoWorker" # Tornado (⚠️ Not recommended)
- Check if async dependencies are installed:
🔧 Configuration Issues¶
Environment Variables Not Set¶
Error:
Solution: 1. Set environment variables in your configuration:
# In gunicorn.conf.py
import os
os.environ.setdefault("PROMETHEUS_MULTIPROC_DIR", "/tmp/prometheus_multiproc")
os.environ.setdefault("PROMETHEUS_METRICS_PORT", "9090")
os.environ.setdefault("PROMETHEUS_BIND_ADDRESS", "0.0.0.0")
os.environ.setdefault("GUNICORN_WORKERS", "2")
- Or export them in your shell:
Redis Configuration Issues¶
Error:
Solution: 1. Check Redis server is running:
-
Verify Redis configuration:
-
Test Redis connection:
🐛 Debug Mode¶
Enable Debug Logging¶
# In gunicorn.conf.py
import logging
logging.basicConfig(level=logging.DEBUG)
# Or set specific logger
logging.getLogger('gunicorn_prometheus_exporter').setLevel(logging.DEBUG)
Verbose Gunicorn Output¶
Check Metrics Endpoint¶
# Test metrics endpoint
curl http://0.0.0.0:9090/metrics
# Check for specific metrics
curl http://0.0.0.0:9090/metrics | grep gunicorn_worker
# Check for errors
curl http://0.0.0.0:9090/metrics | grep -i error
🔍 Diagnostic Commands¶
Check Process Status¶
# List Gunicorn processes
ps aux | grep gunicorn
# Check open ports
netstat -tlnp | grep 9090
# Check multiprocess directory
ls -la /tmp/prometheus_multiproc/
Monitor Metrics¶
# Watch metrics in real-time
watch -n 1 'curl -s http://0.0.0.0:9090/metrics | grep gunicorn_worker_requests_total'
# Monitor specific worker
watch -n 1 'curl -s http://0.0.0.0:9090/metrics | grep "worker_id=\"worker_1\""'
Test Worker Types¶
# Test sync worker
gunicorn --config example/gunicorn_simple.conf.py example/app:app
# Test thread worker
gunicorn --config example/gunicorn_thread_worker.conf.py example/app:app
# Test eventlet worker
gunicorn --config example/gunicorn_eventlet_async.conf.py example/async_app:app
# Test gevent worker
gunicorn --config example/gunicorn_gevent_async.conf.py example/async_app:app
# Test tornado worker (⚠️ Not recommended)
gunicorn --config example/gunicorn_tornado_async.conf.py example/async_app:app
🚨 Async Worker Issues¶
Eventlet Worker Problems¶
Common Issues: 1. Import errors: Install eventlet
package 2. WSGI compatibility: Use async-compatible application 3. Worker connections: Set appropriate worker_connections
Solution:
# In gunicorn.conf.py
worker_class = "gunicorn_prometheus_exporter.PrometheusEventletWorker"
worker_connections = 1000
# Use async-compatible app
app = "example.async_app:app"
Gevent Worker Problems¶
Common Issues: 1. Import errors: Install gevent
package 2. Monkey patching: May conflict with other libraries 3. Worker connections: Set appropriate worker_connections
Solution:
# In gunicorn.conf.py
worker_class = "gunicorn_prometheus_exporter.PrometheusGeventWorker"
worker_connections = 1000
# Use async-compatible app
app = "example.async_app:app"
Tornado Worker Problems (⚠️ Not Recommended)¶
Common Issues: 1. Import errors: Install tornado
package 2. IOLoop conflicts: May conflict with other async libraries 3. Application compatibility: Requires async-compatible app 4. ⚠️ Metrics endpoint hanging: Prometheus metrics endpoint may become unresponsive 5. ⚠️ Thread safety issues: Metrics collection can cause deadlocks
⚠️ Warning: TornadoWorker has known compatibility issues with metrics collection. The Prometheus metrics endpoint may hang or become unresponsive. Use PrometheusEventletWorker
or PrometheusGeventWorker
instead for async applications.
Alternative Solution:
# In gunicorn.conf.py - Use EventletWorker instead
worker_class = "gunicorn_prometheus_exporter.PrometheusEventletWorker"
# Use async-compatible app
app = "example.async_app:app"
🔧 Performance Issues¶
High Memory Usage¶
Symptoms: - Memory usage increases over time - Workers restart frequently
Solutions: 1. Reduce worker count:
-
Enable metric cleanup:
-
Monitor memory metrics:
High CPU Usage¶
Symptoms: - CPU usage spikes during requests - Slow response times
Solutions: 1. Use appropriate worker type:
# For I/O-bound apps
worker_class = "gunicorn_prometheus_exporter.PrometheusThreadWorker"
# For async apps
worker_class = "gunicorn_prometheus_exporter.PrometheusEventletWorker"
- Monitor CPU metrics:
Slow Metrics Collection¶
Symptoms: - Metrics endpoint responds slowly - High latency in metric updates
Solutions: 1. Reduce metric collection frequency:
# Update worker metrics less frequently
def worker_int(worker):
# Only update every 10 seconds
if hasattr(worker, '_last_metrics_update'):
if time.time() - worker._last_metrics_update < 10:
return
worker._last_metrics_update = time.time()
worker.update_worker_metrics()
- Use Redis forwarding for aggregation:
🛠️ Recovery Procedures¶
Clean Restart¶
# Stop all Gunicorn processes
pkill -f gunicorn
# Clean multiprocess directory
rm -rf /tmp/prometheus_multiproc/*
# Restart with fresh configuration
gunicorn -c gunicorn.conf.py app:app
Emergency Recovery¶
# Force kill all processes
pkill -9 -f gunicorn
# Clean all temporary files
rm -rf /tmp/prometheus_multiproc/*
rm -rf /tmp/gunicorn*
# Restart with minimal configuration
gunicorn --bind 0.0.0.0:8000 --workers 1 app:app
Data Recovery¶
# Backup metrics data
cp -r /tmp/prometheus_multiproc /backup/prometheus_multiproc_$(date +%Y%m%d_%H%M%S)
# Restore from backup
cp -r /backup/prometheus_multiproc_latest/* /tmp/prometheus_multiproc/
📞 Getting Help¶
Debug Information¶
When reporting issues, include:
-
Gunicorn version:
-
Python version:
-
Installed packages:
-
Configuration file:
-
Error logs:
-
Metrics endpoint:
Support Channels¶
- GitHub Issues: Report bugs and feature requests
- Documentation: Check the API Reference
- Examples: See the
example/
directory for working configurations
For more help, see the Installation Guide and Configuration Reference.