Troubleshooting - Runpod Documentation

Deployment issues

Worker fails to start

If your worker fails to start or initialize:

Check logs: View endpoint logs in the Runpod console for error messages.
Verify local testing: Ensure your handler works in local testing before deploying.
Check dependencies: Verify all dependencies are installed in your Docker image.
GPU compatibility: Ensure your Docker image is compatible with the selected GPU type.
Input format: Verify your input format matches what your handler expects.

Worker initializes but fails on requests

Issue	Solution
Input validation errors	Add input validation in your handler and check logs for the expected format
Missing dependencies	Verify all required packages are in your Dockerfile
Model loading failures	Check GPU memory requirements and model path
Permission errors	Ensure files are readable and directories are writable

Job issues

Jobs stuck in queue

If jobs remain IN_QUEUE for extended periods:

No workers available: Check if max_workers is set appropriately.
Workers throttled: Your endpoint may be hitting rate limits. Check the Workers tab for throttled workers.
Cold start delays: First requests after idle periods require worker initialization. Consider increasing min_workers or enabling FlashBoot.

Jobs timing out

Cause	Solution
Processing takes too long	Increase `executionTimeout` in your job policy
Model loading too slow	Use model caching or bake models into your image
TTL too short	Set `ttl` to cover both queue time and execution time

Jobs failing

Check the job status response for error details. Common causes:

Handler exceptions: Unhandled exceptions in your handler code. Add try/catch blocks and return structured errors.
OOM (Out of Memory): Model or batch size exceeds GPU memory. Reduce batch size or use a larger GPU.
Timeout: Job exceeded execution timeout. Increase timeout or optimize processing.

Endpoint scaling issues

My endpoint was scaled down unexpectedly

If your endpoint’s max workers dropped without any change on your end, Runpod scaled the endpoint down automatically. This happens in two situations:

Prolonged inactivity: When an endpoint receives no requests for 3 days, its max workers is reduced to 2, and after 7 days its max workers is set to 0. Runpod emails you when the first reduction happens. For more details, see idle endpoint scale-down.
Repeated unhealthy workers: When an endpoint consistently produces unhealthy (crashing) workers, Runpod scales it down to stop billing and reduce thrashing, and sends you an email.

To bring the endpoint back, increase its max workers in the Runpod console. If the scale-down was caused by unhealthy workers, fix the underlying problem first, or the endpoint may be scaled down again. Check the logs for crash errors, and verify your worker using local testing.

Cold start issues

Slow cold starts

Cold start time includes container startup, model loading, and initialization. To reduce cold starts:

Use model caching: Store models on network volumes instead of downloading on each start.
Enable FlashBoot: Use FlashBoot for faster container initialization.
Optimize image size: Use smaller base images and remove unnecessary dependencies.
Initialize outside handler: Load models at module level, not inside the handler function.

# Good: Load model once at startup
model = load_model()

def handler(job):
    return model.predict(job["input"])

# Bad: Load model on every request
def handler(job):
    model = load_model()  # Slow!
    return model.predict(job["input"])

Too many cold starts

If you’re seeing frequent cold starts:

Increase idle timeout: Set a longer idle_timeout to keep workers warm between requests.
Set minimum workers: Configure min_workers > 0 to maintain warm workers.
Check traffic patterns: Sporadic traffic causes more cold starts than steady traffic.

Logging issues

Missing logs

If logs aren’t appearing in the console:

Check throttling: Excessive logging triggers throttling. Reduce log verbosity.
Verify output streams: Ensure you’re writing to stdout/stderr, not just files.
Check worker status: Logs only appear for successfully initialized workers.
Retention period: Logs older than 90 days are automatically removed.

Log throttling

To avoid log throttling:

Reduce log verbosity in production.
Use structured logging for efficiency.
Store detailed logs on network volumes instead of console output.

vLLM-specific issues

OOM errors

If your vLLM worker runs out of memory:

Lower GPU_MEMORY_UTILIZATION from 0.90 to 0.85.
Reduce MAX_MODEL_LEN to limit context window.
Use a GPU with more VRAM.

Model not loading

Issue	Solution
Model not found	Verify `MODEL_NAME` matches the Hugging Face model ID exactly
Gated model access denied	Set `HF_TOKEN` with a token that has access to the model
Incompatible model	Check vLLM supported models

OpenAI API errors

Error	Cause	Solution
401 Unauthorized	Invalid API key	Verify `RUNPOD_API_KEY` is correct
404 Not Found	Wrong endpoint URL	Use the format `https://api.runpod.ai/v2/ENDPOINT_ID/openai/v1`
Connection refused	Endpoint not ready	Wait for workers to initialize

Load balancing endpoint issues

”No workers available” error

This means workers didn’t initialize in time. Common causes:

First request: Workers need time to start. Retry the request. (See Handling cold starts for more information.)
All workers busy: Increase max_workers to handle more concurrent requests.
Workers crashing: Check logs for initialization errors.

Requests not reaching workers

Verify your HTTP server is:

Listening on port 8000 (or the port specified in your configuration).
Binding to 0.0.0.0, not 127.0.0.1.
Returning proper HTTP responses.

Getting help

If you’re still experiencing issues:

Check endpoint logs for detailed error messages.
SSH into workers using SSH access to debug in real-time.
Review metrics in the Metrics tab to identify patterns.
Contact support at help@runpod.io with your endpoint ID and error details.

​Deployment issues

​Worker fails to start

​Worker initializes but fails on requests

​Job issues

​Jobs stuck in queue

​Jobs timing out

​Jobs failing

​Endpoint scaling issues

​My endpoint was scaled down unexpectedly

​Cold start issues

​Slow cold starts

​Too many cold starts

​Logging issues

​Missing logs

​Log throttling

​vLLM-specific issues

​OOM errors

​Model not loading

​OpenAI API errors

​Load balancing endpoint issues

​”No workers available” error

​Requests not reaching workers

​Getting help