LiteLLM Provider Guide¶

Access 100+ AI providers through a unified OpenAI-compatible proxy with advanced features

Overview¶

LiteLLM is a powerful proxy server that unifies access to 100+ AI providers (OpenAI, Anthropic, Azure, Vertex, Bedrock, Cohere, etc.) through a single OpenAI-compatible API. It adds enterprise features like load balancing, fallbacks, budgets, and rate limiting on top of any AI provider.

Key Benefits¶

🌐 100+ Providers: Access every major AI provider through one interface
🔄 Load Balancing: Distribute requests across multiple providers/models
💰 Cost Tracking: Built-in budget management and spend tracking
⚡ Fallbacks: Automatic failover when providers are down
🔧 Proxy Mode: Run as standalone proxy server for team-wide use
📊 Observability: Detailed logging, metrics, and analytics
🔐 Virtual Keys: Manage API keys centrally with role-based access

Use Cases¶

Multi-Provider Access: Unified interface for all AI providers
Load Balancing: Distribute load across providers for reliability
Cost Management: Track and limit AI spending across teams
Provider Migration: Easy switching between providers
Team Collaboration: Centralized proxy for entire organization
Enterprise Features: Budgets, rate limits, audit logs

Quick Start¶

Option 1: Direct Integration (SDK Only)¶

Use LiteLLM directly in your code without running a proxy server.

1. Install LiteLLM¶

pip install litellm

2. Configure NeuroLink¶

# Add provider API keys to .env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_AI_API_KEY=AIza...

3. Use via LiteLLM Python Client¶

import litellm

# Use any provider with OpenAI-compatible interface
response = litellm.completion(
  model="gpt-4",
  messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

# Switch providers easily
response = litellm.completion(
  model="claude-3-5-sonnet-20241022",  # Anthropic
  messages=[{"role": "user", "content": "Hello!"}]
)

response = litellm.completion(
  model="gemini/gemini-pro",  # Google AI
  messages=[{"role": "user", "content": "Hello!"}]
)

Option 2: Proxy Server (Recommended for Teams)¶

Run LiteLLM as a standalone proxy server for team-wide access.

1. Install LiteLLM¶

pip install 'litellm[proxy]'

2. Create Configuration File¶

Create litellm_config.yaml:

model_list:
  - model_name: gpt-4
    litellm_params:
      model: gpt-4
      api_key: ${OPENAI_API_KEY} # Use env vars for all secrets

  - model_name: claude-3-5-sonnet
    litellm_params:
      model: claude-3-5-sonnet-20241022
      api_key: ${ANTHROPIC_API_KEY} # Use env vars for all secrets

  - model_name: gemini-pro
    litellm_params:
      model: gemini/gemini-pro
      api_key: ${GOOGLE_API_KEY} # Use env vars for all secrets

  # Optional: Load balancing across multiple instances
  # SECURITY: Use environment variables or secret management (e.g., AWS Secrets Manager, HashiCorp Vault)
  - model_name: gpt-4-balanced
    litellm_params:
      model: gpt-4
      api_key: ${OPENAI_API_KEY_1} # Use env vars for all secrets

  - model_name: gpt-4-balanced
    litellm_params:
      model: gpt-4
      api_key: ${OPENAI_API_KEY_2} # Use env vars for all secrets

general_settings:
  master_key: ${LITELLM_MASTER_KEY} # Use env vars for all secrets
  database_url: "postgresql://..." # Optional: for persistence

3. Start Proxy Server¶

litellm --config litellm_config.yaml --port 8000

4. Configure NeuroLink to Use Proxy¶

# Add to .env
OPENAI_COMPATIBLE_BASE_URL=http://localhost:8000
OPENAI_COMPATIBLE_API_KEY=sk-1234  # Your master_key from config

5. Test Setup¶

# Test via NeuroLink
npx @juspay/neurolink generate "Hello from LiteLLM!" \
  --provider openai-compatible \
  --model "gpt-4"

# Or use any OpenAI-compatible client
curl http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer sk-1234" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Provider Support¶

Supported Providers (100+)¶

LiteLLM supports all major AI providers:

Category	Providers
Major Cloud	OpenAI, Anthropic, Google (Gemini, Vertex), Azure OpenAI, AWS Bedrock
Open Source	Hugging Face, Together AI, Replicate, Ollama, vLLM, LocalAI
Specialized	Cohere, AI21, Aleph Alpha, Perplexity, Groq, Fireworks AI
Aggregators	OpenRouter, Anyscale, Deep Infra, Mistral AI
Enterprise	SageMaker, Cloudflare Workers AI, Azure AI Studio
Custom	Any OpenAI-compatible endpoint

Model Name Format¶

# OpenAI (default prefix)
model: gpt-4                    # openai/gpt-4
model: gpt-4o-mini             # openai/gpt-4o-mini

# Anthropic
model: claude-3-5-sonnet-20241022      # anthropic/claude-3-5-sonnet
model: anthropic/claude-3-opus-20240229

# Google AI
model: gemini/gemini-pro              # Google AI Studio
model: vertex_ai/gemini-pro           # Vertex AI

# Azure OpenAI
model: azure/gpt-4                    # Requires azure config

# AWS Bedrock
model: bedrock/anthropic.claude-3-sonnet-20240229-v1:0

# Ollama (local)
model: ollama/llama2                  # Requires Ollama running

# Hugging Face
model: huggingface/mistralai/Mistral-7B-Instruct-v0.2

# OpenRouter
model: openrouter/anthropic/claude-3.5-sonnet

# Together AI
model: together_ai/meta-llama/Llama-3-70b-chat-hf

# Full list: https://docs.litellm.ai/docs/providers

Advanced Features¶

1. Load Balancing¶

Distribute requests across multiple providers or API keys:

# litellm_config.yaml
model_list:
  # Load balance across multiple OpenAI keys
  - model_name: gpt-4-loadbalanced
    litellm_params:
      model: gpt-4
      api_key: sk-key-1...

  - model_name: gpt-4-loadbalanced
    litellm_params:
      model: gpt-4
      api_key: sk-key-2...

  - model_name: gpt-4-loadbalanced
    litellm_params:
      model: gpt-4
      api_key: sk-key-3...

router_settings:
  routing_strategy: simple-shuffle # Round-robin across keys
  # or: least-busy, usage-based-routing, latency-based-routing

Usage with NeuroLink:

const ai = new NeuroLink({
  providers: [
    {
      name: "openai-compatible",
      config: {
        baseUrl: "http://localhost:8000",
        apiKey: "sk-1234",
      },
    },
  ],
});

// Requests automatically balanced across all 3 API keys
const result = await ai.generate({
  input: { text: "Your prompt" },
  provider: "openai-compatible",
  model: "gpt-4-loadbalanced",
});

2. Automatic Failover¶

Configure fallback providers for reliability:

# litellm_config.yaml
model_list:
  # Primary: OpenAI
  - model_name: smart-model
    litellm_params:
      model: gpt-4
      api_key: sk-...

  # Fallback 1: Anthropic
  - model_name: smart-model
    litellm_params:
      model: claude-3-5-sonnet-20241022
      api_key: sk-ant-...

  # Fallback 2: Google
  - model_name: smart-model
    litellm_params:
      model: gemini/gemini-pro
      api_key: AIza...

router_settings:
  enable_fallbacks: true
  fallback_timeout: 30 # Seconds before trying fallback
  num_retries: 2

3. Budget Management¶

Set spending limits per user/team:

# litellm_config.yaml
general_settings:
  master_key: sk-1234
  database_url: "postgresql://..." # Required for budgets

# Create virtual keys with budgets
# litellm --config config.yaml --create_key \
#   --key_name "team-frontend" \
#   --budget 100  # $100 limit

Track spending:

# Check budget status
import litellm
budget_info = litellm.get_budget(api_key="sk-team-frontend-...")
print(f"Spent: ${budget_info['total_spend']}")
print(f"Budget: ${budget_info['max_budget']}")

4. Rate Limiting¶

Control request rates per user/model:

# litellm_config.yaml
model_list:
  - model_name: gpt-4-limited
    litellm_params:
      model: gpt-4
      api_key: sk-...
    model_info:
      max_parallel_requests: 10 # Max concurrent requests
      max_requests_per_minute: 100 # RPM limit
      max_tokens_per_minute: 100000 # TPM limit

5. Caching¶

Reduce costs by caching responses:

# litellm_config.yaml
general_settings:
  cache: true
  cache_params:
    type: redis
    host: localhost
    port: 6379
    ttl: 3600 # Cache for 1 hour

Usage:

// Identical requests within TTL return cached results
const result1 = await ai.generate({
  input: { text: "What is AI?" },
  provider: "openai-compatible",
  model: "gpt-4",
});
// Cost: $0.03

const result2 = await ai.generate({
  input: { text: "What is AI?" }, // Same query
  provider: "openai-compatible",
  model: "gpt-4",
});
// Cost: $0.00 (cached)

6. Virtual Keys (Team Management)¶

Create team-specific API keys with permissions:

# Create key for frontend team with budget
litellm --config config.yaml --create_key \
  --key_name "team-frontend" \
  --budget 100 \
  --models "gpt-4,claude-3-5-sonnet"

# Create key for backend team
litellm --config config.yaml --create_key \
  --key_name "team-backend" \
  --budget 500 \
  --models "gpt-4,gpt-4o-mini,claude-3-5-sonnet"

# Returns: sk-litellm-team-frontend-abc123...

Teams use their virtual key:

OPENAI_COMPATIBLE_API_KEY=sk-litellm-team-frontend-abc123

NeuroLink Integration¶

Basic Usage¶

import { NeuroLink } from "@juspay/neurolink";

const ai = new NeuroLink({
  providers: [
    {
      name: "openai-compatible",
      config: {
        baseUrl: "http://localhost:8000", // LiteLLM proxy
        apiKey: process.env.LITELLM_KEY, // Master key or virtual key
      },
    },
  ],
});

// Use any provider through LiteLLM
const result = await ai.generate({
  input: { text: "Hello!" },
  provider: "openai-compatible",
  model: "gpt-4",
});

Multi-Model Workflow¶

// Easy switching between providers via LiteLLM
const models = {
  fast: "gpt-4o-mini",
  balanced: "claude-3-5-sonnet-20241022",
  powerful: "gpt-4",
};

async function generateSmart(
  prompt: string,
  complexity: "low" | "medium" | "high",
) {
  const modelMap = {
    low: models.fast,
    medium: models.balanced,
    high: models.powerful,
  };

  return await ai.generate({
    input: { text: prompt },
    provider: "openai-compatible",
    model: modelMap[complexity],
  });
}

Cost Tracking¶

// LiteLLM provides detailed cost tracking
const result = await ai.generate({
  input: { text: "Your prompt" },
  provider: "openai-compatible",
  model: "gpt-4",
  enableAnalytics: true,
});

console.log("Model used:", result.model);
console.log("Tokens:", result.usage.totalTokens);
console.log("Cost:", result.cost); // Calculated by LiteLLM

CLI Usage¶

Basic Commands¶

# Start LiteLLM proxy
litellm --config litellm_config.yaml --port 8000

# Use via NeuroLink CLI
npx @juspay/neurolink generate "Hello LiteLLM" \
  --provider openai-compatible \
  --model "gpt-4"

# Switch models easily
npx @juspay/neurolink gen "Write code" \
  --provider openai-compatible \
  --model "claude-3-5-sonnet-20241022"

# Check proxy status
curl http://localhost:8000/health

Proxy Management¶

# Create virtual key
litellm --config config.yaml --create_key \
  --key_name "my-team" \
  --budget 100

# List all keys
litellm --config config.yaml --list_keys

# Delete key
litellm --config config.yaml --delete_key \
  --key "sk-litellm-abc123..."

# View spend by key
litellm --config config.yaml --spend \
  --key "sk-litellm-abc123..."

Production Deployment¶

Docker Deployment¶

# Dockerfile
FROM ghcr.io/berriai/litellm:main-latest

COPY litellm_config.yaml /app/config.yaml

EXPOSE 8000

CMD ["litellm", "--config", "/app/config.yaml", "--port", "8000"]

# Build and run
docker build -t litellm-proxy .
docker run -p 8000:8000 litellm-proxy

Docker Compose¶

# docker-compose.yml
version: "3.8"

services:
  litellm:
    image: ghcr.io/berriai/litellm:main-latest
    ports:
      - "8000:8000"
    volumes:
      - ./litellm_config.yaml:/app/config.yaml
    command: ["litellm", "--config", "/app/config.yaml", "--port", "8000"]
    environment:
      - DATABASE_URL=postgresql://user:pass@postgres:5432/litellm
    depends_on:
      - postgres

  postgres:
    image: postgres:15
    environment:
      - POSTGRES_DB=litellm
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=pass
    volumes:
      - postgres_data:/var/lib/postgresql/data

volumes:
  postgres_data:

Kubernetes Deployment¶

# litellm-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: litellm-proxy
spec:
  replicas: 3
  selector:
    matchLabels:
      app: litellm
  template:
    metadata:
      labels:
        app: litellm
    spec:
      containers:
        - name: litellm
          image: ghcr.io/berriai/litellm:main-latest
          ports:
            - containerPort: 8000
          volumeMounts:
            - name: config
              mountPath: /app
          command: ["litellm", "--config", "/app/config.yaml", "--port", "8000"]
      volumes:
        - name: config
          configMap:
            name: litellm-config
---
apiVersion: v1
kind: Service
metadata:
  name: litellm-service
spec:
  selector:
    app: litellm
  ports:
    - port: 80
      targetPort: 8000
  type: LoadBalancer

High Availability Setup¶

# litellm_config.yaml - Production
model_list:
  # Multiple instances of each model
  - model_name: gpt-4-ha
    litellm_params:
      model: gpt-4
      api_key: sk-key-1...

  - model_name: gpt-4-ha
    litellm_params:
      model: gpt-4
      api_key: sk-key-2...

  - model_name: gpt-4-ha
    litellm_params:
      model: gpt-4
      api_key: sk-key-3...

general_settings:
  master_key: ${LITELLM_MASTER_KEY}
  database_url: ${DATABASE_URL}

  # Observability
  success_callback: ["langfuse", "prometheus"]
  failure_callback: ["sentry"]

  # Performance
  num_workers: 4
  cache: true
  cache_params:
    type: redis
    host: redis-cluster
    port: 6379

router_settings:
  routing_strategy: latency-based-routing
  enable_fallbacks: true
  num_retries: 3
  timeout: 30
  cooldown_time: 60

Observability & Monitoring¶

Logging¶

# litellm_config.yaml
general_settings:
  success_callback: ["langfuse"] # Log successful requests
  failure_callback: ["sentry"] # Log failures

  # Langfuse integration for observability
  langfuse_public_key: ${LANGFUSE_PUBLIC_KEY}
  langfuse_secret_key: ${LANGFUSE_SECRET_KEY}

Prometheus Metrics¶

# litellm_config.yaml
general_settings:
  success_callback: ["prometheus"]
# Metrics available at http://localhost:8000/metrics
# - litellm_requests_total
# - litellm_request_duration_seconds
# - litellm_tokens_total
# - litellm_cost_total

Custom Logging¶

// Add custom metadata to requests
const result = await ai.generate({
  input: { text: "Your prompt" },
  provider: "openai-compatible",
  model: "gpt-4",
  metadata: {
    user_id: "user-123",
    team: "frontend",
    environment: "production",
  },
});

Troubleshooting¶

Common Issues¶

1. "Connection refused"¶

Problem: LiteLLM proxy not running.

Solution:

# Check if proxy is running
curl http://localhost:8000/health

# Start proxy
litellm --config litellm_config.yaml --port 8000

# Check logs
litellm --config config.yaml --debug

2. "Invalid API key"¶

Problem: Master key or virtual key incorrect.

Solution:

# Verify master_key in config
grep master_key litellm_config.yaml

# List all virtual keys
litellm --config config.yaml --list_keys

# Ensure key matches in .env
echo $OPENAI_COMPATIBLE_API_KEY

3. "Budget exceeded"¶

Problem: Virtual key reached budget limit.

Solution:

# Check spend
litellm --config config.yaml --spend --key "sk-litellm-..."

# Increase budget
litellm --config config.yaml --update_key \
  --key "sk-litellm-..." \
  --budget 200

4. "Model not found"¶

Problem: Model not configured in model_list.

Solution:

# Add model to litellm_config.yaml
model_list:
  - model_name: your-model
    litellm_params:
      model: gpt-4
      api_key: sk-...

# Restart proxy
litellm --config litellm_config.yaml

Best Practices¶

1. Use Virtual Keys¶

# ✅ Good: Separate keys per team
# Team Frontend: sk-litellm-frontend-abc
# Team Backend: sk-litellm-backend-xyz
# Each with own budget and model access

2. Enable Fallbacks¶

# ✅ Good: Configure fallback providers
router_settings:
  enable_fallbacks: true
  fallback_models: ["claude-3-5-sonnet-20241022", "gemini/gemini-pro"]

3. Implement Caching¶

# ✅ Good: Cache frequent queries
general_settings:
  cache: true
  cache_params:
    ttl: 3600 # 1 hour

4. Monitor Costs¶

# ✅ Good: Track spending
general_settings:
  success_callback: ["langfuse", "prometheus"]
# Set budgets per team
# Create alerts when budgets approach limits

5. Use Load Balancing¶

# ✅ Good: Distribute load across providers
model_list:
  - model_name: production-model
    litellm_params:
      model: gpt-4
      api_key: sk-1...

  - model_name: production-model
    litellm_params:
      model: claude-3-5-sonnet-20241022
      api_key: sk-ant-...

router_settings:
  routing_strategy: usage-based-routing

OpenAI Compatible Guide - OpenAI-compatible providers
Provider Setup Guide - General provider configuration
Cost Optimization - Reduce AI costs
Load Balancing - Distribution strategies

Additional Resources¶

LiteLLM Documentation - Official docs
Supported Providers - 100+ providers list
LiteLLM GitHub - Source code
LiteLLM Proxy Docs - Proxy setup

Need Help? Join our GitHub Discussions or open an issue.

LiteLLM Provider Guide¶

Overview¶

Key Benefits¶

Use Cases¶

Quick Start¶

Option 1: Direct Integration (SDK Only)¶

1. Install LiteLLM¶

2. Configure NeuroLink¶

3. Use via LiteLLM Python Client¶

Option 2: Proxy Server (Recommended for Teams)¶

1. Install LiteLLM¶

2. Create Configuration File¶

3. Start Proxy Server¶

4. Configure NeuroLink to Use Proxy¶

5. Test Setup¶

Provider Support¶

Supported Providers (100+)¶

Model Name Format¶

Advanced Features¶

1. Load Balancing¶

2. Automatic Failover¶

3. Budget Management¶

4. Rate Limiting¶

5. Caching¶

6. Virtual Keys (Team Management)¶

NeuroLink Integration¶

Basic Usage¶

Multi-Model Workflow¶

Cost Tracking¶

CLI Usage¶

Basic Commands¶

Proxy Management¶

Production Deployment¶

Docker Deployment¶

Docker Compose¶

Kubernetes Deployment¶

High Availability Setup¶

Observability & Monitoring¶

Logging¶

Prometheus Metrics¶

Custom Logging¶

Troubleshooting¶

Common Issues¶

1. "Connection refused"¶

2. "Invalid API key"¶

3. "Budget exceeded"¶

4. "Model not found"¶

Best Practices¶

1. Use Virtual Keys¶

2. Enable Fallbacks¶

3. Implement Caching¶

4. Monitor Costs¶

5. Use Load Balancing¶

Related Documentation¶

Additional Resources¶