Routing

Overview

Bifrost’s routing capabilities offer granular control over how requests are directed to different AI models and providers. By configuring routing rules on a Virtual Key, you can enforce which providers and models are accessible, implement sophisticated load balancing strategies, create automatic fallbacks, and restrict access to specific provider API keys. This powerful feature enables key use cases like:

Resilience & Failover: Automatically fall back to a secondary provider if the primary one fails.
Environment Separation: Dedicate specific virtual keys to development, testing, and production environments with different provider and key access.
Cost Management: Route traffic to cheaper models or providers based on weights to optimize costs.
Fine-grained Access Control: Ensure that different teams or applications only use the models and API keys they are explicitly permitted to.

Provider/Model Restrictions

Virtual Keys can be restricted to use only specific provider/models. When provider/model restrictions are configured, the VK can only access those designated provider/models, providing fine-grained control over which provider/models different users or applications can utilize. How It Works:

No Restrictions (default): VK can use any available provider/models based on global configuration
With Restrictions: VK limited to only the specified provider/models with weighted load balancing

Weighted Load Balancing

When you configure multiple providers on a Virtual Key, Bifrost automatically implements weighted load balancing. Each provider is assigned a weight, and requests are distributed proportionally. Example Configuration:

Virtual Key: vk-prod-main
├── OpenAI 
│   ├── Allowed Models: [gpt-4o, gpt-4o-mini]
│   └── Weight: 0.2 (20% of traffic)
└── Azure
    ├── Allowed Models: [gpt-4o]
    └── Weight: 0.8 (80% of traffic)

Load Balancing Behavior:

For gpt-4o: 80% Azure, 20% OpenAI (both providers support it)
For gpt-4o-mini: 100% OpenAI (only provider that supports it)

Usage: To trigger weighted load balancing, send requests with just the model name:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "x-bf-vk: vk-prod-main" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'

To bypass load balancing and target a specific provider:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "x-bf-vk: vk-prod-main" \
  -d '{"model": "openai/gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'

Weights are automatically normalized to a sum 1.0 based on the weights of all providers available on the VK for the given model.

Automatic Fallbacks

When multiple providers are configured on a Virtual Key, Bifrost automatically creates fallback chains for resilience. This feature provides automatic failover without manual intervention. How It Works:

Only activated when: Your request has no existing fallbacks array in the request body
Fallback creation: Providers are sorted by weight (highest first) and added as fallbacks
Respects existing fallbacks: If you manually specify fallbacks, they are preserved

Example Request Flow:

Primary request goes to weighted-selected provider (e.g., Azure with 80% weight)
If Azure fails, automatically retry with OpenAI
Continue until success or all providers exhausted

Request with automatic fallbacks:

# This request will get automatic fallbacks
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "x-bf-vk: vk-prod-main" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'

Request with manual fallbacks (no automatic fallbacks added):

# This request keeps your specified fallbacks
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "x-bf-vk: vk-prod-main" \
  -d '{
    "model": "gpt-4o", 
    "messages": [{"role": "user", "content": "Hello!"}],
    "fallbacks": ["anthropic/claude-3-sonnet-20240229"]
  }'

Setting Provider/Model Routing

Web UI
API
config.json

Go to Virtual Keys
Create/Edit virtual key
In Provider Configurations section, add the provider you want to restrict the VK to
Add the models you want to restrict the VK to, or leave it blank to allow all models for this provider
Add the weight you want to give to this provider
Click on the Save button

API Key Restrictions

Virtual Keys can be restricted to use only specific provider API keys. When key restrictions are configured, the VK can only access those designated keys, providing fine-grained control over which API keys different users or applications can utilize. How It Works:

No Restrictions (default): VK can use any available provider keys based on load balancing
With Restrictions: VK limited to only the specified key IDs, regardless of other available keys

Example Scenario:

Available Provider Keys:
├── key-prod-001 → sk-prod-key... (Production OpenAI key)
├── key-dev-002  → sk-dev-key...  (Development OpenAI key)  
└── key-test-003 → sk-test-key... (Testing OpenAI key)

Virtual Key Restrictions:
├── vk-prod-main
│   ├── Allowed Models: [gpt-4o]
│   └── Restricted Keys: [key-prod-001] ← ONLY production key
├── vk-dev-main  
│   ├── Allowed Models: [gpt-4o-mini]
│   └── Restricted Keys: [key-dev-002, key-test-003] ← Dev + test keys
└── vk-unrestricted
    ├── Allowed Models: [all models]
    └── Restricted Keys: [] ← Can use ANY available key

Request Behavior:

# Production VK - will ONLY use key-prod-001
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "x-bf-vk: vk-prod-main" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'

# Development VK - will load balance between key-dev-002 and key-test-003
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "x-bf-vk: vk-dev-main" \
  -d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello!"}]}'

# VK with no key restrictions - can use any available OpenAI key
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "x-bf-vk: vk-unrestricted" \
  -d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello!"}]}'

Setting API Key Restrictions:

Web UI
API
config.json

Go to Virtual Keys
Create/Edit virtual key
In Allowed Keys section, select the API key you want to restrict the VK to
Click on the Save button

Use Cases:

Environment Separation - Production VKs use production keys, dev VKs use dev keys
Cost Control - Different teams use keys with different billing accounts
Access Control - Restrict sensitive keys to specific VKs only
Compliance - Ensure certain workloads only use compliant/audited keys

The models restrictions applied on the keys of individual providers will always be applied and will work together with the provider/model or api key restrictions set on the virtual key.

Quick Start

Models Catalog

Provider Integrations

Custom plugins

Open Source Features

Enterprise Features

Overview

Provider/Model Restrictions

Weighted Load Balancing

Automatic Fallbacks

Setting Provider/Model Routing

API Key Restrictions

Quick Start

Models Catalog

Provider Integrations

Custom plugins

Open Source Features

Enterprise Features

​Overview

​Provider/Model Restrictions

​Weighted Load Balancing

​Automatic Fallbacks

​Setting Provider/Model Routing

​API Key Restrictions

Overview

Provider/Model Restrictions

Weighted Load Balancing

Automatic Fallbacks

Setting Provider/Model Routing

API Key Restrictions