Skip to main content

Overview

Bifrost provides built-in telemetry and monitoring capabilities through Prometheus metrics collection. The telemetry system tracks both HTTP-level performance metrics and upstream provider interactions, giving you complete visibility into your AI gateway’s performance and usage patterns. Key Features:
  • Prometheus Integration - Native metrics collection at /metrics endpoint
  • Comprehensive Tracking - Success/error rates, token usage, costs, and cache performance
  • Custom Labels - Configurable dimensions for detailed analysis
  • Dynamic Headers - Runtime label injection via x-bf-prom-* headers
  • Cost Monitoring - Real-time tracking of AI provider costs in USD
  • Cache Analytics - Direct and semantic cache hit tracking
  • Async Collection - Zero-latency impact on request processing
  • Multi-Level Tracking - HTTP transport + upstream provider metrics
The telemetry plugin operates asynchronously to ensure metrics collection doesn’t impact request latency or connection performance.

Default Metrics

HTTP Transport Metrics

These metrics track all incoming HTTP requests to Bifrost:
MetricTypeDescription
http_requests_totalCounterTotal number of HTTP requests
http_request_duration_secondsHistogramDuration of HTTP requests
http_request_size_bytesHistogramSize of incoming HTTP requests
http_response_size_bytesHistogramSize of outgoing HTTP responses
Labels:
  • path: HTTP endpoint path
  • method: HTTP verb (e.g., GET, POST, PUT, DELETE)
  • status: HTTP status code
  • custom labels: Custom labels configured in the Bifrost configuration

Upstream Provider Metrics

These metrics track requests forwarded to AI providers:
MetricTypeDescriptionLabels
bifrost_upstream_requests_totalCounterTotal requests forwarded to upstream providersBase Labels, custom labels
bifrost_success_requests_totalCounterTotal successful requests to upstream providersBase Labels, custom labels
bifrost_error_requests_totalCounterTotal failed requests to upstream providersBase Labels, reason, custom labels
bifrost_upstream_latency_secondsHistogramLatency of upstream provider requestsBase Labels, is_success, custom labels
bifrost_input_tokens_totalCounterTotal input tokens sent to upstream providersBase Labels, custom labels
bifrost_output_tokens_totalCounterTotal output tokens received from upstream providersBase Labels, custom labels
bifrost_cache_hits_totalCounterTotal cache hits by type (direct/semantic)Base Labels, cache_type, custom labels
bifrost_cost_totalCounterTotal cost in USD for upstream provider requestsBase Labels, custom labels
Base Labels:
  • provider: AI provider name (e.g., openai, anthropic, azure)
  • model: Model name (e.g., gpt-4o-mini, claude-3-sonnet)
  • method: Request type (chat, text, embedding, speech, transcription)
  • virtual_key_id: Virtual key ID
  • virtual_key_name: Virtual key name
  • selected_key_id: Selected key ID
  • selected_key_name: Selected key name
  • number_of_retries: Number of retries
  • fallback_index: Fallback index (0 for first attempt, 1 for second attempt, etc.)
  • custom labels: Custom labels configured in the Bifrost configuration

Streaming Metrics

These metrics capture latency characteristics specific to streaming responses:
MetricTypeDescriptionLabels
bifrost_stream_first_token_latency_secondsHistogramTime from request start to first streamed tokenBase Labels
bifrost_stream_inter_token_latency_secondsHistogramLatency between subsequent streamed tokensBase Labels

Monitoring Examples

Success Rate Monitoring

Track the success rate of requests to different providers:
# Success rate by provider
rate(bifrost_success_requests_total[5m]) / 
rate(bifrost_upstream_requests_total[5m]) * 100

Token Usage Analysis

Monitor token consumption across different models:
# Input tokens per minute by model
increase(bifrost_input_tokens_total[1m])

# Output tokens per minute by model  
increase(bifrost_output_tokens_total[1m])

# Token efficiency (output/input ratio)
rate(bifrost_output_tokens_total[5m]) / 
rate(bifrost_input_tokens_total[5m])

Cost Tracking

Monitor spending across providers and models:
# Cost per second by provider
sum by (provider) (rate(bifrost_cost_total[1m]))

# Daily cost estimate
sum by (provider) (increase(bifrost_cost_total[1d]))

# Cost per request by provider and model
sum by (provider, model) (rate(bifrost_cost_total[5m])) / 
sum by (provider, model) (rate(bifrost_upstream_requests_total[5m]))

Cache Performance

Track cache effectiveness:
# Cache hit rate by type
rate(bifrost_cache_hits_total[5m]) / 
rate(bifrost_upstream_requests_total[5m]) * 100

# Direct vs semantic cache hits
sum by (cache_type) (rate(bifrost_cache_hits_total[5m]))

Error Rate Analysis

Monitor error patterns:
# Error rate by provider
rate(bifrost_error_requests_total[5m]) / 
rate(bifrost_upstream_requests_total[5m]) * 100

# Errors by model
sum by (model) (rate(bifrost_error_requests_total[5m]))

Configuration

Configure custom Prometheus labels to add dimensions for filtering and analysis:
  • Web UI
  • API
  • config.json
Prometheus Labels
  1. Navigate to Configuration
    • Open Bifrost UI at http://localhost:8080
    • Go to Config tab
  2. Prometheus Labels
    Custom Labels: team, environment, organization, project
    

Dynamic Label Injection

Add custom label values at runtime using x-bf-prom-* headers:
# Add custom labels to specific requests
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-bf-prom-team: engineering" \
  -H "x-bf-prom-environment: production" \
  -H "x-bf-prom-organization: my-org" \
  -H "x-bf-prom-project: my-project" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
Header Format:
  • Prefix: x-bf-prom-
  • Label name: Any string after the prefix
  • Value: String value for the label

Infrastructure Setup

Development & Testing

For local development and testing, use the provided Docker Compose setup:
# Navigate to telemetry plugin directory
cd plugins/telemetry

# Start Prometheus and Grafana
docker-compose up -d

# Access endpoints
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000 (admin/admin)
# Bifrost metrics: http://localhost:8080/metrics
Development Only: The provided Docker Compose setup is for testing purposes only. Do not use in production without proper security, scaling, and persistence configuration.
You can use the Prometheus scraping endpoint to create your own Grafana dashboards. Given below are few examples created using the Docker Compose setup. Grafana Dashboard

Production Deployment

For production environments:
  1. Deploy Prometheus with proper persistence, retention, and security
  2. Configure scraping to target your Bifrost instances at /metrics
  3. Set up Grafana with authentication and dashboards
  4. Configure alerts based on your SLA requirements
Prometheus Scrape Configuration:
scrape_configs:
  - job_name: "bifrost-gateway"
    static_configs:
      - targets: ["bifrost-instance-1:8080", "bifrost-instance-2:8080"]
    scrape_interval: 30s
    metrics_path: /metrics

Production Alerting Examples

Configure alerts for critical scenarios using the new metrics: High Error Rate Alert:
- alert: BifrostHighErrorRate
  expr: sum by (provider) (rate(bifrost_error_requests_total[5m])) / sum by (provider) (rate(bifrost_upstream_requests_total[5m])) > 0.05
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: "High error rate detected for provider {{ $labels.provider }} ({{ $value | humanizePercentage }})"
High Cost Alert:
- alert: BifrostHighCosts
  expr: sum by (provider) (increase(bifrost_cost_total[1d])) > 100  # $100/day threshold
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "Daily cost for provider {{ $labels.provider }} exceeds $100 ({{ $value | printf \"%.2f\" }})"
Cache Performance Alert:
- alert: BifrostLowCacheHitRate
  expr: sum by (provider) (rate(bifrost_cache_hits_total[15m])) / sum by (provider) (rate(bifrost_upstream_requests_total[15m])) < 0.1
  for: 5m
  labels:
    severity: info
  annotations:
    summary: "Cache hit rate for provider {{ $labels.provider }} below 10% ({{ $value | humanizePercentage }})"

Next Steps