Skip to main content

Overview

Bifrost exposes Prometheus metrics via two methods:
  1. Pull-based (Scraping): Traditional /metrics endpoint that Prometheus can scrape
  2. Push-based (Push Gateway): Push metrics to a Prometheus Push Gateway for cluster deployments
For multi-node deployments: Use the Push Gateway method to ensure accurate metric aggregation. Traditional scraping may miss nodes behind load balancers.

Pull-based Scraping

Bifrost automatically exposes a /metrics endpoint when the telemetry plugin is enabled (enabled by default). No additional configuration is needed.
When Bifrost’s authentication is enabled (auth_config.is_enabled = true), the /metrics endpoint requires Basic auth credentials. You must include the same admin_username and admin_password from your auth_config in the Prometheus scrape configuration. Without this, Prometheus will receive 401 Unauthorized responses and scraping will silently fail.

Prometheus Configuration

Add Bifrost to your Prometheus prometheus.yml:
scrape_configs:
  - job_name: 'bifrost'
    static_configs:
      - targets: ['bifrost-host:8080']
    scrape_interval: 15s
If Bifrost authentication is enabled, add basic_auth to your scrape config:
scrape_configs:
  - job_name: 'bifrost'
    static_configs:
      - targets: ['bifrost-host:8080']
    scrape_interval: 15s
    basic_auth:
      username: '<admin_username>'
      password: '<admin_password>'

Endpoint

GET /metrics
Returns metrics in Prometheus exposition format.

Push-based (Push Gateway)

For multi-node cluster deployments, the Prometheus plugin pushes metrics to a Prometheus Push Gateway. This ensures all nodes’ metrics are captured regardless of load balancer routing.

Configuration

FieldTypeRequiredDefaultDescription
push_gateway_urlstring | EnvVar✅ Yes-Push Gateway URL — supports env.VAR_NAME
job_namestring❌ NobifrostJob label for pushed metrics
instance_idstring❌ NohostnameInstance identifier for metric grouping
push_intervalinteger❌ No15Push interval in seconds (1-300)
basic_authobject❌ No-Basic auth credentials

Basic Auth Configuration

FieldTypeRequiredDescription
usernamestring | EnvVar✅ YesBasic auth username — supports env.VAR_NAME
passwordstring | EnvVar✅ YesBasic auth password — supports env.VAR_NAME

Setup

  1. Navigate to ObservabilityPrometheus in the Bifrost UI
  2. The /metrics endpoint is shown at the top for scraping configuration
  3. To enable Push Gateway:
    • Enter the Push Gateway URL
    • Configure Job Name and Push Interval as needed
    • Optionally set a custom Instance ID
    • Enable Basic Authentication if required
    • Toggle Enable Push Gateway on
    • Click Save Prometheus Configuration

Available Metrics

The following metrics are available from both the /metrics endpoint and Push Gateway:

HTTP Metrics

MetricTypeDescription
http_requests_totalCounterTotal HTTP requests by path, method, status
http_request_duration_secondsHistogramHTTP request latency
http_request_size_bytesHistogramRequest body size
http_response_size_bytesHistogramResponse body size

Bifrost LLM Metrics

MetricTypeDescription
bifrost_upstream_requests_totalCounterTotal requests to LLM providers
bifrost_upstream_latency_secondsHistogramProvider request latency
bifrost_success_requests_totalCounterSuccessful provider requests
bifrost_error_requests_totalCounterFailed provider requests
bifrost_input_tokens_totalCounterTotal input tokens processed
bifrost_output_tokens_totalCounterTotal output tokens generated
bifrost_cost_totalCounterTotal cost in USD
bifrost_cache_hits_totalCounterCache hits by type
bifrost_stream_first_token_latency_secondsHistogramTime to first token (streaming)
bifrost_stream_inter_token_latency_secondsHistogramInter-token latency (streaming)
bifrost_active_requestsGaugeLLM requests currently in-flight (labeled by method only)
bifrost_provider_key_upGaugePer-key health. 1 after a successful attempt, 0 after a failed attempt. Labels: provider, key_id, key_name.
bifrost_key_rotation_events_totalCounterKey rotations triggered by per-key failures — rate-limit (429), auth (401/403), or billing (402) — see below v1.5.0-prerelease4+
bifrost_request_retriesHistogramNumber of retries used per request (observed once per request; buckets 0,1,2,3,5,10).

Default Labels

Most request-level Bifrost LLM metrics include these labels (the bifrost_key_rotation_events_total counter is an exception — see Key Rotation Events below for its narrower label set):
  • provider - LLM provider name
  • model - Model identifier
  • alias - Alias resolved to this model (empty if none)
  • method - Request type (chat, completion, embedding, etc.)
  • virtual_key_id / virtual_key_name - Virtual key identifiers
  • routing_engine_used - Comma-separated list of routing engines that contributed to the decision (e.g. governance, routing-rule, loadbalancing, model-catalog)
  • routing_rule_id / routing_rule_name - Routing rule that matched the request
  • selected_key_id / selected_key_name - API key that successfully served the request ("" when all attempts failed)
  • fallback_index - Fallback position
  • team_id / team_name - Team identifiers (empty when governance is not used)
  • customer_id / customer_name - Customer identifiers (empty when governance is not used)
v1.5.0-prerelease4+: selected_key_id / selected_key_name are only populated when the request succeeds. On final errors both are empty — use the attempt_trail log field to see which keys were tried.

Key Rotation Events v1.5.0-prerelease4+

bifrost_key_rotation_events_total is incremented once per actual key rotation — i.e. when a per-key failure causes the next retry to switch to a different key. Rotation-triggering failures are bound to the specific key/account rather than the request:
  • 429 Too Many Requests — this key is rate-limited; another may have capacity.
  • 401 Unauthorized / 403 Forbidden — bad / revoked key, or key lacks permission.
  • 402 Payment Required — billing issue on this key’s account.
It is not incremented for:
  • terminal failures (no retry happens, including max_retries = 0 or every key permanently dead),
  • same-key retries on transient 5xx / network errors,
  • non-retryable request-bound 4xx (400/404/422/…).
Labels are attributed to the key that failed and triggered the rotation:
LabelValuesDescription
providere.g. openaiLLM provider
requested_modele.g. gpt-4oModel as requested (before any alias resolution)
key_idUUIDThe provider API key that failed and was rotated away from
key_namestringHuman-readable name of the provider API key
fail_reasonerror type stringReason the rotation fired: rate_limit_error (429), authentication_error (401/403), billing_error (402), or a provider-supplied error type for non-status-coded rate-limit messages
To inspect every attempted key on a failed request (including terminal failures that did not rotate), read the attempt_trail field on the corresponding log entry instead. Example queries:
# Rate of key rotations per provider
sum by (provider) (
  rate(bifrost_key_rotation_events_total[5m])
)

# Which specific keys are hitting rate limits most often
topk(5, sum by (provider, key_name) (
  rate(bifrost_key_rotation_events_total[1h])
))

Push Gateway Setup

If you don’t have a Push Gateway running, deploy one:

Docker

docker run -d -p 9091:9091 prom/pushgateway

Kubernetes (Helm)

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install pushgateway prometheus-community/prometheus-pushgateway

Configure Prometheus to Scrape Push Gateway

Add to your prometheus.yml:
scrape_configs:
  - job_name: 'pushgateway'
    honor_labels: true
    static_configs:
      - targets: ['pushgateway:9091']
The honor_labels: true setting is important - it preserves the job and instance labels pushed by Bifrost instead of overwriting them with the Push Gateway’s labels.

Pull vs Push: When to Use Each

ScenarioRecommended Method
Single Bifrost instancePull (scraping)
Multiple instances, direct accessPull (scraping)
Multiple instances behind load balancerPush (Push Gateway)
Kubernetes with service meshPull or Push
Serverless / ephemeral instancesPush (Push Gateway)

Why Push for Clusters?

When multiple Bifrost instances run behind a load balancer:
  1. Scraping randomness: Each scrape may hit different nodes, missing metrics from others
  2. Instance tracking: Push Gateway properly tracks per-instance metrics via instance label
  3. Aggregation: Downstream tools (Grafana, Datadog) can aggregate across all instances

Troubleshooting

Push Gateway Connection Failed

failed to push metrics to push gateway: connection refused
  • Verify the Push Gateway URL is correct and reachable from Bifrost
  • Check firewall rules between Bifrost and Push Gateway
  • Ensure Push Gateway is running: curl http://pushgateway:9091/metrics

Metrics Not Appearing

  • Verify the telemetry plugin is enabled (required for metrics collection)
  • Check Bifrost logs for push errors
  • Verify Prometheus is scraping the Push Gateway with honor_labels: true

Authentication Failed

  • Double-check username and password
  • Ensure basic auth is configured on the Push Gateway side
  • Check for special characters that may need escaping