Skip to main content

Overview

Fireworks is an OpenAI-compatible provider in Bifrost with native support for:
  • Chat Completions via /v1/chat/completions
  • Responses API via /v1/responses
  • Text Completions via /v1/completions
  • Embeddings via /v1/embeddings
  • Streaming for chat, responses, and completions
  • Tool calling for chat and responses
Unless noted below, Fireworks follows the standard OpenAI-compatible request and response behavior described in OpenAI.

Supported Operations

OperationNon-StreamingStreamingEndpoint
Chat Completions/v1/chat/completions
Responses API/v1/responses
Text Completions/v1/completions
Embeddings/v1/embeddings
List Models-/v1/models
Images-
Speech / Transcription-
Files-
Batch-
Count Tokens-
Fireworks Responses support is native in Bifrost. Requests are sent to Fireworks’ /v1/responses endpoint directly, so fields such as previous_response_id, max_tool_calls, and store are preserved.

Setup & Configuration

Configure Fireworks as a provider.
Fireworks AI provider dashboard
  1. Navigate to Models > Model Providers. Look for Fireworks under Configured Providers. If it is missing, click on Add New Provider and select Fireworks.
  2. Click Add Key or edit an existing key.
  3. Set a name for your key.
  4. Paste your API key directly or use an environment variable (for example, env.FIREWORKS_API_KEY).
  5. Set Allowed Models to All Models (default) or the specific model allowlist you want this key to serve.
  6. Save the provider configuration.

1. Chat Completions

Fireworks chat completions use the standard OpenAI-compatible wire format.

Fireworks-specific handling

  • prediction is preserved and forwarded.
  • Bifrost maps prompt_cache_key to Fireworks prompt_cache_isolation_key for chat-completion cache isolation.
  • Assistant reasoning_content is preserved for Fireworks chat-completion models that support reasoning history.

Filtered Parameters

For Fireworks chat completions, Bifrost removes or rewrites a small set of OpenAI-specific fields before sending the request upstream:
  • prompt_cache_key is mapped to Fireworks prompt_cache_isolation_key
  • prompt_cache_retention is removed
  • verbosity is removed
  • store is removed
  • web_search_options is removed

Example

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fireworks/accounts/fireworks/models/deepseek-v3p2",
    "messages": [
      {"role": "user", "content": "Reply with exactly: fireworks ok"}
    ]
  }'

2. Responses API

Fireworks Responses use the native Fireworks endpoint:
/v1/responses
This preserves Responses-only fields and semantics, including:
  • previous_response_id
  • max_tool_calls
  • store
  • native responses streaming

Example

curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fireworks/accounts/fireworks/models/deepseek-v3p2",
    "input": [
      {"role": "user", "content": "Reply with exactly: responses ok"}
    ],
    "max_tool_calls": 2
  }'
For continuation requests, Fireworks also supports previous_response_id.

3. Text Completions

Fireworks text completions are sent to the native completions endpoint:
/v1/completions

Example

curl -X POST http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fireworks/accounts/fireworks/models/deepseek-v3p2",
    "prompt": "In fruits, A is for apple and B is for"
  }'
For Fireworks text completions, Bifrost extracts prompt_cache_key from extra_params and maps it to Fireworks prompt_cache_isolation_key.

4. Embeddings

Fireworks embeddings are sent to:
/v1/embeddings
Embedding-capable models may be different from chat/completions models.

Example

curl -X POST http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fireworks/nomic-ai/nomic-embed-text-v1.5",
    "input": "embedding test"
  }'
Fireworks documents additional embedding-specific fields such as prompt_template, return_logits, and normalize. This page describes the standard embeddings flow currently covered by Bifrost.

5. Unsupported Features

The following operations are still unsupported by the Fireworks provider in Bifrost:
FeatureStatus
Image generation / editing / variations
Speech / TTS
Transcription / STT
Files
Batch
Count tokens
Rerank

6. Caveats

For Fireworks chat completions, Bifrost maps prompt_cache_key to Fireworks prompt_cache_isolation_key, which is the Fireworks body field for cache isolation. Fireworks also accepts the header form x-prompt-cache-isolation-key. For text completions, Bifrost extracts prompt_cache_key from extra_params and maps it to the same Fireworks body field. If you need Fireworks session-affinity behavior, pass user, configure x-session-affinity in provider extra headers, or send it through the HTTP gateway via x-bf-eh-x-session-affinity. Live cache-hit behavior remains model and deployment dependent.
Bifrost preserves assistant reasoning_content for Fireworks chat models that support reasoning history. Fireworks-specific reasoning controls such as reasoning_history are not given special typed handling in this provider page.