Create async chat completion

Authorizations

Authorization

string

header

required

Bearer token authentication. Use your provider API key or Bifrost authentication token. Virtual keys (prefixed with sk-bf-) can also be passed here.

Headers

x-bf-async-job-result-ttl

integer

default:3600

Time-to-live in seconds for the job result after completion. Defaults to 3600 (1 hour). After expiry, the job result is automatically cleaned up.

Body

application/json

model

string

required

Model in provider/model format (e.g., openai/gpt-4)

Example:

"openai/gpt-4"

messages

object[]

required

List of messages in the conversation

Show child attributes

fallbacks

string[]

Fallback models in provider/model format

stream

boolean

Whether to stream the response

frequency_penalty

number

Required range: -2 <= x <= 2

logit_bias

object

Show child attributes

logprobs

boolean

max_completion_tokens

integer

metadata

object

modalities

string[]

parallel_tool_calls

boolean

presence_penalty

number

Required range: -2 <= x <= 2

prompt_cache_key

string

reasoning

object

Show child attributes

response_format

object

Format for the response

safety_identifier

string

service_tier

string

stream_options

object

Show child attributes

store

boolean

temperature

number

Required range: 0 <= x <= 2

tool_choice

Available options:

none,

auto,

required

tools

object[]

Show child attributes

seed

integer

Deterministic sampling seed

top_p

number

Nucleus sampling parameter

Required range: 0 <= x <= 1

top_logprobs

integer

Number of most likely tokens to return at each position

Required range: 0 <= x <= 20

stop

Up to 4 sequences where the API will stop generating tokens

prediction

object

Predicted output content for the model to reference (OpenAI only). Can reduce latency.

Show child attributes

prompt_cache_retention

enum<string>

Prompt cache retention policy

Available options:

in-memory,

24h

web_search_options

object

Web search options for chat completions (OpenAI only)

Show child attributes

truncation

string

user

string

verbosity

enum<string>

Available options:

low,

medium,

high

Response

Job accepted for processing

Response returned when creating or polling an async job

string

required

Unique identifier for the async job

status

enum<string>

required

The status of an async job

Available options:

pending,

processing,

completed,

failed

created_at

string<date-time>

required

When the job was created

expires_at

string<date-time>

When the job result expires and will be cleaned up

completed_at

string<date-time>

When the job completed (successfully or with failure)

status_code

integer

HTTP status code of the completed operation

result

any

The result of the completed operation (shape depends on the request type)

error

object

Error response from Bifrost

Show child attributes