Skip to main content

Overview

Cohere has a different API structure from OpenAI’s format. Bifrost performs conversions including:
  • Parameter renaming - e.g., max_completion_tokensmax_tokens, top_pp, stopstop_sequences
  • Message content conversion - String and content block formats handled
  • Tool conversion - Tool definitions and tool choice mapped to Cohere format
  • Thinking/Reasoning transformation - reasoning parameters mapped to Cohere’s thinking structure
  • Response format conversion - JSON schema handling adapted to Cohere’s format

Supported Operations

OperationNon-StreamingStreamingEndpoint
Chat Completions/v2/chat
Responses API/v2/chat
Embeddings-/v2/embed
List Models-/v1/models
Text Completions-
Image Generation-
Speech (TTS)-
Transcriptions (STT)-
Files-
Batch-
Unsupported Operations (❌): Text Completions, Image Generation, Speech, Transcriptions, Files, and Batch are not supported by the upstream Cohere API. These return UnsupportedOperationError.

Setup & Configuration

Configure Cohere as a provider.
Cohere provider dashboard
  1. Navigate to Models > Model Providers. Look for Cohere under Configured Providers. If it is missing, click on Add New Provider and select Cohere.
  2. Click Add Key or edit an existing key.
  3. Set a name for your key.
  4. Paste your API key directly or use an environment variable (for example, env.COHERE_API_KEY).
  5. Set Allowed Models to All Models (default) or the specific model allowlist you want this key to serve.
  6. Save the provider configuration.

1. Chat Completions

Request Parameters

Parameter Mapping

ParameterTransformation
max_completion_tokensRenamed to max_tokens
temperature, top_ppDirect pass-through for temperature; top_p renamed to p
stopRenamed to stop_sequences
frequency_penalty, presence_penaltyDirect pass-through
response_formatConverted to structured format (see Response Format)
toolsSchema structure adapted (see Tool Conversion)
tool_choiceType mapped (see Tool Conversion)
reasoningMapped to thinking (see Reasoning / Thinking)
userVia extra_params (not directly supported in Cohere v2 API)
top_kVia extra_params (Cohere-specific)

Dropped Parameters

The following parameters are silently ignored: logit_bias, logprobs, top_logprobs, seed, parallel_tool_calls, service_tier

Extra Parameters

Use extra_params (SDK) or pass directly in request body (Gateway) for Cohere-specific fields:
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cohere/command-r-plus",
    "messages": [{"role": "user", "content": "Hello"}],
    "top_k": 40,
    "safety_mode": "STRICT",
    "log_probs": true,
    "strict_tool_choice": false
  }'

Reasoning / Thinking

Documentation: See Bifrost Reasoning Reference

Parameter Mapping

  • reasoning.effortthinking.type (mapped to "enabled" or "disabled")
  • reasoning.max_tokensthinking.token_budget (token budget for thinking)

Critical Constraints

  • Minimum budget: 1 token required; requests with 0 tokens will be converted to disabled
  • Dynamic budget: -1 is converted to 1 automatically

Example

// Request
{"reasoning": {"effort": "high", "max_tokens": 2048}}

// Cohere conversion
{"thinking": {"type": "enabled", "token_budget": 2048}}

Message Conversion

Content Handling

  • String content: Messages can have simple string content
  • Content blocks: Messages can have arrays of content blocks (text, images, thinking)
  • Image conversion: image_url blocks with URL are supported
  • Tool calls: Converted from message assistant tool calls to Cohere format
  • Tool messages: Tool call results are passed with tool_call_id

Tool Conversion

Tool definitions are adapted to Cohere format with the following mappings:
  • Function namename (unchanged)
  • Function parametersparameters (flexible JSON format)
  • Strict mode (strict: true) is silently dropped (not supported)
Tool choice mapping:
  • "none""NONE"
  • "auto" or "required""REQUIRED" or "AUTO"
  • Specific tool selection → "REQUIRED" (Cohere uses function-level selection)

Response Format

Supported formats:
  • text - Plain text response
  • json_object - Structured JSON response
  • json_schema - JSON with schema validation (converted to json_object)
Schema is passed through response_format.json_schema field.

Response Conversion

Field Mapping

  • finish_reason: COMPLETE / STOP_SEQUENCEstop, MAX_TOKENSlength, TOOL_CALLtool_calls
  • input_tokensprompt_tokens | output_tokenscompletion_tokens
  • cached_tokensprompt_tokens_details.cached_tokens (if present)
  • Tool call arguments converted from string → string (no conversion needed, Cohere uses string format)

Streaming

Event sequence: message-startcontent-startcontent-deltacontent-endmessage-end Delta types:
  • content-delta with text → message content
  • content-delta with thinking → reasoning text
  • tool-call-start/delta/end → tool call events
  • tool-plan-delta → tool planning output

Caveats

Severity: Low Behavior: reasoning.max_tokens must be >= 1 Impact: Very low impact, conversion happens automatically Code: chat.go:104-130
Severity: Low Behavior: top_p parameter renamed to p Impact: Parameter name changes internally Code: chat.go:99
Severity: Low Behavior: strict: true in tool definitions silently dropped Impact: No schema validation enforcement Code: chat.go:168-185
Severity: Low Behavior: Tool arguments are already strings, no JSON serialization needed Impact: Minimal - Cohere v2 API expects string format Code: chat.go:70-78

2. Responses API

The Responses API uses the same underlying /v2/chat endpoint but converts between OpenAI’s Responses format and Cohere’s format.

Request Parameters

Parameter Mapping

ParameterTransformation
max_output_tokensRenamed to max_tokens
temperature, top_ppDirect pass-through for temperature; top_p renamed to p
instructionsBecomes system message
text.formatConverted to response_format
toolsSchema restructured (see Chat Completions)
tool_choiceType mapped (see Chat Completions)
reasoningMapped to thinking (see Reasoning / Thinking)
stopVia extra_params, renamed to stop_sequences
top_kVia extra_params (Cohere-specific)
frequency_penalty, presence_penaltyVia extra_params

Extra Parameters

Use extra_params (SDK) or pass directly in request body (Gateway):
curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cohere/command-r-plus",
    "input": "Hello, how are you?",
    "top_k": 40,
    "stop": [".", "!"]
  }'

Input & Instructions

  • Input: String converted to user message or array converted to messages
  • Instructions: Becomes system message (prepended to messages)

Tool Support

Supported types: function Tool conversions same as Chat Completions.

Response Conversion

  • textmessage | tool_usefunction_call
  • input_tokens / output_tokens preserved
  • Token details with cached tokens support

Streaming

Event sequence: message-startcontent-startcontent-deltacontent-endmessage-end Special handling:
  • Tool call arguments accumulated across chunks
  • Synthetic output_item.added events emitted for text/reasoning
  • Stable item IDs generated as msg_{messageID}_item_{outputIndex}

3. Embeddings

Request Parameters

Parameter Mapping

ParameterTransformation
input (text or array)Converted to texts array
dimensionsRenamed to output_dimension
input_typeVia extra_params (required, defaults to "search_document")
embedding_typesVia extra_params (array of embedding types)
truncateVia extra_params (how to handle long inputs)
max_tokensVia extra_params (max tokens to embed per input)

Extra Parameters

Use extra_params for Cohere-specific embedding options:
curl -X POST http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cohere/embed-english-v3.0",
    "input": ["text to embed"],
    "input_type": "search_query",
    "embedding_types": ["float"],
    "truncate": "START"
  }'

Critical Notes

  • Input Type Required: Cohere v3+ models require input_type parameter (defaults to "search_document")
  • Embedding Types: Specify which embedding types to return (e.g., "float", "int8")

Response Conversion

  • embeddings.floatdata[].embedding
  • meta.tokens → usage information
  • Multiple embedding types handled

4. List Models

Request: GET /v1/models?page_size={defaultPageSize} Field mapping: Model data converted to standard format Pagination: Cursor-based with next_page_token Note: endpoint and default_only filters available via extra_params