Ryvos

Ryvos supports 18+ LLM providers out of the box. Every provider implements the same LlmClient trait, giving you streaming responses, tool calling, and token estimation regardless of which model you choose.

Provider Overview

Provider	Type	Tool Calling	Streaming	Notes
Anthropic	Cloud	Yes	Yes	Recommended. Native Messages API.
OpenAI	Cloud	Yes	Yes	Chat Completions API.
Gemini	Cloud	Yes	Yes	Google native Gemini API.
Azure OpenAI	Cloud	Yes	Yes	Enterprise Azure deployments.
Cohere	Cloud	Yes	Yes	Cohere v2 Chat API.
AWS Bedrock	Cloud	Yes	Yes	AWS-managed models.
Ollama	Local	Yes	Yes	Run models locally. No API key needed.
Groq	Cloud	Yes	Yes	Ultra-fast inference.
OpenRouter	Cloud	Yes	Yes	Multi-model gateway.
Together	Cloud	Yes	Yes	Together AI inference.
Fireworks	Cloud	Yes	Yes	Fireworks AI.
Cerebras	Cloud	Yes	Yes	Cerebras inference.
xAI	Cloud	Yes	Yes	Grok models.
Mistral	Cloud	Yes	Yes	Mistral AI.
Perplexity	Cloud	Yes	Yes	Search-augmented LLM.
DeepSeek	Cloud	Yes	Yes	DeepSeek models.
Claude Code	CLI	Yes	Yes	Wraps Claude Code CLI subprocess.
Copilot	CLI	Yes	Yes	Wraps GitHub Copilot CLI (JSONL).
Custom	Any	Varies	Yes	Any OpenAI-compatible endpoint.

Cloud Providers

Anthropic (Recommended)

[model]
provider = "anthropic"
model_id = "claude-sonnet-4-20250514"
api_key = "$\{ANTHROPIC_API_KEY\}"

Supported models: claude-sonnet-4-20250514, claude-opus-4-20250514, claude-haiku-4-20250506, claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022

Ryvos uses Anthropic's native Messages API with streaming, extended thinking support, and accurate token estimation via tiktoken.

:::tip Anthropic models have the best tool-calling reliability. Claude Sonnet 4 is the recommended default for the best balance of speed, cost, and capability. :::

OpenAI

[model]
provider = "openai"
model_id = "gpt-4o"
api_key = "$\{OPENAI_API_KEY\}"

Supported models: gpt-4o, gpt-4o-mini, gpt-4-turbo, o1, o1-mini, o3-mini

Google Gemini

[model]
provider = "gemini"
model_id = "gemini-2.5-pro"
api_key = "$\{GOOGLE_API_KEY\}"

Supported models: gemini-2.5-pro, gemini-2.5-flash, gemini-2.0-flash

Uses the native Gemini API (not the OpenAI compatibility layer) for full feature support.

Azure OpenAI

[model]
provider = "azure"
model_id = "gpt-4o"
api_key = "$\{AZURE_OPENAI_KEY\}"
base_url = "https://your-resource.openai.azure.com/openai/deployments/gpt-4o"

The base_url must point to your specific Azure deployment. The API key is sent as the api-key header.

Cohere

[model]
provider = "cohere"
model_id = "command-r-plus"
api_key = "$\{COHERE_API_KEY\}"

Uses Cohere's v2 Chat API with native tool calling support.

AWS Bedrock

[model]
provider = "bedrock"
model_id = "anthropic.claude-3-5-sonnet-20241022-v2:0"
base_url = "https://bedrock-runtime.us-east-1.amazonaws.com"

:::note Bedrock authentication uses your AWS credentials from environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN) or IAM roles. No api_key field is needed. :::

Local Providers

Ollama

[model]
provider = "ollama"
model_id = "llama3.1:70b"

No API key required. Ryvos connects to Ollama at http://localhost:11434 by default.

# Install and start Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3.1:70b
ollama serve

Override the endpoint for remote Ollama instances:

[model]
provider = "ollama"
model_id = "llama3.1:70b"
base_url = "http://192.168.1.100:11434"

:::tip Ollama works great as a fallback model. Configure a cloud primary with local Ollama fallback for resilience:

[model]
provider = "anthropic"
model_id = "claude-sonnet-4-20250514"
api_key = "$\{ANTHROPIC_API_KEY\}"
 
[[fallback_models]]
provider = "ollama"
model_id = "llama3.1:70b"

:::

OpenAI-Compatible Providers

These providers all use the OpenAI Chat Completions API format. Ryvos auto-fills the correct base_url and headers via apply_preset_defaults().

Groq

[model]
provider = "groq"
model_id = "llama-3.3-70b-versatile"
api_key = "$\{GROQ_API_KEY\}"

Known for extremely fast inference. Good for high-throughput tasks.

OpenRouter

[model]
provider = "openrouter"
model_id = "anthropic/claude-sonnet-4-20250514"
api_key = "$\{OPENROUTER_API_KEY\}"

Access hundreds of models through a single API. Model IDs use the provider/model format.

Together AI

[model]
provider = "together"
model_id = "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo"
api_key = "$\{TOGETHER_API_KEY\}"

Fireworks AI

[model]
provider = "fireworks"
model_id = "accounts/fireworks/models/llama-v3p1-70b-instruct"
api_key = "$\{FIREWORKS_API_KEY\}"

Cerebras

[model]
provider = "cerebras"
model_id = "llama3.1-70b"
api_key = "$\{CEREBRAS_API_KEY\}"

Ultra-fast inference on Cerebras wafer-scale hardware.

xAI (Grok)

[model]
provider = "xai"
model_id = "grok-2"
api_key = "$\{XAI_API_KEY\}"

Mistral AI

[model]
provider = "mistral"
model_id = "mistral-large-latest"
api_key = "$\{MISTRAL_API_KEY\}"

Perplexity

[model]
provider = "perplexity"
model_id = "llama-3.1-sonar-large-128k-online"
api_key = "$\{PERPLEXITY_API_KEY\}"

Perplexity models include built-in web search. Useful for research-heavy tasks.

DeepSeek

[model]
provider = "deepseek"
model_id = "deepseek-chat"
api_key = "$\{DEEPSEEK_API_KEY\}"

CLI Subprocess Providers

These providers wrap external CLI tools as subprocesses.

Claude Code

[model]
provider = "claude_code"
model_id = "claude_code"

Wraps the claude CLI. Must be installed separately (npm install -g @anthropic-ai/claude-code). Useful if you have Claude Code access but not a direct Anthropic API key.

GitHub Copilot

[model]
provider = "copilot"
model_id = "copilot"

Wraps the gh copilot CLI. Requires GitHub Copilot subscription and the gh CLI. Communication uses JSONL format (rewritten in v0.4.4 for reliability).

Custom / Self-Hosted

Use any OpenAI-compatible endpoint:

[model]
provider = "openai"
model_id = "your-model-name"
api_key = "$\{YOUR_API_KEY\}"
base_url = "http://localhost:8080/v1"
extra_headers = { "X-Custom-Header" = "value" }

This works with:

vLLM — base_url = "http://localhost:8000/v1"
llama.cpp server — base_url = "http://localhost:8080/v1"
LocalAI — base_url = "http://localhost:8080/v1"
LiteLLM proxy — base_url = "http://localhost:4000/v1"
Text Generation Inference — base_url = "http://localhost:8080/v1"

Fallback Chain

Configure multiple fallback models for resilience:

[model]
provider = "anthropic"
model_id = "claude-sonnet-4-20250514"
api_key = "$\{ANTHROPIC_API_KEY\}"
 
[[fallback_models]]
provider = "openai"
model_id = "gpt-4o"
api_key = "$\{OPENAI_API_KEY\}"
 
[[fallback_models]]
provider = "ollama"
model_id = "llama3.1:70b"

Ryvos tries each model in order when the previous one fails (network error, rate limit, timeout). The fallback is transparent to the conversation.

Streaming Architecture

All providers implement the LlmClient trait with streaming:

LLM Provider --> SSE/JSONL stream --> StreamDelta events --> Agent Loop

StreamDelta variants:

TextDelta — Partial text response
ThinkingDelta — Extended thinking content (Anthropic)
ToolUseStart — Beginning of a tool call
ToolInputDelta — Partial tool input JSON
Stop — Stream complete
Usage — Token counts

The streaming system includes a RetryingClient wrapper for automatic exponential backoff on transient failures.

Verify Your Provider

After configuring a provider, verify it works:

ryvos doctor

The doctor command tests API connectivity for your primary and all fallback models.

Next Steps

Configuration — Full config reference
CLI Reference — All commands and flags
Budget System — Control spending per provider