DocsGetting StartedLLM Providers

Ryvos supports 18+ LLM providers out of the box. Every provider implements the same LlmClient trait, giving you streaming responses, tool calling, and token estimation regardless of which model you choose.

Provider Overview

ProviderTypeTool CallingStreamingNotes
AnthropicCloudYesYesRecommended. Native Messages API.
OpenAICloudYesYesChat Completions API.
GeminiCloudYesYesGoogle native Gemini API.
Azure OpenAICloudYesYesEnterprise Azure deployments.
CohereCloudYesYesCohere v2 Chat API.
AWS BedrockCloudYesYesAWS-managed models.
OllamaLocalYesYesRun models locally. No API key needed.
GroqCloudYesYesUltra-fast inference.
OpenRouterCloudYesYesMulti-model gateway.
TogetherCloudYesYesTogether AI inference.
FireworksCloudYesYesFireworks AI.
CerebrasCloudYesYesCerebras inference.
xAICloudYesYesGrok models.
MistralCloudYesYesMistral AI.
PerplexityCloudYesYesSearch-augmented LLM.
DeepSeekCloudYesYesDeepSeek models.
Claude CodeCLIYesYesWraps Claude Code CLI subprocess.
CopilotCLIYesYesWraps GitHub Copilot CLI (JSONL).
CustomAnyVariesYesAny OpenAI-compatible endpoint.

Cloud Providers

Anthropic (Recommended)

[model]
provider = "anthropic"
model_id = "claude-sonnet-4-20250514"
api_key = "$\{ANTHROPIC_API_KEY\}"

Supported models: claude-sonnet-4-20250514, claude-opus-4-20250514, claude-haiku-4-20250506, claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022

Ryvos uses Anthropic's native Messages API with streaming, extended thinking support, and accurate token estimation via tiktoken.

:::tip Anthropic models have the best tool-calling reliability. Claude Sonnet 4 is the recommended default for the best balance of speed, cost, and capability. :::

OpenAI

[model]
provider = "openai"
model_id = "gpt-4o"
api_key = "$\{OPENAI_API_KEY\}"

Supported models: gpt-4o, gpt-4o-mini, gpt-4-turbo, o1, o1-mini, o3-mini

Google Gemini

[model]
provider = "gemini"
model_id = "gemini-2.5-pro"
api_key = "$\{GOOGLE_API_KEY\}"

Supported models: gemini-2.5-pro, gemini-2.5-flash, gemini-2.0-flash

Uses the native Gemini API (not the OpenAI compatibility layer) for full feature support.

Azure OpenAI

[model]
provider = "azure"
model_id = "gpt-4o"
api_key = "$\{AZURE_OPENAI_KEY\}"
base_url = "https://your-resource.openai.azure.com/openai/deployments/gpt-4o"

The base_url must point to your specific Azure deployment. The API key is sent as the api-key header.

Cohere

[model]
provider = "cohere"
model_id = "command-r-plus"
api_key = "$\{COHERE_API_KEY\}"

Uses Cohere's v2 Chat API with native tool calling support.

AWS Bedrock

[model]
provider = "bedrock"
model_id = "anthropic.claude-3-5-sonnet-20241022-v2:0"
base_url = "https://bedrock-runtime.us-east-1.amazonaws.com"

:::note Bedrock authentication uses your AWS credentials from environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN) or IAM roles. No api_key field is needed. :::

Local Providers

Ollama

[model]
provider = "ollama"
model_id = "llama3.1:70b"

No API key required. Ryvos connects to Ollama at http://localhost:11434 by default.

# Install and start Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3.1:70b
ollama serve

Override the endpoint for remote Ollama instances:

[model]
provider = "ollama"
model_id = "llama3.1:70b"
base_url = "http://192.168.1.100:11434"

:::tip Ollama works great as a fallback model. Configure a cloud primary with local Ollama fallback for resilience:

[model]
provider = "anthropic"
model_id = "claude-sonnet-4-20250514"
api_key = "$\{ANTHROPIC_API_KEY\}"
 
[[fallback_models]]
provider = "ollama"
model_id = "llama3.1:70b"

:::

OpenAI-Compatible Providers

These providers all use the OpenAI Chat Completions API format. Ryvos auto-fills the correct base_url and headers via apply_preset_defaults().

Groq

[model]
provider = "groq"
model_id = "llama-3.3-70b-versatile"
api_key = "$\{GROQ_API_KEY\}"

Known for extremely fast inference. Good for high-throughput tasks.

OpenRouter

[model]
provider = "openrouter"
model_id = "anthropic/claude-sonnet-4-20250514"
api_key = "$\{OPENROUTER_API_KEY\}"

Access hundreds of models through a single API. Model IDs use the provider/model format.

Together AI

[model]
provider = "together"
model_id = "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo"
api_key = "$\{TOGETHER_API_KEY\}"

Fireworks AI

[model]
provider = "fireworks"
model_id = "accounts/fireworks/models/llama-v3p1-70b-instruct"
api_key = "$\{FIREWORKS_API_KEY\}"

Cerebras

[model]
provider = "cerebras"
model_id = "llama3.1-70b"
api_key = "$\{CEREBRAS_API_KEY\}"

Ultra-fast inference on Cerebras wafer-scale hardware.

xAI (Grok)

[model]
provider = "xai"
model_id = "grok-2"
api_key = "$\{XAI_API_KEY\}"

Mistral AI

[model]
provider = "mistral"
model_id = "mistral-large-latest"
api_key = "$\{MISTRAL_API_KEY\}"

Perplexity

[model]
provider = "perplexity"
model_id = "llama-3.1-sonar-large-128k-online"
api_key = "$\{PERPLEXITY_API_KEY\}"

Perplexity models include built-in web search. Useful for research-heavy tasks.

DeepSeek

[model]
provider = "deepseek"
model_id = "deepseek-chat"
api_key = "$\{DEEPSEEK_API_KEY\}"

CLI Subprocess Providers

These providers wrap external CLI tools as subprocesses.

Claude Code

[model]
provider = "claude_code"
model_id = "claude_code"

Wraps the claude CLI. Must be installed separately (npm install -g @anthropic-ai/claude-code). Useful if you have Claude Code access but not a direct Anthropic API key.

GitHub Copilot

[model]
provider = "copilot"
model_id = "copilot"

Wraps the gh copilot CLI. Requires GitHub Copilot subscription and the gh CLI. Communication uses JSONL format (rewritten in v0.4.4 for reliability).

Custom / Self-Hosted

Use any OpenAI-compatible endpoint:

[model]
provider = "openai"
model_id = "your-model-name"
api_key = "$\{YOUR_API_KEY\}"
base_url = "http://localhost:8080/v1"
extra_headers = { "X-Custom-Header" = "value" }

This works with:

  • vLLMbase_url = "http://localhost:8000/v1"
  • llama.cpp serverbase_url = "http://localhost:8080/v1"
  • LocalAIbase_url = "http://localhost:8080/v1"
  • LiteLLM proxybase_url = "http://localhost:4000/v1"
  • Text Generation Inferencebase_url = "http://localhost:8080/v1"

Fallback Chain

Configure multiple fallback models for resilience:

[model]
provider = "anthropic"
model_id = "claude-sonnet-4-20250514"
api_key = "$\{ANTHROPIC_API_KEY\}"
 
[[fallback_models]]
provider = "openai"
model_id = "gpt-4o"
api_key = "$\{OPENAI_API_KEY\}"
 
[[fallback_models]]
provider = "ollama"
model_id = "llama3.1:70b"

Ryvos tries each model in order when the previous one fails (network error, rate limit, timeout). The fallback is transparent to the conversation.

Streaming Architecture

All providers implement the LlmClient trait with streaming:

LLM Provider --> SSE/JSONL stream --> StreamDelta events --> Agent Loop

StreamDelta variants:

  • TextDelta — Partial text response
  • ThinkingDelta — Extended thinking content (Anthropic)
  • ToolUseStart — Beginning of a tool call
  • ToolInputDelta — Partial tool input JSON
  • Stop — Stream complete
  • Usage — Token counts

The streaming system includes a RetryingClient wrapper for automatic exponential backoff on transient failures.

Verify Your Provider

After configuring a provider, verify it works:

ryvos doctor

The doctor command tests API connectivity for your primary and all fallback models.

Next Steps