AutoLLM Router

Automatically route queries to the optimal AI model based on task requirements

Cost Optimization Smart Model Selection Multi-Provider Easy Integration

Stop choosing between dozens of AI models for every query. AutoLLM Router analyzes each request and automatically selects the best model based on cost, performance, and capabilities.

Get Started How It Works

example.py

from autollm_router import LLMRouter

# Initialize with models from your config
router = LLMRouter()

# Let AutoLLM choose the optimal model
response = await router.generate(
    "Explain quantum computing to a 10-year-old"
)

print(f"Selected model: {response.model_id}")
print(f"Cost: ${response.estimated_cost:.5f}")
print(response.content)

Key Benefits

Cost Optimization

Reduce your API costs by up to 70% by automatically routing to cost-effective models when premium capabilities aren't needed.

Superior Results

Get better responses by leveraging the unique strengths of different models for specific types of queries.

Simplified Integration

One API to access all major LLM providers, with smart routing handled automatically behind the scenes.

Real-World Cost Savings

AutoLLM Router intelligently selects more affordable models for simple tasks, automatically switching to premium models only when needed.

• Simple content generation at 1/5 the cost
• Code review with specialized coding models
• Fast responses for time-sensitive queries
• Automated fallbacks when services are disrupted

Monthly Cost Comparison

OpenAI Only Approach $2,400

With AutoLLM Router $720

Save up to 70% on API costs

How AutoLLM Router Works

Query Analysis

Your query is analyzed to determine its requirements

Model Selection

The best model is selected based on capabilities and constraints

API Management

The request is handled with appropriate provider-specific settings

Results Delivery

Response is returned with metadata on model selection

Intelligent Model Selection

AutoLLM Router maintains a registry of models with detailed capability scores, performance metrics, and cost data.

When a query arrives, the system analyzes its characteristics and matches them against available models, considering:

• Task type (coding, writing, math, creative)
• Performance requirements (speed, accuracy)
• Cost constraints and budget limits
• Context length and complexity

model_registry.py

# Excerpt from the AutoLLM Router model registry
  models = [
    {
      "id": "gpt-4-turbo",
      "provider": "OPENAI",
      "capabilities": {
        "coding": 0.95,
        "math": 0.92,
        "writing": 0.97,
        "creative": 0.95,
        "analysis": 0.96
      },
      "performance": {
        "avg_latency": 2.8,
        "cost_per_1k_tokens": 0.01
      }
    },
    {
      "id": "claude-3-opus",
      "provider": "ANTHROPIC",
      "capabilities": {
        "coding": 0.94,
        "math": 0.88,
        "writing": 0.98,
        "creative": 0.92,
        "analysis": 0.95
      }
    },
    {
      "id": "llama3-70b-8192",
      "provider": "GROQ",
      "capabilities": {
        "coding": 0.93,
        "math": 0.86,
        "writing": 0.92,
        "creative": 0.86,
        "analysis": 0.89
      },
      "performance": {
        "avg_latency": 0.8,
        "cost_per_1k_tokens": 0.0001
      }
    }
  ]

query_analysis.py

# The Query Analyzer in action
  async def analyze_query(query: str, constraints: dict):
      """Analyze query characteristics to find the best model"""
      
      prompt = f"""You are an expert AI model selector.
      
  Available LLMs and their metrics:
  {models_formatted}
  
  User query: "{query}"
  
  Constraints: {constraints}
  
  Analyze this query and select the most appropriate model.
  Consider query domain, complexity, and user constraints.
  """
      
      # Use a small, fast model for the selection process
      selector_model = "gpt-3.5-turbo"
      response = await client.generate(selector_model, prompt)
      
      # Parse the response to get the selected model
      selected = parse_selection(response)
      return {
          "selected_model": selected["model_id"],
          "reasoning": selected["reasoning"],
          "estimated_cost": selected["estimated_cost"]
      }

Example Analysis

"Write a Python function to calculate the Fibonacci sequence using dynamic programming"

Selected: llama3-70b-8192 (via Groq)

Reasoning: Coding task with medium complexity, fast execution preferred, low cost solution adequate

Alternative Analysis

"Explain different approaches to solving the P vs NP problem"

Selected: Claude 3.7 (via Anthropic)

Reasoning: Complex theoretical CS topic requiring advanced reasoning and accuracy

Core Features

Model Registry

Comprehensive catalog of available LLMs with detailed capability scores and performance metrics.

• Scoring for coding, math, writing, and creative tasks
• Performance metrics for latency and cost per token
• Configurable via JSON or YAML files
• Easy to extend with new models as they're released

Query Analyzer

Using an LLM to analyze queries and determine the best model for each specific task.

• Smart analysis of query intent and requirements
• Consideration of user-specified constraints
• Detailed reasoning for model selection
• Handles complex mixed queries appropriately

Model Client

Unified API for multiple LLM providers with built-in token counting and cost estimation.

• Support for OpenAI, Anthropic, Groq, and more
• Automated token counting for all models
• Accurate cost tracking and estimation
• Secure API key management via environment variables

CLI & API

Flexible interfaces for both command line usage and direct integration into your applications.

• Intuitive CLI for quick queries and testing
• Simple Python API for application integration
• Async support for high-performance applications
• Detailed response metadata for transparency

terminal

$ autollm-router query "Explain quantum entanglement simply"

AutoLLM Router: Analyzing query...

Selected model: claude-3-haiku (Anthropic)

Reason: Explanatory query requiring clarity; medium complexity; cost efficiency prioritized

Quantum entanglement is like having two magical coins that always match each other...

Tokens: 245 | Cost: $0.00025 | Processing time: 0.7s

Perfect For

AI Product Builders

Create more powerful and cost-effective AI products by leveraging the right model for each task.

Example: An AI writing assistant that uses affordable models for drafting but premium models for final editing.

Enterprise Teams

Optimize AI costs while maintaining quality across different business applications.

Example: An internal tool that routes customer service queries to affordable models but uses specialized models for technical or complex issues.

Researchers

Experiment with multiple models without constantly switching APIs and configurations.

Example: A research project comparing model performance across different tasks, with unified data collection and processing.