Multi-LLM Orchestration
Multi-LLM Orchestration & Routing in ARAL
Section titled “Multi-LLM Orchestration & Routing in ARAL”ARAL v1.2+ — Advanced intelligent routing, fallback, aggregation, and optimization across multiple LLM providers
Overview
Section titled “Overview”ARAL supports advanced multi-LLM orchestration, enabling agents to:
- ✅ Use multiple LLM providers (OpenAI, Anthropic, Azure, Google, Cohere, etc.)
- ✅ Intelligent routing (specialized, cost-optimized, quality-first, latency-first)
- ✅ Weighted ponderation (blend models with custom weights)
- ✅ Response aggregation (best-of-n, ensemble, consensus, voting)
- ✅ Automatic fallback (provider A → B → C if failure)
- ✅ Load balancing (distribute requests across providers)
- ✅ Cost & latency optimization (budget limits, max latency)
- ✅ Specialized routing (task-specific model selection)
- ✅ Model-specific prompts (customize prompts per LLM)
- ✅ Content moderation (safety filters, guardrails)
- ✅ Cost tracking (per-provider usage and spend monitoring)
✅ Yes, ARAL Supports Multi-LLM!
Section titled “✅ Yes, ARAL Supports Multi-LLM!”Current Support:
Section titled “Current Support:”| Provider | Models | Status |
|---|---|---|
| OpenAI | GPT-4, GPT-4-turbo, GPT-3.5-turbo | ✅ |
| Azure | Azure OpenAI models | ✅ |
| Anthropic | Claude 4 (Opus, Sonnet), Claude 3 | ✅ |
| Gemini Pro, Gemini Ultra | 🔜 | |
| Cohere | Command, Command-R | 🔜 |
| Mistral | Mistral Large, Medium, Small | 🔜 |
| Local | Ollama, LM Studio, vLLM | 🔜 |
Architecture
Section titled “Architecture”┌────────────────────────────────────────────────────┐│ LLM Orchestrator (Layer 4) ││ ┌──────────────────────────────────────────────┐ ││ │ Router (Specialized Routing) │ ││ │ - Task-based model selection │ ││ │ - Cost optimization │ ││ │ - Quality-based selection │ ││ │ - Latency optimization │ ││ │ - Load balancing │ ││ │ - Priority-based routing │ ││ └──────────────────────────────────────────────┘ ││ ┌──────────────────────────────────────────────┐ ││ │ Aggregation Engine │ ││ │ - Best-of-N selection │ ││ │ - Ensemble combination │ ││ │ - Consensus voting │ ││ │ - Weighted blending │ ││ │ - Quality-based ranking │ ││ └──────────────────────────────────────────────┘ ││ ┌──────────────────────────────────────────────┐ ││ │ Fallback Manager │ ││ │ - Primary → Secondary → Tertiary │ ││ │ - Circuit breaker per provider │ ││ │ - Health monitoring │ ││ └──────────────────────────────────────────────┘ ││ ┌──────────────────────────────────────────────┐ ││ │ Optimization Layer │ ││ │ - Cost tracking & budgets │ ││ │ - Latency monitoring │ ││ │ - Quality scoring │ ││ │ - Performance analytics │ ││ └──────────────────────────────────────────────┘ ││ ┌──────────────────────────────────────────────┐ ││ │ Moderation Layer │ ││ │ - Content safety filters │ ││ │ - PII detection │ ││ │ - Guardrails enforcement │ ││ └──────────────────────────────────────────────┘ │└────────────────────────────────────────────────────┘ │ │ │ ▼ ▼ ▼ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ OpenAI │ │Anthropic│ │ Azure │ └─────────┘ └─────────┘ └─────────┘Configuration
Section titled “Configuration”Basic Multi-Provider Setup
Section titled “Basic Multi-Provider Setup”import { ARALAgent, LLMOrchestrator } from "@aral-standard/sdk";
const agent = new ARALAgent({ persona: "assistant.json", llm: new LLMOrchestrator({ providers: [ { name: "openai", type: "openai", apiKey: process.env.OPENAI_API_KEY, models: ["gpt-4", "gpt-3.5-turbo"], priority: 1, cost_per_1k_tokens: { input: 0.03, output: 0.06 }, }, { name: "anthropic", type: "anthropic", apiKey: process.env.ANTHROPIC_API_KEY, models: ["claude-3-opus", "claude-3-sonnet"], priority: 2, cost_per_1k_tokens: { input: 0.015, output: 0.075 }, }, { name: "azure", type: "azure", endpoint: process.env.AZURE_ENDPOINT, apiKey: process.env.AZURE_API_KEY, models: ["gpt-4-azure"], priority: 3, cost_per_1k_tokens: { input: 0.03, output: 0.06 }, }, ], routing_strategy: "cost_optimized", fallback_enabled: true, moderation: { enabled: true, provider: "openai", }, }),});Python SDK Example
Section titled “Python SDK Example”from aral import ARALAgent, LLMOrchestrator, ProviderConfig
agent = ARALAgent( persona_id="assistant", llm=LLMOrchestrator( providers=[ ProviderConfig( name="openai", type="openai", api_key=os.getenv("OPENAI_API_KEY"), models=["gpt-4", "gpt-3.5-turbo"], priority=1, cost_per_1k_tokens={"input": 0.03, "output": 0.06} ), ProviderConfig( name="anthropic", type="anthropic", api_key=os.getenv("ANTHROPIC_API_KEY"), models=["claude-3-opus", "claude-3-sonnet"], priority=2, cost_per_1k_tokens={"input": 0.015, "output": 0.075} ) ], routing_strategy="cost_optimized", fallback_enabled=True, moderation={"enabled": True, "provider": "openai"} ))Routing Strategies
Section titled “Routing Strategies”1. Cost-Optimized Routing
Section titled “1. Cost-Optimized Routing”Route to the cheapest model that meets quality requirements:
const orchestrator = new LLMOrchestrator({ routing_strategy: "cost_optimized", quality_threshold: 0.85, // Min quality score providers: [ { name: "gpt-3.5", cost: 0.002, quality: 0.8 }, // Cheapest, below threshold { name: "claude-sonnet", cost: 0.015, quality: 0.88 }, // ✅ Selected (cheapest meeting threshold) { name: "gpt-4", cost: 0.03, quality: 0.95 }, // Most expensive ],});
// Result: Uses claude-sonnet (meets quality + cheapest)2. Quality-First Routing
Section titled “2. Quality-First Routing”Always use the highest-quality model:
const orchestrator = new LLMOrchestrator({ routing_strategy: "quality_first", providers: [ { name: "gpt-3.5", quality: 0.8 }, { name: "claude-sonnet", quality: 0.88 }, { name: "gpt-4", quality: 0.95 }, // ✅ Always selected ],});3. Latency-Optimized Routing
Section titled “3. Latency-Optimized Routing”Route to the fastest-responding provider:
const orchestrator = new LLMOrchestrator({ routing_strategy: "latency_first", latency_tracking: true, providers: [ { name: "gpt-4", avg_latency_ms: 1200 }, { name: "claude-sonnet", avg_latency_ms: 800 }, // ✅ Fastest { name: "gpt-3.5", avg_latency_ms: 900 }, ],});4. Round-Robin Load Balancing
Section titled “4. Round-Robin Load Balancing”Distribute requests evenly:
const orchestrator = new LLMOrchestrator({ routing_strategy: "round_robin", providers: ["gpt-4", "claude-opus", "gpt-3.5"],});
// Request 1 → gpt-4// Request 2 → claude-opus// Request 3 → gpt-3.5// Request 4 → gpt-4 (cycle repeats)5. Context-Aware Routing
Section titled “5. Context-Aware Routing”Route based on task characteristics:
const orchestrator = new LLMOrchestrator({ routing_strategy: "context_aware", rules: [ { condition: (ctx) => ctx.task_type === "code_generation", provider: "gpt-4", // Best for coding }, { condition: (ctx) => ctx.task_type === "creative_writing", provider: "claude-opus", // Best for creative tasks }, { condition: (ctx) => ctx.input_length > 100000, provider: "claude-opus", // Large context window }, { condition: (ctx) => ctx.priority === "low", provider: "gpt-3.5", // Cheap for non-critical tasks }, ],});Fallback Chains
Section titled “Fallback Chains”Automatic Failover
Section titled “Automatic Failover”const orchestrator = new LLMOrchestrator({ fallback_chain: [ { provider: "gpt-4", retry_attempts: 2, timeout_ms: 30000, }, { provider: "claude-opus", retry_attempts: 2, timeout_ms: 30000, }, { provider: "gpt-3.5", retry_attempts: 1, timeout_ms: 15000, }, ], circuit_breaker: { failure_threshold: 5, reset_timeout_ms: 60000, },});
// Try gpt-4 (2 retries) → if fails → claude-opus (2 retries) → gpt-3.5Python Example
Section titled “Python Example”orchestrator = LLMOrchestrator( fallback_chain=[ {"provider": "gpt-4", "retry_attempts": 2}, {"provider": "claude-opus", "retry_attempts": 2}, {"provider": "gpt-3.5", "retry_attempts": 1} ], circuit_breaker={ "failure_threshold": 5, "reset_timeout_ms": 60000 })
try: response = await orchestrator.complete("Analyze this data...")except AllProvidersFailedError as e: # All providers in chain failed logger.error(f"All providers failed: {e.failures}")Consensus Mode
Section titled “Consensus Mode”Query multiple models and require agreement:
Example: Critical Decision
Section titled “Example: Critical Decision”const orchestrator = new LLMOrchestrator({ mode: "consensus", consensus_config: { providers: ["gpt-4", "claude-opus", "gemini-ultra"], min_agreement: 0.67, // Require 2/3 agreement voting_method: "majority", tie_breaker: "highest_confidence", },});
const result = await orchestrator.complete({ prompt: "Is this medical diagnosis correct?", context: diagnosticData,});
// Result includes:// - consensus_reached: true/false// - agreement_score: 0.67// - individual_responses: [...]// - final_decision: "..."// - confidence: 0.85Use Cases for Consensus Mode:
Section titled “Use Cases for Consensus Mode:”- 🏥 Medical diagnoses
- ⚖️ Legal opinions
- 💰 Financial decisions
- 🔐 Security assessments
- 🎯 High-stakes recommendations
Routing Strategies
Section titled “Routing Strategies”ARAL v1.2+ supports advanced routing strategies that go beyond simple priority-based selection, enabling intelligent model routing based on task type, cost, quality, and latency requirements.
Available Routing Strategies
Section titled “Available Routing Strategies”1. Single (Default for Simple Cases)
Section titled “1. Single (Default for Simple Cases)”Routes all requests to a single primary model.
llm_config: { providers: [ { provider: "openai", model: "gpt-4", priority: 1 } ], routing_strategy: "single"}2. Priority-Based
Section titled “2. Priority-Based”Routes to the highest priority available provider, with automatic fallback.
llm_config: { providers: [ { provider: "anthropic", model: "claude-opus-4", priority: 1 }, { provider: "openai", model: "gpt-4", priority: 2, fallback: false }, { provider: "openai", model: "gpt-3.5-turbo", priority: 3, fallback: true } ], routing_strategy: "priority_based"}Use Cases:
- General purpose workflows with fallback
- Quality-first approach with cost-effective fallbacks
3. Specialized (Task-Based Routing)
Section titled “3. Specialized (Task-Based Routing)”Routes requests to different models based on task type using routing rules.
llm_config: { providers: [ { provider: "anthropic", model: "claude-opus-4", weight: 0.5 }, { provider: "openai", model: "gpt-4", weight: 0.3 }, { provider: "anthropic", model: "claude-sonnet-4", weight: 0.2 } ], routing_strategy: "specialized"},extensions: { routing_rules: { "brainstorming": "claude-opus-4", "long_form_content": "gpt-4", "refinement": "claude-sonnet-4", "code_generation": "gpt-4", "creative_fiction": "claude-opus-4", "quick_summary": "gpt-3.5-turbo" }}Use Cases:
- Creative applications with distinct workflow stages
- Code generation + documentation
- Research (search) + analysis (reasoning)
- Multi-stage content pipelines
4. Cost-Optimized
Section titled “4. Cost-Optimized”Prefers cheaper models while respecting quality thresholds.
llm_config: { providers: [ { provider: "openai", model: "gpt-3.5-turbo", priority: 1 }, { provider: "anthropic", model: "claude-sonnet-4", priority: 2 }, { provider: "anthropic", model: "claude-opus-4", priority: 3 } ], routing_strategy: "cost_optimized", cost_optimization: { enabled: true, max_cost_per_request: 0.10, prefer_cheaper: true, budget_limit: 50.0 }}Use Cases:
- High-volume production applications
- Budget-constrained projects
- Simple queries that don’t require top-tier models
5. Quality-First
Section titled “5. Quality-First”Always routes to the highest quality model regardless of cost.
llm_config: { providers: [ { provider: "anthropic", model: "claude-opus-4", priority: 1 }, { provider: "openai", model: "gpt-4", priority: 2, fallback: true } ], routing_strategy: "quality_first"}Use Cases:
- Critical decision-making
- High-stakes content generation
- Complex reasoning tasks
- Medical/legal/financial applications
6. Latency-First
Section titled “6. Latency-First”Prioritizes fastest response time.
llm_config: { providers: [ { provider: "openai", model: "gpt-3.5-turbo", priority: 1 }, { provider: "anthropic", model: "claude-sonnet-4", priority: 2 } ], routing_strategy: "latency_first", latency_optimization: { enabled: true, max_latency_ms: 2000, timeout_ms: 5000, prefer_faster: true }}Use Cases:
- Real-time chat applications
- Interactive UIs requiring instant feedback
- High-throughput APIs
7. Round-Robin
Section titled “7. Round-Robin”Distributes requests evenly across providers for load balancing.
llm_config: { providers: [ { provider: "openai", model: "gpt-4" }, { provider: "anthropic", model: "claude-opus-4" }, { provider: "azure", model: "gpt-4" } ], routing_strategy: "round_robin"}Use Cases:
- Load distribution across multiple API keys
- Testing and comparison
- Avoiding rate limits
8. Ponderation (Weighted Blending)
Section titled “8. Ponderation (Weighted Blending)”Combines outputs from multiple models with custom weights.
llm_config: { providers: [ { provider: "openai", model: "gpt-5.2", weight: 0.8 }, { provider: "anthropic", model: "claude-sonnet-4.5", weight: 0.2 } ], routing_strategy: "ponderation"}See LLM Ponderation section for details.
9. Consensus (Agreement-Based)
Section titled “9. Consensus (Agreement-Based)”Queries multiple models and requires agreement.
llm_config: { providers: [ { provider: "openai", model: "gpt-4" }, { provider: "anthropic", model: "claude-opus-4" }, { provider: "google", model: "gemini-ultra" } ], routing_strategy: "consensus", aggregation: { method: "consensus", min_responses: 3, selection_criteria: ["agreement_score", "confidence"] }}Use Cases:
- High-stakes decisions
- Verification and validation
- Reducing hallucination risk
Routing Strategy Comparison
Section titled “Routing Strategy Comparison”| Strategy | Cost | Quality | Speed | Use Case |
|---|---|---|---|---|
| Single | Low | Varies | Fast | Simple applications |
| Priority-Based | Medium | High | Fast | General purpose |
| Specialized | Medium | High | Fast | Task-specific workflows |
| Cost-Optimized | Low | Medium | Fast | High-volume, budget-limited |
| Quality-First | High | Highest | Medium | Critical applications |
| Latency-First | Low | Medium | Fastest | Real-time interactive |
| Round-Robin | Medium | Varies | Fast | Load balancing |
| Ponderation | High | Highest | Slow | Ensemble quality |
| Consensus | High | Highest | Slowest | Verification, high-stakes |
Response Aggregation
Section titled “Response Aggregation”When using multiple LLM providers simultaneously, ARAL can aggregate responses using various strategies.
Aggregation Methods
Section titled “Aggregation Methods”1. First (Default)
Section titled “1. First (Default)”Returns the first successful response.
aggregation: { method: "first";}2. Best-of-N
Section titled “2. Best-of-N”Queries multiple models and selects the best response based on criteria.
aggregation: { method: "best_of_n", selection_criteria: ["creativity", "coherence", "engagement", "originality"], min_responses: 2, max_responses: 3}Example: Creative Specialist Persona
{ "llm_config": { "providers": [ { "provider": "anthropic", "model": "claude-opus-4", "weight": 0.5 }, { "provider": "openai", "model": "gpt-4", "weight": 0.3 }, { "provider": "anthropic", "model": "claude-sonnet-4", "weight": 0.2 } ], "routing_strategy": "specialized", "aggregation": { "method": "best_of_n", "selection_criteria": [ "creativity", "coherence", "originality", "engagement" ], "min_responses": 2, "max_responses": 3 } }}Selection Criteria Options:
accuracy- Factual correctnesscreativity- Originality and noveltycoherence- Logical flow and structurerelevance- Alignment with promptengagement- Reader interest and appealbrevity- Concisenessdetail- Comprehensivenessclarity- Ease of understandingtechnical_depth- Expert-level detailoriginality- Unique perspectives
3. Weighted Blend
Section titled “3. Weighted Blend”Combines responses using provider weights.
aggregation: { method: "weighted_blend", selection_criteria: ["quality", "relevance"]}Uses the weight property from each provider configuration.
4. Majority Vote
Section titled “4. Majority Vote”Selects the response supported by the majority of models.
aggregation: { method: "majority_vote", min_responses: 3}5. Consensus
Section titled “5. Consensus”Requires high agreement between all models.
aggregation: { method: "consensus", selection_criteria: ["agreement_score", "confidence"], min_responses: 3}6. Ensemble
Section titled “6. Ensemble”Combines strengths of multiple responses into a synthesized output.
aggregation: { method: "ensemble", selection_criteria: ["accuracy", "creativity", "completeness"], min_responses: 2, max_responses: 4}Aggregation Strategy Comparison
Section titled “Aggregation Strategy Comparison”| Method | Speed | Cost | Quality | Use Case |
|---|---|---|---|---|
| First | Fastest | Low | Varies | Simple, cost-effective |
| Best-of-N | Slow | High | High | Quality-critical, creative |
| Weighted Blend | Slow | High | High | Balanced ensemble |
| Majority Vote | Medium | High | Medium | Objective decisions |
| Consensus | Slowest | Highest | Highest | Mission-critical, verification |
| Ensemble | Slowest | Highest | Highest | Comprehensive analysis |
Model-Specific Prompts
Section titled “Model-Specific Prompts”ARAL v1.2+ supports customizing prompts per LLM model, enabling specialized instructions that leverage each model’s unique strengths.
Configuration
Section titled “Configuration”{ "prompts": { "system": "You are a Creative Specialist...", "prefix": "Creative task:\n\n", "llm_specific": { "claude-opus-4": { "system": "You excel at deep, nuanced creative work with rich narrative exploration.", "prefix": "For this creative challenge, bring your deepest creative capabilities:\n\n" }, "gpt-4": { "system": "You excel at structured creativity with clear organization.", "prefix": "Creative task requiring structured innovation:\n\n" }, "claude-sonnet-4": { "system": "You provide balanced creative refinement, enhancing coherence and polish.", "prefix": "Creative refinement task:\n\n" } } }}Use Cases
Section titled “Use Cases”1. Leveraging Model Strengths
// Claude Opus: Deep creative exploration"claude-opus-4": { "system": "Focus on deep creative exploration with vivid imagery and emotional resonance."}
// GPT-4: Structured problem-solving"gpt-4": { "system": "Focus on clear structure, logical organization, and actionable insights."}
// Claude Sonnet: Balance and refinement"claude-sonnet-4": { "system": "Focus on balance, coherence, and polishing for professional quality."}2. Task-Specific Specialization
{ "extensions": { "routing_rules": { "brainstorming": "claude-opus-4", "structuring": "gpt-4", "refinement": "claude-sonnet-4" } }, "prompts": { "llm_specific": { "claude-opus-4": { "prefix": "BRAINSTORM MODE: Generate wild, creative ideas without constraints.\n\n" }, "gpt-4": { "prefix": "STRUCTURE MODE: Organize and structure the content logically.\n\n" }, "claude-sonnet-4": { "prefix": "REFINEMENT MODE: Polish and enhance for clarity and quality.\n\n" } } }}3. Output Format Customization
{ "llm_specific": { "gpt-4": { "suffix": "\n\nProvide response in JSON format." }, "claude-opus-4": { "suffix": "\n\nProvide response in markdown with rich formatting." } }}LLM Ponderation (Weighted Blending)
Section titled “LLM Ponderation (Weighted Blending)”- ⚖️ Legal opinions
- 💰 Financial decisions
- 🔐 Security assessments
- 🎯 High-stakes recommendations
LLM Ponderation (Weighted Blending)
Section titled “LLM Ponderation (Weighted Blending)”Overview
Section titled “Overview”Ponderation combines outputs from multiple LLMs with custom weights, creating ensemble responses that leverage each model’s strengths.
Example: GPT-5.2 = 0.8 + Claude Sonnet 4.5 = 0.2
- 80% weight to GPT-5.2 (strong reasoning)
- 20% weight to Claude (creativity, nuance)
Basic Ponderation
Section titled “Basic Ponderation”const orchestrator = new LLMOrchestrator({ mode: "ponderation", ponderation: { enabled: true, weights: { "gpt-5.2": 0.8, "claude-sonnet-4.5": 0.2, }, merge_strategy: "weighted_blend", normalize: true, // Auto-normalize weights to sum to 1.0 },});
const result = await orchestrator.complete({ prompt: "Explain quantum computing in simple terms",});
// Result: 80% GPT-5.2 + 20% Claude Sonnet 4.5 blended responsePython Example
Section titled “Python Example”from aral import LLMOrchestrator, PonderationConfig
orchestrator = LLMOrchestrator( mode='ponderation', ponderation=PonderationConfig( enabled=True, weights={ 'gpt-5.2': 0.8, 'claude-sonnet-4.5': 0.2 }, merge_strategy='weighted_blend', normalize=True ))
response = await orchestrator.complete( "Explain quantum computing in simple terms")
# Response combines:# - GPT-5.2's technical accuracy (80%)# - Claude's clear explanations (20%)Advanced Ponderation Strategies
Section titled “Advanced Ponderation Strategies”1. Task-Adaptive Weighting
Section titled “1. Task-Adaptive Weighting”Different weights for different task types:
const orchestrator = new LLMOrchestrator({ mode: "ponderation", adaptive_weighting: { code_generation: { "gpt-4": 0.7, "claude-opus": 0.2, codellama: 0.1, }, creative_writing: { "claude-opus": 0.6, "gpt-4": 0.3, "mistral-large": 0.1, }, data_analysis: { "gpt-4": 0.5, "claude-sonnet": 0.3, "gemini-pro": 0.2, }, },});
// Automatically selects weights based on detected task typeconst result = await orchestrator.complete({ prompt: "Write a Python function to analyze sales data", task_type: "code_generation", // Uses code_generation weights});2. Confidence-Based Dynamic Weighting
Section titled “2. Confidence-Based Dynamic Weighting”Adjust weights based on model confidence scores:
const orchestrator = new LLMOrchestrator({ mode: "ponderation", dynamic_weighting: { enabled: true, base_weights: { "gpt-5.2": 0.5, "claude-sonnet-4.5": 0.5, }, confidence_threshold: 0.8, adjustment_factor: 0.2, // Boost weight by 20% if confidence > threshold },});
// If GPT-5.2 has confidence 0.95 and Claude has 0.75:// GPT-5.2: 0.5 + (0.2 * 0.5) = 0.6// Claude: 0.5 - (0.2 * 0.5) = 0.4// (Automatically normalized to sum to 1.0)3. Multi-Stage Ponderation
Section titled “3. Multi-Stage Ponderation”Different weights for different stages of processing:
const pipeline = new LLMPipeline([ { stage: "research", ponderation: { "gpt-4": 0.4, "claude-opus": 0.4, "gemini-pro": 0.2, }, }, { stage: "analysis", ponderation: { "gpt-5.2": 0.7, "claude-sonnet": 0.3, }, }, { stage: "synthesis", ponderation: { "claude-opus": 0.6, "gpt-4": 0.4, }, },]);
const result = await pipeline.execute(inputData);Merge Strategies
Section titled “Merge Strategies”| Strategy | Description | Use Case |
|---|---|---|
| weighted_blend | Combine text outputs with weights | General responses |
| weighted_voting | Vote on discrete choices | Classification, yes/no |
| weighted_average | Average numerical outputs | Scoring, predictions |
| confidence_weighted | Use confidence scores as weights | Dynamic blending |
| best_of_ensemble | Select highest-confidence response | Quality filtering |
Weighted Blend Example
Section titled “Weighted Blend Example”// Query both modelsconst gptResponse = await gpt52.complete(prompt);const claudeResponse = await claudeSonnet.complete(prompt);
// Blend with 80/20 weightsconst blended = weightedBlend([ { response: gptResponse, weight: 0.8 }, { response: claudeResponse, weight: 0.2 },]);
console.log(blended.text);// Output combines 80% GPT + 20% Claude// Preserves style/tone weighted toward GPT// Incorporates Claude's insights at 20%Practical Use Cases
Section titled “Practical Use Cases”1. Balanced Accuracy + Creativity
Section titled “1. Balanced Accuracy + Creativity”// 70% accuracy-focused, 30% creativity-focusedponderation: { 'gpt-4': 0.7, // Strong reasoning 'claude-opus': 0.3 // Creative insights}2. Cost Optimization with Quality
Section titled “2. Cost Optimization with Quality”// 90% cheap model, 10% expensive for quality controlponderation: { 'gpt-3.5-turbo': 0.9, // Cheap 'gpt-4': 0.1 // Quality check}3. Domain Expertise Blending
Section titled “3. Domain Expertise Blending”// Medical diagnosis: blend specialist modelsponderation: { 'medical-llm': 0.6, // Domain-specific 'gpt-4': 0.3, // General reasoning 'claude-opus': 0.1 // Safety check}4. Multi-Language Support
Section titled “4. Multi-Language Support”// Translation with native speaker qualityponderation: { 'gpt-4': 0.5, // Technical accuracy 'claude-multilingual': 0.3, // Fluency 'specialized-translator': 0.2 // Domain terms}Performance Considerations
Section titled “Performance Considerations”Pros:
- ✅ Better quality than single model
- ✅ Reduces individual model biases
- ✅ Leverages complementary strengths
- ✅ More robust than single-model dependency
Cons:
- ❌ Higher latency (queries multiple models)
- ❌ Increased cost (multiple API calls)
- ❌ Complex error handling
Optimization:
// Parallel execution for speedconst orchestrator = new LLMOrchestrator({ ponderation: { weights: { "gpt-5.2": 0.8, "claude-sonnet": 0.2 }, execution: "parallel", // Query simultaneously timeout_ms: 30000, cache_enabled: true, // Cache repeated queries },});Validation & Normalization
Section titled “Validation & Normalization”// ARAL automatically validates and normalizes weightsconst orchestrator = new LLMOrchestrator({ ponderation: { weights: { "gpt-5.2": 4, // Raw weight "claude-sonnet": 1, // Raw weight }, normalize: true, // Auto-converts to 0.8 and 0.2 },});
// Validation rules:// - Weights must be positive// - Sum must be > 0// - Auto-normalized to sum to 1.0// - Warning if imbalance > 0.95 (one model dominates)Content Moderation & Safety
Section titled “Content Moderation & Safety”Built-in Moderation
Section titled “Built-in Moderation”const orchestrator = new LLMOrchestrator({ moderation: { enabled: true, provider: 'openai', // Use OpenAI moderation API
input_checks: { hate_speech: true, sexual_content: true, violence: true, self_harm: true, pii_detection: true },
output_checks: { factual_accuracy: false, // Requires external fact-checker bias_detection: true, toxicity_threshold: 0.7 },
actions: { block_on_violation: true, log_violations: true, notify_admin: true } }});
// Example: Blocked harmful requesttry { await orchestrator.complete("How to make explosives");} catch (ModerationViolationError as e) { console.log(`Blocked: ${e.category} (score: ${e.confidence})`); // Blocked: violence/dangerous-content (score: 0.95)}Custom Guardrails
Section titled “Custom Guardrails”from aral import LLMOrchestrator, Guardrail
class PIIGuardrail(Guardrail): """Detect and block personally identifiable information."""
def check_input(self, text: str) -> GuardrailResult: # Check for SSN, credit cards, etc. if self.contains_pii(text): return GuardrailResult( blocked=True, reason="PII detected in input", violations=["ssn", "credit_card"] ) return GuardrailResult(blocked=False)
def check_output(self, text: str) -> GuardrailResult: # Redact any PII in output if self.contains_pii(text): return GuardrailResult( blocked=False, modified=True, text=self.redact_pii(text) ) return GuardrailResult(blocked=False)
orchestrator = LLMOrchestrator( guardrails=[ PIIGuardrail(), ToxicityGuardrail(threshold=0.8), BiasDetectionGuardrail() ])Cost Tracking & Monitoring
Section titled “Cost Tracking & Monitoring”Per-Provider Analytics
Section titled “Per-Provider Analytics”const orchestrator = new LLMOrchestrator({ tracking: { enabled: true, metrics: ["cost", "latency", "tokens", "errors"], },});
// After some usageconst stats = await orchestrator.getStats();
console.log(stats);// {// openai: {// requests: 150,// total_cost: 4.52,// avg_latency_ms: 1100,// tokens_used: 150000,// errors: 2// },// anthropic: {// requests: 50,// total_cost: 1.23,// avg_latency_ms: 850,// tokens_used: 82000,// errors: 0// }// }Budget Limits
Section titled “Budget Limits”const orchestrator = new LLMOrchestrator({ budget: { daily_limit: 100.0, // $100/day per_request_limit: 0.5, // $0.50/request max alert_threshold: 0.8, // Alert at 80% of budget action_on_exceed: "switch_to_cheapest", // or 'block' },});
// When approaching limit:orchestrator.on("budget_alert", (event) => { console.log(`Budget at ${event.percentage}% - switching to cheaper models`);});Advanced Patterns
Section titled “Advanced Patterns”1. Multi-Stage Pipeline
Section titled “1. Multi-Stage Pipeline”Use different models for different stages:
const pipeline = new LLMPipeline([ { stage: "data_collection", provider: "gpt-3.5", // Fast and cheap for simple tasks prompt: "Extract key facts from: {{input}}", }, { stage: "analysis", provider: "gpt-4", // High quality for analysis prompt: "Analyze these facts: {{previous_output}}", }, { stage: "report_writing", provider: "claude-opus", // Best for creative writing prompt: "Write a comprehensive report: {{previous_output}}", },]);
const result = await pipeline.execute(inputData);2. Parallel Processing
Section titled “2. Parallel Processing”Query multiple models simultaneously:
const orchestrator = new LLMOrchestrator({ mode: "parallel", providers: ["gpt-4", "claude-opus", "gemini-pro"],});
const results = await orchestrator.completeParallel({ prompt: "What are the top 3 risks in this contract?", aggregation: "merge_unique", // Combine unique insights});
// Result contains insights from all 3 models3. A/B Testing
Section titled “3. A/B Testing”Compare model performance:
const orchestrator = new LLMOrchestrator({ ab_testing: { enabled: true, variants: [ { name: "control", provider: "gpt-4", traffic: 0.5 }, { name: "variant", provider: "claude-opus", traffic: 0.5 }, ], metrics: ["quality", "latency", "cost", "user_satisfaction"], },});
// Automatically routes traffic and collects metricsBest Practices
Section titled “Best Practices”1. Use Fallback Chains
Section titled “1. Use Fallback Chains”// ✅ Good: Always have fallbackfallback_chain: ["primary", "secondary", "tertiary"];
// ❌ Bad: Single point of failureproviders: ["gpt-4"]; // What if OpenAI is down?2. Enable Moderation for User-Facing Apps
Section titled “2. Enable Moderation for User-Facing Apps”// ✅ Good: Protect users and comply with regulationsmoderation: { enabled: true, provider: 'openai' }
// ❌ Bad: No safety checksmoderation: { enabled: false } // Risky!3. Monitor Costs
Section titled “3. Monitor Costs”// ✅ Good: Set budget limitsbudget: { daily_limit: 100.00, alert_threshold: 0.80 }
// ❌ Bad: Unlimited spending// Could lead to unexpected bills4. Use Quality Thresholds
Section titled “4. Use Quality Thresholds”// ✅ Good: Ensure minimum qualityrouting_strategy: 'cost_optimized',quality_threshold: 0.85
// ❌ Bad: Cheapest without quality checkrouting_strategy: 'cost_optimized' // Might use low-quality modelsSecurity & Privacy
Section titled “Security & Privacy”1. API Key Management
Section titled “1. API Key Management”// ✅ Good: Use environment variables or key vaultsapiKey: process.env.OPENAI_API_KEY;
// ❌ Bad: Hardcoded keysapiKey: "sk-1234..."; // Security risk!2. Data Privacy
Section titled “2. Data Privacy”// ✅ Good: Use providers with data protectionproviders: [ { name: "azure", data_residency: "eu-west", // GDPR compliant no_training: true, // Don't use for model training },];3. Audit Logging
Section titled “3. Audit Logging”orchestrator.on("request", (event) => { auditLog.record({ timestamp: event.timestamp, provider: event.provider, model: event.model, input_hash: hash(event.input), // Don't log sensitive data user_id: event.user_id, cost: event.cost, });});Compliance
Section titled “Compliance”Multi-LLM orchestration complies with:
- ARAL-CORE-1.0 (L4: Reasoning, requirements L4-013 to L4-020)
- GDPR: Data residency, right to erasure
- SOC 2: Audit logging, access controls
- HIPAA: PHI protection, moderation
- ISO 27001: Security controls
API Reference
Section titled “API Reference”LLMOrchestrator
Section titled “LLMOrchestrator”class LLMOrchestrator { constructor(config: OrchestratorConfig);
async complete( prompt: string, options?: CompletionOptions ): Promise<LLMResponse>; async completeParallel( prompt: string, options?: ParallelOptions ): Promise<LLMResponse[]>; async completeWithConsensus( prompt: string, options?: ConsensusOptions ): Promise<ConsensusResult>;
getStats(): ProviderStats; getProvider(name: string): LLMProvider; setRoutingStrategy(strategy: RoutingStrategy): void;
on(event: OrchestratorEvent, handler: EventHandler): void;}Conclusion
Section titled “Conclusion”Yes, ARAL fully supports multi-LLM orchestration and moderation!
Key capabilities:
- ✅ Multiple provider support (OpenAI, Anthropic, Azure, etc.)
- ✅ Intelligent routing (cost, quality, latency)
- ✅ Automatic fallback chains
- ✅ Load balancing
- ✅ Consensus mode for critical decisions
- ✅ Content moderation and safety guardrails
- ✅ Cost tracking and budget limits
- ✅ A/B testing and performance monitoring
Requirements: ARAL v1.2.0+ with Layer 4 (Reasoning) enhancements.
Learn More:
© 2026 ARAL Standard — CC BY 4.0