Skip to main content
  1. The Intelligence Layer/

Executive Summary: The Right Architecture for the Right Task

·369 words·2 mins

BMT-06.02 Executive Summary
#

BlueMirror.tech | May 2026
#

The thirty-model portfolio uses four architecture types because different tasks have fundamentally different computational profiles. Forcing one architecture onto all tasks wastes parameters, increases latency, or sacrifices quality.

Fourteen models use State Space Model architectures, built on three shared bases: Mamba-2 (150M parameters) for language and conversation tasks, Mamba-Sensor (80M) for physiological signals, and Mamba-Audio (80M) for voice processing. SSMs process sequential data at O(n) complexity compared to the Transformer’s O(n-squared). For continuous monitoring tasks like health monitoring, sleep analysis, and agitation detection, this is the difference between feasible and infeasible on edge hardware. Nine models share the Mamba-2 base and differentiate through specialized task heads, reducing stored parameters from 830 million to 500 million. The honest trade-off: SSMs are sensitive to hyperparameters, require custom CUDA kernels, and have a fraction of the tooling maturity that Transformers enjoy.

Eleven models use Mixture of Experts architectures, sharing a 50-million-parameter embedding layer and 80-million-parameter gating network. MoE stores 625 million parameters but activates only 120 million per inference. The Safety Monitor and Privacy Filter bypass routing entirely, with their experts forced active on every query because safety and privacy screening cannot be conditional.

Three models use full Transformer architectures. The Response Generator (150M parameters) produces natural language output where generation quality requires attending to the full input context. The Memory Anchor (75M) uses retrieval-augmented generation. The Context Compressor (75M) performs abstractive summarization at compression ratios that SSM alternatives could not match: an SSM compressor achieved 78% relevance where the Transformer achieved 95% at the same compression ratio.

Two hybrid models combine architectures: the Speech-to-Intent model uses a Conformer (SSM plus local attention) for audio processing, and the Relationship Mapper uses a graph neural network with cross-attention for social network navigation.

The total portfolio: 1.55 billion stored parameters, 450 million active per inference. The architecture mix matches the work the system actually does: most tasks are monitoring or classification (SSM and MoE territory), with generation (Transformer territory) as the minority. Early prototypes used Transformers for everything. Measurement forced the decomposition: SSMs reduced latency by 40% for monitoring tasks, MoE reduced active parameters by 75% for classification.

The full article is available at BlueMirror.tech.