BMT-02.03 Executive Summary#
BlueMirror.tech | May 2026#
Wei Chen has spent eleven years building production ML systems for healthcare companies that are now mostly defunct. She is on a due diligence call because the fund she advises is considering a position. Her question is whether the thirty small language models described in the BlueMirror specification are the system that runs today or the system the company wishes it had. The answer she receives is specific: thirty models is the engineering destination, the portfolio the system is being built toward over twenty-four to thirty-six months. At launch, no proprietary models run in any zone. The system runs entirely on a commercial cloud reasoning layer operating under a healthcare data processing agreement. Wei appreciates the specificity.
Five constraints force the decomposition away from a single model, and they compound.
Latency is first. The Safety Filter must respond in fifteen milliseconds because it gates every output the system produces. A large general-purpose model cannot meet that budget at the edge. A 120-million-parameter model optimized for safety classification can. Privacy is second. The Cognitive State Estimator, which processes the most sensitive data the system handles, must be deployable to Zone 1 for subscribers with a Local Pane, keeping behavioral data on the home device. A monolithic model cannot be split across privacy boundaries. Incrementality is third: when the Nutrition Advisor needs updating because dietary research changes, the team retrains one focused model. A general model requires full retraining. Cost is fourth: the total SLM portfolio development runs approximately $600,000 to $1 million over twenty-four months through university research partnerships, far below what a general healthcare model would require. Deployability is fifth: the thirty models distribute across a three-zone compute architecture based on their privacy sensitivity and latency requirements. A monolithic model cannot be split across zones with different privacy boundaries.
The thirty models organize into five functional categories. Core Interaction handles real-time user-facing language: Response Generator, Intent Classifier, Emotion Detector, Empathy Responder, Clarification Generator. These range from 100 to 400 million parameters with inference latency targets under 100 milliseconds. Memory Care specializes in cognitive support: Orientation Assessor, Cognitive State Estimator, Confusion Detector, Reminiscence Prompter, Simplification Engine. Domain Expert provides knowledge in focused areas: Medication Advisor, Nutrition Advisor, Exercise Coach, Sleep Pattern Analyzer, Financial Advisor, Legal Advisor. Routing and Safety gates behavior: MoC Router, Safety Filter, Privacy Filter, Escalation Classifier, Trust Evaluator, with the Safety and Privacy Filters targeting sub-15-millisecond inference because they gate every output. Specialized Function handles sensor and analytical tasks: Speech-to-Intent, Voice Tone Analyzer, Temporal Pattern Detector, Anomaly Detector, Summary Generator. Total target portfolio: approximately 2 billion parameters, which quantizes to roughly 1 gigabyte.
The portfolio uses four architecture types matched to task requirements. State space models handle temporal pattern recognition with linear computational complexity, appropriate for the Anomaly Detector and Sleep Pattern Analyzer. Mixture of experts provides parameter efficiency for classification tasks where only relevant sub-networks activate per query. Transformers deliver attention quality for generation tasks that require coherent contextual output. Hybrids combine architectures for tasks that need multiple capabilities simultaneously, such as the Cognitive State Estimator, which needs both continuous monitoring and categorical output.
The deployment timeline is concrete. Months 0 to 12: the system runs entirely on Zone 3, accumulating real interaction data. The India university teams pretrain V0.5 SLMs on synthetic data generated through Zone 3. Months 12 to 18: subscribers who acquire a Local Pane gain Zone 1, where the privacy-critical V0.5 models deploy first. Months 18 to 30: Zone 2 regional nodes deploy; V1.0 SLMs for routine query classes pass A/B validation against Zone 3. Months 30 to 36: the full portfolio reaches Phase 3 maturity. For a subscriber with all three zones at that point, inference distributes roughly 15 to 20 percent in Zone 1, 55 to 60 percent in Zone 2, and the balance in Zone 3 for queries exceeding regional capacity. For a Zone 3-only subscriber, 100 percent of inference runs in Zone 3 throughout.
Zone 3 is not a transitional state or a degraded fallback. It is permanent infrastructure. The Zone 3-only path is a first-class deployment, not an approximation of a better one. The architecture is not designed to retire Zone 3. It is designed to grow with Zone 3 as one of three permanent tiers.
The full article, including the per-model architecture choice rationale and measured performance comparisons against alternatives considered and rejected, is at BlueMirror.tech.
