Edge Intelligence

Table of Contents

The infrastructure architect reviewing the deployment plan flagged the edge/cloud boundary as the critical design decision. Most AI systems default to cloud-first architectures because cloud inference is simpler to manage: one deployment target, unlimited compute, easy scaling. The BlueMirror architecture reverses the default. Edge-first. Cloud when necessary. The reversal is not ideological. It follows from three requirements that cloud-first cannot satisfy.

Privacy requires that sensitive data never leave the device unless the person explicitly consents. Health data, cognitive assessments, financial information, and personal correspondence processed locally stay local. Processed in the cloud, they traverse networks, reside on servers operated by third parties, and create exposure surfaces that no amount of encryption can eliminate entirely. The consent architecture (BMT-05.05) enforces what the person allows to leave. The edge architecture ensures most data never needs to leave in the first place.

Latency requires sub-200-millisecond response for safety-critical functions. A cloud round-trip adds 50 to 150 milliseconds of network latency before inference begins. For the Safety Monitor detecting a potential fall, for the Agitation Detector identifying a behavioral crisis, for the Medication Assistant screening an interaction, the network latency alone can exceed the time budget. Edge inference eliminates the network hop entirely.

Resilience requires that the system continue operating when the internet goes down. The person who needs fall detection during a power outage, medication reminders during a storm, or cognitive support during a network failure cannot wait for connectivity to return. Edge deployment means the critical functions continue operating on device power and local compute.

What runs on the device stays private, responds fast, and works when everything else fails. That is not an optimization. It is the architecture.

The edge/cloud boundary
#

The thirty-seven models in the portfolio (BMT-06.01) distribute across four compute tiers based on their privacy sensitivity, latency requirements, and computational demands.

Sensors capture raw signals: vital signs from wearable devices, motion data from accelerometers, audio from microphones. No model inference happens at the sensor tier. The sensor’s job is to capture and transmit data to the edge device with minimal latency and minimal power consumption. Sensor data is the rawest form of personal information: heart rate, movement patterns, sleep cycles, voice recordings. It never leaves the local compute environment.

The edge device runs the majority of inference. Core Interaction models, Memory Care models, and the privacy-critical Domain Expert models run here. The Conversation Manager, Intent Classifier, Emotion Detector, Safety Monitor, and all six Memory Care models are edge-resident because their functions are the most latency-sensitive and the most privacy-critical. The Cognitive State Estimator processes behavioral signals that reveal cognitive function: response latency, linguistic complexity, error patterns. This data is among the most sensitive the system handles, and it never leaves the device. The Health Monitor, Sleep Analyzer, and Exercise Coach run on-device because they process continuous physiological data that should not leave. The Privacy Filter runs exclusively on-device and never in the cloud, because routing privacy screening through a cloud service defeats the purpose of privacy screening.

The quantization level varies by device tier. The primary deployment target runs all models at full precision or FP16. Lower-tier devices run models at 4-bit quantization. Wearable devices run only the most critical models at 2-bit quantization or use heuristic rule-based approximations for functions that cannot tolerate extreme compression. The quantization strategy preserves accuracy where it matters most: safety-critical models maintain higher precision even on constrained devices, while models that affect interaction quality but not safety can tolerate more aggressive compression.

The NPU local hub handles multi-model coordination when a query requires several models to collaborate. A complex health question might require the Intent Classifier, the MoC Router, the Medication Assistant, the Health Monitor, and the Response Generator to work in sequence. The local hub orchestrates this pipeline without cloud connectivity, managing model loading, context passing, and output composition on the local device’s neural processing unit. For the primary deployment target with 256 gigabytes of unified memory, the full model portfolio occupies less than 2 gigabytes, leaving abundant capacity for concurrent model execution.

The cloud handles complex reasoning that exceeds edge capacity. Long-form response generation for detailed health reports, cross-domain analysis requiring simultaneous context from many domains, and model updates all download from the cloud. Approximately 75% of queries are handled entirely locally. The remaining 25% require cloud participation, but even these queries send minimal context: the MoC Router compresses the context package (BMT-05.01) before any cloud transmission, and the consent layer strips any data the person has not authorized for cloud processing. The result is a 95% reduction in cloud compute costs compared to a cloud-first architecture, and a privacy posture that is structurally stronger than any cloud-only system can achieve.

FSSVA: federated validation without sharing data
#

The thirty-seven models running on thousands of edge devices need continuous validation. Are they performing correctly? Have they drifted from acceptable accuracy? Is a model producing outputs that deviate from expected patterns? Traditional validation requires collecting data centrally, which violates the privacy architecture. FSSVA solves this by federating deviation signals, not data and not model weights.

The Federated Sentinel-Surveillance Validation Architecture operates in two modes. Sentinel mode is lightweight monitoring: each edge node runs periodic validation checks against local held-out test cases and reports only a deviation score to the regional coordinator. The score is a scalar. It contains no patient data, no model weights, no interaction content. It says “this model’s performance on my local validation set has drifted by X percent from baseline.” The regional coordinator aggregates deviation scores across its node population and detects patterns: if deviation scores are increasing across many nodes simultaneously, something systematic is happening. A model update may have introduced a regression. A data distribution shift may be affecting a geographic region. A specific device configuration may be causing inference errors.

Active Surveillance mode triggers when deviation scores exceed thresholds. The affected node runs a more comprehensive validation suite locally and reports detailed deviation metrics, still without sharing patient data. The detailed metrics include per-task accuracy breakdown, latency distribution, and output quality scores. The regional coordinator uses these metrics to diagnose the deviation cause and determine whether a model update, a configuration change, or a targeted intervention is needed.

The mode switching follows an epidemiological model borrowed from public health surveillance. When a deviation cluster is detected in a geographic region, FSSVA increases monitoring density in that region, analogous to ring vaccination: concentrate surveillance resources around the outbreak to contain it before it spreads. Nodes adjacent to the deviation cluster shift from Sentinel to Active mode. Nodes in unaffected regions remain in Sentinel mode, conserving bandwidth and compute. The analogy is precise: in epidemiology, you do not test everyone in the country when an outbreak appears in one city. You test intensively in and around the affected area. FSSVA applies the same principle to model validation.

The mode transitions are automatic. A regional coordinator that detects a deviation cluster above threshold triggers Active mode for affected nodes without requiring human intervention. When deviation scores return to normal, nodes transition back to Sentinel mode. The system self-heals for transient issues and escalates to the cloud learning agent for systemic ones. Human engineers are involved only when the cloud learning agent determines that a model update is needed, a training data problem has been identified, or a hardware-specific issue requires investigation.

The bandwidth advantage is dramatic. Traditional federated learning shares model weight updates: millions of floating-point numbers per round. FSSVA shares deviation signals: a few scalars per validation cycle. Orders of magnitude less bandwidth. This matters for edge devices that may be connected over cellular networks, where bandwidth is metered and expensive, or over home internet connections, where upload speeds are limited.

Three-tier FSSVA topology
#

The FSSVA architecture organizes into three levels. Edge nodes are the individual devices running the model portfolio. Each edge node runs its own validation cycles and computes its own deviation scores. Regional coordinators aggregate deviation scores from hundreds or thousands of edge nodes in a geographic area. They detect geographic patterns, temporal patterns, and model-specific patterns. The cloud learning agent receives aggregated reports from regional coordinators, analyzes system-wide trends, and triggers model updates when systematic drift is detected.

The three-tier topology keeps the majority of validation traffic local. Edge nodes communicate with their regional coordinator, not with each other and not directly with the cloud. Regional coordinators communicate with the cloud only when they detect patterns that require system-wide analysis or model updates. Most validation cycles produce a deviation score that stays within normal range, triggers no alert, and generates no upstream traffic beyond the periodic summary report.

Equity-aware monitoring
#

The FSSVA monitoring allocation could, if left unmanaged, underserve populations that are already underserved. If deviation detection depends on the density of deployed devices, and device density correlates with income, then the system detects and corrects problems faster in wealthy neighborhoods than in poor ones. The same structural inequality that shapes healthcare access shapes model validation coverage.

The ISHI integration (BMT-11.04) addresses this by weighting monitoring allocation inversely to device density. Regions with fewer deployed devices receive proportionally more Active Surveillance cycles per node. The system spends more validation budget per person in underserved areas, not less. The equity-aware monitoring does not require knowing anything about the individual people in those areas. It requires knowing the device density and adjusting the monitoring allocation accordingly.

The honest limitation: equity-aware monitoring improves detection coverage. It does not fix the underlying model quality if the training data underrepresents the affected population. If the Medication Assistant was trained primarily on medication data for conditions prevalent in majority populations, it may perform worse for conditions prevalent in minority populations regardless of monitoring coverage. Monitoring can detect the disparity. Addressing it requires training data improvements, which is a problem for the training philosophy (BMT-06.04), not the monitoring architecture. The value of equity-aware monitoring is that it makes the disparity visible. Without it, model quality problems in underserved populations go undetected because monitoring density is lowest where problems are most likely.

Offline resilience
#

The edge-first architecture’s third promise is resilience: the system works when the internet does not. This promise is tested most during the moments when it matters most. Power outages affect internet connectivity. Severe weather disrupts cellular networks. Rural areas have intermittent coverage by default. The person who needs fall detection during a storm, medication reminders during a power outage, or cognitive support in a rural area with spotty coverage cannot wait for connectivity to return.

In offline mode, the system operates with full capability for edge-resident functions: safety monitoring, medication tracking, cognitive support, basic conversation, and health monitoring all continue without interruption. Cloud-dependent functions degrade: complex multi-domain queries are deferred, model updates do not download, and FSSVA validation cycles do not report to the regional coordinator. The system queues deferred requests and processes them when connectivity returns.

The MoC context layers (BMT-05.01) are fully available in offline mode because they are stored locally. The person’s identity, preferences, history, and deep knowledge do not require cloud access. The system knows Margaret just as well offline as online. It can do slightly less with that knowledge during an outage, but it does not forget who she is or what she needs.

The offline capability is not a fallback mode that the system reluctantly enters. It is a design target that shapes every architectural decision. Every model that runs a safety-critical function must fit on the edge device at acceptable quality. Every context layer must be stored locally. Every core workflow must complete without a network round-trip. The cloud enhances the system. It does not enable it.

Cross-References
#

BMT-07.01 Where Your Data Lives. Data residency as the storage complement to edge compute, showing how the physical location of data and processing align with the privacy architecture.

BMT-11.04 Population-Level Equity. ISHI equity monitoring as the framework that ensures FSSVA monitoring allocation does not replicate structural inequalities.

BMT-09.01 Where It Runs. Device tier deployment specifications that define the hardware targets for edge intelligence.

Technical Appendix BMT-06.03-A is available to partners and investors at partners.bluemirror.tech.

The edge/cloud boundary#

FSSVA: federated validation without sharing data#

Three-tier FSSVA topology#

Equity-aware monitoring#

Offline resilience#

Cross-References#