Skip to main content
  1. The Intelligence Layer/

Edge Intelligence

·3104 words·15 mins

The infrastructure architect reviewing the deployment plan asked the question that matters: where does the inference run? It depends on which zones the subscriber has access to. The architecture distributes intelligence across three zones, and a given subscriber may have one, two, or all three depending on her hardware situation and the regional deployment status. Zone 3 (the cloud reasoning layer) is always present. Zone 2 (a regional node) is present where deployed. Zone 1 (an in-home device) is present where the subscriber has acquired one.

This is not an “edge-first” architecture in the conventional sense, because edge intelligence requires hardware that not every subscriber will have. It is a three-zone architecture designed so that the deepest reasoning is available to every subscriber, with stronger privacy and lower latency available to subscribers who have access to Zone 1 and Zone 2.

Three requirements drive the decomposition. Each is satisfied differently for each deployment path.

Privacy is the first requirement. For subscribers with a Local Pane (Zone 1), the most sensitive signals process locally and never transit anywhere. Cognitive state assessments, emotional patterns, voice recordings, and raw behavioral signals stay in Zone 1. The consent architecture (BMT-05.05) enforces what is permitted to leave. The Zone 1 architecture ensures the most sensitive data never needs to leave in the first place. For subscribers without a Local Pane, the same signals are processed by Zone 2 or Zone 3 under the consent architecture and the healthcare data processing agreement governing those tiers. The privacy posture is weaker than Zone 1 provides; it relies on contract rather than on architectural data residency.

Latency is the second requirement. Safety-critical functions need sub-200-millisecond response. For subscribers with a Local Pane, Zone 1 inference eliminates the network hop entirely. For subscribers without a Local Pane, safety-critical functions route to Zone 2 or Zone 3 with the network round-trip absorbed in the latency budget through aggressive parallelism, caching, and prioritization. The budget is tighter, but it closes.

Resilience is the third requirement. For subscribers with a Local Pane, the device continues operating during network outages, running the Zone 1 model portfolio on local compute. For subscribers without a Local Pane, network connectivity is required for every interaction. This is a real limitation of the Zone 3-only path. Where resilience matters most, a Local Pane is the answer the architecture offers. Where a Local Pane is not affordable, connectivity becomes a precondition for the service.

What this architecture promises: the deepest reasoning is available to every subscriber. What it does not promise: that every subscriber gets the same privacy posture or the same offline resilience. Those depend on the zones she has.

The three-zone model
#

The target architecture distributes intelligence across three zones, each with a distinct role, a distinct privacy boundary, and a distinct relationship to the subscriber’s hardware situation.

Zone 1: the home. The Local Pane is an optional in-home edge device. When a subscriber has one, it runs the privacy-critical Tiny LMs: Safety Filter, Privacy Filter, Cognitive State Estimator, Emotion Detector, Speech-to-Intent, Voice Tone Analyzer, Orientation Assessor, and Confusion Detector. Approximately 850 million parameters, quantized to roughly 425 megabytes, running on an 8-to-16-gigabyte device. These models process the most sensitive signals the system handles: raw audio, cognitive fluctuation patterns, emotional state, and behavioral indicators that reveal things about the person she may not have articulated to anyone. For subscribers with a Local Pane, this data never leaves Zone 1. The Local Pane transmits processed signals upward, not raw data. The Cognitive State Estimator sends “cognitive state: normal” or “cognitive state: fluctuating, confidence 0.73” to the regional or cloud reasoning layer. It does not send the behavioral observations that produced that assessment. The distinction is architectural, not policy. The system cannot leak what it does not transmit.

Zone 1 is also where offline resilience lives for subscribers who have it. When the internet goes down, the Local Pane continues safety monitoring, medication reminders, cognitive support, and basic interaction. The subscriber’s MoC context layers 0 and 1 (BMT-05.01) are cached locally. The system knows her just as well offline as online for the layers Zone 1 caches. It can do less with that knowledge during an outage, but it does not forget who she is or what she needs.

Not every subscriber will have a Local Pane. The hardware is purpose-built and costs $150 to $300 at volume, which is a barrier some subscribers cannot or will not clear. The architecture is designed so that not having a Local Pane does not exclude a subscriber from the product. It changes the privacy posture and the offline resilience, not the availability.

Zone 2: the region. The Community Pane is a regional compute node serving 150 to 500 subscribers from a co-location facility, a care agency office, or a similar regional site. When deployed in a subscriber’s region, it runs heavy inference and most orchestration: Response Generator, Intent Classifier, Empathy Responder, all Domain Expert models, MoC Router, Escalation Classifier, Trust Evaluator, and the remaining Specialized Function models. It holds the full MoC context for each subscriber it serves, the P-RLHF individual preference models, and the session history. For subscribers with a Local Pane, Zone 2 receives only privacy-filtered data from Zone 1, never raw audio, cognitive signals, or emotional data. For subscribers without a Local Pane, Zone 2 (when present) receives data directly from the subscriber’s smartphone or other client device, under the consent the subscriber has granted.

Zone 2 does most of what Zone 3 does, but with limited compute capacity and a more bounded model portfolio. Cross-domain reasoning that fits within the regional node’s resources runs at Zone 2: the medication question that requires checking the drug interaction database, correlating with the blood pressure trend, and generating a response calibrated to the person’s communication preferences. Queries that exceed Zone 2’s scope, complex multi-domain reasoning, novel query types, or workloads beyond the regional capacity, escalate to Zone 3.

Zone 2 is also optional from the subscriber’s standpoint. Regional deployment depends on whether a Community Pane node exists in the subscriber’s area, which in turn depends on whether enough subscribers in that region justify a node, whether a hosting partner (PACE facility, care agency, co-location provider) is available, and whether the rollout has reached that region. Subscribers in regions without a deployed Community Pane are served directly by Zone 3.

Zone 3: the cloud reasoning layer. Always present. Performs deep inference, complex multi-domain reasoning, and orchestration that exceeds Zone 2’s capacity or covers queries the regional SLM portfolio does not yet handle. Zone 3 is the ceiling, not the basement. It is the deepest reasoning the system can perform on any query, regardless of which other zones are present for a given subscriber.

Zone 3 is also the sole inference tier for subscribers who do not have a Local Pane and who live outside a Zone 2 service area. For those subscribers, the cloud reasoning layer serves every query from the simplest medication reminder to the most complex care coordination. The product is the same product. The privacy posture is weaker because the subscriber’s data transits to Zone 3 for every inference (under the healthcare data processing agreement governing the cloud layer), and the offline resilience is weaker because every interaction requires connectivity. The architecture does not exclude this subscriber. It serves her through Zone 3.

The full-stack subscriber has Zone 1 + Zone 2 + Zone 3. Maximum privacy, lowest latency, deepest reasoning. The Zone 2 + Zone 3 subscriber has no Local Pane but is served by a regional node, with Zone 3 for deep reasoning. The Zone 3-only subscriber has no Local Pane and no regional node in her area. The cloud reasoning layer serves her end-to-end.

The three-zone architecture is what makes this range of deployment paths possible. A single-zone architecture cannot offer the same product to subscribers with different hardware situations. The decomposition is what makes the equity claim real: the deepest reasoning is available to every subscriber regardless of hardware affordability.

The phased rollout
#

The three-zone target architecture comes online in phases, not all at once. The phasing matters for diligence because it determines what the system actually does today, what it does in twelve months, and what it does at maturity.

Phase 1: Zone 3 only. Every subscriber is served by the cloud reasoning layer. No Local Pane device. No regional Community Pane. The commercial API operating under a healthcare data processing agreement is Zone 3 at this phase. It handles every query for every subscriber, from privacy-critical to deep reasoning. The privacy posture rests on the DPA: no retention beyond the inference request lifecycle, no use for the provider’s own model training, HIPAA technical safeguard compliance, audit rights, geographic residency constraints. The DPA is strong but not as strong as the architectural data residency that Zone 1 and Zone 2 will eventually provide for those who have them.

Phase 2: Zone 1 optional. Subscribers who receive a Local Pane device gain Zone 1. A small portfolio of privacy-critical Tiny LMs deploys locally: Safety Filter, Privacy Filter, Cognitive State Estimator, Emotion Detector, Speech-to-Intent. These are V0.5 models pretrained on synthetic data through the pipeline described in BMT-06.04. Zone 3 continues to handle everything else, including most inference. Zone 2 may or may not exist depending on regional rollout status. Subscribers without a Local Pane continue to be served entirely by Zone 3, with the same product, the same Zone 3 inference, the DPA-based privacy posture.

Phase 3: full SLM portfolio. The trained proprietary SLM portfolio deploys to Zone 1 (for subscribers with a Local Pane, an expanded set of locally-running models) and to Zone 2 regional nodes (the heavy inference portfolio described above). Zone 3 continues. It still performs deep inference, complex multi-domain reasoning, and orchestration. What changes is that Zones 1 and 2 absorb the routine work for subscribers who have them, leaving Zone 3 to focus on the queries that exceed regional capacity and to serve subscribers who remain Zone 3-only. Zone 3 is not retired. It is the deepest reasoning the system can perform, in all phases.

At Phase 3 maturity, a query distribution looks like this for a subscriber with all three zones: 15 to 20 percent of inference runs in Zone 1, 55 to 60 percent in Zone 2, the balance in Zone 3 including all queries that exceed regional capacity. For a Zone 2 + Zone 3 subscriber, the Zone 1 fraction shifts to Zone 2 and Zone 3 depending on the privacy classification of the inference. For a Zone 3-only subscriber, 100 percent of inference runs in Zone 3 in every phase.

At launch, edge intelligence does not yet exist in any subscriber’s home. Zone 3 handles everything for everyone. Over 24 to 36 months, edge intelligence expands for subscribers who have or who acquire Local Panes, and regional inference expands for subscribers in regions where Community Panes deploy. The architecture grows. Zone 3 remains the deep-reasoning ceiling for everyone, including the subscribers who never acquire a Local Pane and who never live near a deployed regional node.

FSSVA: federated validation without sharing data
#

The models running across hundreds or thousands of deployments need continuous validation. Are they performing correctly? Has a model drifted from acceptable accuracy? Is a specific device configuration producing anomalous outputs? Traditional validation requires collecting data centrally, which violates the privacy architecture. The Federated Sentinel-Surveillance Validation Architecture solves this by federating deviation signals, not data and not model weights.

FSSVA operates in two modes. Sentinel mode is lightweight monitoring: each deployment runs periodic validation checks against local held-out test cases and reports only a deviation score to a regional coordinator. The score is a scalar value. It contains no patient data, no model weights, no interaction content. It says “this model’s performance on my local validation set has drifted by X percent from baseline.” The regional coordinator aggregates deviation scores across its population and detects patterns: if scores increase across many deployments simultaneously, something systematic is happening. A model update may have introduced a regression. A data distribution shift may be affecting a geographic region.

Active Surveillance mode triggers when deviation scores exceed thresholds. The affected deployment runs a more comprehensive validation suite locally and reports detailed deviation metrics, still without sharing patient data. The detailed metrics include per-task accuracy breakdown, latency distribution, and output quality scores. The regional coordinator uses these to diagnose the deviation cause and determine whether a model update, a configuration change, or a targeted intervention is needed.

The mode switching follows an epidemiological model borrowed from public health surveillance. When a deviation cluster is detected in a geographic region, FSSVA increases monitoring density in that region, analogous to ring vaccination: concentrate surveillance resources around the outbreak to contain it before it spreads. Nodes adjacent to the deviation cluster shift from Sentinel to Active mode. Nodes in unaffected regions remain in Sentinel mode, conserving bandwidth and compute. The analogy is precise: in epidemiology, you do not test everyone in the country when an outbreak appears in one city. You test intensively in and around the affected area.

The mode transitions are automatic. A regional coordinator that detects a deviation cluster above threshold triggers Active mode for affected nodes without requiring human intervention. When deviation scores return to normal, nodes transition back to Sentinel mode. The system self-heals for transient issues and escalates to the cloud learning agent for systemic ones. Human engineers are involved only when the cloud learning agent determines that a model update is needed, a training data problem has been identified, or a hardware-specific issue requires investigation.

At launch (Phase 1), no Zone 1 or Zone 2 models exist to monitor. FSSVA monitors the quality of Zone 3 inference indirectly: response quality tracking, latency distribution, and consistency checks against held-out validation sets. The Zone 3 provider’s internal model quality is the provider’s responsibility, governed by the SLA in the DPA, not BlueMirror’s. As Zone 1 Tiny LMs deploy to subscriber Local Panes in Phase 2, FSSVA begins monitoring them. As Zone 2 regional nodes deploy in Phase 3, FSSVA monitoring expands to cover their full SLM portfolio. The FSSVA three-tier topology (edge nodes, regional coordinators, cloud learning agent) maps naturally to the zones once they are deployed: Zone 1 devices are the edge nodes, Zone 2 regional nodes host the coordinators, and the cloud learning agent runs in BlueMirror’s infrastructure separate from the Zone 3 inference layer.

Equity-aware monitoring
#

The FSSVA monitoring allocation could, if left unmanaged, underserve populations that are already underserved. If deviation detection depends on the density of deployed devices, and device density correlates with income, then the system detects and corrects problems faster for subscribers in wealthy neighborhoods than for subscribers in underserved areas. The same structural inequality that shapes healthcare access shapes model validation coverage.

The ISHI integration (BMT-11.04) addresses this by weighting monitoring allocation inversely to device density. Regions with fewer deployed devices receive proportionally more Active Surveillance cycles per node. The system spends more validation budget per person in underserved areas, not less. The equity-aware monitoring does not require knowing anything about the individual people in those areas. It requires knowing the device density and adjusting the monitoring allocation accordingly.

Equity-aware monitoring improves detection coverage. It does not fix the underlying model quality if the training data underrepresents the affected population. If a Domain Expert SLM was trained primarily on interaction patterns from majority populations, it may perform worse for minority populations regardless of monitoring coverage. Monitoring can detect the disparity. Addressing it requires training data improvements, which is a problem for the training pipeline (BMT-06.04), not the monitoring architecture. The value of equity-aware monitoring is that it makes the disparity visible. Without it, model quality problems in underserved populations go undetected because monitoring density is lowest where problems are most likely.

The expanding envelope
#

The edge intelligence story is a twenty-four to thirty-six month arc, but it is not the story of Zone 3 retiring. Zone 3 stays. What expands is Zone 1 (for subscribers who acquire a Local Pane) and Zone 2 (for regions where a Community Pane deploys). The deep reasoning Zone 3 always provided continues to be available to every subscriber.

At Phase 1, Zone 3 handles every query for every subscriber. There is no edge intelligence in any home. At Phase 2 (month 12 to 18), the first Tiny LMs deploy to Local Panes for subscribers who have them. Privacy-critical signals process locally for those subscribers; everything else still routes through Zone 3. At Phase 3 (month 18 to 36), Zone 2 regional nodes deploy in served regions, the full SLM portfolio comes online, and the inference distribution for subscribers with all three zones reaches the 15-20 / 55-60 / balance pattern described above. For Zone 3-only subscribers, Zone 3 continues to handle 100 percent of inference.

At month thirty-six, the architecture has the full three-zone topology available, but not every subscriber sits inside it. The system serves three deployment paths in parallel: full-stack subscribers (Zone 1 + Zone 2 + Zone 3), regional subscribers without a Local Pane (Zone 2 + Zone 3), and Zone 3-only subscribers. Each path is a first-class deployment, not a degraded fallback. The architecture is designed so that the deepest reasoning is available to every subscriber regardless of which zones they have.

The person does not see any of this. She sees an AI concierge that responds quickly, knows her well, and respects her privacy. Whether the inference behind her response runs on a device in her living room, on a server forty miles away, or in a data center two thousand miles away is invisible to her. The orchestration layer (BMT-02.01) makes the substrate disappear. What matters to her is the response. What matters to the architect is that the architecture grows in a way that improves the privacy posture and latency for subscribers who have access to Zone 1 and Zone 2 hardware, without ever cutting off the subscribers who do not.

Cross-References
#

BMT-07.01 Where Your Data Lives. Data residency as the storage complement to the three-zone compute model, showing how the physical location of data and processing align.

BMT-11.04 Population-Level Equity. ISHI equity monitoring as the framework that ensures FSSVA monitoring allocation does not replicate structural inequalities.

BMT-06.04 The Training Philosophy. The synthetic-to-proprietary pipeline that produces the SLMs whose edge deployment this article describes.

BMT-02.03 The Thirty Models. The SLM portfolio distributed across the three zones, including the launch-versus-target portfolio distinction.

Technical Appendix BMT-06.03-A is available to partners and investors at partners.bluemirror.tech.