Skip to main content
  1. The Orchestration Layer/

How a Request Becomes an Action

·2354 words·12 mins

Margaret Chen says it out loud, into the room, at 3:14 in the afternoon. “I think my blood pressure medication is making me dizzy.” Twelve words. The system has roughly five hundred milliseconds to produce a response that is medically responsible, emotionally appropriate, calibrated to Margaret’s communication preferences, and aware that Margaret’s daughter Sarah is listed as the primary caregiver contact for health concerns. The clock starts.

The architecture document for the orchestration layer describes components. This article shows them working. One request traces through the full stack: from natural language input, through intent classification, context routing, infrastructure agent activation, small language model inference, safety filtering, response synthesis, and delivery. The reader finishes understanding not just what the system does but how long each step takes, why each step exists, and what happens when a step fails.

This is the trace the technical due diligence reader cares about most. Not the theory. The execution.

Step 1: Intent classification
#

The Speech-to-Intent model, a 50M parameter model running in Zone 1 (the Local Pane in Margaret’s home), converts Margaret’s spoken sentence to a structured intent representation in roughly forty milliseconds. The output is not the words. It is a categorical vector indicating that the input is a self-reported symptom, the symptom relates to a medication, and the speaker is in a normal conversational register rather than a distressed one. The Voice Tone Analyzer running in parallel, also in Zone 1, confirms the conversational register. Raw audio never leaves Zone 1.

The Intent Classifier, a 150M parameter mixture-of-experts model in Zone 2 (the regional Community Pane node), then categorizes the request. Domain: healthcare. Sub-domain: medication side effects. Urgency: moderate. The category is symptom report, not emergency, because Margaret’s wording is exploratory rather than urgent and her tone confirms it. The classifier confidence is 0.92. The transmission from Zone 1 to Zone 2 carries the processed intent vector and the categorical tone result, not the raw audio.

The classification determines everything that follows. It decides which concierge agent activates, which Mixture of Context layers load, what autonomy rules apply to the response. A misclassification at this step propagates through the entire response. If the classifier read “dizzy” as a cognitive symptom, the cognitive concierge would activate instead of the health concierge. The response would be an orientation check rather than a medication review. Margaret would experience this as the system failing to understand her.

The architecture defends against this by triggering multi-path processing when classifier confidence falls below 0.85. At 0.92, the single path is taken. Total time elapsed: 50 milliseconds.

Step 2: MoC routing
#

The MoC Router, a 150M parameter mixture-of-experts model in Zone 2, selects which context layers to load for the response. The decision takes 22 milliseconds.

For a medication side-effect report, the router selects four layers. Layer 0 contains the core identity and is always loaded: Margaret, 78, female, English-speaking, prefers direct communication, healthcare autonomy 0.6. Layer 1 contains session context and is loaded: current interaction state, time of day at 3:14 PM (well outside sundowning hours), no prior conversation context from the last forty-five minutes. Layer 2 contains historical patterns and is loaded: communication preference for data-first responses, previous medication concerns including a question about metformin timing the previous month, no prior dizziness reports. Layer 3 contains deep knowledge and is loaded: full medication list, blood pressure history for the last six months, known allergies, Dr. Patel as cardiologist, last cardiology visit on March 4.

Layer 4, retrieval-augmented generation, is not activated. No external document retrieval is needed for the question at hand.

Total context package: approximately 800 tokens. The same query handled with naive context loading would pass approximately 5,000 tokens to the downstream skills. Token reduction: 84%, with relevance maintained at 95% for this query type. The reduction is not theoretical. It is what allows the latency budget to close.

Cumulative time: 72 milliseconds.

Step 3: H-layer delegation
#

The H-layer orchestrator in Zone 2 receives the classified intent and the context package. It makes delegation decisions in roughly 45 milliseconds.

Primary delegation: the Health Concierge, which routes to its Medication Manager infrastructure agent. The medication manager is the right hand for a self-reported medication side effect.

Supporting delegations, fired in parallel: the Symptom Monitor, to check for a dizziness pattern in recent vital signs and prior reports; the Cognitive State Assessor, to determine whether this is a cognitive concern presenting as a medication concern. The supporting delegations are not redundant. They are the cross-domain check that prevents the system from missing a non-obvious explanation.

Safety delegation: the Safety Filter pre-screens all outputs that will be generated. It runs once per response, immediately before delivery.

The H-layer also checks the Human Agency Scale. Margaret’s healthcare autonomy is 0.6. For a symptom report that does not require action, the system can respond autonomously. If the response will include a recommendation to stop taking the medication, the response requires Margaret’s explicit approval before being acted on. The H-layer notes the threshold and proceeds.

Cumulative time: 117 milliseconds.

Step 4: Infrastructure agent execution
#

Three infrastructure agents fire in parallel. They run on three different small language models. The total step time is determined by the slowest path, not the sum of paths.

The Medication Manager calls the Medication Advisor SLM, a 150M parameter model in Zone 2. The advisor checks the medication list against a drug interaction database. It identifies amlodipine, prescribed for blood pressure, as a known cause of dizziness, especially when combined with the furosemide diuretic Margaret also takes. The interaction is well-documented. The advisor confidence is 0.88. Latency: 73 milliseconds.

The Symptom Monitor checks recent vital signs data and prior symptom reports. Margaret’s blood pressure readings from her at-home cuff have trended lower over the past week, averaging 118 over 72 compared to 135 over 82 the previous month. The trend is consistent with the dizziness complaint and suggests the dose may be too high for her current blood pressure trajectory. The monitor flags the pattern with confidence 0.91. Latency: 58 milliseconds.

The Cognitive State Assessor evaluates the current conversation. Margaret is oriented. Her language is articulate. She is specific about the symptom and the timeline. She has correctly identified amlodipine as her blood pressure medication. The cognitive state is normal. This is a medication issue, not a cognitive issue. The assessor confidence is 0.94. Latency: 51 milliseconds.

All three agents complete within the 73 millisecond ceiling set by the Medication Advisor. The H-layer receives three structured results.

Cumulative time: 190 milliseconds.

Step 5: Response synthesis
#

The Response Generator, a 400M parameter transformer in Zone 2, receives a synthesis package from the H-layer. The package contains the medication interaction finding, the vital signs trend, the cognitive assessment, and Margaret’s P-RLHF preference profile. The profile says: data first, recommendation second, direct language, no infantilization, no medical disclaimers that read as legal liability protection.

The generator produces this response: “Your amlodipine and furosemide together can cause dizziness, especially since your blood pressure has been trending lower this week. The average over the last seven days is 118 over 72, compared to 135 over 82 last month. That difference is worth raising with Dr. Patel. I can prepare a summary for your next appointment. If the dizziness is happening when you stand up, we should flag that sooner.”

The response is 67 words. The data appears first per Margaret’s preference. The recommendation is concrete. The escalation path is clear. The hedge about positional dizziness is warranted by clinical guidance about orthostatic hypotension, which is a more urgent concern than baseline dizziness.

The Safety Filter, a 100M parameter mixture-of-experts model running in Zone 1, validates the response in 19 milliseconds as it returns through the Local Pane on the way to Margaret. The validation checks for diagnostic claims (none present), medication change recommendations (none present), and harmful or alarming content (none present). The response passes. Placing the Safety Filter at the Zone 1 boundary means every output is screened on the same device that captures Margaret’s input, before any response reaches her.

The Empathy Responder, a 200M parameter transformer, evaluated whether the response register matches Margaret’s emotional state. The Emotion Detector earlier in the trace flagged Margaret as concerned but not distressed. The response is informative and action-oriented, which matches the register. No adjustment is needed.

Total synthesis latency: 96 milliseconds.

Cumulative time: 286 milliseconds.

Step 6: Delivery and learning
#

The response is delivered to Margaret through the system’s text-to-speech path. Delivery latency is 47 milliseconds.

In parallel with delivery, three things happen. The P-RLHF system logs the interaction for learning. The signal will resolve as Margaret responds. If she engages with the appointment preparation offer, the system updates her preference vector to reinforce that actionable next steps are valued. If she asks a follow-up data question, the system reinforces that more data was wanted. If she dismisses both, the system reinforces that information-only responses are preferred for symptom concerns.

The Audit Trail Logger records the full interaction. Which infrastructure agents activated, which models fired, what context layers were loaded, what response was generated, at what time, at what confidence. The log is cryptographically signed. The log is what allows after-the-fact reconstruction if a question arises about why the system said what it said.

The Symptom Monitor adds the interaction to Margaret’s symptom history. The next time she reports something, the prior dizziness report will be in the context the router considers loading.

Total time elapsed from the start of Margaret’s sentence to the end of the system’s response: 333 milliseconds. Margaret experiences the response as immediate. The system has used roughly two-thirds of its budget. The clock stops.

The trace at launch
#

The trace above describes the target architecture for a full-stack subscriber at Phase 3 maturity. At launch (Phase 1), no Zone 1 or Zone 2 deployments exist for any subscriber. Every step in this trace runs through Zone 3 (the cloud reasoning layer). The orchestration logic is identical. The eleven computational steps still execute in the same order with the same decomposition. The latency budget increases because every inference step crosses the network to Zone 3 rather than running locally or regionally. The Privacy Filter, which would run in Zone 1 at maturity for subscribers with a Local Pane, runs in the platform’s coordinator layer at Phase 1 and validates the outbound context package before transmission to the Zone 3 provider. The response is screened by the Safety Filter (also in the coordinator layer at Phase 1) before delivery. The person sees a slower response than the target trace would deliver, but she sees the same product. The architect reviewing the trace sees a different substrate, with Zone 3 processing the privacy-filtered context under a healthcare data processing agreement and returning the structured result that the H-layer would have synthesized from a multi-zone trace at maturity. As Phase 2 brings Zone 1 online for subscribers who acquire a Local Pane, and as Phase 3 brings Zone 2 online for subscribers in served regions, the trace for those subscribers shifts toward the target latency profile above. For Zone 3-only subscribers, the trace remains as described in this paragraph in every phase. Zone 3 continues to be the substrate. The DPA continues to be the privacy posture. The latency profile for the Zone 3-only path is slower than the full-stack path, a consequence of serving subscribers who do not have local hardware.

What could have gone wrong
#

The trace above is the system working. The architecture also has to handle the system not working. Three failure scenarios are worth naming because they exercise the resilience design.

Intent misclassification. If the classifier had read “dizzy” as a cognitive symptom with confidence below 0.85, the multi-path processing would have triggered. Both the cognitive concierge and the health concierge would have activated, with the H-layer presenting both findings to the response generator and letting the synthesis logic select the right framing based on the additional evidence. The cost is roughly 50 milliseconds of additional latency. The benefit is that the system does not fail silently when the categorization is ambiguous.

Medication database gap. If Margaret’s medication list was incomplete because the system had not synced with the pharmacy in two weeks, the Medication Advisor could miss the amlodipine-furosemide interaction. The architecture defends against this with a staleness check. The advisor receives the timestamp of the last medication reconciliation. If the timestamp exceeds the freshness threshold, the advisor flags the staleness in its output. The synthesis path then produces a response that acknowledges the gap rather than producing a confident answer based on stale data.

Vital signs data missing. If Margaret’s at-home blood pressure cuff has not synced in a week, the Symptom Monitor cannot produce a trend analysis. The architecture handles this by allowing partial responses. The Symptom Monitor returns a structured null result with the reason. The Response Generator produces a response that omits the trend reference and notes that the trend would be available if the device were syncing. Margaret is asked to check the device. The response degrades gracefully rather than fabricating a trend.

The defenses are not optional. They are the difference between a system that works in the demo and a system that works in deployment. The full enumeration of failure modes per step, including timeout behavior, partial result handling, and the cascade rules when multiple steps degrade simultaneously, sits in the technical appendix.

Cross-references
#

The Brain and the Hands (BMT-02.01). The H-layer and L-layer architecture this trace demonstrates. The article specifies the design; this trace shows it operating.

The Health Concierge (BMT-01.02). The concierge agent activated in this trace. Series 01 describes the concierge from Margaret’s perspective; this trace shows what runs underneath.

The Five Layers (BMT-05.01). The Mixture of Context hierarchy the router selects from. The article specifies what each layer contains and how they update.

Sensor Fusion (BMT-07.03). How the vital signs data arrived at the Symptom Monitor. Series 07 covers the data architecture that feeds this trace’s evidence base.

Technical Appendix BMT-02.04-A is available to partners and investors at partners.bluemirror.tech.