Skip to main content
  1. The Concierge Architecture/

The Health Concierge

·2399 words·12 mins

Margaret’s cardiologist sees her once a year for thirty minutes. He is a thoughtful physician, attentive to her chart, careful with her medications. He has nine hundred patients. The arithmetic is unforgiving: his attention to Margaret, summed across the year, comes to thirty minutes. The remaining 525,570 minutes belong to no one.

This is the gap the health concierge addresses. Not the thirty minutes; those belong to the cardiologist, and the architecture must not cross into them. The 525,570. The minutes when Margaret’s blood pressure climbs three points across four readings and no one notices. The minutes when her weight gain begins three weeks before the ER visit and no one connects the two. The minutes when her medication timing slides from 7:00 to 7:45 to 8:30 because no one is watching. The cardiologist decides. The concierge watches.

The health concierge is the most complex single agent in the BlueMirror system. Six infrastructure agents, five small language models, a mixed autonomy profile that runs high for routine monitoring and low for care transitions. The complexity is not gratuitous. It maps to the complexity of the gap.

Six infrastructure agents, one voice
#

From Margaret’s perspective, the health concierge is one entity. She asks her health concierge whether she should take her morning blood pressure medication if she already took her ACE inhibitor an hour ago. She does not address the Medication Manager. She does not invoke the Medication Advisor SLM. She speaks to her health concierge.

Beneath the surface, six infrastructure agents do the work. The Medication Manager owns reminders, refills, adherence tracking, and interaction checking against a continuously updated drug database. It runs at 0.75 autonomy on the edge: most of its actions execute without human approval, and most of its computation happens on the local device because the data is sensitive and the latency requirements are tight. The Symptom Monitor tracks reported symptoms across time, looks for pattern signatures (the five-day fatigue trend that often precedes infection in elderly diabetics, for example), and generates alerts when patterns cross thresholds. It runs at 0.5 autonomy on the edge: alerts go through human review by default, but routine logging and trend tracking are autonomous.

The Vital Signs Analyst ingests blood pressure, blood glucose, weight, oxygen saturation, and heart rate from connected devices, computes trends across windows from days to months, and flags deviations against the person’s own baseline. It runs at 0.75 autonomy on the edge. The Exercise Monitor tracks activity from wearables and ambient sensors, assesses mobility patterns (gait variability, transfer time from sit to stand, stair climbing rate), and detects regression that might warrant clinical attention. It runs at 0.5 autonomy on the edge. The Appointment Coordinator manages scheduling across the person’s clinicians, transportation logistics, pre-visit preparation (medication lists, symptom summaries, questions to ask), and post-visit follow-up. It runs at 0.5 autonomy across edge and cloud. The Care Transition Manager handles the most consequential and most regulated category: discharge planning, home services coordination, and the transition between care settings. It runs at 0.25 autonomy in the cloud. Every meaningful action requires human approval.

The voice the user hears is one. The infrastructure agents underneath are six because the work decomposes into six distinct concerns with distinct autonomy profiles, distinct latency targets, and distinct privacy requirements. Decomposition is not a UI choice. It is the structure of the work.

The SLM stack
#

The health concierge calls five small language models. The model selection is as deliberate as the agent decomposition.

The Medication Advisor runs at 150 million parameters and targets under 75 milliseconds inference. Its job is drug interaction checking, contraindication evaluation, and dose-adjustment reasoning given comorbidities. The 150M parameter count is the smallest size at which the model reliably handles the combinatorial space of polypharmacy reasoning for the elderly population (where the median patient takes seven concurrent medications). Smaller models miss interactions. Larger models add latency that pushes interactive medication checks past the threshold where Margaret loses patience and stops asking.

The Cognitive State Estimator runs at 200 million parameters, targets under 75ms inference, and is shared across the health concierge and the cognitive concierge. Its job is to assess the person’s current lucidity from conversational signals: response coherence, vocabulary range, time-orientation cues, repetition patterns. The output drives behavioral adaptation across the system. The model is sized at 200M because cognitive state inference from conversation is harder than medication interaction checking. The shared deployment across concierge agents is what allows one assessment to drive thirteen adjustments, a structural feature discussed in BMT-01.07.

The Safety Filter runs at 100M parameters and targets under 25 milliseconds. It validates every output before delivery. Its job is to catch responses that could cause harm if delivered: medication advice that contradicts the clinical record, instructions that exceed the agent’s authority, content that could endanger a person in cognitive distress. The 25ms target is non-negotiable. The Safety Filter sits in the response path on every interaction. Latency here multiplies across every other model in the stack.

The Intent Classifier runs at 150M parameters and targets under 50ms. Its job is to route the request to the right capability. “Should I take my evening medication if I missed the morning dose?” is a Medication Advisor query. “I feel dizzy” is a Symptom Monitor query plus a possible escalation. “What did the doctor say last week about my potassium?” is an Appointment Coordinator query against the post-visit summary. Misrouting at this stage cascades into the wrong response from the wrong specialist.

The Response Generator runs at 400M parameters and targets under 100ms. Its job is to compose the conversational output once the specialist models have done their work. It is the largest of the five because conversational quality at the surface determines whether Margaret keeps engaging. Models smaller than 400M produce prose that the population we serve experiences as cold, halting, or robotic. The threshold is empirical, not aesthetic.

Total SLM footprint for the health concierge: 1.0 billion parameters across five models, with cumulative inference budget of approximately 325 milliseconds for a typical query that touches all five paths. The footprint is what allows the entire stack to run on consumer hardware (a current-generation tablet or capable phone) without cloud round-trips. Edge deployment is what makes the privacy guarantees credible: the medication context never leaves the device.

What runs autonomously, and what does not
#

The autonomy gradient is the design surface where the health concierge’s ethics meet its engineering. Every action maps to a position on the scale. The gradient follows risk, not convenience.

Medication reminders execute autonomously. The agent decides when to remind, how to phrase the reminder, whether to escalate if the person ignores the first prompt. The risk of an autonomous reminder is low (the person can always decline the action) and the cost of human-in-the-loop on every reminder would be absurd.

Refill requests to the pharmacy execute autonomously with notification. The agent reorders the prescription, applies any patient assistance program discount it has discovered, and notifies Margaret that the refill is on the way. The notification is not a request for permission. It is a record of action. Margaret can reverse the action. The default is action.

Symptom pattern alerts to family follow a delay protocol. Routine alerts queue for 24 hours before family notification, allowing the person to address the underlying condition or update the agent on the symptom’s resolution. Urgent alerts (chest pain pattern, fall risk, signs of stroke) skip the delay and notify both the person and the designated family contact immediately. The delay is not bureaucratic. It is dignity-preserving: most concerning symptoms resolve, and a family member who learns about every concerning symptom learns to ignore the alerts.

Appointment rescheduling requires human confirmation. The agent surfaces a proposed reschedule, presents the implications (the rescheduled date is six weeks later than the original, which means the lab work needs to move too), and waits for Margaret to confirm. The cost of an unauthorized reschedule is high enough (missed care, downstream consequences) that the autonomous default would be wrong.

Care transition planning requires human approval at every step. When the discharge planner at the hospital generates a home services recommendation, the Care Transition Manager organizes the recommendation into a structured proposal: which services, what schedule, what cost, what insurance coverage, what alternatives. Margaret or her designated proxy approves each component. The agent does not execute the transition. It manages the decision space.

The reason is regulatory and ethical. Care transitions are where elderly patients are most often harmed by mistakes. The medication reconciliation gap between hospital and home accounts for a documented share of preventable readmissions. An autonomous agent that gets this wrong creates legal exposure that the architecture must not accept. The autonomy default is 0.25 not because the agent is incapable but because the consequences of action without approval are too severe.

The clinician interface
#

The health concierge integrates with the clinical record through FHIR R4. What it reads, what it writes, and what it never does are the three boundaries that define the integration.

The agent reads the active medication list, the problem list, the allergy list, recent labs, and the post-visit notes from the person’s care providers. The read is constrained: only data the patient has consented to share with the agent, only providers the patient has explicitly added, only records relevant to the agent’s function. The MoC (Model of Context) consent architecture (Series 05) governs the granularity.

The agent writes adherence data, symptom reports, vital signs trends, and structured pre-visit summaries back to the providers who accept FHIR write-back from patient-initiated agents. The set of providers who accept this is currently small. As of mid-2026, FHIR write-back from patient-initiated sources is a partial reality at major academic centers and integrated systems and a future state for most community practices. The architecture is built for the partial reality and ready for the full one. What the agent writes today is more limited than what it will write in twelve to eighteen months as integrated systems extend their FHIR endpoints.

The agent never diagnoses. It never prescribes. It never contradicts a clinical order. The line between “monitoring and decision support” and “medical practice” is the regulatory line that triggers FDA medical device classification. Crossing that line changes the architecture from a software product into a regulated device subject to 510(k) clearance, post-market surveillance, and a different liability regime. The product roadmap holds the line.

The architectural consequence: when Margaret asks “Should I increase my blood pressure medication?” the health concierge does not answer the question as posed. It surfaces the relevant data (the trend, the recent readings, the historical context), prepares a structured question for her cardiologist, and offers to send the question through the patient portal. The decision is the cardiologist’s. The preparation is the agent’s. The boundary is not a limitation imposed on an otherwise capable system. The boundary is what makes the system capable of operating at all.

The cognitive capacity overlay
#

The health concierge’s behavior is not static. It adapts continuously to the person’s cognitive state, driven by the Cognitive State Estimator that the cognitive concierge also calls.

When the estimator reports a high-capacity day, the agent operates at full conversational complexity, expects multi-turn dialogue, and surfaces nuanced information. When the estimator reports a low-capacity day (the response patterns suggest disorientation, the vocabulary has narrowed, the response time has lengthened), the agent simplifies. Reminders use shorter sentences. Questions become more concrete. The visual layout, on devices that support it, shifts toward larger fonts and fewer options. The agent does not announce the change. The person does not receive a notification saying “your cognitive score is lower today.”

The continuous adaptation is what makes the agent usable across the trajectory of cognitive change. A system that requires the person to switch modes is a system that fails when the person’s capacity to switch modes itself declines. A system that adapts continuously is one the person can use without ever becoming aware that adaptation is happening.

The honest limitation: the Cognitive State Estimator is a 200M parameter model performing inference from conversational signal. It is not a clinical diagnostic. It cannot distinguish between mild cognitive impairment, a low-sleep day, depression, and an acute infection. When the model detects significant deviation from the person’s baseline, it does not diagnose. It surfaces a question to the person’s clinical care team and adjusts the agent’s behavior in the meantime. The clinician decides. The agent watches.

What the health concierge cannot do
#

The health concierge cannot replace the clinician. It cannot diagnose, prescribe, or contradict orders. It cannot reach across into the territories of the buying agent, the financial concierge, or the legal advocate without the user’s consent and the right context handoff. It cannot operate when the underlying clinical record is unavailable: no FHIR endpoint, no integration. In that case it operates on patient-reported data alone, which is a meaningful but reduced capability.

It cannot detect what it cannot sense. The five-day fatigue trend that precedes the infection is detectable because Margaret reports her energy each morning. The fall risk that emerges from gait variability is detectable because Margaret’s wearable measures it. The medication timing slide that worsens her hypertension is detectable because the medication reminders log compliance. The architecture is honest about what depends on what. A health concierge running without sensor data is a different system than the one described above. It is still useful. It is not the same.

The next article describes the buying agent: the concierge that demonstrates the structural inversion BlueMirror represents, the agent with zero seller bias, and the membrane architecture that protects the buyer during agent-to-agent negotiation.

Cross-References
#

What Changes When the AI Knows Your Health (BML-01 series). The editorial framing of the health concierge from the user’s perspective, including the daily-life consequences of the architectural decisions described here.

The Cognitive Concierge (BMT-01.07). Shares the Cognitive State Estimator with the health concierge and demonstrates how one model drives behavioral adaptation across multiple agents.

The Escalation Hierarchy (BMT-04.04). Details the autonomy framework that governs which actions execute autonomously, which require notification, and which require approval, applied here to the health domain.

The Health Record Integration (BMT-07.02). The FHIR integration architecture that the clinician interface depends on, including write-back patterns and the regulatory boundary.

Technical Appendix BMT-01.02-A is available to partners and investors at partners.bluemirror.tech.