Jerome is the clinical informatics director at a home care agency that has been deploying remote patient monitoring for seven years. In that time he has worked through three major platform failures. They all followed the same pattern: the system made a decision it should not have made alone, and the human intervention arrived too late. The system was not designed to escalate. It was designed to decide.
His evaluation of BlueMirror centered on a single question: at what point does the system stop deciding and start involving people? The answer is a five-level hierarchy. The interesting part, he found, was not the levels themselves but the failure mode analysis for each one. A framework that only describes the levels but not what goes wrong when you pick the wrong level is not an operational framework. It is a diagram.
Five levels with failure modes
Level 1 is fully automated. The system decides and acts with no notification. Medication reminder timing, ambient temperature adjustment, routine calendar updates, grocery re-orders within established preferences: these are Level 1. The criteria are strict: the action must be routine, reversible, low-stakes, within established patterns, and covered by domain consent. The failure mode if this level is applied to the wrong action is that the person did not want it taken but was not asked. Recovery is to undo the action, log the error, and adjust the automation threshold for that action type.
Level 2 is Act and Notify. The system decides, acts, and tells the person by the end of the day. Medication refill placed, grocery substitution made, appointment rescheduled due to a provider change: these are Level 2. The criteria are that the action is routine but involves an external commitment, is reversible within a reasonable window, and is something the person should know about even if she does not need to approve it. The failure mode is that the person would have decided differently but the action is already in progress. Recovery is to reverse where possible and reclassify that action type to Level 3.
Level 3 is Recommend and Wait. The system recommends an action and waits for approval before acting. Switching pharmacies for thirty dollars per month in savings, changing a Medicare plan during enrollment, scheduling a specialist appointment: Level 3. The criteria are that the action involves significant commitment, is difficult to reverse, involves financial impact, or crosses domain boundaries. The failure mode in the other direction: the person misses a time-sensitive opportunity because the system waited instead of acting. Recovery is escalation timeout, described below.
Level 4 is Present and Defer. The system presents information and defers entirely. It does not recommend. Complex financial decisions with multiple valid options, care transition planning, family coordination decisions with relational implications: Level 4. The criteria are that multiple reasonable options exist, the system cannot determine the person’s preference with confidence, and the stakes are high enough that a wrong recommendation is worse than no recommendation. The failure mode is that the person wanted guidance, not just information. Recovery is straightforward: offer to recommend if asked. The system does not withhold its judgment permanently. It withholds it by default when the decision is genuinely ambiguous and the downside of a wrong recommendation is high.
Level 5 is Emergency. The system acts immediately, notifies emergency contacts, and bypasses normal consent and autonomy boundaries. Fall detected with no response, vital signs indicating an acute event, wandering detected outside a safe zone: Level 5. The criteria are immediate safety risk. The failure mode is a false positive that triggers an unnecessary emergency response. Recovery requires immediate person reassurance, a false positive analysis, and threshold adjustment to reduce recurrence. False positives are embarrassing and erosive of trust. False negatives in this category are life-threatening. The architecture accepts false positive risk to prevent false negatives, and invests in multi-signal convergence to keep the false positive rate manageable.
How the system chooses
The Escalation Classifier SLM, a dedicated 100-million-parameter model running in under fifty milliseconds, evaluates every pending decision against five criteria.
Reversibility: can this be undone? Reversible actions can be automated. Irreversible actions escalate by default.
Stakes: what is the downside if the system gets it wrong? Low stakes automate. High stakes escalate. The system has a calibrated stakes model per domain and per action type that updates as it learns from the person’s responses.
Precedent: has the system made this specific decision before and was the person satisfied with the outcome? Strong positive precedent supports automation. Novel situations escalate. Precedent is tracked per action type, not per domain in aggregate: fifty successful appointment scheduling decisions do not create precedent for an insurance enrollment decision.
Domain sensitivity: the domain modifier from the Human Agency Scale adjusts the classification. The healthcare domain escalates more by default. Entertainment escalates less. The modifier reflects the asymmetry of consequences across domains.
Cognitive state: if the Cognitive State Estimator indicates reduced capacity, escalation thresholds adjust in a specific direction. The system does not simply escalate everything when cognitive capacity is low. It distinguishes between decisions the person can still make well (meal preference, activity choice, simple scheduling) and decisions that now exceed current capacity (financial commitments, consent modifications, medication changes). For the former, the system continues to surface decisions with simpler language and clearer options. For the latter, the system acts more conservatively and engages the safe default rather than pressing the person for a decision she may not be equipped to make reliably.
The cognitive state paradox
The most difficult calibration in the escalation hierarchy is what to do when cognitive capacity declines. The intuition that declining capacity means more escalation is wrong in a specific way.
A person experiencing a difficult cognitive day who is presented with ten decisions that all require her approval does not receive those decisions with more care and consideration than she would on a clear day. She receives them with less. Decision fatigue sets in faster. The options feel more overwhelming. The temptation to approve everything just to make the questions stop is stronger. Over-escalating to a person with reduced cognitive capacity creates confusion and decision fatigue that may produce worse outcomes than careful automated action would have.
The correct calibration is domain-specific conservative action: the system acts on what it can act on safely and conservatively, reduces the number of decisions surfaced to the person, and makes those remaining decisions clearer and simpler. It does not ask more. It asks less, but asks it better.
This continuous calibration is not a mode switch. There is no “reduced capacity mode” that the system enters as a state. There is a continuous assessment from the Cognitive State Estimator that adjusts every escalation decision in real time. The effect is gradual. The person does not experience a shift. She experiences a system that seems to require less of her on harder days, which is exactly the intended experience.
Escalation timeout
When the system asks and the person does not respond, time does not pass indefinitely. Domain-appropriate timeouts govern what happens next, and the timeout action is always the safer option.
Healthcare: four hours for routine decisions, immediate action for anything urgent. After a routine timeout, the safe default is to keep what is current: do not change the appointment, do not change the medication, maintain the status quo.
Financial: twenty-four hours for routine decisions, four hours for time-sensitive matters. After timeout, no action. Financial inaction is safer than financial action when the person has not responded.
Social: forty-eight hours. After timeout, no action. Social invitations can wait without harm.
Emergency: no timeout. The system acts immediately.
The timeout default is never the more aggressive action. The system does not default to the larger commitment, the newer plan, or the faster path when the person does not respond. It defaults to maintaining the current state or taking no action. This is a design choice with a clear rationale: the cost of inaction in most domains is delay, which is recoverable. The cost of wrong action is commitment, which is often not.
Person override of escalation level
The escalation hierarchy is the default. The person’s preferences are the override, within one hard limit.
Margaret can tell the system to stop asking her about grocery substitutions. That moves routine substitution decisions from Level 3 to Level 1 for her account. She can tell the system to always ask before scheduling anything with a specific physician. That moves those scheduling decisions from Level 2 to Level 3 for that specific provider. The escalation levels are adjustable per action type, per provider, per domain, and per the person’s expressed preference.
The one non-overridable escalation is Level 5. The system will always act in a life-threatening situation regardless of the person’s autonomy settings, regardless of her notification preferences, regardless of any prior instruction to “stop bothering me about my blood pressure.” Some protections are not configurable away by the person, because their purpose is to protect the person from situations she cannot assess from inside them.
Jerome’s evaluation concluded that the failure mode analysis was the part that distinguished the hierarchy from diagrams he had seen elsewhere. Every level had a named failure mode. Every failure mode had a recovery path. The system that only describes its correct behavior has not been designed for the real world.
Cross-References#
The Human Agency Scale (BMT-04.01). The HAS settings that inform escalation defaults across domains.
Cognitive Capacity and Consent (BMT-04.05). The deeper treatment of capacity-dependent escalation and the decision-maker transition.
When Agents Disagree (BMT-02.06). Conflict resolution between concierge agents as a specialized form of escalation.
The Cognitive Concierge (BMT-01.07). The agent architecture producing the cognitive state assessment that adjusts escalation thresholds in real time.
Technical Appendix BMT-04.04-A is available to partners and investors at partners.bluemirror.tech.
