The Negotiation Sandbox

Table of Contents

Marcus manages vendor relationships for a hospital network that is integrating AI scheduling across seventeen facilities. He has watched three AI vendor integrations fail in the past four years, and each time the failure followed the same pattern: the system worked in demos and in controlled testing, and then someone in the production environment found an edge case that was not in the spec, and the edge case was a negotiation state that no one had anticipated, and the negotiation state produced an outcome that was either wrong or unverifiable. The problem was not the algorithm. The problem was that nobody could prove what had happened.

His question about BlueMirror’s integration architecture was specific: when his scheduling system and BlueMirror’s health concierge agent negotiate an appointment, what exactly is logged, and what can he prove after the fact if something goes wrong?

The answer is the negotiation sandbox.

Why unstructured negotiation is unsafe
#

Agent-to-agent negotiation without structured isolation is an inherently unsafe interaction. Consider what can go wrong when two agents exchange messages in an open channel. A pharmacy agent communicating with a scheduling agent can observe timing correlations: a prescription refill request that precedes every specialist appointment reveals that the appointment is medication-related without any health data being explicitly shared. An adversarial agent that stalls a negotiation indefinitely can prevent the person from getting a better offer elsewhere while the stall continues. An insurance agent that communicates with a vendor agent through a side channel, outside the primary negotiation thread, can coordinate against the person’s interests without any of that coordination appearing in the negotiation record. An agent can treat the absence of a rejection as acceptance, quietly advancing a commitment through the person’s agent’s silence.

None of these failure modes require malicious intent. Some are the natural result of agents optimizing independently for their own objectives. Some are the result of specification gaps. All of them share a common property: they are harder to detect and impossible to prove without a record that covers the full negotiation.

The sandbox exists to create that record and to prevent the conditions that make the failure modes possible.

The shared state space
#

Every negotiation sandbox contains a shared state space that both agents read and write to within defined rules.

Proposals and counterproposals from both sides live in the shared state. Either agent can see what the other has offered, because transparency in the state space is a feature, not a bug: an agent that cannot see the current state cannot negotiate effectively, and an agent that modifies the state without the other seeing it is not negotiating but manipulating. Agreed terms are marked tentative until both agents explicitly accept the full agreement. Points of contention, places where the agents have not yet found common ground, are documented as such.

What never enters the sandbox is raw context. The Memory of Context data, the five-layer structure that holds Margaret’s complete situation, does not exist inside the sandbox. The internal agent brings into the sandbox only the context that the exploration bounds permitted for this interaction type at this trust tier. The hospital scheduling agent negotiating inside the sandbox sees what the bounds allowed: scheduling constraints, accessibility requirements, time preferences. It does not see, and cannot infer from sandbox state, anything beyond what the Context Gate Controller permitted. The sandbox is isolated from the internal agent’s full context by design.

Five enforced rules
#

Five rules govern every sandbox, and the membrane enforces all five without exception.

Complete logging is not optional. Every message, every proposal, every counterproposal, every state change, and every agreement is logged with cryptographic signatures from both agents and the membrane itself. The log is tamper-evident: any modification to the record after the fact is detectable. When Marcus asks what happened during a negotiation, the answer is the log, and the log is provable.

No side-channel communication is permitted during an active sandbox. From the moment a sandbox opens until it closes, the membrane blocks all alternative communication paths between the two agents. A hospital scheduling agent cannot send a direct API call to BlueMirror’s systems while a sandbox negotiation is underway. A vendor agent cannot reach a BlueMirror partner channel to influence the negotiation through a different pathway. The sandbox is the only channel. Everything that matters to the negotiation happens inside it.

Timeout enforcement prevents indefinite stalling. Every sandbox has a temporal bound set at creation, based on the interaction type. Routine appointment scheduling: 30 seconds from invitation to agreement. Standard procurement negotiation: five minutes. Complex multi-party care coordination: up to 24 hours, with defined check-in intervals. An agent that does not reach agreement within the bound does not get more time by waiting. The sandbox closes. No agreement is reached. The person is notified if the interaction was consequential.

Commitment on explicit acceptance means that tentative agreements are not binding agreements. An agent cannot treat a proposal as accepted because the other agent did not reject it. Acceptance requires an explicit acceptance message from both agents within the sandbox. Inside the sandbox, there is no silence. There is only explicit state.

Human escalation is available at any point. Either agent can flag an impasse, a proposed term that exceeds the internal agent’s commitment authority, or any condition that warrants the person’s direct involvement. When escalation happens, the person sees the current sandbox state exactly as it stands: what has been proposed, what has been agreed tentatively, where the disagreement sits. She makes the decision with full visibility into the negotiation.

The optional mediator
#

Multi-party negotiations use an optional mediator that changes the dynamic without weakening the sandbox model.

Care coordination involving a hospital, a pharmacy, a transportation provider, and an insurance plan is too complex for bilateral negotiation. Each pair of agents negotiating separately would require coordinating six separate sandboxes with no mechanism for cross-sandbox consistency. The multi-party sandbox addresses this by creating a single shared state space with a mediator agent.

The mediator can be a trusted third-party agent from a neutral party, or a BlueMirror system agent. Its role is to propose compromises, identify Pareto improvements that no single agent would propose because they cannot see all parties’ constraints, and break deadlocks when two agents are stuck. The mediator sees the shared state space, which contains what all parties have permitted to be shared. It does not see any party’s private context. A mediator that proposes “Thursday 2pm with hospital transport provided” has proposed that because of what it can see in the shared state, not because of private information about any party.

Mediator interventions are logged with the same cryptographic requirements as every other sandbox event. The mediated agreement is no different in its audit properties from a bilateral agreement.

Sandbox lifecycle
#

Creation happens when the internal agent determines that an interaction requires structured negotiation rather than a simple request-response exchange. The membrane creates the sandbox, assigns the exploration bounds for this interaction, sets the timeout, and sends an invitation to the external agent. The external agent’s acceptance of the invitation is itself logged.

Execution is the negotiation. Both agents interact within the rules. The membrane monitors in real time, not as a passive observer but as an active enforcer: if a message violates the no-side-channel rule, the membrane blocks it. If a proposed term would require disclosing context beyond what the bounds permit, the internal agent cannot include it. If the timeout approaches, the membrane signals both agents that time is limited.

Closure by agreement requires both agents to explicitly accept the full set of agreed terms. The acceptance messages are signed by both agents and the membrane. The committed actions are queued for execution by the relevant concierge agents. The sandbox is archived.

Closure by timeout produces no agreement and no commitment. The person is notified if the interaction warranted escalation. The agent’s failure to reach agreement within the timeout is noted, though a single timeout is not itself a trust violation.

Closure by violation is the most consequential outcome. If one agent violates sandbox rules, the sandbox terminates immediately. No agreement stands. The violating agent’s trust tier drops according to the severity of the violation. The person is notified. The full audit trail is preserved and flagged for review.

Marcus received the answer to his question. Every message was logged. Every agreement required explicit acceptance. Every violation was recorded with a cryptographic trail that could not be modified after the fact. He noted that this was the first integration architecture he had reviewed where the failure modes were as carefully specified as the success modes.

Cross-References
#

Bounded Exploration (BMT-03.03). The exploration bounds enforced inside every sandbox.

Attack Resistance (BMT-03.06). What happens when an agent violates sandbox rules.

The Audit Trail (BMT-07.04). The cryptographic logging architecture that underpins sandbox records.

Context Packaging for Experts (BMT-08.03). How the sandbox model extends to human expert interactions.

Technical Appendix BMT-03.04-A is available to partners and investors at partners.bluemirror.tech.

Why unstructured negotiation is unsafe#

The shared state space#

Five enforced rules#

The optional mediator#

Sandbox lifecycle#

Cross-References#