The Agent Stack #028 — Monday Build
The Pentagon just gave us the biggest signal yet about where AI agents are heading. Last Friday, they signed deals with OpenAI, Google, Microsoft, AWS, Nvidia, and xAI to deploy AI on classified networks. Anthropic got left out over usage disputes. This isn’t just procurement news - it’s your blueprint for building enterprise-grade agents that handle sensitive data.
Why This Matters for Builders
Government contracts always preview enterprise requirements 18 months early. The Pentagon’s criteria tell us exactly what Fortune 500 CISOs will demand:
- Air-gapped deployment: Your agent must run completely offline
- Audit trails: Every decision needs cryptographic logging
- Data residency: No cloud dependencies for sensitive operations
- Multi-vendor strategy: Avoid single points of failure
The exclusion of Anthropic over “usage terms disputes” is the smoking gun. Claude’s constitutional AI approach apparently didn’t meet classified requirements. That suggests the Pentagon prioritises control over safety guardrails.
Build Pattern: The Classified Agent Architecture
Here’s how to architect agents for high-security environments:
class SecureAgent:
def __init__(self, security_level="SECRET"):
self.local_llm = self._load_air_gapped_model()
self.audit_logger = CryptographicLogger(security_level)
self.data_vault = IsolatedStorage()
def process_request(self, query, context):
# Log every interaction with tamper-proof signatures
session_id = self.audit_logger.start_session(query)
# Process locally - no external API calls
response = self.local_llm.generate(
prompt=self._sanitise_prompt(query),
context=self._validate_context(context)
)
# Cryptographic proof of processing
self.audit_logger.log_decision(
session_id, response, self._get_reasoning_chain()
)
return self._apply_output_filters(response)
The key insight: separate your reasoning engine from your data plane completely. Use local models like Code Llama 34B or Mistral 7B running on dedicated hardware.
For the audit system, implement Merkle trees for tamper-proof logs. Every agent decision gets cryptographically signed and chained to previous decisions. No retroactive editing possible.
Infrastructure Decisions
Model Selection: The Pentagon chose vendors who can ship models, not just APIs. OpenAI’s partnership suggests GPT models will get on-premises versions. Google’s inclusion points to Gemini Pro getting air-gapped deployment.
Hardware Requirements: Nvidia’s inclusion isn’t just for training. They’re providing the H100 clusters these models need to run locally. Budget £50k minimum for serious classified-level inference.
Network Architecture: Think zero-trust but more paranoid. Every component assumes every other component is compromised. Use hardware security modules for key management.
Quick Hits
• Anthropic’s exclusion: Their constitutional AI requires ongoing training updates that conflict with air-gapped requirements. Lesson: build agents that improve through local feedback loops, not cloud retraining.
• xAI’s inclusion: Musk’s new company made the cut despite being months old. Their differentiator appears to be “maximum truth-seeking” without safety constraints - exactly what military applications need.
• Multi-vendor strategy: Pentagon isn’t picking winners. They want redundancy across providers. Build your agent architecture to swap LLM backends without rewriting business logic.
One Thing to Try
Build a local audit system this week. Create a simple agent that logs every decision to a local SQLite database with SHA-256 hashes linking each entry. Test it with a basic task like document classification. You’ll quickly discover what enterprise security teams actually care about: provenance, not performance.
The government’s AI requirements today become your customer’s requirements tomorrow. Start building for classified-level security now.