Fall 2025 Poster Honorable Mention Submitted January 2026

μInference: From Minimal Stack to SL5 Weight Enclave

Vansh Agrawal, Samruddhi Rahate, Godric Aceves

Mentored by Luis Cosio

Working report from the SPAR program. May not reflect the authors' current views.

Abstract

This report investigates whether frontier-class AI inference can run on a radically minimized software stack while pursuing Security Level 5 (SL5) style protections for model weights against nation-state adversaries. We began with μInference v0, a security-by-simplification experiment that measured how much of a conventional inference stack (kernels, libraries, orchestration) can be removed while still producing useful inference. As SL5 guidance in the ecosystem evolved to emphasize Weight Enclaves, low-bandwidth boundaries, and accelerator-centric machine security, we expanded the project from pure minimization into an end-to-end Weight Enclave prototype. Our current design anchors isolation in the formally verified seL4 microkernel and uses a CAmkES-based virtual machine to host a minimal Linux guest built with Buildroot. Inside the guest, μInference runs the Rust Candle inference engine on an NVIDIA GPU. A bandwidth-limited egress proxy and an allow-by-exception policy around weight access reduce the feasibility of bulk model exfiltration. We compare attack-surface indicators (compiled lines of code, dependencies, reachable services) and lightweight security tests against a baseline GPU container stack, and we highlight the main limitation today: competitive GPU inference still requires a full Linux guest and a large GPU driver stack in the weight-touching trust boundary.