Writeup · · Nils & Herman
Joining NVIDIA Inception
Tenura just joined NVIDIA Inception. A feasibility report on bringing up an NVIDIA L4 from bare metal — how far we got, where we stopped, and what that taught us about where the lease layer runs on NVIDIA's stack.
We just joined NVIDIA Inception. This is not a certification announcement — Inception is NVIDIA’s startup program. The more interesting question is downstream of that: what would it mean for Tenura’s lease layer to be recognized inside NVIDIA’s stack?
Below is the feasibility report we ran on NVIDIA L4 to start finding out.
What we’re building
Tenura’s core abstraction is the lease: a capability-bound, time-limited grant of access to a hardware resource that expires by default. When a lease expires, the resource is reclaimed; if reclamation fails, the resource is fenced, meaning no new lease can be issued against it until an operator clears the fence. Cleanup that “usually happens” is not cleanup. We would rather block a new lease than risk leaking the last one’s state into it.
Once leases are the unit of access, GPUs — and memory, and NVMe, and fabric — can be shared between tenants with the same determinism today’s hyperscalers get from rebooting between workloads, but at sub-second granularity, without losing utilization.
If leases are going to be a real hardware-boundary primitive, the question that matters is where they run: below the vendor driver, where resource ownership actually begins, or above it, integrating with the driver’s lifecycle. We have spent the last several months finding out, empirically.
What we built on NVIDIA L4, and where we stopped
We ran a feasibility pass against an NVIDIA L4 to see how far a vendor-driver-free bare-metal initialization path goes on production datacenter silicon. The short version: it goes further than we initially expected, and not as far as we hoped.
What worked, reproducibly:
- Userspace MMIO to BAR0 from an unbound device, byte-exact against the nvidia kernel driver’s first-touch capture
- BAR0 reads with
0xbadfin the upper 16 bits classified as NVIDIA’s privilege-lockout sentinel — distinct from PCI0xFFFFFFFF“no device.” Took us a while to spot the pattern. - SEC2 Booter firmware extracted from the OGKM BINDATA archive (raw DEFLATE, not XZ — an easy wrong turn), uploaded to SEC2 IMEM, matching the driver’s expected image to the byte
- WprMeta handoff and radix3 page tables composed correctly enough that the GSP RISC-V supervisor boots and reports alive
The unlock for all of it was almost embarrassingly small. Sixteen attempts to populate SEC2 IMEM had failed silently. One line in our kernel module fixed it on attempt seventeen:
pci_set_master(pdev); // 16 attempts of silence; 1 line.
We spent a stretch blaming AWS Nitro’s hidden IOMMU on g6.xlarge for blocking GPU DMA. It wasn’t Nitro. It was the line above.
What did not work: the next step. Past GSP-alive, the path runs into NVIDIA’s closed signed-firmware infrastructure and PLM-gated internal state. The full GSP RPC round-trip — the boundary where the host actually starts dispatching compute work — requires either custom HS-signed firmware (which only NVIDIA can produce) or an alternative interface defined in cooperation with NVIDIA. Neither is a shortcut available to us.
We closed the feasibility report on that result. The host-visible handoff artifacts we produce byte-match the kernel driver’s. The locked doors are inside the chip.
What that taught us
Two things:
- The vendor-driver-free path on NVIDIA datacenter silicon is gated by NVIDIA, not by engineering effort on our side. Closed signed firmware is a deliberate security property, and the locks work as intended.
- For NVIDIA workloads, the production path for the lease layer runs above the nvidia kernel driver, not under it —
fabricbiosdintegrating with the driver lifecycle, leases bounding CUDA contexts, capability tokens scoping access.
The customer-facing implication is simple: Tenura’s NVIDIA support does not depend on bypassing NVIDIA’s security model.
For full vendor-driver-free exploration of the lease primitive — the path that runs below any vendor driver — we are targeting silicon where the doors are not locked from the inside: AMD MI300X and Intel Gaudi, gated by early openness audits we are working through now.
Where Inception fits
The answer to “what would NVIDIA recognition of Tenura look like?” splits into two paths.
The near-term path is the one that already works: the lease layer running above CUDA and the nvidia kernel driver on production hardware. Inception’s credits and ecosystem access help us get that path tested at scale on real NVIDIA infrastructure.
The longer-term path is the conversation our feasibility report points at: what would it look like for a lease-bounded initialization path to coexist with NVIDIA’s signed-firmware substrate, on terms NVIDIA is comfortable with? That is not a request for an exception. It is the kind of architecture conversation that becomes more possible because of Inception.
If you’re at NVIDIA and any of this is in your area, we’d like to talk: [email protected], or [email protected] directly.
We’ll write more here as we learn.