
RightNow AI has released AutoKernel, an open-source framework that uses an autonomous LLM agent loop to automatically optimize GPU kernels for any PyTorch model. The tool aims to democratize high-performance GPU programming by eliminating the need for specialized kernel engineering expertise.
RightNow AI, a research-focused AI startup, has unveiled AutoKernel — an open-source framework designed to take one of the most painful bottlenecks in machine learning engineering and hand it over to an autonomous AI agent. The tool targets GPU kernel optimization for arbitrary PyTorch models, promising to eliminate the need for specialized low-level programming expertise that has long been a barrier to peak model performance.
At its core, AutoKernel is a system that wraps a large language model inside an iterative agent loop. You feed it a PyTorch model — whether it’s a custom transformer, a diffusion backbone, or a fine-tuned LLaMA variant — and the agent autonomously generates, benchmarks, and refines Triton kernels until it converges on faster implementations.
The research team published their findings in a detailed paper, framing the approach as a “set it and forget it” workflow. In practical terms, an engineer could kick off an optimization run before leaving for the night and return to meaningfully faster kernel code by morning. No hand-tuned CUDA. No weeks of profiling. Just an agent doing what agents do best: grinding through iteration at a pace humans cannot match.
The framework is fully open source, which is a deliberate strategic choice by RightNow AI to encourage community adoption and contribution. If you’ve been following our coverage of AI Agents Demand Better Governance Systems Now | 2026, you know this mirrors a broader trend of companies betting on transparency to build developer trust.
To understand why AutoKernel matters, you need to appreciate just how difficult it is to write efficient GPU code. A kernel, in this context, is a function designed to execute across thousands of parallel GPU cores simultaneously. When large models like GPT-2 or LLaMA perform inference, the majority of compute time is consumed by kernel-level operations — matrix multiplications, softmax calculations, attention mechanisms, and layer normalizations.
Getting these operations to run optimally requires deep knowledge of:
This is why kernel engineers are among the most sought-after (and expensive) specialists in the entire AI industry. Companies like NVIDIA employ entire teams dedicated to hand-crafting libraries like cuDNN and cuBLAS. The talent pool is extraordinarily shallow, and demand has only accelerated as model sizes balloon.
AutoKernel sits at the intersection of two powerful trends: the democratization of AI infrastructure and the rise of agentic AI workflows. By automating kernel optimization, RightNow AI is effectively commoditizing a skill set that has traditionally been locked behind years of specialized training.
Consider the implications for smaller teams. A three-person startup building a real-time inference product currently has two options: use off-the-shelf kernels from PyTorch and accept suboptimal performance, or spend months (and significant capital) hiring someone who can squeeze every last FLOP out of their hardware. AutoKernel proposes a third path — let an agent handle it.
This also has significant cost ramifications. GPU compute remains the single largest expense for most AI companies. Even marginal improvements in kernel efficiency — say, a 15-20% speedup on key operations — can translate directly into lower cloud bills and faster iteration cycles. For organizations training or serving models at scale, those savings compound rapidly.
RightNow AI isn’t operating in a vacuum. Several projects have explored automated kernel generation, including efforts from Meta’s compiler team and Google’s work on XLA optimizations. OpenAI‘s Triton language itself was designed to make GPU programming more accessible, though it still requires meaningful expertise to use effectively.
What distinguishes AutoKernel is its agent-based architecture. Rather than relying on static compiler heuristics or rule-based transformations, the system treats optimization as a search problem — iteratively generating candidates, profiling them against actual hardware, and using the results to inform the next round of generation. This closed-loop approach mirrors how human experts actually work, just at machine speed.
For deeper context on how autonomous agents are reshaping AI development workflows, check out our previous analysis on AI Agents Demand Better Governance Systems Now | 2026.
Several questions remain open as AutoKernel moves from research artifact to production tool:
RightNow AI’s AutoKernel represents a genuinely compelling vision: a world where high-performance GPU code isn’t a luxury reserved for teams with deep pockets and rare talent. By releasing the framework as open source, the team is making a bet that community-driven development will accelerate the tool’s maturity faster than any proprietary approach could.
Whether AutoKernel can deliver production-grade reliability remains to be proven at scale. But the direction is unmistakable — the era of manually hand-tuning GPU kernels may be entering its twilight. For ML engineers who have spent countless hours staring at NVIDIA Nsight profiler output, that’s a future worth watching closely.