Mark Collier briefed me on two updates under embargo at KubeCon Europe 2026 last month: Helion, which opens up GPU kernel ...
A from-scratch PyTorch implementation of TurboQuant (ICLR 2026), Google's two-stage vector quantization algorithm for compressing LLM key-value caches — enhanced with a comprehensive, research-grade ...