Preprint on geometric coupling between routers and experts in sparse mixture-of-experts models.
Research Engineer, LLM Inference
Sage Ahrac
I am a Research Engineer at IBM Research and an MSc student at Tel Aviv University, fortunate to be advised by Mor Geva. I work on distributed LLM inference serving, especially KV-cache management, scheduling, and cache-aware routing. Separately, I am drawn to mechanistic questions around Mixture-of-Experts routing geometry and latent-space monitoring. I also like building small cloud apps and agentic tools that sometimes work.
Selected work
Added vllm launch render, highlighted in the vLLM
v0.18.0 release.
Added LoRA-aware prefix-cache routing for multi-adapter inference.
Created sparse tensor delta encoding for autonomous-driving data in Daft.