Inference

A few days ago, this paper was published in Nature [1] claiming a huge improvement in LLM inference using analog computing: “Our architecture reduces attention latency and energy consumption by up to two and four orders of magnitude” This could mean we may soon run LLMs on devices no bigger than smartphones while consuming less power than a light bulb. After taking a deep dive, I believe this might turn out to be one of the most influential results of 2025. ...