Notes on Generalization Theory

How do we know if a machine learning model will perform well on unseen data? What happens if you continue to add samples to the dataset? Is it better to have a more complex model or a simpler one? These questions have been around for many years and are central to the field of statistical learning theory. Generalization theory provides mathematical guarantees and bounds on the generalization capability of families of functions. I have prepared a few notes on the basics of generalization theory. The only prerequisite is probability theory, and it is intended to be self-contained. It includes the most important results, such as Dudley’s Theorem and McDiarmid’s Theorem. ...

January 4, 2026 · 1 min · Daniel López Montero

Fusion Energy Simulation: Tokamak

I think there are 3 major milestones remaining for humanity, and one of them is clean and abundant energy. Fusion energy has the potential to provide a nearly limitless source of clean energy by replicating the processes that power the sun. However, achieving controlled fusion reactions on Earth has proven to be a formidable and very challenging task. The most well-known fusion prototype is the tokamak, which uses a magnetic field to confine the plasma within a toroidal chamber (doughnut-shaped). ...

December 20, 2025 · 19 min · Daniel López Montero

Self-Attention, Kernel Methods and G-Metric Spaces

For some time, I’ve been thinking about how to generalize self-attention mechanisms. Most existing attention mechanisms rely on pairwise similarities (dot products) between query and key vectors. However, higher-order relationships (involving triples or tuples of elements) could capture richer interactions. I then found that several people are already exploring this idea under the name “higher-order attention” [5]. However, this approach comes with a performance cost. Traditional self-attention has a complexity of O(n^2), while higher-order attention is even more computationally expensive. In this post, I’d like to share my perspective on this topic, connecting it with kernel methods and generalized metric spaces. ...

October 30, 2025 · 17 min · Daniel López Montero

The Importance of (Good) Metrics

Initially, I wanted this post to focus solely on metrics in machine learning. However, the concept of metrics is far more universal, and it doesn’t make sense to treat it as an isolated problem. This is more of a philosophical post, the ultimate goal is to make you think and reflect. We live surrounded by metrics: grades from teachers, performance reviews from employers, publication counts in academia, FLOPs in computing, ELO in chess, salary, IQ for intelligence, movie ratings on IMDB, book ratings on Goodreads, stars/reviews on Amazon, election results in democracies, F1 score in machine learning, GDP for countries, EBITDA in finance, likes/followers on Instagram, time spent on TikTok for content recommendation algorithms, etc. ...

October 16, 2025 · 9 min · Daniel López Montero

Gaussian Processes

Gaussian Process Regression is one of the most elegant and theoretically rich algorithms in machine learning. With this post, I want to celebrate the mathematical beauty underlying Gaussian Processes. I will divide this post into two sections: theory and practice, accompanied by code examples. One of the key advantages of Gaussian Processes compared to Deep Learning methods is that they inherently provide interpretability (through confidence intervals and uncertainty estimation). They also offer excellent extrapolation properties, as we will see, and a way to incorporate knowledge about the structure of the data into the model. However, these benefits come at a cost. The algorithm has a wide variety of hyperparameters that are difficult to configure; for instance, kernel selection alone is challenging. Understanding and having a good intuition for the inner workings of this algorithm (and the data) is key to making the most of it. ...

October 9, 2025 · 17 min · Daniel López Montero

Analog computing in LLMs

A few days ago, this paper was published in Nature [1] claiming a huge improvement in LLM inference using analog computing: “Our architecture reduces attention latency and energy consumption by up to two and four orders of magnitude” This could mean we may soon run LLMs on devices no bigger than smartphones while consuming less power than a light bulb. After taking a deep dive, I believe this might turn out to be one of the most influential results of 2025. ...

September 14, 2025 · 3 min · Daniel López Montero