Entropy Blog

Peeking Inside Diffusion Language Models with LogitLens

June 04, 2025 • 10 min read

Lately, diffusion-based language models like LLaDA and MMaDA have been gaining traction. These aren’t your standard left-to-right text generators - they’re b...

Visualizing the Vocabulary of an LLM

April 25, 2025 • 8 min read

Disclaimer: unlike other posts in this blog that actually served some purpose, this is just a random idea I had and am implementing for fun. So if your qu...

LogitLens From Scratch With Hugging Face Transformers

October 28, 2024 • 12 min read

In this short tutorial, we’ll implement LogitLens to inspect the inner representations of a pre-trained Phi-1.5. LogitLens is a straightforward yet effective...

Vision Transformer in pure JAX.

October 18, 2024 • 27 min read

I decided to do this for two reasons. The first reason is that, for years, I had to bear my Ph.D. advisor coming into the lab while I was happily coding my P...

Visualizing Attention Maps in Pre-trained Vision Transformers (Pytorch)

October 01, 2024 • 9 min read

Goal: Visualizing the attention maps for the CLS token in a pretrained Vision Transformer from the timm library.

Short Notes on Types of Parallelism for Training Neural Networks

August 17, 2024 • 6 min read

As neural networks grow larger (see LLMs, though now it looks like we also have a trend towards smaller models with Gemma2-2b ) and datasets become more mass...

Efficiency Metrics in Machine Learning

July 01, 2024 • 12 min read

In the world of machine learning, efficiency is a buzzword we hear all the time. New methods or models often come with the claim of being more efficient than...

Flops with Pytorch built-in flops counter

June 01, 2024 • 2 min read

It is becoming more and more common to use FLOPs (floating point operations) to measure the computational cost of deep learning models. For Pytorch users, un...

Adaptive Computation Modules

February 24, 2024 • 5 min read

This brief post summarizes a project I have been working on over the past months. You can find further details about this work here

Manifold learning

December 01, 2023 • 2 min read

I stumbled across this concept a lot of times, so here I am writing a brief recap to myself.

Explainability for Graphs with Pytorch Geometric and Captum

November 10, 2023 • 1 min read

In this Colab Notebook we show how to use explainability methods on Graph Neural Networks.

Entropy and Self Information

November 05, 2023 • 3 min read

This post contains short notes on entropy and self information and why machine learning adopted them from information theory.

A Primer on Graph Neural Networks with Pytorch Geometric

September 10, 2023 • 1 min read

In this Colab Notebook we show how to train a simple Graph Neural Network on the MUTAG dataset.