commentary2 articles

commentary

Articles

Frontier evals are converging. That tells us less than you think.
Every model on the leaderboard is within a few points of every other. Here's what that does — and doesn't — mean for which one you should ship.
April 25, 20261 min
Open weights is fine. Open data is the actual fight.
Llama and Mistral and DeepSeek give us weights. They do not give us reproducibility. The next decade of AI policy turns on the difference.
April 20, 20261 min