Frontier evals are converging. That tells us less than you think.
Every model on the leaderboard is within a few points of every other. Here's what that does — and doesn't — mean for which one you should ship.
1 min
Articles
Every model on the leaderboard is within a few points of every other. Here's what that does — and doesn't — mean for which one you should ship.
Years of patient mech-interp work just turned a corner. The result is the most concrete reason for optimism in alignment research in years.