llms.blog

Frontier Model Benchmark Sweep: Where Things Stand in 2025

A comparative look at how the latest frontier models perform across reasoning, coding, and multimodal tasks.

April 28, 20258 min

Latest

Research
Chain-of-Thought Scaling: A New Paper Changes the Calculus
Researchers demonstrate that step-by-step reasoning scales more predictably than token-level next-token prediction — with significant implications for test-time compute.
April 25, 20256 min
Industry
Enterprise AI Adoption: Q1 2025 State of the Market
Fortune 500 deployments accelerated sharply in Q1. We break down what is actually shipping versus proof-of-concept, and which verticals are leading.
April 22, 20257 min
Safety
Constitutional AI and the Interpretability Gap
Alignment techniques are advancing, but our ability to verify internal model behavior lags. A look at where interpretability research needs to go next.
April 18, 202510 min
Tools
Model Context Protocol: Six Months In
MCP has shipped integrations across dozens of developer tools. We review what works, what does not, and where the spec should evolve.
April 15, 20259 min

Stay current

One email. The most important model releases, research findings, and industry moves — curated from llms.blog.