Chain-of-Thought Scaling: A New Paper Changes the Calculus
Researchers demonstrate that step-by-step reasoning scales more predictably than token-level next-token prediction — with significant implications for test-time compute.
A comparative look at how the latest frontier models perform across reasoning, coding, and multimodal tasks.
Latest
Researchers demonstrate that step-by-step reasoning scales more predictably than token-level next-token prediction — with significant implications for test-time compute.
Fortune 500 deployments accelerated sharply in Q1. We break down what is actually shipping versus proof-of-concept, and which verticals are leading.
Alignment techniques are advancing, but our ability to verify internal model behavior lags. A look at where interpretability research needs to go next.
MCP has shipped integrations across dozens of developer tools. We review what works, what does not, and where the spec should evolve.
Stay current
One email. The most important model releases, research findings, and industry moves — curated from llms.blog.