The rapid evolution of artificial intelligence demands rigorous benchmarking to assess AI agents’ true capabilities. Enter PaperBench, a groundbreaking benchmark designed to evaluate AI’s ability to replicate state-of-the-art research. This initiative challenges AI agents to reproduce 20 ICML 2024 Spotlight and Oral papers from scratch,...