A new paper discusses the challenges of evaluating AI agents and offers solutions for better benchmarking. 'AI agents that matter' outlines proposals for cost-controlled evaluations, accuracy-cost optimization, and improved standardization to advance practical AI agent development.