Model choice actually did more work than the tools did ...
A new benchmark pitting AI against previously unseen maths problems shows systems still fall short of top human expertise.