News
In this study, we design and use evaluation models to both evaluate and autonomously refine the performance of digital agents that browse the web or control mobile devices. The evaluator and ...
If you are not using LLM-as-a-Judge, you can assign any value in the .env file to bypass the error.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results