News

In this study, we design and use evaluation models to both evaluate and autonomously refine the performance of digital agents that browse the web or control mobile devices. The evaluator and ...
If you are not using LLM-as-a-Judge, you can assign any value in the .env file to bypass the error.