Today's Large Audio Language Models (LALMs) are stuck in an offline paradigm: you hand them a complete audio clip, wait, and get a reply. Streaming audio models exist, but each one only handles a ...
SDPG is the main contribution. It extends GRPO with an exact per-token forward KL between the actor (without privileged context) and itself conditioned on privileged context c: ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results