DeepPHY integrates six diverse and challenging environments to evaluate interactive physical reasoning in agentic VLMs. Our results indicate that even state-of-the-art models have significant ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results