Search for a command to run...
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning