Search for a command to run...
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification