Search for a command to run...
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination