A Pathwise Optimization Approach for Reinforcement Learning in Merchant Energy Operations