Efficient Policy Gradient Reinforcement Learning in High-Noise Long Horizon Settings