Tackling Choke Point Induced Performance Bottlenecks in a Near-Threshold GPGPU