Accelerating Data Parallel Applications via Hardware and Software Techniques