Optimizing the Performance of Multi-threaded Linear Algebra Libraries Based on Task Granularity