Towards efficient deep neural network execution with model compression and platform-specific optimization