On compact deep learning with neural network architecture optimization