Towards Label Efficiency And Privacy Preservation In Video Understanding