Fair and diverse data representation in machine learning