Learning Affordance, Environment And Interaction Representations By Watching People In Video