Grounding robot motion in natural language and visual perception