Multimodal Knowledge Integration For Object Detection And Visual Reasoning