Video pretrained transformer with an ensemble of experts