Bandit Learning Problems In Recommendation Systems: Self-Reinforcing User Preferences, Delayed Feedback, And Online Learning To Rank