We consider a dynamic combinatorial optimization problem where at each time step, the decision maker selects a subset of cardinality K from N possible items and observes a feedback in the form of the index of one of the items in the said subset or none. Each of the N items is ascribed a certain value (reward), which is collected if the item is chosen. This problem is motivated by that of assortment selection in online retail, where items are products. Akin to that literature, it is assumed that the choice of the item given the subset is governed by a multinomial logit (MNL) choice model whose parameters are a priori unknown. The objective of the decision maker is to maximize the expected cumulative rewards over a finite horizon T or alternatively, minimize the regret relative to an oracle that knows the MNL choice model parameters. We formulate this problem as a multiarmed bandit problem that we refer to as the MNL-bandit problem. We present a Thompson sampling-based algorithm for this problem and show that it achieves near-optimal regret as well as attractive empirical performance. Funding: S. Agrawal is supported in part by the Division of Civil, Mechanical and Manufacturing Innovation [NSF Grant 1846792]. V. Goyal is supported in part by the Division of Civil, Mechanical and Manufacturing Innovation [NSF Grants 1351838 and 1636046]. A. Zeevi is supported in part by the Division of Computer and Network Systems [NSF Grant 0964170] and the United States-Israel Binational Science Foundation [Grant 2010466].