In this article we propose several modeling choices to extend propensity score analysis to clustered data. We describe different possible model specifications for estimation of the propensity score: single-level model, fixed effects model, and two random effects models. We also consider both conditioning within clusters and conditioning across clusters. We examine the underlying assumptions of these modeling choices and the type of randomized experiment approximated by each approach. Using a simulation study, we compare the relative performance of these modeling and conditioning choices in reducing bias due to confounding variables at both the person and cluster levels. An applied example based on a study by Hughes, Chen, Thoemmes, and Kwok (2010) is provided in which the effect of retention in Grade 1 on passing an achievement test in Grade 3 is evaluated. We find that models that consider the clustered nature of the data both in estimation of the propensity score and conditioning on the propensity score performed best in our simulation study; however, other modeling choices also performed well. The applied example illustrates practical limitations of these models when cluster sizes are small.