作者
Luning Sun,Zijie Qin,Shan Wang,Xuetao Tian,Fang Luo
摘要
AbstractForced-choice questionnaires involve presenting items in blocks and asking respondents to provide a full or partial ranking of the items within each block. To prevent involuntary or voluntary response distortions, blocks are usually formed of items that possess similar levels of desirability. Assembling forced-choice blocks is not a trivial process, because in addition to desirability, both the direction and magnitude of relationships between items and the traits being measured (i.e., factor loadings) need to be carefully considered. Based on simulations and empirical studies using item pairs, we provide recommendations on how to construct item pairs matched by desirability. When all pairs contain items keyed in the same direction, score reliability is improved by maximizing within-block loading differences. Higher reliability is obtained when even a small number of pairs consist of unequally keyed items.Keywords: Forced-choice questionnaireThurstonian IRT modelsocial desirability Article informationConflict of interest disclosures: The authors have no conflicts of interest to disclose.Ethical principles: The authors affirm having followed professional ethical guidelines in preparing this work, which received ethical approval from the Institutional Review Board, Faculty of Psychology, Beijing Normal University (Reference numbers: 202012140058 and 202206030087), where the requirement for participants' consent was waived.Funding: This work was supported by Grant U1911201, 62207002 from the National Natural Science Foundation of China, and Grant 22YJAZH077 from the Humanities and Social Sciences Research of the Ministry of Education.Role of the funders/sponsors: The funder had no role in the design and conduct of the study; collection, management, analysis, and interpretation of data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.Acknowledgments: We are very grateful for the generous help we received from the Editor-in-Chief, Professor Alberto Maydeu-Olivares with the manuscript. We would also like to thank the anonymous reviewers as well as Dr Joe Watson and Dr Chia-Wen Chen for their comments on prior versions of this manuscript. The ideas and opinions expressed herein are those of the authors alone, and endorsement by the authors' institutions is not intended and should not be inferred.Notes1 In a very small number of trials where the Thurstonian IRT model was able to converge in Mplus, more than half of the factor loadings were not significant, suggesting local minima that failed to recover the item parameters. Such trials were also considered as non-convergence in our results.2 When calculating the ARBs of the factor loadings and the thresholds, we excluded certain outliers, which were defined as absolute values of above five for factor loadings and absolute values of above ten for thresholds.3 Under Maximum conditions, an item was paired up with the one that had the most distant factor loading among all SDI-matched items from a different trait. Item pairs were constructed sequentially. For each trial, 100 test forms were assembled using random items as the starting point, and the test form with the highest average within-pair loading difference was used in the simulation.4 Under Random conditions, in each item bank, we selected the same set of items that were used under the Maximum condition so that there was no difference in the values of the factor loadings between the corresponding trials in the two conditions.5 Unfortunately, the original file that contained all the rating data was missing. The SDs reported here were calculated based on the ratings of 43 participants.6 Due to a labelling mistake, one extra negative item was selected for the Emotional Stability domain.7 The decision for this arbitrary value was partially informed by the conditions in the simulation studies, where the average within-pair difference in the SDIs on a 5-point scale was mostly below 0.3.8 One Openness item with factor loading of 0.29 was included, as there were not enough Openness items with absolute factor loadings above 0.3.