计算机科学
范围(计算机科学)
情报检索
偏爱
选择(遗传算法)
排名(信息检索)
学习排名
搜索引擎
考试(生物学)
相关性(法律)
度量(数据仓库)
人口
机器学习
数据挖掘
统计
数学
人口学
古生物学
社会学
程序设计语言
法学
生物
政治学
作者
Mark Sanderson,Monica Lestari Paramita,Paul Clough,Evangelos Kanoulas
标识
DOI:10.1145/1835449.1835542
摘要
This paper presents results comparing user preference for search engine rankings with measures of effectiveness computed from a test collection. It establishes that preferences and evaluation measures correlate: systems measured as better on a test collection are preferred by users. This correlation is established for both "conventional web retrieval" and for retrieval that emphasizes diverse results. The nDCG measure is found to correlate best with user preferences compared to a selection of other well known measures. Unlike previous studies in this area, this examination involved a large population of users, gathered through crowd sourcing, exposed to a wide range of retrieval systems, test collections and search tasks. Reasons for user preferences were also gathered and analyzed. The work revealed a number of new results, but also showed that there is much scope for future work refining effectiveness measures to better capture user preferences.
科研通智能强力驱动
Strongly Powered by AbleSci AI