估计员
计算机科学
公制(单位)
线性回归
数据挖掘
回归分析
变量(数学)
回归
统计
数学
人工智能
计量经济学
机器学习
运营管理
数学分析
经济
作者
Mengke Qiao,Ke‐Wei Huang
标识
DOI:10.1287/isre.2020.0977
摘要
There is a surge of interest in social science studies in applying data mining methods to construct variables for regression analysis. For example, text classification was applied to classify whether the review is subjective or objective. The derived review subjectivity was used as an independent variable in the regression to examine its impact on review helpfulness. In the classification phase of these studies, researchers need to subjectively choose a classification performance metric for optimization. No matter which performance metric is chosen, the constructed variable still includes classification error because the variable cannot be classified perfectly. The misclassification of constructed variables will lead to inconsistent estimators of regression coefficients in the following phase. To correct the estimation inconsistency, we summarize and modify existing proofs in econometrics to derive theoretical formulas of consistent estimators in generalized linear models. The main implication of our theoretical result is that the inconsistency can be corrected by theoretical formulas, even when the classification accuracy is poor. Therefore, we propose that a classification algorithm should be tuned to minimize the standard error of the focal coefficient derived based on the corrected formula. As a result, researchers derive a consistent and most precise estimator in generalized linear models.
科研通智能强力驱动
Strongly Powered by AbleSci AI