Minor Issues Escalated to Critical Levels in Large Samples: A Permutation-Based Fix
arxiv(2024)
摘要
In the big data era, the need to reevaluate traditional statistical methods
is paramount due to the challenges posed by vast datasets. While larger samples
theoretically enhance accuracy and hypothesis testing power without increasing
false positives, practical concerns about inflated Type-I errors persist. The
prevalent belief is that larger samples can uncover subtle effects,
necessitating dual consideration of p-value and effect size. Yet, the
reliability of p-values from large samples remains debated.
This paper warns that larger samples can exacerbate minor issues into
significant errors, leading to false conclusions. Through our simulation study,
we demonstrate how growing sample sizes amplify issues arising from two
commonly encountered violations of model assumptions in real-world data and
lead to incorrect decisions. This underscores the need for vigilant analytical
approaches in the era of big data. In response, we introduce a
permutation-based test to counterbalance the effects of sample size and
assumption discrepancies by neutralizing them between actual and permuted data.
We demonstrate that this approach effectively stabilizes nominal Type I error
rates across various sample sizes, thereby ensuring robust statistical
inferences even amidst breached conventional assumptions in big data.
For reproducibility, our R codes are publicly available at:
.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要