Towards a Rigorous Statistical Analysis of Empirical Password Datasets

Jeremiah Blocki, Peiyuan Liu

arxiv(2023)

引用 2|浏览39
暂无评分
摘要
A central challenge in password security is to characterize the attacker's guessing curve i.e., what is the probability that the attacker will crack a random user's password within the first G guesses. A key challenge is that the guessing curve depends on the attacker's guessing strategy and the distribution of user passwords both of which are unknown to us. In this work we aim to follow Kerckhoffs's principal and analyze the performance of an optimal attacker who knows the password distribution. Let λ G denote the probability that such an attacker can crack a random user's password within G guesses. We develop several statistically rigorous techniques to upper and lower bound λ G given N independent samples from the unknown password distribution ${\mathcal{P}}$. We show that our upper/lower bounds on λ G hold with high confidence and we apply our techniques to analyze eight large password datasets. Our empirical analysis shows that even state-of-the-art password cracking models are often significantly less guess efficient than an attacker who can optimize its attack based on its (partial) knowledge of the password distribution. We also apply our statistical tools to re-examine different models of the password distribution i.e., the empirical password distribution and Zipf's Law. We find that the empirical distribution closely matches our upper/lower bounds on λ G when the guessing number G is not too large i.e., G ≪ N. However, for larger values of G our empirical analysis rigorously demonstrates that the empirical distribution (resp. Zipf's Law) overestimates the attacker's success rate. We apply our statistical techniques to upper/lower bound the effectiveness of password throttling mechanisms (key-stretching) which are used to reduce the number of attacker guesses G. Finally, if we are willing to make an additional assumption about the way users respond to password restrictions, we can use our statistical techniques to evaluate the effectiveness of various password composition policies which restrict the passwords that users may select.
更多
查看译文
关键词
password distribution, password cracking, password dataset, theoretical bounds, statistical analysis, password policies
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要