Text

Wikipedia & Benford's Law

Benford’s law is the tendency for small digits to be more common than large ones when looking at the first non-zero digits in a large, heterogenous collection of numbers. These frequencies range from about 30% for a leading 1 down to about 4.6% for a leading 9, as opposed to the constant 11.1% you would get if they all appeared at the same rate.

Since I recently wrote about unpacking the pages from a dump of the English Wikipedia, I thought would see if Benford’s law manifested in the text of Wikipedia, as it seems like it fits the idea of a “large, heterogenous collection of numbers” quite well.

The notebook containing the full code is here.