There are plenty of methods to choose from for classification problems, all with their own strengths and weaknesses. This post will try to compare three of the more basic ones: linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and logistic regression.
Benford’s law is the tendency for small digits to be more common than large ones when looking at the first non-zero digits in a large, heterogenous collection of numbers. These frequencies range from about 30% for a leading 1 down to about 4.6% for a leading 9, as opposed to the constant 11.1% you would get if they all appeared at the same rate.
Since I recently wrote about unpacking the pages from a dump of the English Wikipedia, I thought would see if Benford’s law manifested in the text of Wikipedia, as it seems like it fits the idea of a “large, heterogenous collection of numbers” quite well.
The notebook containing the full code is here.
When trying to look at examples of LSTMs in Keras, I’ve found a lot that focus on using them to predict stock prices in the future. Most are pretty bare-bones though, consisting of little more than a basic LSTM network and a quick plot of the prediction. Though I think the utility of these models is a little questionable, it brought a question into my head: how accurate are the predictions made by a model trained on one stock if it’s predicting on another stock?
The full code can be found here.
Bayesian statistics is centered on constructing certain assumptions about how the probability of an event is distributed, and then adjusting that belief as new information comes in. It can be more involved to construct a Bayesian model as opposed to the “look at many things in aggregate” approach used in frequentist statistics. But it has nice properties, and we’ll take a look at them in a real albeit fairly unimportant context: the Pokemon video games.
First broadcast in 1988, Mystery Science Theater 3000 is a television show whose nominal story involves a guy being trapped in space by a couple of mad scientist types…which is actually just an excuse to have a few guys make fun of really, really bad movies. This raises a few unusual questions about the series (as far as TV series go, anyway), like how the movie quality relates to the episode quality. Thankfully, this isn’t too hard to get data on, as we can just look at the IMDB ratings for both.