COMMON STATISTICAL DISTRIBUTIONS

Statistical Distributions are an important tool in data science. A distribution helps us to understand a variable by giving us an idea of the values that the variable is most likely to obtain.

Besides, when knowing the distribution of a variable, we can do all sorts of probability calculations, to compute probabilities of certain situations occurring.

In this article, I share 6 Statistical Distributions with intuitive examples that often occur in real-life data.

COMMON STATISTICAL DISTRIBUTIONS
COMMON STATISTICAL DISTRIBUTIONS

1. Normal or Gaussian distribution

COMMON STATISTICAL DISTRIBUTIONS

The Normal or Gaussian distribution is arguably the most famous distribution, as it occurs in many natural situations. A normal distribution shows the probability density for a population of continuous data (for example height in cm for all NBA players)

In other words, it shows how likely is it that any player from the NBA is of a certain height. Most players are around the mean/average height, fewer are much taller, or much shorter. A normal distribution is symmetrical on both sides of the mean.

2. T-Distribution

COMMON STATISTICAL DISTRIBUTIONS

Just like a normal distribution, a t-distribution is symmetrical around the mean, and the breadth is based on the deviation within the data. While a normal distribution works with a population – a t-distribution is designed for situations where the sample size is small. The shape of the T distribution becomes broader as the sample size decreases, to take into account the extra uncertainty we are faced with.

The shape of a t-distribution relates to the number of degrees of freedom which is calculated as the sample size minus one. As the sample size, and thus the degrees of freedom gets larger, the distribution tends towards a normal distribution – as with a larger sample we’re more certain about estimating the true population statistics.

3. Binomial Distribution

COMMON STATISTICAL DISTRIBUTIONS
COMMON STATISTICAL DISTRIBUTIONS

A Binomial Distribution can end up looking a lot like the shape of a normal distribution. The main difference is that instead of plotting continuous data, it instead plots a distribution of two possible discrete outcomes, for example, the results from flipping a coin.

Imagine flipping a coin 10 times, and from those 10 flips, noting down how many were “Heads”. It could be any number between 1 and 10. Now imagine repeating that task 1,000 times…

If the coin we are using is indeed fair (not biased to heads or tails) then the distribution of outcomes should start to look at the plot above. In the vast majority of cases, we get 4, 5, or 6 “heads” from each set of 10 flips, and the likelihood of getting more extreme results is much rarer!

4. Bernoulli Distribution

COMMON STATISTICAL DISTRIBUTIONS

The Bernoulli Distribution is a special case of the Binomial Distribution. It considers only two possible outcomes, success or failure, true or false.

It’s a really simple distribution, but worth knowing! In the example below we’re looking at the probability of rolling a 6 with a standard die.

If we roll a die many, many times, we should end up with a probability of rolling a 6, 1 out of every 6 times (or 16.7%) and thus a probability of not rolling a 6, in other words rolling a 1,2,3,4 or 5, 5 times out of 6 (or 83.3%) of the time!

5. Uniform Distribution

COMMON STATISTICAL DISTRIBUTIONS

A Uniform Distribution is a distribution in which all events are equally likely to occur. Below, we’re looking at the results from rolling a die many, many times.

We’re looking at which number we got on each roll and tallying these up. If we roll the die enough times (and the die is fair) we should end up with a completely uniform probability where the chance of getting any outcome is exactly the same.

6. Poisson Distribution

COMMON STATISTICAL DISTRIBUTIONS

A Poisson Distribution is a discrete distribution similar to the Binomial Distribution (in that we’re plotting the probability of whole numbered outcomes) Unlike the other distributions we have seen, however, this one is not symmetrical – it is instead bounded between 0 and infinity

The Poisson distribution describes the number of events or outcomes that occur during some fixed interval. Most commonly this is a time interval like in our example below where we are plotting the distribution of sales per hour in a shop.