Average person will read this blog 0.87 times

Today, we have a serious thing to talk about – best practices.


  In Keboola we say that there are only two roles of analytics in any organisation. Either the goal is to save money or make new money. So it is obvious that we mostly focus on actionable insight and for the most part, very valuable insights are presented in very simple charts.

  Looking at average something seems to be Analytics 1.0.1 but we are in our daily practice surprised how many times the meaning of simple average is completely misinterpreted, spinning the entire situation out of context, creating false and inaccurate information.    Now let’s break down what does this mean because very often these ‘insight’ pieces are designed to conceal incomplete information.



Average or median?   Surprisingly, an average person does not often know the difference between average and median. Since you are reading our geeky blog, chances are you do. In any case, let me provide example from a real life scenario.


 Let’s say the average salary in your country is 1000 USD, you just saw it on the telly. The instinct reaction would make you go: “Oh really? I know a lot of people and hardly one in four has this or more.” As you see, the average value of anything can be very deceptive.

Example: In a group of 10 people in a power plant. Most of them low skilled workers, then few in office jobs and one director of power plant. Their salaries are as follows:   Now when we do the average on such salaries, we get exactly 1021 USD. But does this number provide any useful insight at all?


  With an average expenses on housing and necessities, will I be reliably able to determine how much spare cash everyone in this segment has to spare on my product or service?   In said dataset there is one big outlier (the director), who completely messes up the average number. In order to do better analysis, we need more views. Lets use some techniques from descriptive statistics as smart people from universities call it.


Median is a value of the element right in middle of the list of salaries order from the lowest to the highest. In our case it is 370 USD (it is because 10 elements does not have middle, so we do average of the two “middle” values). So this is a value that gives us better information. But let’s not stop there.



Let’s do basic histogram of the values rounded to the hundreds. This is simply a graph telling us what are the most frequent numbers present in the dataset, so we can, for example, pick the right group of customers to target – big enough so we have a lot of customers and disposable cash rich enough so they are able to buy our product. In this example we see that most frequently represented numbers are 300 and 400.


One view isn’t enough

In essence data visualisation and interpretation is very tricky subject and let’s face it, sometimes these misinterpretations are intended. But let’s go with the assumption that this is not what you are doing. In this case you need as many relevant angle as possible and you achieve this by combining available analytical techniques and methodologies as well as combining different data dimensions that do not necessarily always come from one source. Something CSC knows a bit about !

As always we would love to hear your opinion and if enjoyed the read, share the hell out of this post on your social channels please!

Pavel Bulowski & Vojtech Kurka