02 Mar Measuring Facebook Audience Interactions Accurately is Easy, Right?

 

At Keboola we work with reporting performance of social channels as part of a larger digital media mix on daily basis. Some data-related problems we run into are really easy to solve, while some are more ‘interesting’. The one I will be talking about today started with a quite simple client request for following development of interactions on Facebook posts in time. Piece of cake, one would think.

 

Know Thy API

Facebook Insights API offers a large selection of available metrics about brand pages including their development in time, typically segmented by day. However, Facebook page posts are a completely different story, in this case the API is returning only the absolute number in given moment of the request. At first we started to extract this metric on daily basis, because what isn’t available in API, we would figure out on our own.

Facebook api blog post 1

(Example of the API reply to the query ‘post_likes’. Number of post likes on a given day)

With few days of history I was able to start working on SQL transformation, which will create information about how many likes were added on Facebook page posts. At first, naively I admit, I wrote a simple SQL query to aggregate absolute numbers of posts, segmented by days. The initial theory sounded very simple: “Count likes for every day and the increment will end up being yesterday’s count subtracted from today”. Well, not in reality. All I need to do was to look at the output table.

Facebook api blog post 2

(Absolute number of Facebook page post likes by days)

 

API Limitations – The Twist

How could we lose hundreds of likes on the posts overnight? Simple answer: We didn’t! This was caused by Facebook API not providing metrics for posts older than 90 days. Hint hint, check below the column post_count.

Facebook api blog post 3

(Absolute number of post likes and absolute number of Facebook page posts by days)

The goal was to show report for number of post likes which were added to the page posts every day. Proud to say, several coffees and proper geek out later, the result was worth it. The trick to the right solution was not to subtract total numbers of the post likes, but only likes gained in the current as well as the previous day.

More practically, this effect is achieved by creating list of Post IDs that are represented in the data for each day. The next step would be to make a sum of all likes on posts where ID appears both in a list for today and nearest previous day.

Facebook api blog post 4

(My thought chain – define closest previous date, count all likes for today and yesterday for the posts which appeared in data from both days and subtract them from each other)

Earlier experiences taught me to anticipate that everything doesn’t always go exactly according to my plan. Data extraction can, for example, fail on the authentication and we can miss a day’s worth of data. This is why I am not looking at previous day as ‘yesterday’ but as ‘closest previous date’.

This method I described allowed me to discover different level of information through Facebook API, the API itself isn’t providing. This leads me to following points as means of conclusion of this article.

 

What Does This Learning Experience Tell Us?

Situations and problems I described in this post show how having data analytics framework with full automation, high reliability and transparency comes across as an absolute must. If you are limited by your environment and technology, your capability and output will most probably also be compromised.

This situation is also a great example as to why the ability to query your data with SQL without limitations is important for problem complexity of this level.

 

Tip of the Day: Go SQL, Booo Drag&Drop!

On a more serious note, inaccuracies like the one I just solved are (unfortunately) issues we often come across in reporting done by agencies or 3rd parties. Common practice is that every first day of the month, the analysts manually download data from digital sources like Facebook, Google Analytics, Adwords, Youtube or Twitter in order to collate reports for their clients. This comes with serious accuracy and scalability concerns. A whole different topic is reliance on human labor and cost/benefit analysis of such practice in agency business.

At Keboola we advocate the approach of full automation of ETL/Data collection and reporting through smart, scalable and agile data integration technology as means of not limiting the future evolution of analytics practice.

 

At Keboola we work with reporting performance of social channels as part of a larger digital media mix on daily basis. Some data-related problems we run into are really easy to solve, while some are more ‘interesting’. The one I will be talking about today started with a quite simple client request for following development of interactions on Facebook posts in time. Piece of cake, one would think.

 

Know Thy API

Facebook Insights API offers a large selection of available metrics about brand pages including their development in time, typically segmented by day. However, Facebook page posts are a completely different story, in this case the API is returning only the absolute number in given moment of the request. At first we started to extract this metric on daily basis, because what isn’t available in API, we would figure out on our own.

Facebook api blog post 1

(Example of the API reply to the query ‘post_likes’. Number of post likes on a given day)

With few days of history I was able to start working on SQL transformation, which will create information about how many likes were added on Facebook page posts. At first, naively I admit, I wrote a simple SQL query to aggregate absolute numbers of posts, segmented by days. The initial theory sounded very simple: “Count likes for every day and the increment will end up being yesterday’s count subtracted from today”. Well, not in reality. All I need to do was to look at the output table.

Facebook api blog post 2

(Absolute number of Facebook page post likes by days)

 

API Limitations – The Twist

How could we lose hundreds of likes on the posts overnight? Simple answer: We didn’t! This was caused by Facebook API not providing metrics for posts older than 90 days. Hint hint, check below the column post_count.

Facebook api blog post 3

(Absolute number of post likes and absolute number of Facebook page posts by days)

The goal was to show report for number of post likes which were added to the page posts every day. Proud to say, several coffees and proper geek out later, the result was worth it. The trick to the right solution was not to subtract total numbers of the post likes, but only likes gained in the current as well as the previous day.

More practically, this effect is achieved by creating list of Post IDs that are represented in the data for each day. The next step would be to make a sum of all likes on posts where ID appears both in a list for today and nearest previous day.

Facebook api blog post 4

(My thought chain – define closest previous date, count all likes for today and yesterday for the posts which appeared in data from both days and subtract them from each other)

Earlier experiences taught me to anticipate that everything doesn’t always go exactly according to my plan. Data extraction can, for example, fail on the authentication and we can miss a day’s worth of data. This is why I am not looking at previous day as ‘yesterday’ but as ‘closest previous date’.

This method I described allowed me to discover different level of information through Facebook API, the API itself isn’t providing. This leads me to following points as means of conclusion of this article.

 

What Does This Learning Experience Tell Us?

Situations and problems I described in this post show how having data analytics framework with full automation, high reliability and transparency comes across as an absolute must. If you are limited by your environment and technology, your capability and output will most probably also be compromised.

This situation is also a great example as to why the ability to query your data with SQL without limitations is important for problem complexity of this level.

 

Tip of the Day: Go SQL, Booo Drag&Drop!

On a more serious note, inaccuracies like the one I just solved are (unfortunately) issues we often come across in reporting done by agencies or 3rd parties. Common practice is that every first day of the month, the analysts manually download data from digital sources like Facebook, Google Analytics, Adwords, Youtube or Twitter in order to collate reports for their clients. This comes with serious accuracy and scalability concerns. A whole different topic is reliance on human labor and cost/benefit analysis of such practice in agency business.

At Keboola we advocate the approach of full automation of ETL/Data collection and reporting through smart, scalable and agile data integration technology as means of not limiting the future evolution of analytics practice.

Written by Vojtech Kurka
Senior Solution Architect, Keboola Singapore Pte. Ltd.

like