08 Aug Data challenges with Google Analytics Premium and BigQuery

The difference when you go Premium

Premium is almost the same as the standard Google Analytics. One major difference is the possibility of raw data extraction while freemium version works on sampling basis. Having access to raw data sounds great. But in what form does it come? How hard/easy is it to work with it?

Let us break it down and highlight what you need to be prepared for as there are obstacles and situations that are hard to foresee.  

image02

 

Daily data export to BigQuery

After setting up the automatic export of your data from Google Analytics Premium to the BigQuery (currently the only option to export the data directly), you will have each day a new table with data. In order to be able to work with it further, you need to compile it into one table in standardized format and clean it.

1000 days restriction

Another difficulty of working with BigQuery is the 1000 days date restriction. Essentially if you have more than 3 years of data you won’t be able to pull it. Also BigQuery isn’t ideal for work with high dimensional data. If you have lot of attributes, you will frequently find yourself looking for workarounds which you just shouldn’t need to do.   Keboola Connection makes this really easy with its native Big Query data connector. Further cleansing and wrangling of your data from here will be a lot easier in Keboola because we address following challenges with systemic approach rather than messy workarounds.

Data Structure flattening JSON nested objects

After you get the data, it is important to look at their structure. In the raw form, every row is a JSON object with another nested JSON objects. This is bad news for almost every traditional database and BI Solutions working on top of them. Now we have to convert the JSON nested objects into a set of tables connected by Primary and Foreign keys. For this, we have our own (open sourced!) library that can do this. If you want to see in more detail how we do it, check out our Github library.

No more hard-coded ways of getting data that needs to change every time the source structure of the data changes.Which leads us to the next point.

image03image01  

Schema changes over time – dealing with new columns

Sometimes it happens that the structure of the data changes. For example there is a new attribute in an event you have on your page. In Keboola, we can deal with it automatically, not breaking the whole ETL. The new column from the previous step is simply added to the end of the table, sitting there, waiting to be used, but most importantly not breaking your entire data flow.   For more practical example, imagine you include new data to better measure your e-commerce site funnel. Using GA’s custom dimensions you pack new user id or better identification of bought product. You surely do not want to revise the ETL just because of this.

Incremental loading of the data

When you have your data pipeline set up including loading, parsing and cleansing, the next tricky thing will be the incremental load. Raw data load from your web can be huge and full load becomes very inconvenient or straight up prohibitive in terms of actionability on your data. It just takes too long to process it. Based on custom made primary key Keboola can automatically create Single Source of Truth to store your data safely without duplicates.

image00  

 

Simply speaking in Keboola Connection you can perform an incremental load easily by just ticking a box. Long gone are the days when you had to write custom Python or SQL script to perform this. And use it all over again, debug it properly and have only 1 person that actually knows what it does… Transparency and reusability are the way forward!  

Expert advice

If you are currently or will be in the near future exploring implementation and usage of Google Analytics 360 or analysis of raw data from Google BigQuery, we recommend getting in touch with our regional partner Sparkline, a certified Google 360 partner with lots of experience in this field.

Do you have a different experience or other challenges with implementing Google Analytics Premium? We would love to hear from you!