Getting valuable insights from unstructured text data


cover letters language analysis

Unimaginable piles of new data (mostly unstructured) is generated every single second on the Internet and beyond. This volume and speed is growing fast. Have you ever wondered how much valuable information could be hidden in texts like chats, reviews, comments, tweets, posts on your social sites and how to get valuable insights from it? Or how could you improve your customer support service and churn if you automatically analyse and categorize incoming emails, tickets, conversations, reviews from your CRMs and support platforms? What are the topics, sentiments and trends people are speaking about across multiple platforms? Could we somehow enrich our structured existing data with information from text data? The answer is YES! You can now easily do all of it and find that possibly tremendous value lying out there!


Generally, there are countless of possible use cases where text analysis aka NLP (Natural Language Processing) comes in handy as there are lots of data sources to extract text data from. Geneea and Keboola Connection enables you to beat this seemingly complex task literally in couple of clicks! Let’s check it out.

What is Geneea?

Geneea NLP platform helps you analyse large amounts of text. Whether it is customer feedback, news articles, social media posts, blogs, e-mails or legal documents in your archives. It can do pretty cool things: read, understand, and interpret large amounts of text, all in a very short time. It is trained to find important information like people, places, products, organizations, etc., detect sentiment in text and label and categorize documents. All of this without changing your business processes. Simply plug in the output of Geneea and start using the previously hidden information to cut costs or boost revenue.

Geneea + Keboola = gold mining combo

Obviously Geneea is a rocket science app that can find valuable information in your unstructured text data. But to obtain the output from this NLP platform you have to prepare the data for analysis first. This step consists usually of data extraction and data cleansing.


It is common that data sometimes lies in very obscure data sources like any type of SQL / NoSQL database, textual or tabular files on your disk, FTP or Google Drive, Salesforce, Zendesk, Slack, social media sites, data from any cloud service accessible via REST API, web pages etc. Possibilities where your data might be found are endless. With Keboola Connection connecting to all these data sources is fast and easy.

Once you have data in your possession, you are ready to cleanse and transform it to the requested form in Keboola Connection. After the transformation just simply plug in your data to Geneea app and wait couple of minutes for the output. How simple is that! Afterwards, it is easy to enrich your already existing data and push it for visualization and analytics in Tableau, GoodData, Qlik, Power BI, Google Data Studio or any other reporting tool of your choice.


How we did text analysis in less than 4 hours

We in Keboola Asia wanted to try how easy it is to get valuable insights from some interesting texts. For this purpose we chose reviews of Android app of one of our client. Let’s see how we did it!


1) Reviews are not easy to access via API. To make it easier we scraped couple of thousand reviews from Google Play website and store them in local CSV file.

Screen Shot 2016-09-16 at 15.08.22


2) We simply uploaded this CSV file with reviews to Keboola Connection storage.

Screen Shot 2016-09-16 at 15.16.37


3) Then we plugged in the imported table to Geneea app  set it up



4) And chose which tasks we would like to perform and run the app

Screen Shot 2016-09-16 at 18.18.16

5) Several minutes later, the output is ready. Every Geneea task generates separate table easily referenced to source data over id.


6) At the end, we decided to visualise the output in Microsoft PowerBI one of the many BI tools Keboola can feed data to. You can play with the interactive dashboard here.



Yes, it is that simple!

All done in 4 hours including the visualization. Even without an enrichment with other relevant company’s data there is plenty of interesting information. Lots of ideas came to our mind to improve it after we finished it and saw the result, i.e. to see the reviews in time frame, correlation with app updates etc.

Last but not least we have to mention that this whole process from getting the input data to pushing Geneea outputs to reporting tool can be fully automated with Keboola Connection custom apps and orchestrations.

Have bunch of text files lying around your company without getting any value from them? Talk to us! 

Article by Frantisek Rehor