How to Get Insights?

Amir Harjo
4 min readApr 25, 2024

--

People seems allured with the promise of big data. Charlatan promote that big data can change how we understand the world. In business, it means how to generate more money with more data. But it is not as easy as it sound.

Although so many proponent people show that big data can do more for business, other people also said that big data will not deliver its promise without the right, correct and clean data. No good data no insights.

We collect more data, yet our understanding of the world is far from over. Andrew Ng said in one of his LinkedIn feed, that now is the age of data cleanliness. Bob Hoffman in Advertising for Sceptics state that we have more data but we cannot generate insights.

Insights is hard because this is not technical issue. Insights is domain issue. Without understanding the domain, there will be no insight that can be use to solve the problem.

Now, if you already expert in your domain, what is the best way or framework to get insights? This questions has been bugging me for long time because I have train and mandated by company I am working for on how to get insights using analytics.

From experience and reading book, I create framework to get insights. I called it SIX-Ces. I mean, I have 6 word starting with Cs that can be used to get the insight. Please note that this is not a sorting or ranking. The only mandatory is number one. Depend on the requirement and the data available, we can use this framework.

Here is the Six-C framework that awesome and I can use it for creating insights.

Check

Check the data, column available, type of data available in each column. Is there primary or secondary key? Is there any missing data? Is there any correlation between column?

Aggregate the data. What is the count for categorical column? For numeric data, how is the distribution? Visualize the distribution of the data.

We might expect that the numeric data is something looks like this.

Source: Towards Data Science

But surprise-surprise, we might encounter something like this

Source: Towards Data Science

The data we currently explore might incomplete. But, if few business questions can be answered, we could start from there. Collect more data when required.

Categorize

One of the column in the data might be categorical column. For example gender, color or product category. We can explore how each column differ by aggregating in each column, and correlate it with other column. How one category data relate to other column might reveal something.

However, something might not be able to be answered by the current grouping. The natural steps for this is to create our own grouping based on our need. For example, in case of transaction data, we can group the customers by spending or number of transaction.

There will be some natural category or grouping of the data. Man can be group by Male or Female. Fruit can be group into Orange or based on color. Product can be group into beauty product, home product etc.

Using this natural group we can plot the differences between group to understand the characteristic of the group.

We can even create non natural group. For example, we can group people based on their spending for certain product, create analysis based on those group and get the insigt.

If we have too many variable and curious how those many variable can create group, we can use k-means algorithm or other thing to get some sense of the grouping.

Compare

Comparing data seems obvious to get insight. Growth is one of the measurement that used very much in business setting and it is basically comparing one period measurement to another measurement such as compare to last year, to last period and comparing same stores.

Another form of comparison is comparing between category. For example we have two product category and we sell it with various discount. We can see, on each category, how deep is each discount. Using this, we can decide what action need to be done.

Correlate to find Cause

When we compare, we might get the sense of how the business or program performing. But we might not know which action actually cause something to happened? Correlation statistics alone will not answer causality (cause and effect), but when we are the domain expert and know what is happening in the operation of the business we might into something. For example how deep is discount and how much sales value generated from this?

Contradict

I am sure, with experience and reading various books, we think we have a hunch of best action point for different scenario. However, sometimes checking the data and find any contradiction is a good way to get insight. For example, in the case of sub-prime mortgage in 2008. A lot of people think that the housing business is great. Everyone borrow money and buy more house. However, some people smart enough and checking the data and think that “It can’t be right”. They reap the rewards.

Connect

Connect different things. You might ask, “what is the difference with correlate?”

Well, I made this distinction because there are a lot of things that can influence business outcome. From internal operation and the effect from external force. Correlate mean, we want to understand the internal operation effect to the outcome. Connect mean, we collect data from various source internal and external, triangulate the data and find some interesting story.

--

--