How bank clients behave?

A database approach using Credit Card customers data from kaggle platform.


You might have heard statements like “If you study you will earn more money” or you will consider that “smarter people are more responsible with the management of their account” but, are those statements true?

You might have your own opinion, like me. But, what does the data suggest? Let’s put aside our prejudice and take a deep dive into the database of clients of a bank. This data will not provide a wide view of the actual society, but could offer any hints, at least in the countries where this bank operates.

As you might expect, it is quite tough to get real bank data due to the privacy policy. But I managed to obtain an interesting open source dataset from kaggle platform.

The dataset provides 10.000 customers mentioning their age, salary, marital status, credit card limit, credit card category, etc. There are around 18 features of the customers specify in the dataset.

Part 1: Salary Impact

Have you ever heard the statement “People with higher studies earn more money”? Well, in my case I heard this a lot, so that makes me really curious to see if the data will behave like so.

But in first place we have to speak about the data, what data from the dataset can we use to prove this? Well, in the dataset we have a feature called “Income_Category” which provide us with the annual income, split by ranges, which are: “Less than $40k”, “$40k - $60k”, “$60k - $80k”, “$80 - $120k”, “$120k +” and lastly we have “Unknown” for the users that the bank miss this data.

On the other hand, we have the feature called “Education_Level”which provide us the following levels: “Uneducated”, “High School”, “Collage”, “Graduated”, “Post-Graduated”, “Doctorate” and “Unknown” for the people that this information is missing.

To make this analysis, the first thing done was a table of counts for each label, an example of a label would be: how many people earn less than $40k and are uneducated? The final table is:

As you might expect, they do not have the same amount of people for each educational level. In order to avoid any miss interpretation in the comparison, we have to make it proportional for each educational level. Therefore, the final table would look like the following:

This table will not tell us a lot, but we can use it to compare the different educational levels in a more visual way. To do so, first I visualized a bar chart for each educational level, to see if I can appreciate any trend. And the result is quite interesting.

First, let’s take the educational level with a higher proportion of people earning less than $40k, which is “Uneducated” and compare it with the education level, which have a less proportion of people earning the same amount, which is “Post-Graduate”. The resulting graph is:

We can appreciate a trend to earn more money if we have a Post-Graduate than if we are uneducated, except for the extreme of earning more than $120k+ which is interesting and lead me to another question: who earn more than $120k? Let’s plot that graph.

This results are quite unexpected. For the people that earn more than $120k it seems like having more studies is counterproductive, except for the case of Doctorate degree.

We might expect that the income of people who are uneducated and the people who are Doctorate might be really different, but we saw that they proportion are quite similar for $120k+. What happens in the others levels?

This is something once again unexpected, the graphs for both Doctorate and Uneducate are pretty similar.

*We have to consider also that this data is from a bank customers and should not be used as a reflect of the society due to the fact that the marketing department might cause some bias.

Before ending the topic of the salary, you might also have heard about the salary gap between man and women. Let’s take advantage on the fact that we have the gender information and make a comparison on their income.

We can clearly appreciate in the graph that the majority of the females are earning less than $60k while the majority of the male are earning more than $60k. The difference it’s quite huge and obvious, so maybe the marketing department should take this into further consideration.

Part 2: Debts Responsability

In order to study how responsible are the customers with their debts we will use the revolving balance on the credit card, which is the monthly amount of debt accumulated on their credit card.

Now, we will consider that if you have higher studies you will be more responsible with your account. Let’s check if this statement is true and, to do so, we will calculate the average debt for each study level.

We can observe that the Doctorate ones have the lowest average of debt, but in general the difference is not that high to consider any trend. So we will consider that the study level has no impact on the health of your account.

Now let’s test another statement which people usually consider: “people with more money, manage their bank account better”. In order to test this we will do the same thing as in the previous question.

In this case we can see a clear trend to have higher depts as higher is the income. But now the question is, Is this increase steep enough? Because, as you might already noticed, is not the same a debt of $1k for somebody with an income of $20k than for a person with an income of $150k.

Therefore, to make this more proportional we will divide the debt by the middle value of each income range. In the case of $120k we will consider as mid value $140k which is not true, but we can’t get the middle value between 120 and infinite. If we calculate this and plot it as a bar chart we obtain:

We can appreciate in the plot that proportionally speaking the debts for high income level is much lower than for low income levels. In reality, this will represent the effort that the people from the different income levels have to make in order to remove their debts.

Part 3: Loyal Customers

For this last part we will analyse the loyalty of the clients and we will develop a profile of the most loyal customers of this bank. Then, if the marketing department will use this information they will be able to focus on attracting this profile of clients.

In order to do so, we will use the information provided on the dataset which allow us to know if a client will churn the company. With that we will create a machine learning model called Logistic Regression, which will calculate the probability of a customer to churn. This model is not the most accurate one, but it offers us an accuracy of 90%, nevertheless it provides us the possibility to interpret the way it calculates the results. Then, we can use it to obtain the weight that each feature is having on the prediction.

Using that information we created the following profile of the most loyal clients:

The most loyal clients are married males, with an income lower than $60k. They are pretty active and make a lot of transactions in their account, those transactions are often small amounts. They also have higher debts and hire a lot of the products offered by the bank. Lastly, they tend to manage more money each month with their accounts.


In this article, we took a look at which is the behave of the bank customers from the dataset provided in kaggle platform

  1. First, we checked if the studies have some impact on the salary income, and we discovered that for normal values the behave is how everybody would expect (more studies = more income). But then when we go to extreme the situation changes, like for people who earn more than $120k per year. We also checked the income difference between male and female, and we discovered that females in general earn less than males.
  2. In the second place, we analysed how responsible are the customers with their debts. We found that the studies have no impact on the debt level. And on the other side, we saw that as people earn more money they use to have higher debts, but the effort to afford those debts is much lower.
  3. In the final part we developed the profile of the most loyal customer.

The findings here are observational, not the result of a formal study. But this study can give you hints about the things that you can find and explore in this field.

To see more about this analysis check out my github here.

Data Scientist with post degree studies from the University of Granada

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store