AN ANALYSIS OF A FICTITIOUS DATA FROM QUICKCHECK

Mary Bajisma Ashiru
7 min readJan 23, 2024

In today’s blog, I’ll be writing about the analysis I derived from a fictitious dataset from QuickCheck. This analysis was done last year in August but didn’t make it to this space.

The dataset was a very messy one and I had sleepless nights trying to clean the data before I could analyze it.

Thank you for always engaging with my posts. Without you, yes YOU, this blog will not exist.

I’ll appreciate your candid feedback on this. Kindly take your time to go through the analysis.

The data cleaning tool I used for this analysis is Microsoft Excel.

The visualization tool I used for this analysis is Microsoft Power BI.

You can access the Power BI report here.

You can access the PDF of the Powerbi report here.

Excerpt from Visualization

Dataset Description:

In this project, I analyzed the dataset provided by QuickCheck to investigate the factors that affect the rejection/approval of loans.

  • Two datasets were provided which included the users dataset and the applications dataset.
  • The primary key that is common to the two datasets is the user_id. There was no duplicate user_id in the users dataset but there were duplicates in the applications dataset which means that some user_id’s applied more than once. The total unique user_id in the users dataset is 298,634.
  • The dataset had some columns that were not necessary for analysis which were deleted.
  • The dataset had blank entries in some columns that were not deleted because they were required for analysis. They were replaced with NULL.

Statement of the Problem:

It was discovered during analysis that some applicants were approved while some were rejected. Out of the loan applicants that were approved, some defaulted in repayment while some are overdue which is a loss to the organization.

Due to this discovery in the non-refund of collected loans from the organization, it is paramount to solve this problem by suggesting factors that can reduce the rate of losing funds to loan applicants.

Aim and Objectives of the Analysis:

The analysis aims to identify patterns and insights that will inform the QuickCheck team to reduce the number of defaulted loans.

The objectives of the report are:

1) To critically identify the factors that could have affected the repayment of loans.

2) To suggest ideas that would yield more profit once effected.

Data Cleaning and Preparation Steps:

I noticed three relationships between the two datasets which is:

  • VLOOKUP was used to look up the user_id in the users dataset from the user_id in the applications dataset and it was discovered that not all the users that had registered had applied for a loan.

The steps below summarize the steps I took in preparing the data for analysis.

  • Ogun state appeared twice in the state column of the users dataset but with a different spelling. The misspelled one was named correctly to avoid splitting the information.
  • All blank spaces across the columns in the users dataset were replaced with null.
  • From the applications dataset, I created a new column by adding the principal to the interest and late fee to determine the total amount an ID is to repay. This was named “TOTAL TO BE PAID”.
  • I created a new balance column in the applications dataset by subtracting the amount an ID has repaid from the total amount to be paid.
  • The IDs in the defaulted category were checked in the paid category to see if some IDs paid a previous loan before defaulting another one.
  • A date column was created to calculate the difference between the application date when an ID defaulted and the application date when an ID paid.

Findings from Data Cleaning:

  • It was discovered that out of the 298,634 registered IDs, 266,978 have applied for loans while 31,656 are yet to apply for loans.
  • There were 543,448 IDs in the applications dataset, 140,607 applications were approved, and 402,841 applications were rejected.
  • None of the registered IDs own a car.
  • The newly computed repayment balance of 9,690 IDs out of 543,448 IDs was not the same as the initial balance in the dataset.
  • 5,125 IDs repaid more than they were supposed to pay which totaled to an extra value of 506,719 for the organization.
  • 20,312 IDs defaulted and 10,729 of the IDs applied more than once while 9,583 of the defaulted IDs did not apply again.
  • From the defaulted IDs, 7,780 that defaulted paid in other applications. 7,778 of them had paid in a previous loan before defaulting while 2 of them defaulted and were still approved for another loan.
  • 71 of the IDs had application dates that were the same as when they defaulted and when they paid.
  • 2 IDs had a negative amount that was repaid.
  • The number of children of some users had outliers e.g., negative number of children and over 20 number of children.

Deductions from Analysis:

Having done an in-depth analysis based on the datasets provided, the following can be deduced.

Comparing the approved to applied IDs, the following was deduced.

  • More males applied for loans than females, even though, more males were granted approval, female loan applicants had a higher approval percentage (55%) while males was 51%.
  • None of the IDs that recorded no gender were granted loan approval. This means that gender specification played a vital role in loan approval.
  • 2 IDs that applied with a negative number of children were not granted loan approval.
  • More IDs in Lagos state applied and were approved but Edo state loan applicants had a higher approval percentage compared to Lagos state.
  • IDs using Access Bank had more applicants, applicants using GTB had more approvals while applicants using Providus Bank had a higher approval percentage (89%). User_id’s without bank details were not approved of their loans.
  • More IDs with payment system code 3 applied and were approved but payment system code 1 had a higher approval percentage (85%).
  • More IDs without email validation applied but IDs with email validation had a higher approval percentage (74%).
  • More IDs with education status code 2 applied and were approved with a higher approval percentage (56%). It was not a criterion as 12% of IDs with blank details were granted approval.
  • More IDs with employment status code 1 applied and were approved but employment status code 0 had a higher approval percentage (63%). User_id’s without employment status details were not approved of their loans.
  • More IDs with professional category code 6 applied and were approved but professional category code 4 had a higher approval percentage (66%). Only 6% of IDs without professional category details were approved of their loans.

Comparing the paid/defaulted to applied IDs, the following was deduced.

  • The percentage of females who paid their loans was more compared to the percentage of males who paid their loans (65%:63%).
  • The percentage of IDs in Abuja state that paid their loans was more compared to the percentage of other states that paid their loans (68%).
  • The percentage of IDs using Providus Bank that paid their loans was more compared to the percentage of other bank users that paid their loans (78%).
  • The percentage of IDs with payment system code 3 that paid their loans was more compared to the percentage of IDs with payment system code 1 by 1% (64%:63%).
  • The percentage of IDs with email validation that paid their loans was more compared to the percentage of IDs without email validation (68%).
  • The percentage of IDs with education status code 2 that paid their loans was more compared to the percentage of IDs with other education status codes (64%).
  • The percentage of IDs with employment status code 2 that paid their loans was more compared to the percentage of IDs with other employment status codes (73%).
  • The percentage of IDs with professional category code 1 that paid their loans was more compared to the percentage of IDs with other professional category codes (67%).
  • More interest was made from applicants who borrowed loans due to business reasons.
  • The organization made more interest in 2022 compared to other years. There were many reasons applicants collected loans in 2022 compared to other years.
  • The organization lost interest of over 1.3 million in value and borrowed funds of over 36 million in value from IDs that defaulted on their loans from 2020–2022.
  • User IDs with monthly income less than 0 in value were not granted loan approval.
  • In January 2018, users that used Product ID code 1 were all not from Origin 2.
  • There was no user record for January 2019 and January 2021.
  • In January 2020, users used Product 1 and Product 2 but were all not from Origin 2.
  • In January 2022, Products 1, 2, and 8 were used by users from Origin 2 while Products 1, 2, 8, and 9 were used by users not from Origin 2. More users from Origin 2 used Product 2 compared to other products while more users not from Origin 2 used Product 1 compared to other products.

Recommendation and Conclusion:

  • Defaulted IDs in a previous loan application should not be approved of another loan.
  • The dataset did not show if a user had collateral to stand in place in case a loan defaulted (none of the applicants own a car). Measures should be put in place to achieve this which could reduce losses in the organization.
  • Users that have a good repayment history of collected loans should be reached out to by the call center department to encourage more usage of the application and taking loans as this will yield more profit for the organization. For example, it was discovered from the dataset that 638 users have applied over 10 times and paid back their loans.
  • Users who apply for loans due to business should be approved of their loans after passing certain checks as the record shows that they are more likely to pay back their loans.

I want to say a very big thank you for taking the time to engage with this analysis. You are deeply appreciated.

Kindly view a PowerPoint documentation of this analysis here.

--

--

Mary Bajisma Ashiru

I am a data analyst with over 2 years of experience. My personal experiences in solving data related problems are documented here.