Cleaning and Analyzing data with SQL and Tableau

Mary Bajisma Ashiru
3 min readMar 17, 2022

During the week, I downloaded a dataset from Kaggle titled “QS World University Rankings 2017–2022”. The data is a csv file which contains the world university rankings from 2017 up to this present year 2022 and I decided to make a quick analysis from it. I found this data interesting even though unfortunately, no university from Nigeria was ranked in the dataset.

After downloading the dataset, I re-saved the csv file as a Microsoft Excel workbook and then imported it into Microsoft SQL Server Management Studio using the SQL server import and export wizard.

First I noticed that when I tried to import the dataset, it was giving me errors which led me to the internet to find a solution to the problem. After reading solutions from numerous sites including stack overflow, I couldn’t find a solution to the error feedback, I then took a critical look at the problem. The solution I used which has been working for me since then is:

  • when choosing a Microsoft excel file as your source file to import from, always change the Microsoft excel version to the version of the Microsoft office package you are using on your computer, which is contrary to what I saw on the internet “which says you should leave the automatic version that is given once you input your excel file (Microsoft Excel 2007–2010)”.

After doing the above, the data was imported successfully and the queries below were written:

  • The first query was used to return the entire data from the dataset.
  • The next two queries were used to delete two columns named “link” and “logo” which were not necessary for my analysis.
  • The last two queries were used to delete the rows that had null values under the “rank_display” and “score” columns after discovering that there were null values. This reduced the data rows from 6,482 to 2,803.

After ensuring the data was properly cleaned for my use, I exported the cleaned data to Microsoft Excel by going to the get data from other sources under the data pane in Excel and choosing SQL server, after which I typed in my SQL server name, picked the database to import from, picked the table to import and imported the data as a table.

The imported data was then exported to my Tableau Public for final analysis which I will state below. My analysis was centered on universities with rank display of 1–10 for all years in the dataset.

The data depicted the following:

  • over the years, the 10 universities below has always been interchangeably ranked:
  1. Massachusetts Institute of Technology (MIT)
  2. Stanford University
  3. Harvard University
  4. University of Cambridge
  5. California Institute of Technology (CALTECH)
  6. University of Oxford
  7. UCL
  8. ETH Zurich — Swiss Federal Institute of Technology
  9. Imperial College London
  10. University of Chicago
  • MIT has always been ranked 1st position from 2017–2022.
  • University of Oxford which was ranked 6th position in 2017 is ranked 2nd position in 2022.
  • Stanford University which has always been ranked 2nd position is ranked 3rd position in 2022.
  • University of Cambridge which was ranked 4th position in 2017 is ranked 3rd position in 2022.
  • Harvard University which has always been ranked 3rd position is ranked 5th position in 2022.
  • CALTECH which was ranked 5th position in 2017 is ranked 6th position in 2022.
  • Imperial College London which was ranked 9th position in 2017 is ranked 7th position in 2022.
  • UCL AND ETH Zurich are ranked 8th position in 2022.
  • University of Chicago is ranked 10th position in 2022.
  • The number of international students intake from 2017–2022 for all 10 universities were the same figure.
  • North America (United States) and Europe (United Kingdom) regions have top ranked Universities compared to other regions.

The link to the raw file and tableau workbook is supplied below:

https://public.tableau.com/views/WorldUniversityRankingAnalysis/MITSRANKVARIATION?:language=en-US&publish=yes&:display_count=n&:origin=viz_share_link

--

--

Mary Bajisma Ashiru

I am a data analyst with over 2 years of experience. My personal experiences in solving data related problems are documented here.