Data Exploration and Visualisation

Programme outline

Learning objectives
  • Identify and explain the 5 V’s (Volume, Velocity, Value, Variety, and Veracity) of big data.
  • Perform data discovery by accessing data from multiple sources and importing various file formats into suitable data formats.
  • Create and manipulate NumPy arrays, and produce visualisations using the Matplotlib and Seaborn libraries with Python.
  • Conduct data profiling by obtaining descriptive statistics, checking for missing values, and visualising data with common plots.
  • Conduct cross-column profiling to analyse relationships between numerical and categorical data through various bivariate and multivariate plots.
Day 1
  • Characteristics of Big Data – 5 V’s, Types of Digital Data, Types of Database
  • Role of Data Exploration in Big Data
  • Data Discovery – Reading common file formats as DataFrame, reading other file formats, database in Python and Sqlite3, access data from AWS S3, access data from website with API.
Day 2
  • Creating NumPy arrays, plotting using Matplotlib and Seaborn libraries.
  • Data Profiling – Column Profiling – Get descriptive statistics, check for missing values, type of variables, unique values, common plots for numerical and categorical data
  • Data Profiling – Cross-column Profiling – Common plots for numerical vs numerical, numerical vs categorical, bivariate categorical plot, bivariate comparison scatter plots, multivariate categorical plot
Mode of assessment
  • Quiz
What’s next

Find out more

Mailing list

Subscribe to our mailing list and learn about the latest developments in SUTD Academy.

Get in touch

Submit an enquiry or schedule a call with our friendly team at +65 6499 7171.