Data Exploration and Visualisation

Identify and explain the 5 V’s (Volume, Velocity, Value, Variety, and Veracity) of big data.
Perform data discovery by accessing data from multiple sources and importing various file formats into suitable data formats.
Create and manipulate NumPy arrays, and produce visualisations using the Matplotlib and Seaborn libraries with Python.
Conduct data profiling by obtaining descriptive statistics, checking for missing values, and visualising data with common plots.
Conduct cross-column profiling to analyse relationships between numerical and categorical data through various bivariate and multivariate plots.

Characteristics of Big Data – 5 V’s, Types of Digital Data, Types of Database
Role of Data Exploration in Big Data
Data Discovery – Reading common file formats as DataFrame, reading other file formats, database in Python and Sqlite3, access data from AWS S3, access data from website with API.

Creating NumPy arrays, plotting using Matplotlib and Seaborn libraries.
Data Profiling – Column Profiling – Get descriptive statistics, check for missing values, type of variables, unique values, common plots for numerical and categorical data
Data Profiling – Cross-column Profiling – Common plots for numerical vs numerical, numerical vs categorical, bivariate categorical plot, bivariate comparison scatter plots, multivariate categorical plot

Find out more

Subscribe to our mailing list and learn about the latest developments in SUTD Academy.

Submit an enquiry or schedule a call with our friendly team at +65 6499 7171.