Programme outline
Learning objectives
- Understand the role of data preparation in big data.
- Structure data into arrays and DataFrames, performing operations like reshaping, slicing, appending, dropping, transposing, and melting with Python.
- Apply data cleaning techniques such as imputing missing values, renaming columns, and handling unbalanced datasets.
- Enrich data by merging tables.
- Aggregate data using pivot tables and groupby operations with Python.
Day 1
- Role of data preparation in big data
- Data types, date sizes, data encoding
- Data structuring with array
- Data structuring with DataFrame – Reshape, slicing, append and drop, transpose and shift, melt
Day 2
- Data cleaning – Imputing missing values, different types of missing data (MCAR, MAR, MNAR), renaming column names, dropping duplicates, dropping rows and columns, strings manipulation, (handling unbalanced dataset, data transformation)
- Data enrichment – Access and manipulate dates and times, join tables with merge()
- Data aggregation – Pivot table, groupby, aggregate, describe
Day 3
- Project consultation
- Project presentation
Mode of assessment
- Assignment
- Project