Data Wrangling and Preparation with Programming
Programme Outline
Learning Objectives and Structure
- Perform the basic ML model component for the role of a junior data scientist
- Understand the types of data and databases in the business context
- Appreciate the use of data dictionary and harness the potential of metadata for data science
- Acquire organizational dataset from data lakes and other democratized data sources for data enrichment purposes
- Structure data into an appropriate form for data analysis
- Manipulate data structures to support data-wrangling phase
- Perform data wrangling on the acquired dataset
- Address data quality issues with appropriate data cleansing technique
- Iterate the data mining process progressively with the provision of data wrangling and exploratory analysis tools
Programme Structure: Participants will go through 4 days of training. Class will reconvene on the 5th day for a presentation as part of the course assessment.
Day 1
- Overview of Data Science Pipeline
- What is Data Wrangling and Data Preparation?
- Data Acquisition
- Understand how data scientist prepares the dataset for data modelling
- What is data discovery?
- Types of data
- Types of databases
- Data Dictionary and Metadata
- Data Models
Day 2
- Data Mining and CRISP-DM
- Common Computing Infrastructure
- Interactive Data Exploratory Analysis (IDEA)
- Basics of Descriptive Statistics
Day 3
- Breakdown of Data Preparation Phases
- Dataset Structuring: Data Frame Handling
- Data Cleaning
Day 4
- Data Enrichment and alternative sources
- Data Enrichment: Data Aggregation
- Data Enrichment: Data Standardisation
Day 5
- Project Presentation
Assessment
Participants will be assessed via group based project presentation on the 5th session of the course. There will also be formative assessment and case studies to assess a participant’s understanding and competency.
Subject Credits
Upon completion and satisfying the requirements of passing this course, learners will be awarded 12 subject credits.
Tags
Tags