50.038 Computational Data Science
This course provides students the necessary background and experience in data science technology and concepts. Students will gain experience with tackling a complete data science project, from data gathering and pre-processing to data analysis through machine learning tools. Students will learn to apply fundamental concepts in machine learning to data storage and distributed processing as a foundation for their project.
Pre-requisites
- 10.014 Computational Thinking for Design (For AY2020 to AY2024) or
- 10.025 Computational Thinking for Design (For AY2025 and subsequent batches)
Learning objectives
- Be aware of the main goals of data science, its main application domains and current challenges.
- Apply tools to build basic models for solving typical data analytics problems.
- Visualise the structure of big data in order to uncover hidden patterns.
- Design and implement distributed database systems for managing heterogeneous data.
- Perform basic operations on a moderately complex distributed computation system, such as Spark.
- Explain the fundamentals of statistical machine learning and deep learning.
- Appreciate the technical skills necessary to be a capable data scientist.
Measurable outcomes
- Identify important concepts and current challenges in data science.
- Design feature representations for image, text and time series data.
- Analyse data and build simple models in tools such as Weka, Python and Tableau.
- Implement distributed computation model using Spark.
- Evaluate the performance of different models using empirical benchmarks.
- Mathematically explain common machine learning models such as SVMs, logistic regression systems and neutral networks.
- Implement machine learning algorithms using software such as R, C++ and PyTorch.
- Manage big data using Hadoop and MapReduce.
Topics covered
- Intro DS and Hadoop
- Features + Text
- Visualisation
- Regression and Time Series
- Classification
- Into to Deep Learning
- Word2vec
- Digital Media (CNN)
- RNN
- Negative Sampling, Attention Mechanism, 1D Convolution and Pre-trained Feature Extraction