50.045 Information Retrieval
Course Description
Automatic methods of Information Retrieval (IR) have gained greater significance in recent years due to the dramatic increase in the amount of data available on the Web. The data is often present in multiple forms (such as text, image, video) and hence it is necessary that the IR techniques being deployed on the web are able to perform various operations such as search and retrieval across all these different data formats. In this course, the study of IR will be focused on the methodologies of indexing, processing, and querying of primary textual data and will be extended to video and image data in the latter part of the course.
Pre-requisite
50.007 Machine Learning or 50.021 Artificial Intelligence or 50.038 Computational Data Science or 50.039 Theory and Practice of Deep Learning
Learning Objectives
- Gain knowledge about the basic concepts and techniques of IR.
- Understand the basic functionality and underlying algorithms of an IR system.
- Understand modern neural networks and deep learning-based techniques that are used in today’s IR systems.
- Learn about several applications of e.g., question answering, image, and video retrieval.
- Learn how to develop a basic IR system from scratch and evaluate the system.
- Learn clarification, clustering, topic modeling which are the core modules in an IR system.
Measurable Outcomes
- Identifying important concepts of Information Retrieval.
- Learn vector space modeling, modern deep learning techniques for IR and evaluation methods. Finally, utilize this knowledge to complete the project.
- Evaluate the performance of different IR models using empirical benchmarks.
- Implement different IR applications such as Question Answering, Image and Video Retrieval systems.
- Able to use libraries such as Sklearn, Keras for data processing and IR model creation.
- Mathematically explain common neural network-based models, word2vec and Glove distributed word representations for IR system building.
Topics Covered
- Boolean retrieval and VSM
- Word Embeddings
- Probabilistic IR and Relevance Feedback, BM-25
- Introduction to Neural Networks
- Text Processing and Classification using Neural Networks.
- Language Modeling — Transformers, BERT, RoBERTa
- IR using Language Modeling
- Question Answering
- Personalization
- Image Retrieval
- Large Language Models.
- Language Models for Information Retrieval e.g., how do we leverage ChatGPT for Information Retrieval
- Question Answering and its utilization in Information Retrieval
Required Texts and Reading
Introduction to Information Retrieval, by Christopher Manning, Prabhakar Raghavan, and Hinrich Schutze.
http://nlp.stanford.edu/IR-book/html/htmledition/irbook.html