Skip to main content

MTH511A: STATISTICAL SIMULATION AND DATA ANALYSIS

Course Description

Instructor details

Instructor: Arnab Hazra

E-mail: ahazra@iitk.ac.in

Office: 576 Faculty Building

Web: https://sites.google.com/view/arnabhazra09/

 

Course Description 

This course has two aspects: statistical simulation and data analysis. The course will be broken up broadly into three parts: 

(i) Statistical simulation in general,

(ii) Some selective case studies and necessary statistical tools,

(iii) Demonstration of the role of simulation in the context of case studies.

 

Prerequisites

MSO201A / Instructor’s approval, R would be a mandatory part of this course. Any excuses of having no previous knowledge about R will not be entertained.

 

Lectures and Tutorials

The course will be taught in hybrid mode. Attendance is NOT compulsory. Recordings of each lecture will be uploaded on mooKIT.

Venue: L10

Mon: 5pm - 6pm

Wed: 5pm - 6pm

Thu: 5pm - 6pm (tutorial)

Frid: 5pm - 6pm

 

Course Webpage 

Materials will be shared on mooKIT at https://hello.iitk.ac.in/course/mth511a2223/.

 

Quizzes

There will be overall five quizzes (5 points for each quiz) throughout the semester (two before the mid-sem exam and three after the mid-sem exam). The lowest quiz grade of each student will be dropped.

 

References

The following books would be useful references:

• “Simulation” by Sheldon M. Ross (Academic Press, Fourth Edition), 2006, Chapters 1-5.

• “Non-Uniform Random Variable Generation” by Luc Devroye. [Online book available]

• “Statistical Inference” by Casella and Berger.

• “Elements of Statistical Learning” by Hastie, Tibshirani, and Friedman

• “Convex Optimization” by Boyd and Vandenberghe

• “An Introduction to the Bootstrap” By Efron

• “Monte Carlo Statistical Methods” by Casella and Robert

 

Marks Distribution

Quizzes: 20%

Mid-sem Exam: 20%

Group Project: 40% (Presentation 20% + Report 20%)

End-sem Exam: 20%

 

Expectations from the group project

This would be treated as the most important part of this course. MTH511A is a 10-credit course meaning you are expected to work for 10 hours per week for this course. Thus, for 14 weeks, it is expected to work 140 hours overall. 40% of this is 56 hours. Thus, the expectation would be according to this. Any kind of trying to copy previous years’ reports, etc. will be strongly punished. Your goal would be to reproduce research papers of different difficulty levels according to the group size.

The list of different Statistical (including some mathematical and other types of journals) can be found at https://www.scimagojr.com/journalrank.php?category=2613&area=2600&type=j. This list contains some journals which are not of our interest (theoretical or applied statistics) and discard them.

I would suggest not picking a paper that is very interesting but not at all doable in this short time of a semester. I would also suggest picking from Q2 and Q3 journals. Reimplenting a paper from a Q1 journal might be too hard in this short time, just saying from my experience. But you are free to choose. I understand that most of you or probably all of you are not familiar with this. But it’s a good start, spend some time to be familiar.

To clarify, the purpose of the project is the re-implementation of everything done in a paper and NOT just presenting the paper. Each team has to submit the codes (cleanly written and commented so that I can run and reproduce the results) for data exploration as well as simulation studies, the final project report, and also have to give a short presentation.

 

Grading policy

For MSc Statistics students:

 

• 95 or above – A* Grade (Outstanding)

• Marks lying in [90, 95) – A Grade (Excellent)

• Marks lying in [80, 90) – B+ Grade (Very Good)

• Marks lying in [70, 80) – B Grade (Good)

• Marks lying in [60, 70) – C+ Grade (Fair)

• Marks lying in [50, 60) – C Grade (Satisfactory)

• Marks lying in [40, 50) – D+ Grade (Marginal)

• Marks lying in [35, 40) – D Grade (Pass)

• Marks lying in [25, 35) – E Grade (Exposure but Fail)

• Less than 25 marks – F Grade (Fail)

 

For all other students:

• 90 or above – A* Grade (Outstanding)

• Marks lying in [85, 90) – A Grade (Excellent)

• Marks lying in [75, 85) – B+ Grade (Very Good)

• Marks lying in [65, 75) – B Grade (Good)

• Marks lying in [55, 65) – C+ Grade (Fair)

• Marks lying in [45, 55) – C Grade (Satisfactory)

• Marks lying in [35, 45) – D+ Grade (Marginal)

• Marks lying in [30, 35) – D Grade (Pass)

• Marks lying in [20, 30) – E Grade (Exposure but Fail)

• Less than 20 marks – F Grade (Fail)

 

Course Content

Week 1: Introduction to Monte Carlo

Week 2: Generating random variables - Discrete

Week 3: Generating random variables - Continuous

Week 4: Importance Sampling and Monte Carlo

Week 5: Maximum likelihood estimation with examples

Week 6: Gradient-based optimization methods

Week 7: Gradient-based optimization methods

Week 8: Least squares and optimization

Week 9: Bootstrap and cross-validation

Week 10: MM algorithm and EM algorithm

Week 11: Stochastic optimization

Week 12: Simulated annealing

Week 13: Introduction to Bayesian methods

Week 14: Markov chain Monte Carlo