This course will deal with both theory and practical aspects of Speech signal processing. The course requires the basics of digital signal processing and probability theory. Although the course will include math, the key idea is to get the participants to appreciate the math behind the practice and not get lost in the math itself. The Assignments will include reading, math, and implementation assignments while making it challenging for participants who like math also. Although the course content lists several books, the quizzes and the finals will be based only on what is delivered in the class, assignments and the class notes (which is basically portions of specific textbooks). The projects listed will cover various topics in speech and audio processing in general and will include tools like HTK, CMU Spinx, Deep Learning, Voice XML, Matlab. Each participant will be expected to turn in a report, demo and present the particular project assigned to him. The Instructor will provide support to the best extent possible with the projects.
- Part- I by Rajesh Hegde: Week 0 to Week 6
Overview of speech recognition, Modeling the speech production mechanism, Source-system model of speech, Physiological and Mathematical categorization of speech sounds. Discrete time processing of speech signals, Relevance of the DFT, the ZT, convolution. Filter banks, and analytical pole-zero modeling in speech recognition. Short time Fourier Analysis and Spectral estimation models for Speech – DTFT DFT. Pole zero modeling and All pole modeling of speech, LPC model for speech. Basics of Speech Coding. Homomorphic speech signal deconvolution, cepstral analysis, Features for speech recognition: MFCC. Vector Quantization, Pattern Recognition. GMMs for speaker and Language Identification.
- Part-II by Vipul Arora: Week 9 to Week 16
Conventional ASR systems, Gaussian Mixture Models, Hidden Markov Models, Finite State Transducers, Decision Trees, Kaldi toolkit, Hybrid HMM-DNN ASR systems, Deep Neural Networks, End-to-end ASR systems, Connectionist Temporal Classification.
UG Students: BTech (3rd and 4th year) students
PG Students: All MTech and PhD students
Outcomes of this Course
On completion of the course, the student should be able to
- Understand the concepts and practical aspects of Speech signal processing
- Able to model the speech production mechanism, Source-system model of speech
- Able to analyse and model the speech signals in different domains
- Able to extract different feature extraction and utilise these features in various speech processing algorithm development.
- Design of automatic speech recognition (ASR) system
- Able to use deep learning based methods, Kaldi, open - FST for various speech processing applications