V. Ramasubramanian

Foundation Models and Responsible AI: Some emerging scenarios in speech technology

Abstract

Foundation Models (FMs) have ushered in a new era of AI in the last few years in various domains such as language, image, video, speech, audio etc., wherein FMs can now be designed and pre-trained from very large unsupervised data in these domains and applied to diverse down-stream tasks, further to fine-tuning on small task-specific supervised data. However, the sheer dependence of such diverse down-stream specialized tasks on monolithic pre-trained FMs brings with it a spectrum of risks related to Fairness, Equity, Security, Privacy, Explainability, Safety, Ethics etc., all of which are now being attended to within the larger ambit of emerging notions of “Responsible AI”. This talk will consider some of these aspects of Responsible AI from the viewpoint of design and deployment of FMs in the context of ‘speech technology’. Within a narrower context of ASR (automatic speech recognition), this talk will focus on ‘inclusion’ or the lack thereof, manifesting in “Data Bias” to gender, age, demographics, accent, dialect, speaking style etc., all of which are dominant sources of variability in the speech signal and which are never represented adequately in the pre-training and fine-tuning settings towards acceptable generalizability. The talk will close with potential directions and solution spaces for ‘bias-mitigation’ in these scenarios, and identify and advocate generic domain-agnostic theories and practices which need to emerge in the current trend of ubiquitous use of FMs towards addressing the broad spectrum of risks.

Bio

Dr. Ramasubramanian (Ram) is a Professor in the Dept. of Data Science and Artificial Intelligence (DSAI) at IIIT Bangalore (IIITB) since 2017. His current research interests span automatic speech recognition (ASR), machine learning, deep learning, few-shot learning (FSL), self-supervised learning (SSL), and associative memory formulations. He has been engaged in research in various ‘speech’ domain topics for nearly 4 decades, starting with his PhD from TIFR. Since 1984, he has worked in various institutions and universities, such as TIFR (1984-99) as Research Scholar, Fellow and Reader; University of Valencia, Spain as Visiting Scientist (1991-92); Advanced Telecommunications Research (ATR) Laboratories, Kyoto, Japan as Invited Researcher (1996-97); Indian Institute of Science (IISc), Bangalore as Research Associate (2000-04) and Siemens Corporate Research & Technology (2005-13) as Senior Member Technical Staff and as Head of Professional Speech Processing – India (2006-09) and as Professor at PES University, South Campus, Bangalore (2013-2017) and presently at IIIT Bangalore as Professor (2017 – now).

Know more