Skip to main content

Haizhou Li

Prof. Haizhou Li   

Title 

Seeing to Hear Better

Abstract

Humans have a remarkable ability to pay their auditory attention only to a sound source of interest, that we call selective auditory attention, in a multi-talker environment or a Cocktail Party. However, signal processing approach to speech separation and/or speaker extraction from multi-talker speech remains a challenge for machines. In this talk, we study the deep learning solutions to monaural speech separation and speaker extraction that enable selective auditory attention. We review the findings from human audio-visual speech perception to motivate the design of speech perception algorithms. We introduce their applications in speech enhancement, speaker extraction, and speech recognition. We will also discuss the computational auditory models, technical challenges and the recent advances in the field.

Biography

Haizhou Li is a Presidential Chair Professor and Associate Dean (Research) at the School of Data Science, The Chinese University of Hong Kong, Shenzhen, China. Dr. Li is also with the Department of Electrical and Computer Engineering, National University of Singapore (NUS), Singapore.

Dr. Li’s research interests include automatic speech recognition, natural language processing and information retrieval. He has served as the Editor-in-Chief of IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING (2015-2018). Dr. Li was the recipient of National Infocomm Awards 2002, Institution of Engineers Singapore (IES) Prestigious Engineering Achievement Award 2013 and 2015, President's Technology Award 2013, and MTI Innovation Activist Gold Award 2015 in Singapore. He was named one of the two Nokia Visiting Professors in 2009 by Nokia Foundation, IEEE Fellow in 2014 for leadership in multilingual, speaker and language recognition, ISCA Fellow in 2018 for contributions to multilingual speech information processing, and Bremen Excellence Chair Professor in 2019. Dr. Li is a Fellow of the Academy of Engineering Singapore, and a member of ACL, ACM, and APSIPA.

Sakriani Sakti

Prof. Sakriani Sakti

Title
Language Technology for All: From the technology and indigenous community perspectives

Abstract
The development of advanced spoken language technologies based on automatic speech recognition (ASR) and text-to-speech synthesis (TTS) has enabled computers to learn how to listen or speak. Many applications and services are now available but still support fewer than 100 languages. Nearly 7000 living languages that 350 million people speak remain uncovered. This is because the construction is commonly done based on machine learning training in a supervised fashion where a large amount of paired speech and corresponding transcription is required. In this talk, I will first introduce several successful technology approaches that have been proposed to aim for "language technology for all" that construct language technology for language diversity using less-resourced data. Then, I will share some of the thoughts and feedback from the indigenous community based on several events, workshops, and panel discussions throughout the years. The challenges are not only how to construct language technologies for language diversity, but how to ensure that language technologies are helpful for the the under-resourced language community.

Biography
Sakriani Sakti is an associate professor at Japan Advanced Institute of Science and Technology (JAIST), adjunct associate professor at Nara Institute of Science and Technology (NAIST), visiting research scientist at RIKEN Center for Advanced Intelligent Project (RIKEN AIP) Japan, and adjunct professor at the University of Indonesia. She is also a committee member of IEEE SLTC (2021-2023) and an associate editor of the IEEE/ACM Transactions on Audio, Speech, and Language Processing (2020-2023). She was a board member of Spoken Language Technologies for Under-resourced languages (SLTU) and the general chair of SLTU2016. She was the general chair of the "Digital Revolution for Under-resourced Languages (DigRevURL)" Workshop as the Interspeech Special Session in 2017 and DigRevURL Asia in 2019. She was involved in creating the joint ELRA and ISCA Special Interest Group on Under-resourced Languages (SIGUL) in 2018 and has become the chair and ISCA liaison representative of SIGUL. In collaboration with UNESCO and ELRA, she was also on the organizing committee of the International Conference of "Language Technologies for All (LT4All): Enabling Linguistic Diversity and Multilingualism Worldwide" in 2019. She was also on the organizing committee of the Zero Resource Speech Challenge 2019 and 2020. Her research interests include deep learning, zero-resourced speech technology, multilingual speech recognition and synthesis, spoken language translation, social-affective dialog systems, and cognitive communication.