50 likes | 173 Vues
This paper presents a comprehensive overview of four decades of research in speech and speaker recognition, focusing on technological advancements from the 1950s to the present. It highlights key generations of ASR technology, including heuristic approaches, pattern matching, statistical frameworks, and recent advancements in knowledge processing. The research, conducted at the Tokyo Institute of Technology and in collaboration with renowned laboratories, sheds light on the evolution and challenges of automatic speech recognition, including spontaneous speech and rich transcription methodologies.
E N D
Selected topics from 40 years of research on speech and speaker recognition Sadaoki Furui Tokyo Institute of Technology Department of Computer Science furui@cs.titech.ac.jp
Generations of ASR technology 1950 1960 1970 1980 1990 2000 2010 1952 1968 1G Heuristic approaches (analog filter bank + logic circuits) 1980 2G 1968 Pattern matching (LPC, FFT, DTW) 3G 1980 1990 Statistical framework (HMM, n-gram, neural net) 3.5G 1990 Discriminative approaches, robust training, Prehistory normalization, adaptation, spontaneous speech, rich transcription 4G ? Extended knowledge processing Our research NTT Labs (+Bell Labs), Tokyo Tech Collaboration with other labs
ATTENTION! TRIAL LIMITATION - ONLY 3 SELECTED PAGES MAY BE CONVERTED PER CONVERSION. PURCHASING A LICENSE REMOVES THIS LIMITATION. TO DO SO, PLEASE CLICK ON THE FOLLOWING LINK: https://www.pdfconverter.com/purchase/