Research output per year
Research output per year
PROF
Accepting PhD Students
As an engineer, the way I learn how something works is to take it apart and put it back together again. The fundamental questions around which most of my research revolves are: What are the basic building blocks of speech? What is speech made of? How can we take speech apart and put it back together again? Will that tell us how it works? To answer these questions, I am working in a number of areas. Often, the research methodology involves solving a practical problem or building a useful application, such as a speech recogniser, synthesiser or a speech search engine.
In speech recognition, I have investigated new acoustic models, such as Linear Dynamical Models, factorial-HMMs and other graphical models that can represent speech not as `beads on a string' but as streams of interacting factors. I've devised ways to automatically find an inventory of suitable sub-word units to model, as well as working on other alternatives to phonetic units, such as graphemes. This work has been applied to automatic speech recognition and to Spoken Term Detection, which is an underpinning technique required to realise speech search engines. One long-standing interest is the use of phonological/acoustic/articulatory features and articulatory measurement data as a tool to develop models of speech.
In speech synthesis, I work on both unit selection methods and HMM-based / DNN-based speech synthesis. In both of these areas, the definition of the unit of speech is crucial. Both typically use context-dependent phonemes or diphones. In this context, we can gain some insight into the basic building blocks of speech by asking What contextual features must we model? In unit selection, this means learning the target cost; in HMM- or DNN-based speech synthesis it relates to the clustering of acoustically similar units or the representation of linguistic context.
I am increasingly interested in perceptual measures in speech synthesis, not just for evaluation of the final output, but within the synthesis process itself. In unit selection, perceptual measures should be used to determine equivalent units or contexts, because acoustic similarity and perceptual interchangeability are not the same thing. In HMM-based speech synthesis, the training criterion should be perceptual: perhaps minimum generation error gives us a way to use such a criterion? How can the requirements of acoustic modelling fit with this idea of perceptual equivalence?
In both recognition and synthesis, I work on multilingual systems as an additional way to look at the basic units of speech. Is there a universal set of building blocks for speech, and can we build systems that use common model structures or unit inventories for multiple languages?
Building large complex systems, as is often the case in speech technology, requires a team. I have therefore built up and sustained a team of postdoctoral Research Fellows, supported by research grants, with whom I pursue these research interests.
So far, I have helped to define three new research areas. In the late 1990s, I pioneered the use of articulatory feature representations of speech for use in automatic speech recognition. This was an early attempt at deconstructing speech into a factorial representation. The article ``Detection of phonological features in continuous speech using neural networks'' remains one of the key references in this field and has been cited 69 times to date. The results reported in this article are still the `ones to beat'.
Following on from the work on articulatory features, I started to work within the dynamic Bayesian network formalism, spending time at the University of Washington to collaborate with the leading groups in this area; my work led to me being invited to co-lead a summer workshop on this topic at Johns Hopkins University in 2006.
For many years, I was one of only a very small number of people around the world who worked in both automatic speech recognition and speech synthesis. Within the last few years, the number of people `crossing over' from recognition to synthesis has exploded. I am a pioneer in this newly emerging field -- ``unified models for speech recognition and synthesis'' -- which is already delivering exciting results. I was the Project Director and co-ordinator for the EC FP7-funded six-partner 3 million euro project EMIME, which was a high profile project exploring this area. I was then the Project Director and co-ordinator for the project Simple4All, which made significant advances in unsupervised learning for speech synthesis. This was followed by the 5 year Natural Speech Technology project (a programme grant) in which we made advances into the machine learning for speech synthesis front end, and Deep Neural Networks. That work is now being directly applied in the SCRIPT project, working with the BBC World Service to deliver speech synthesis techniques that scale well across languages.
Past: Speech Processing 1 and Speech Processing 2; Speech Synthesis and Automatic Speech Recognition; Advanced Phonetics; Annual Good Academic Practice session for all incoming Linguistics Masters students.
Current: M.Sc. SLP programme; Speech Processing (undergraduate and postgraduate); Speech Synthesis (undergraduate and postgraduate)
The Centre for Speech Technology Research (CSTR) is a world-leading research centre in the field of speech technology, including speech synthesis and automatic speech recognition. I became Director of CSTR in February 2011. The two main responsibilities of Director are leadership of the group and sustaining grant income. CSTR varies in size, generally being around 20-35 people, and enjoys an enviable reputation as a welcoming, open and collaborative group. Maintaining this culture over the years, as both the student and Research Fellow members of CSTR gradually change, is a key part of our success. My approach to this, and to leadership in general, is simply one of leading by example.
The financial responsibilities of the Director of CSTR include ensuring a smooth funding profile is maintained, in order to support a stable group of 10-15 postdoctoral Research Fellows, all of whom are on soft money. CSTR also supports a number of administrative staff who are also mainly paid from soft money. The responsibility of bringing in the research grant income is shared between myself and Prof. Steve Renals: between us, we bring in the majority of funding to CSTR.
In 2001, I redesigned and relaunched the M.Sc. in Speech and Language Processing. At the time I took over this programme, it had become outdated and was examined solely on coursework. Many courses were taught specially and only for this M.Sc. I completely overhauled the course design, bringing it into line with the existing modular structure of the Informatics M.Sc. programme. This enabled the sharing of modules across the two programmes, decreased the teaching effort required from Linguistics, increased the number of option modules available to students, and brought Informatics M.Sc students into Linguistics-taught modules.
1998 University of Edinburgh, Ph.D
1993 University of Cambridge, M.Phil. (taught course)
1992 University of Cambridge, B.A., First Class Honours
Research output: Contribution to journal › Literature review › peer-review
Research output: Contribution to journal › Article › peer-review
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
Simon King (Keynote speaker)
Activity: Academic talk or presentation types › Invited talk
Simon King (Speaker)
Activity: Academic talk or presentation types › Invited talk
Simon King (Organiser)
Activity: Participating in or organising an event types › Participation in workshop, seminar, course
Simon King (Member of programme committee)
Activity: Participating in or organising an event types › Participation in conference
Simon King (Member of programme committee)
Activity: Participating in or organising an event types › Participation in conference
King, Simon (Recipient), Lu, Heng (Recipient) & Watts, Oliver (Recipient), 22 Sept 2019
Prize: Prize (including medals and awards)
Chermaz, Carolin (Recipient), Valentini Botinhao, Cassia (Recipient) & King, Simon (Recipient), 19 Sept 2019
Prize: Prize (including medals and awards)
1/10/22 → 31/03/26
Project: Research
1/10/22 → 31/03/26
Project: Research
Non-EU industry, commerce and public corporations
1/09/20 → 31/08/24
Project: Research
Espic, F. (Creator), Valentini Botinhao, C. (Creator), Wu, Z. (Creator) & King, S. (Creator), Edinburgh DataShare, 24 Jun 2016
DOI: 10.7488/ds/1433
Dataset
Ronanki, S. (Creator), Henter, G. (Creator), Wu, Z. (Creator) & King, S. (Creator), Edinburgh DataShare, 7 Jul 2016
DOI: 10.7488/ds/1435
Dataset
Clark, R. (Creator), King, S. (Creator), Yang, C. (Creator), Yamagishi, J. (Creator) & Brown, G. (Creator), Edinburgh DataShare, 2 Mar 2021
DOI: 10.7488/ds/2994
Dataset
Wester, M. (Creator), Watts, O. (Creator), Henter, G. (Creator), Yamagishi, J. (Creator), Wu, Z. (Creator), Dall, R. (Creator), Corley, M. (Creator), Aylett, M. (Creator), Clark, R. (Creator), King, S. (Creator), Merritt, T. (Creator), Richmond, K. (Creator), Ronanki, S. (Creator) & Tomalin, M. (Creator), Edinburgh DataShare, 8 Jun 2015
http://datashare.is.ed.ac.uk/handle/10283/786
Dataset
Henter, G. (Creator), Ronanki, S. (Creator), Watts, O. (Creator), Wester, M. (Creator), Wu, Z. (Creator) & King, S. (Creator), Edinburgh DataShare, 20 Jan 2016
DOI: 10.7488/ds/1317
Dataset
Simon King & Phillipa Rewaj
4/12/16
1 Media contribution
Press/Media: Public Engagement Activities
Simon King, Cassia Valentini Botinhao & Cassie Mayo
9/09/13
26 items of Media coverage
Press/Media: Research