Predicting binaural colouration using VGGish embeddings

Thomas McKenzie, Alec Wright, Dan Turner, Pedro Llado

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

An initial feasibility study is presented exploring the use of a pre-trained feature extractor, designed for large-scale audio classification, applied to the task of predicting the colouration between binaural signals. A multilayer perceptron (MLP) is trained to predict binaural colouration using feature embeddings obtained from the VGGish network and data from five previously conducted listening tests. The evaluation compares seven versions of the network, each trained using different data augmentation methods, to three existing signal processing methods: basic spectral difference (BSD), log. spectral distance (LSD) and an auditory model for predicting binaural colouration (PBC-2). Results show that while the MLP networks
are comparable to BSD and LSD, further work is needed to compete with the more accurate PBC-2; such as using specific audio features relevant for colouration
Original languageEnglish
Title of host publicationAES International Conference on Artificial Intelligence and Machine Learning for Audio
PublisherAudio Engineering Society
Pages1-10
Number of pages10
Publication statusPublished - 2 Sept 2025

Fingerprint

Dive into the research topics of 'Predicting binaural colouration using VGGish embeddings'. Together they form a unique fingerprint.

Cite this