AES Milan workshop on Audio Repurposing using Source Separation

At the AES 144th Convention in Milan, I chaired a workshop on Audio Repurposing using Source Separation. The workshop was organised by the MARuSS project at the University of Surrey, and drew in expert input from partners at BBC R&D, Fraunhofer IDMT and Fraunhofer IIS.

Source separation attempts to estimate an audio source given knowledge of a mixture containing that source, for example, extracting the vocal from a pop recording from the mastered stereo track. A scientific overview of research into this kind of separation can be found here. Typically, the estimated source has questionable sound quality if auditioned in isolation. However, many of the artefacts arising from the source separation process can be masked (become unnoticeable) if the separated source is mixed back into the track. In this way, some extra benefits can be gained (for example, level remix of the original stereo track) with minimal noticeable distortions.

The aims of the workshop, building on the idea of using state-of-the-art source separation in the real world, were fourfold:

  • Give an overview of the state of the art in source separation and how it sounds;
  • Discuss and demonstrate current applications of source separation in audio production;
  • Discuss evaluation of source separation in applied contexts;
  • Give perspectives on future research directions and applications of source separation.

These aims were met thanks to the input of our four experts.

Estefanía Cano Cerón [slides] gave an introduction to source separation and its evaluation. Methods for source separation model one or more of the source, source position, or predicted interference. A number of commercial products already utilise source separation, however evaluation is still an open challenge. In particular, researchers running listening tests report inconsistencies with some of the objective perceptual models regularly used in the literature. Her take-home messages were: a clear definition of the target source is critical both for model design and evaluation; understanding the creation process of the mixture will result in better separation algorithms; and, do not neglect the importance of a clear evaluation procedure!

Jon Francombe [slides] then presented the results of source separation applied to remixing object-based audio content. In this context, audio extracted by blind source separation (for speech) and beamforming (for music) was presented to listeners mixed with the original channel-based recordings. Listeners preferred the music mix containing a beamformer-extracted piano source, and remixed the speech to make the target speaker clearer while maintaining acceptable audio quality. The results are presented in two of our recent papers [IEEE Transactions on Multimedia] [LVA/ICA 2018].

Jouni Paulus [slides] also discussed recent research using source separation to enable end-user control of dialogue level. Adjustments would allow listeners to respond to aspects such as any hearing impairments, the background noise levels in the listening environment, and listening to content in a non-natively-spoken language.  As above, blind source separation can facilitate these adjustments even when perfect separation is not achieved. This work is described in a recent article in IEEE Transactions on Broadcasting.

Finally, Ryan Kim [slides] gave an overview of the work in our own project on musical audio repurposing. The project has two strands: development of source separation algorithms, and perceptual evaluation of these algorithms in the context of musical audio repurposing. Deep neural networks have been exploited extensively in development of state-of-the-art source separation, with various architectures and combinations considered to achieve the best results. Work on perceptual evaluation has focussed on developing listening test methodologies to enable a better understanding of the strengths and limitations of current evaluation models. For example, listeners tended not to find a meaningful difference between artefacts and distortions, which might have implications for future models. The full list of project publications can be found here.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s