AI Seminar: Bryan Pardo – VoiceBlock: Privacy through Real-Time Adversarial Attacks with Audio-to-Audio Models

Bryan PardoProfessor, Computer Science and Radio, Television, and FilmNorthwestern University
This seminar has been cancelled and will be rescheduled sometime in 2023.

VoiceBlock: Privacy through Real-Time Adversarial Attacks with Audio-to-Audio Models


As governments and corporations adopt deep learning systems to collect and analyze user-generated audio data, concerns about security and privacy naturally emerge. Automatic speaker recognition systems can facilitate mass surveillance, allowing search for a target speaker through thousands of concurrent voice communications, or through large databases of recorded voice data. Prior to the advent of automatic speaker recognition these tasks required human analysts, forming a natural check on surveillance overreach. We seek to restore this check by degrading the efficacy of automated speaker recognition while maintaining the original perceptual quality of the voice communication, a step that could grant a measure of privacy from mass surveillance. Inspired by architectures for tasks such as speech denoising and enhancement, we propose a deep learning model capable of anonymizing a user’s audio stream online, in real-time. Our model learns to apply a time-varying finite impulse response (FIR) filter to outgoing audio, allowing for effective and inconspicuous perturbations with a delay small enough for voice communications. This model is highly effective at de-identifying user speech from speaker recognition and can transfer to a recognition system it was not trained on. A perceptual study shows our method produces perturbations significantly less perceptible than baseline anonymization methods, when controlling for effectiveness.


Bryan Pardo studies fundamental problems in computer audition, content-based audio search, and generative modeling of audio, and also develops inclusive interfaces for audio production. He is head of Northwestern University’s Interactive Audio Lab and co-director of the Northwestern University Center for HCI+Design. Prof. Pardo has appointments in in the Department of Computer Science and Department of Radio, Television and Film. He received a M. Mus. in Jazz Studies in 2001 and a Ph.D. in Computer Science in 2005, both from the University of Michigan. He has authored over 130 peer-reviewed publications. He has developed speech analysis software for the Speech and Hearing department of the Ohio State University, statistical software for SPSS and worked as a machine learning researcher for General Dynamics. His patented technologies have been productized by companies including Bose, Adobe and Ear Machine. While finishing his doctorate, he taught in the Music Department of Madonna University. When he is not teaching or researching, he performs on saxophone and clarinet with the bands Son Monarcas and The East Loop.

