Aurosweta
Mahapatra

PhD Student, SMILE Lab @ CLSP
Advisor: Dr. Berrak Sisman
Speech Security · Anti-Spoofing · Deepfake Detection · Speech Synthesis · Speech for Healthcare

Get in touch Google Scholar LinkedIn

Latest News

Apr 2026 Accepted for a poster presentation at the Amazon Trusted AI Symposium 2026.

Sep 2025 Presented "Can Emotion Fool Anti-Spoofing?" at Interspeech 2025 and selected for the Doctoral Consortium.

About Me

I am a PhD student at the SMILE Lab, CLSP, Johns Hopkins University, advised by Dr. Berrak Sisman, working at the intersection of speech security, synthesis, and healthcare. My research focuses on a critical gap in modern speech security: current speech deepfake detectors often fail against emotional and expressive synthetic speech because they rely on dataset-specific artifacts rather than true speech structure. In practice, modern text-to-speech and voice conversion systems generate speech that is highly intelligible, speaker-consistent, and context-aware, making them effective for spoofed calls, impersonation, misinformation, and biometric deception, while existing detection models remain vulnerable to these attacks. I address this by designing emotion-aware and prosody-driven models inspired by human perception, which learn more generalizable representations instead of relying solely on classification objectives. Beyond speech security, I also work on speech for healthcare, including depression and Alzheimer's detection, as well as speech synthesis. Before joining Hopkins, I was a research assistant at UT Dallas, where I conducted an in-depth literature review on speech security. I completed my master's at UCLA, working on automatic speech recognition for child speech at SPAPL under the guidance of Dr. Abeer Alwan. I am broadly interested in building reliable and trustworthy speech systems that enable the safe and ethical use of AI in real-world applications.

Academic Journey

Jan 2025 – Present

PhD in Electrical and Computer Engineering

Johns Hopkins University · SMILE Lab @ CLSP

Advisor: Dr. Berrak Sisman · Secure speech technologies, speech synthesis, speech for healthcare

Sep 2022 – Jun 2024

MS in Electrical & Computer Engineering

University of California, Los Angeles (UCLA) · GPA 3.71/4

Specialization in Signals and Systems · ASR for child speech, generative TTS-based data augmentation

Jul 2018 – May 2022

BS in Electronics & Telecommunication

KIIT, Bhubaneswar, India · GPA 3.88/4

School of Electronics Topper · Science Academies' Summer Research Fellowship

Projects & Resources

ProSDD

"ProSDD: Learning Prosodic Representations for Speech Deepfake Detection" · Submitted, Interspeech 2026

A speaker-conditioned, prosody-aware speech deepfake detection framework based on supervised masked prediction. Improves cross-domain generalization and robustness to expressive and emotional attacks while maintaining strong performance on standard benchmarks across diverse training scenarios.

↗ Project Page

HuLA

"HuLA: Prosody-Aware Anti-Spoofing with Multi-Task Learning" · Under Review, IEEE TAC

A human-perception-inspired, prosody-aware deepfake detection model using multi-task learning. Improves robustness to expressive and emotional synthetic speech while maintaining strong performance on standard benchmarks. Learns prosodic patterns that generalize to cross-lingual attacks, including Spanish and Mandarin, even when trained only on English.

↗ Project Page

EmoSpoof-TTS

"Can Emotion Fool Anti-Spoofing?" · Interspeech 2025

The first TTS-based emotional synthetic speech dataset. Reveals critical vulnerabilities of speech deepfake detectors to expressive and emotion-driven attacks and enables evaluation under realistic conditions. Exposes performance disparities across emotional states, which we define as vulnerability to emotion-targeted attacks.

↗ Dataset Page

NaturalVoices

"NaturalVoices: A Large-Scale, Spontaneous and Emotional Podcast Dataset" · Under Review, IEEE TAC

A large-scale dataset with over 5,000 hours of spontaneous and emotional podcast speech, designed to support research in voice conversion and expressive speech modeling.

↗ GitHub

JIBO Kids Corpus

"The JIBO Kids Corpus: A speech dataset of child-robot interactions" · JASA Express Letters 2024

A 21-hour dataset of child–robot interactions in classroom environments, covering 110 children aged 4–7 with longitudinal recordings and word-level annotations.

↗ GitHub