BanglaFake

Constructing and Evaluating a Specialized Bengali Deepfake Audio Dataset

Project Overview

With the increasing misuse of AI-generated speech, especially in the form of highly realistic deepfake audio, the need for effective detection mechanisms has become urgent. While research in high-resource languages like English and Chinese has progressed, low-resource languages such as Bengali lack the necessary datasets to support meaningful detection efforts.

To bridge this gap, we present BanglaFake, the first publicly available large monospeaker Bengali deepfake audio dataset, designed specifically for training and evaluating deepfake detection models. The dataset comprises 12,260 real and 13,260 synthetic audio clips, each 6–7 seconds long. Real speech samples were collected from the SUST TTS corpus and Mozilla Common Voice, while fake samples were generated using a VITS-based TTS model trained from scratch on Bengali phonemes.

To assess quality, we conducted a Mean Opinion Score (MOS) evaluation with 30 native Bengali speakers, achieving scores of 3.40 for naturalness and 4.01 for intelligibility. We further employed t-SNE visualization on MFCC features to explore the distribution of real versus synthetic audio, emphasizing the realism of the deepfake audios.

BanglaFake provides a critical resource to advance deepfake detection research for Bengali and serves as a foundational benchmark for future TTS and audio forensics studies in low-resource language settings like Bangla.

Audio Samples

Sample 1: বিশ্ববিদ্যালয় ও কলেজের শিক্ষকগণ এই ধরণের অনুষ্ঠানগুলোতে এক সাথে কাজ করতে পারে।

🎧 Ground Truth:

🌀 Deepfake:

Sample 2: সমর্থকদের কাছে ইমরান খান রাজনীতিবিদের পাশাপাশি একজন মানবপ্রেমিকও।

🎧 Ground Truth:

🌀 Deepfake:

Sample 3: অথচ এখানে চাঁদটি কী ভয়ঙ্কর অহঙ্কারীর মতো পুরো আকাশটি উজ্জ্বল করে রেখেছে।

🎧 Ground Truth:

🌀 Deepfake:

Sample 4: তাদের দেশের ওপর অবরোধ আরোপ করা হয়েছে।

🎧 Ground Truth:

🌀 Deepfake:

Sample 5: মুনীর চৌধুরী একজন বাংলাদেশী শিক্ষাবিদ, নাট্যকার, সাহিত্য সমালোচক, ভাষাবিজ্ঞানী এবং শহীদ বুদ্ধিজীবী।

🎧 Ground Truth:

🌀 Deepfake:

Methodology

We trained a VITS-based Bengali TTS model from scratch using phoneme-level inputs to generate synthetic audio. Real and fake samples were analyzed using MFCC features and visualized with t-SNE to examine distribution patterns. To evaluate realism, we conducted a Mean Opinion Score (MOS) test with 30 native speakers. All experiments, preprocessing steps, and evaluation code are available in our GitHub repository.

Research Team

The BanglaFake project is developed by a group of machine learning enthusiast students from IIT, University of Dhaka. As part of our academic journey, we explored the challenges of deepfake audio detection in Bengali. We are open to feedback, collaboration, and guidance from the broader research and open-source communities.

Contact Us

For questions, feedback, or collaboration inquiries, reach out via email:
bsse1204@iit.du.ac.bd
bsse1217@iit.du.ac.bd
bsse1221@iit.du.ac.bd