With the rise of AI-generated voices, deepfake audio is becoming a significant challenge. This project aims to build a deep learning model from scratch to detect fake speech using MFCC feature extraction and an RNN-based model.
Tech Stack & Libraries:
Python (Primary Language)
TensorFlow/Keras (Deep Learning)
Librosa (Audio Processing)
Matplotlib & Seaborn (Visualization)
Scikit-learn (Confusion Matrix & Metrics)
Jupyter Notebook/VS Code (Development Environment)
Dataset:
FoR 2-second dataset (Contains train, test, validation folders)
Each folder has "real" and "fake" subdirectories
Project Workflow:
1. Feature Extraction: Convert audio files to MFCC features
2. Model Creation: Build a custom RNN model without pretrained weights
3. Training & Validation: Train the model with the FoR 2-second dataset
4. Evaluation: Generate a confusion matrix and accuracy graph
5. Prediction: Test the model on unseen audio samples