Bilal Ahmad | Hate Speech Detection in E2E Communication

Implementation Details

Model & Dataset

The core of the system is MobileBERT, which was fine-tuned on the Roman Urdu Hate Speech Dataset from Kaggle to detect offensive content in Roman Urdu script.

Optimization & Security Pipeline

To ensure the model could run efficiently and securely on mobile devices, we applied a rigorous optimization pipeline:

Quantization (INT8): To reduce size, the model was quantized to INT8 format.
- Actual Model (after fine-tuning): 95.2 MB
- After Quantization: 25.6 MB
Encryption: To protect intellectual property and prevent reverse engineering, the model file was encrypted before deployment.
- Encrypted Payload: 34.1 MB
- Tokenizer Overhead: ~1 MB (922 KB)

System Architecture

The solution is designed around a three-tier architecture to ensure secure, real-time communication:

Client-Side Intelligence: The optimized MobileBERT (TFLite) runs entirely on-device, classifying content locally before any data leaves the phone.
Security Layer: Implements RSA for secure key exchange and AES for message confidentiality.
Backend Infrastructure: Utilizes Firebase Authentication and Realtime Database for managing encrypted payloads.

Hate Speech Detection in E2E Communication

Problem Statement

Implementation Details

Model & Dataset

Optimization & Security Pipeline

System Architecture

Tech Stack