Host-based Intrusion Detection System (HIDS)
Intrusion Detection using DistilBERT
About This Demo
This application demonstrates a machine learning-based intrusion detection system trained on the ADFA-LD dataset.
- Model: DistilBERT fine-tuned for sequence classification
- Dataset: ADFA-LD (Australian Defence Force Academy Linux Dataset)
- Performance: 94.03% Accuracy | 94.50% F1-Score
Model Repository: salsazufar/distilbert-base-hids-adfa
Detection Pipeline
The system processes system calls through three stages:
- Preprocessing - Converts raw system call sequence into 18-gram sliding windows
- Inference - Classifies each window as Normal or Attack using the DistilBERT model
- Aggregation - Determines final detection based on all window predictions
Detection Strategy: If any window is classified as Attack, the final result is Attack.
Input System Call Sequence
Detection Results
Notes
- The model uses 18-gram sliding windows with stride=1 for comprehensive coverage
- Each window is independently classified, providing detailed analysis of the sequence
- The aggregation strategy flags the entire sequence as Attack if any window is detected as malicious
- Sample sequence demonstrates transition from Normal → Attack → Normal behavior
Developed as part of thesis research | Model trained on ADFA-LD dataset