Host-based Intrusion Detection System (HIDS)

Intrusion Detection using DistilBERT

About This Demo

This application demonstrates a machine learning-based intrusion detection system trained on the ADFA-LD dataset.

  • Model: DistilBERT fine-tuned for sequence classification
  • Dataset: ADFA-LD (Australian Defence Force Academy Linux Dataset)
  • Performance: 94.03% Accuracy | 94.50% F1-Score

Model Repository: salsazufar/distilbert-base-hids-adfa


Detection Pipeline

The system processes system calls through three stages:

  1. Preprocessing - Converts raw system call sequence into 18-gram sliding windows
  2. Inference - Classifies each window as Normal or Attack using the DistilBERT model
  3. Aggregation - Determines final detection based on all window predictions

Detection Strategy: If any window is classified as Attack, the final result is Attack.


Input System Call Sequence


Detection Results


Notes

  • The model uses 18-gram sliding windows with stride=1 for comprehensive coverage
  • Each window is independently classified, providing detailed analysis of the sequence
  • The aggregation strategy flags the entire sequence as Attack if any window is detected as malicious
  • Sample sequence demonstrates transition from Normal → Attack → Normal behavior

Developed as part of thesis research | Model trained on ADFA-LD dataset