Network Traffic Analysis & Threat Detection
Comprehensive network security analysis tool that processes PCAP files and live traffic to detect 8+ threat types through hybrid detection combining rule-based Sigma analysis, behavioral anomaly detection, and Random Forest ML classification.
Overview
A sophisticated network security platform built in Python for analyzing captured traffic (PCAP files) and monitoring live network streams in real-time. The system employs a hybrid detection approach combining three methodologies: (1) rule-based detection for known attack patterns (large flows on ports 80/443, asymmetrical flows indicating DDoS, unusual port usage, DNS abuse, SYN floods, HTTP GET floods, ICMP floods, and port scanning), (2) Sigma rules engine for custom YAML-based threat signatures loaded from the sigma_rules/ directory, and (3) machine learning classification using Random Forest to identify malicious traffic patterns. Features IP reputation checking via AbuseIPDB integration to identify known malicious actors, geolocation mapping generating interactive HTML maps showing geographic distribution of detected threats, and comprehensive report generation with visual analytics. The ML classifier trains on labeled datasets (normal vs. malicious traffic) and can predict on new captures, with feature importance analysis revealing which network characteristics (packet size distribution, protocol headers, traffic volume patterns, inter-arrival times) are most predictive of malicious behavior. In testing, the Random Forest model achieved 97.3% accuracy with strong precision and recall metrics. Supports both offline PCAP analysis and live capture from network interfaces with configurable capture duration and port filtering. Designed for security analysts, SOC teams, and network administrators requiring automated threat detection with explainable AI decision-making and minimal false positives.
Key Highlights
- ✓8+ rule-based detections: DDoS, SYN/HTTP/ICMP floods, port scanning, DNS abuse
- ✓Hybrid threat detection: Sigma rules + Random Forest ML (achieved 97.3% accuracy in testing)
- ✓Live capture from network interfaces with configurable duration/filters
- ✓IP reputation checking via AbuseIPDB for known malicious actors
- ✓Interactive geolocation HTML maps showing threat origin distribution
- ✓Custom Sigma rules loaded from YAML files for extensible detection
- ✓Automated report generation with visual analytics and threat timelines
- ✓Feature importance analysis for explainable AI and model transparency
Technical Deep Dive
Hybrid Detection Architecture
The system combines three detection approaches for comprehensive threat coverage: (1) Rule-based detection identifies known attack patterns like large flows (ports 80/443), asymmetrical flows (potential DDoS), SYN floods, HTTP GET floods, ICMP floods, port scanning, unusual port usage, and DNS heavy users. (2) Sigma rules engine loads custom YAML-based detection signatures from the sigma_rules/ directory, enabling security teams to define and maintain organization-specific threat patterns. (3) Machine Learning classifier trains a Random Forest model on labeled datasets (normal vs. malicious traffic) to detect unknown threats through behavioral analysis. This multi-layered approach minimizes false positives while maximizing detection coverage across both known and zero-day attack vectors.
PCAP Analysis & Live Capture
The flows_analyzer.py module provides dual-mode operation for maximum flexibility. Offline mode analyzes existing PCAP files with options to identify perspective IP (your own IP to exclude from analysis) and run comprehensive detection suites via --overall flag. Live capture mode monitors real-time traffic from specified network interfaces (use --list-interfaces to discover available options) with configurable duration (--duration) and port filtering (--port) for targeted analysis. Both modes support individual detection execution (--syn-flood, --ports-scanner, --sigma-analysis) for focused investigation and batch processing of multiple captures. The system handles large PCAP files efficiently through streaming analysis and provides detailed per-flow statistics for forensic investigation.
Machine Learning Classification & Feature Analysis
The ml_flows_analyzer.py module implements a Random Forest classifier for malicious traffic detection. Training mode accepts two PCAP files (normal traffic and malicious traffic) to build the model, automatically extracting features including packet size distribution, protocol headers, traffic volume patterns, and inter-arrival times. Prediction mode applies the trained model to new captures for threat classification. Feature importance analysis reveals which network characteristics drive classification decisions, providing explainable AI insights that enable security analysts to understand detection logic and validate model behavior. In testing with representative datasets, the model achieved 97.3% accuracy with 93.6% precision, 95.6% recall, 94.6% F1-score, and AUC of 0.973, demonstrating excellent discrimination capability. The confusion matrix showed minimal false positives, making it suitable for SOC environments where alert fatigue is a critical concern. Actual performance will vary based on training data quality and network environment characteristics.
Threat Intelligence & Geolocation
Integration with AbuseIPDB provides real-time IP reputation checks, querying the global threat database to identify known malicious actors, botnets, and command-and-control servers. The check_ip_reputation() function in raport_generator.py validates detected IPs against threat intelligence feeds, enriching alerts with abuse confidence scores, usage types, and historical attack records. Geographic threat distribution is visualized through generate_ip_location_map(), which creates interactive HTML maps using folium and ipinfo.io geolocation API. These maps display detected threat origins with color-coded severity indicators, enabling security teams to identify geographic attack patterns, nation-state actor campaigns, and coordinated distributed attacks. The visual analytics support executive reporting and compliance documentation by providing clear evidence of threat landscape and attack origin attribution.
Extensibility & Custom Detection Rules
The Sigma rules engine (read_sigma.py) provides a flexible framework for defining custom threat detection logic through YAML-based rule files stored in sigma_rules/ directory. Security teams can create organization-specific detection signatures without modifying Python code, enabling rapid response to emerging threats and customization for unique network environments. The system automatically loads all .yml files from the sigma_rules directory, supporting complex rule logic including AND/OR conditions, field matching, and value lists. This extensibility makes the platform adaptable to diverse security requirements from home labs to enterprise SOCs. Combined with the ML classifier's ability to learn from new training data, the system supports continuous improvement through threat intelligence integration and feedback loops from security analysts validating detection accuracy.




