STATISTICAL ANALYSIS OF ANOMALY DETECTION ALGORITHMS IN BIG DATA ENVIRONMENTS
Keywords:
Anomaly Detection, Big Data, Machine Learning, Smart Metering, Cybersecurity, Intrusion Detection System, IoT, Turbomachinery, Unsupervised Learning, Spectral Residual-Convolutional Neural Network, One-Class Support Vector Machine, Blockchain, Apache Spark, Real-Time Detection.Abstract
In the era of large records, detecting anomalies has become a critical mission across various industries, which include utilities, cybersecurity, and the petroleum enterprise. This paper presents a comprehensive statistical analysis of anomaly detection algorithms carried out in big data environments, focusing at the software zone's smart metering, cybersecurity, turbomachinery within the petroleum industry, and the Internet of Things (IoT). We discover the utility of unsupervised and supervised gadget getting to know (ML) techniques for identifying anomalous styles in time-collection data. A hybrid approach combining Spectral Residual-Convolutional Neural Networks (SR-CNN) and martingale-primarily based anomaly detection is applied to clever meter facts, achieving high accuracy in identifying suspicious behavior. We also present a cloud-primarily based Intrusion Detection System (IDS) leveraging Apache Spark and the MAWILab dataset, demonstrating the gadget's efficacy in real-time cyber-assault detection with close to-perfect accuracy. Additionally, we observe the use of one-class guide vector machines and YASA segmentation for anomaly detection in turbomachinery, addressing the challenges posed by unlabeled statistics in high-frequency sensor environments. Lastly, we evaluate anomaly detection methodologies for IoT structures, highlighting the capability of blockchain-based collaborative studying for reinforcing security in these aid-constrained and distributed networks. The paper concludes with a statistical assessment of the overall performance metrics across the diverse domains, emphasizing the effectiveness of device getting to know algorithms in large information environments.

