Department of Management Information System, Al-Istiqlal University, Jericho P.O. Box 10, Palestine
World Journal of Advanced Engineering Technology and Sciences, 2025, 16(03), 176–188
Article DOI: 10.30574/wjaets.2025.16.3.1330
Received on 27 July 2025; revised on 06 September 2025; accepted on 08 September 2025
Distributed computer systems are the cornerstone of the present-day computing infrastructures, but due to their complexity, they are susceptible to undetected errors that might reduce their performance and reliability. Although previous studies have shown how Artificial Intelligence (AI) can be used to improve fault detection in energy systems, motors, and cloud infrastructures, the gap of the research lies in how methods based on Artificial Intelligence can be used to the multi-metric CPU-level performance monitoring in distributed computing platforms directly. This research paper fills the gap by creating an Artificial Intelligence-based framework, which combines principal component analysis (PCA), clustering (K-means), anomaly detection (One-Class SVM), and correlation analysis to test the real operational data. An 8,673 records dataset was examined comprising of CPU utilization, temperature, clock speed, cache miss rate, and power consumption. The data indicated that the system has been running inside a steady state, the mean CPU utilization is 50.87 percent, and the mean temperature of 60.2 C, which demonstrates that the thermal management is efficient. Nevertheless, the highest values of power consumption (1264.5 W) and temperature (120 C) showed the moments when high loads demanded better control of power and cooling. The three operating modes were identified by PCA and K-means clustering, two dominant clusters (47.78% and 51.47% of samples) of them included normal states, and one minor cluster (0.75% of samples) implied transitional or possibly anomalous states. By comparison, the One-Class SVM model did not mark these small cases as anomalies, which points to sensitivity weaknesses. In general, this study makes a new AI-based approach to system-level fault detection, which supports not only operational information but also the basis on which predictive fault-tolerant strategies are built in distributed computer systems.
CPU Performance Monitoring; Statistical Analysis; Clustering; Anomaly Detection; Power Consumption
Preview Article PDF
Bahaa Yahya. Artificial Intelligence–Driven Fault Detection in Distributed Computer Systems. World Journal of Advanced Engineering Technology and Sciences, 2025, 16(03), 176–188. Article DOI: https://doi.org/10.30574/wjaets.2025.16.3.1330.